The Lightspeed Matlab toolbox

version 2.6

by Tom Minka

This library provides highly optimized versions of primitive functions such as repmat, set intersection, and gammaln. It provides efficient random number generators and evaluation of common probability densities. It provides routines for counting floating-point operations (FLOPS), useful for benchmarking algorithms. There are also some useful utilities such as filename globbing and parsing of variable-length argument lists.

You will be surprised how much faster Matlab runs with the routines in this toolbox. For examples, check out the benchmarks test_repmat.m and test_solve_tri.m. The new repmat(1,m,n) is faster than ones(m,n)!

The library also contains some useful graphics functions such as axis_pct and mobile_text (in a subdirectory).

The toolbox has been tested on all versions of Matlab from 6.5 to 7.12 with Windows XP, Vista, and 7. It has been tested on 32-bit and 64-bit machines, with Microsoft Visual Studio 2008 and 2010 Professional and 2010 Express Editions. It should work on Macs and Linux as well. Most (but not all) functions work with Matlab 6.1 and 5. See the README.txt file for installation instructions.

Alternate download
If the above links don't work, get it at Microsoft research downloads under the name "Fast subroutines for Matlab programs".

Note: For Matlab R2013b and later, you will need to replace install_lightspeed.m with the following:

Further information

See my general tips on accelerating matlab.

Marios Athineos's links to tips and tricks. Lightspeed incorporates many of the tricks described there.

Kevin Murphy has a list of other useful Matlab packages.

Flop counting

Matlab is frequently used for research, so I have included routines that faciliate research. In particular, Lightspeed features a set of routines for accurate floating-point operation (flop) counting. These flop counts allow machine-independent and programmer-independent comparison of numerical algorithms, because they represent the minimal number of operations that the algorithm needs. Consequently, you can compare algorithms based on their Matlab implementations alone, without having to code them efficiently in C and report their run time on a particular processor. Hopefully these routines will allow more informative algorithm comparisons in research papers.

The flop count routines in lightspeed are significantly more accurate than the flops function which was included in Matlab up to version 5. For example, according to Matlab 5, inv(3) requires more flops than 1/3. Matlab 5 also returned slightly incorrect flop counts for matrix multiplies and overly pessimistic flop counts for solving linear systems. Lightspeed always returns the minimal number of flops for matrix operations, as though the best possible algorithm was used, no matter what method you are actually using.

Perhaps the biggest difference between flop counting in lightspeed versus Matlab 5 is the handling of special functions like exp() and sqrt(). Matlab 5 counted these as one flop, which is a gross underestimate. A more accurate count for exp(), based on the Pentium 4 processor, is 40 flops. Similarly, sqrt() is 8 flops. Interestingly, the run time in Matlab 6 for sqrt() is significantly longer than exp(), and even longer than the time for x^(0.51) (raising to an arbitrary power other than 0.5). This emphasizes the importance of measuring the `idealized' run time, represented by flops, rather than the actual run time, which is subject to odd inefficiencies in Matlab (or any programming language).

Flop counting in lightspeed is a more manual process than in Matlab 5. In Matlab 5, the flop counter is incremented automatically after every operation. Lightspeed does not increment the flop counter automatically. Instead, you must specify which operations should have their flops counted. For example, after performing a matrix multiply you should call addflops(flops_mul(...)), and after every Cholesky decomposition you should call addflops(flops_chol(...)). Some operations do not have a dedicated flops routine. For these you should consult the help for flops.

Manual flop counting has two advantages. First, it can be different from the operation that you actually performed, allowing you to count the flops for an `idealized' algorithm rather than the one you implemented. Second, since only the operations that you explicitly count get added to the flop counter, unrelated operations (such as debugging code) will not interfere with the result.

Incrementing the flop counter on every operation can cause your code to run slower. To avoid this, you can batch up the count for many operations. For example, to get the flop count for a loop, you can save time by computing the flops for one iteration of the loop and then multiply by the number of iterations. For examples, see dirichlet_fit_newton.m in fastfit or train_cg.m in my logistic regression library.

Table of contents

% Lightspeed Toolbox.  
% Efficient operations for Matlab programming.
% Version 2.6   05-May-2011
% By Tom Minka
% (c) Microsoft Corporation. All rights reserved. 
% Matrix algebra
%   repmat           - Fast replacement for matlab's repmat.
%   xrepmat          - Matlab's original repmat.
%   setnonzeros      - Fast creation of sparse matrix.
%   row_sum          - Sum for each row.  Faster than 'sum'.
%   scale_rows       - Scale each row of a matrix.
%   scale_cols       - Scale each column of a matrix.
%   solve_triu       - Left division by upper triangular matrix.
%   solve_tril       - Left division by lower triangular matrix.
%   sqdist           - Squared Euclidean and Mahalanobis distance.
%   isposdef         - Check for positive-definiteness.
%   logdet           - log(determinant) for positive definite matrix.
%   cholproj         - Projected Cholesky factorization.
%   inv_posdef       - Invert positive definite matrix.
% Statistics
%   mvnormpdf        - Multivariate normal density.
%   mvnormpdfln      - Log of multivariate normal density.
%   normcdf          - Normal cumulative distribution.
%   normcdfln        - Log of normal cumulative distribution.
%   normcdflogit     - Logit of normal cumulative distribution.
%   invnormcdf       - Normal quantile function.
%   wishpdf          - Wishart probability density function.
%   wishpdfln        - Log of Wishart probability density function.
%   sample           - Sample from categorical distribution.
%   sample_vector    - Sample from multiple categorical distributions.
%   sample_hist      - Sample from multinomial distribution.
%   randbinom        - Sample from binomial distribution.
%   randnorm         - Sample from multivariate normal.
%   randgamma        - Sample from Gamma distribution.
%   randbeta         - Sample from Beta distribution.
%   randwishart      - Sample from Wishart distribution.
%   randomseed       - Get or set the random seed.
%   int_hist         - Histogram of integer values.
% Utility
%   logsumexp        - Sum in the log domain.
%   logmulexp        - Matrix multiply in the log domain.
%   ndgridmat        - Matrix of grid points.
%   ind2subv         - Subscript vector from linear index.
%   subv2ind         - Linear index from subscript vector.
%   gammaln          - Fast replacement for matlab's gammaln.
%   digamma          - Derivative of gammaln.
%   trigamma         - Derivative of digamma.
%   tetragamma       - Derivative of trigamma.
%   ndsum            - Sum over multiple dimensions.
%   ndmax            - Maximum over multiple dimensions.
%   ndlogsumexp      - Sum over multiple dimensions in the log domain.
%   maxdiff          - Maximum difference between structs or arrays.
%   sameobject       - Test if two variables correspond to the same object.
%   find_sameobject  - Find an object in a cell array.
%   toJava           - Convert to Java representation.
%   fromJava         - Convert from Java to Matlab.
%   glob             - Filename expansion via wildcards.
%   globstrings      - String matching via wildcards.
% Argument lists
%   argfilter        - Remove unwanted arguments from a key/value list.
%   makestruct       - Cell-friendly alternative to STRUCT.
%   setfields        - Set multiple fields of a structure.
%   struct2arglist   - Convert structure to cell array of fields/values.
% Mutation
%   mutable          - Convert to a mutable object.
%   immutable        - Convert to an ordinary (immutable) object.
% Set operations
%   ismember_sorted  - True for member of sorted set.
%   match            - Location of matches in a set.
%   match_sorted     - Location of matches in a sorted set.
%   setdiff_sorted   - Set difference between sorted sets.
%   intersect_sorted - Set intersection between sorted sets.
%   union_sorted     - Set union of sorted sets.
%   union_sorted_rows - Set union of sorted sets of row vectors.
%   duplicated       - Find duplicated rows in a matrix.
% Readability
%   rows             - Number of rows.
%   cols             - Number of columns.
%   col_sum          - Sum for each column.
%   setdiag          - Modify the diagonals of a matrix.
%   finddiag         - Index of elements on diagonals.
%   argmax           - Index of maximum element.
%   argmin           - Index of minimum element.
% Flop counting
%   flops            - Read/write flop counter.
%   addflops         - Add to flop counter.
%   flops_chol       - Flops for Cholesky decomposition.
%   flops_col_sum    - Flops for column sums.
%   flops_det        - Flops for matrix determinant.
%   flops_digamma    - Flops for gammaln, digamma, and trigamma.
%   flops_div        - Flops for division.
%   flops_exp        - Flops for exponential.
%   flops_inv        - Flops for matrix inversion.
%   flops_mul        - Flops for real matrix multiplication.
%   flops_normpdfln  - Flops for normpdfln.
%   flops_pow        - Flops for raising to real power.
%   flops_randnorm   - Flops for randnorm.
%   flops_row_sum    - Flops for row sums.
%   flops_sample     - Flops for sample(p,n).
%   flops_solve      - Flops for matrix left division.
%   flops_solve_tri  - Flops for triangular left division.
%   flops_spadd      - Flops for sparse matrix addition.
%   flops_spmul      - Flops for sparse matrix multiplication.
%   flops_sqrt       - Flops for square root.
% Stand alone programs
%   matfile          - Read/write MAT files.
%   tests/test_flops - Compare time versus flops for various math operations.
% Graphics utilities
%  see graphics/Contents.m
% Demos
%   tests/test_repmat,
%   tests/test_solve_tri, ...

Changes for 2.6

Updated the installer to handle MacOSX 10.6 and older versions of matlab (all the way back to 7.0.1).

Changes for 2.5

Updated the installer to handle Microsoft Visual Studio 2010 Express with Windows SDK 7.1. All mex files now use -largeArrayDims. Fixed a bug in multivariate gammaln (used by wishpdf).

Changes for 2.4

Lightspeed's normpdf is now called mvnormpdf, for compatibility with the statistics toolbox. Added invnormcdf. Updated the installer.

Changes for 2.3

Support for Matlab R2009a and 64-bit architectures. install_lightspeed is more robust to different matlab versions and platforms (pc, mac, unix). setnonzeros and sparse_nocheck are new functions for quickly filling in a sparse matrix. Flop counting is more accurate in some cases. Added flops_lu and flops_abs. Fixed a bug in union_rows_sorted. Added the tetragamma function.

Changes for 2.2

The installer now supports Matlab 7.5. randomseed is a new function that allows you to set the seed for lightspeed's random number generator. cholproj is a new function that provides an approximate cholesky decomposition even when a matrix is not exactly positive definite, for example due to roundoff errors. The tests have been moved to a subdirectory. More bug-fixes, thanks to people who have sent in bug reports.

Changes for 2.1

graphics/hhist has been greatly improved, and now has the same interface as hist. You'll never have to use hist again! normcdflogit is a new function that provides accurate evaluation of the normal CDF in the tails. A variety of bug-fixes, thanks to people who have sent in bug reports.

Changes for 2.0

Many new functions have been added, including sorted-set functions and mutation. Graphics functions have been moved to a subdirectory. The formulas for flop counting have been changed, to better reflect the Pentium 4 architecture. In particular, note the addition of flops_div (division is 8 flops on a Pentium 4), flops_sqrt, and flops_log.

Changes for 1.3

The main change from version 1.2 is that tri_inv_times has been renamed to solve_triu and solve_tril, and made more efficient. The tricky part is that these have to be linked against Matlab's internal BLAS library.

The mex files have also been changed to use the Matlab 5 API.

Tom Minka