Making trackpy fast(er) ======================= This notebook explains specific features built into ``trackpy`` to boost performance, and it also provides some general suggestions on how to make your particle tracking run faster. First, set up matplotlib... .. code:: python %matplotlib inline import matplotlib from matplotlib.pyplot import imshow Let's check the ``trackpy`` version. If it's not 0.3.0 or greater, the following *may* not work. .. code:: python import trackpy trackpy.__version__ .. parsed-literal:: '0.2.3-296-g3e96836' General advice -------------- Subnetworks ~~~~~~~~~~~ As we'll see in the examples below, the need to resolve "subnetworks" — in which multiple neighboring particles in one frame could correspond to multiple particles in the next — can drastically slow down tracking. There is a seprate tutorial on dealing with (excessive) subnetworks. In the text below, we'll look at some methods for finding out just how much a of a problem subnetworks are in your application, so that you can weigh the various remedies against a potential loss of accuracy. Feature-finding: The best advice ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Judicious use of the ``minmass`` and/or ``threshold`` parameters during feature-finding, as described in the "walkthrough" tutorial notebook, can considerably speed up feature-finding, linking, and analysis of tracks. The best way to speed up a computation is usually to avoid doing it at all. Feature-finding: The details ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``locate()`` and ``batch()`` functions for feature-finding can use either of two engines for their work, as specified by the optional ``engine`` parameter: - The ``numba`` engine is reasonably fast. It is the default when numba is installed (see above). - The ``python`` engine is currently slightly more accurate, but considerably slower. In addition, both engines do some extra work that you might not always need: - They iteratively try to refine the position of the feature centroid, for the best possible subpixel accuracy. This can be turned off with the parameter ``max_iterations=0``. - They compute extra data about the feature, such as its eccentricity. This can be turned off with ``characterize=False``. **Note:** Before sacrificing accuracy for performance, it's best to know objectively how much performance you are gaining. Read on to the Advanced Topics section for information about how to time feature-finding. Using numba for acceleration ---------------------------- Two key algorithms in ``trackpy`` can be sped up by the ```numba`` `__ array-based just-in-time compiler for Python: - Iteratively refining the centroids of features, for better subpixel resolution. - Resolving subnetworks, as part of the linking process. The source code actually contains distinct Python and numba versions of each — and not one numba version but several, each optimized for a different use, and for 2D and 3D images. The default setting is to use the appropriate numba version when possible, but to fall back on the general-purpose Python version if numba is unavailable. You can also control these choices manually, using the ``engine`` argument to ``locate()``, and the ``link_strategy`` argument when linking. If you don't have ``numba``, we recommend using the Anaconda or Canopy scientific Python distributions, which include ``numba`` and many other useful packages. Note that it can be difficult to install outside of these distributions. Theoretical performance diagnostics: ``performance_report()`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``trackpy`` provides an easy way to check whether your system is configured for the best performance. Currently, it reports only whether ``numba`` is working. .. code:: python import trackpy.diag trackpy.diag.performance_report() .. parsed-literal:: Yes, but could it be faster? FAST: numba is available and enabled (fast subnets and feature-finding). Advanced topics --------------- Empirical performance diagnostics: Timing and profiling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What follows is a crash course in how to evaluate the performance of tracking. The general suggestions given above are good starting points, but they are just that — general suggestions. For your *specific* application, you need *specific* data about how long the computation takes, and how the computer is spending its time. This information may help you make your own adjustments to your tracking method, or it may help you when asking the ``trackpy`` developers for assistance (chances are we'll be happy to help!). You will use these techniques with your own data, but let's use the data from the "walkthrough" tutorial. Setup ~~~~~ .. code:: python import trackpy as tp import pims .. code:: python frames = pims.open('../sample_data/bulk_water/*.png', as_grey=True) The speed at which we can read images from disk is not something we can hope to improve much, so let's get that out of the way before we do our tests. Also, we don't need to use all the frames for this example (though you're welcome to). .. code:: python # This gets the image data out of the pims ImageSequence object, and into a good old list in memory. frames_preload = list(frames[:50]) Just so we don't forget what these images look like: .. code:: python imshow(frames_preload[0], cmap=matplotlib.cm.gray) .. parsed-literal:: .. image:: performance_files/performance_14_1.png Before we identify features, we need to give ``trackpy`` a chance to "warm up" — initialize ``fftw`` and compile certain functions, if it hasn't done so already. So we'll find features in just 2 frames: .. code:: python f2 = tp.batch(frames_preload[:3], 11, minmass=100, invert=True) _ = tp.batch(frames_preload[:3], 11, minmass=100, invert=True, characterize=False) .. parsed-literal:: Frame 2: 506 features Timing ~~~~~~ Let's find out — precisely — how long it takes to find features in a single frame. You'll notice that the following cell begins with the line ``%%timeit``. This is an example of a *cell magic*, which gives IPython special instructions about how to run the rest of the cell. .. code:: python %%timeit f1 = tp.batch(frames_preload[:1], 11, minmass=100, invert=True) .. parsed-literal:: Frame 0: 500 features 10 loops, best of 3: 89.7 ms per loop ``%%timeit`` runs the same code multiple times and returns a somewhat accurate measurement of execution time. We can use it to see how much faster we can find features in this image when we turn off centroid refinement and feature characterization: .. code:: python %%timeit f1 = tp.batch(frames_preload[:1], 11, minmass=100, invert=True, max_iterations=0, characterize=False) .. parsed-literal:: Frame 0: 499 features 10 loops, best of 3: 78.8 ms per loop One thing to note is that the ratio of features to pixels is rather low in this image. If you were finding features in a dense packing of particles, the timing difference could be much more significant. Profiling ~~~~~~~~~ Next, we'll try to get a sense of what's taking so long. The ``%%prun`` cell magic instructs IPython to run the rest of the cell's code using a *profiler*. .. code:: python %%prun f = tp.batch(frames_preload, 11, minmass=100, invert=True) .. parsed-literal:: Frame 49: 522 features When execution is complete, a pager will pop up in the lower portion of the notebook window. This is the profiler output. We reproduce a portion of it here. Your results will depend on your data and the computer you are using. :: 123225 function calls (121986 primitive calls) in 4.480 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 50 2.709 0.054 2.709 0.054 {scipy.ndimage._nd_image.min_or_max_filter} 50 0.481 0.010 0.481 0.010 {scipy.ndimage._nd_image.binary_erosion} 100 0.299 0.003 0.299 0.003 {scipy.ndimage._nd_image.correlate1d} 100 0.205 0.002 0.205 0.002 {scipy.ndimage._nd_image.uniform_filter1d} 100 0.128 0.001 0.128 0.001 {method 'nonzero' of 'numpy.ndarray' objects} 50 0.079 0.002 0.164 0.003 feature.py:103(refine) 100 0.078 0.001 0.078 0.001 {numpy.core.multiarray.where} 100 0.053 0.001 0.053 0.001 {method 'query_pairs' of 'scipy.spatial.ckdtree.cKDTree' objects} 50 0.036 0.001 0.607 0.012 preprocessing.py:14(bandpass) 50 0.033 0.001 0.195 0.004 feature.py:27(percentile_threshold) 50 0.032 0.001 2.989 0.060 feature.py:36(local_maxima) 50 0.029 0.001 0.526 0.011 uncertainty.py:12(measure_noise) This is a record of how the computer spent its time while finding features. The various numbers are explained in the `documentation for the Python ``profiler`` module `__. You shouldn't try to understand every line, but you can get some idea of which functions are called for each frame (multiples of 50), and which are called for each particle. Note that ``{functions in braces}`` are written in C, not Python, are not part of ``trackpy``, and tend to be less user-serviceable. Now let's look at the linking step. Once again, we'll prime ``trackpy``, to avoid distorting our results with one-time setup code: .. code:: python # Use a large search_range, to be sure that numba subnet code is compiled and used. t1 = tp.link_df(f2, 10, memory=3) .. parsed-literal:: Frame 2: 506 trajectories present .. code:: python %%prun t = tp.link_df(f, 5, memory=3) .. parsed-literal:: Frame 49: 522 trajectories present :: 770140 function calls (766921 primitive calls) in 1.652 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 51 0.229 0.004 1.180 0.023 linking.py:926(link) 149 0.140 0.001 0.322 0.002 {map} 28925/28825 0.096 0.000 0.099 0.000 {numpy.core.multiarray.array} 49 0.082 0.002 0.240 0.005 linking.py:1182(assign_candidates) 49 0.077 0.002 0.080 0.002 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects} 49 0.073 0.001 0.118 0.002 linking.py:1069(_assign_links) 25945 0.064 0.000 0.179 0.000 linking.py:368(__init__) 27415 0.048 0.000 0.117 0.000 numeric.py:394(asarray) 25945 0.042 0.000 0.042 0.000 linking.py:300(__init__) 54280 0.040 0.000 0.051 0.000 {method 'sort' of 'list' objects} 50 0.039 0.001 0.039 0.001 {pandas.lib.clean_index_list} 1 0.037 0.037 1.652 1.652 linking.py:470(link_df) 98 0.031 0.000 0.097 0.001 linking.py:32(rebuild) 300 0.024 0.000 0.043 0.000 common.py:303(_isnull_ndarraylike) 54115 0.022 0.000 0.022 0.000 linking.py:345(track) 54280 0.021 0.000 0.021 0.000 linking.py:47() 101 0.021 0.000 0.021 0.000 {method 'get_indexer' of 'pandas.index.IndexEngine' objects} 112445 0.018 0.000 0.018 0.000 {method 'append' of 'list' objects} 1265 0.018 0.000 0.018 0.000 {method 'reduce' of 'numpy.ufunc' objects} 25945 0.018 0.000 0.030 0.000 linking.py:196(add_point) 89361/86242 0.017 0.000 0.019 0.000 {len} 136 0.016 0.000 0.021 0.000 linking.py:1375(numba_link) This goes much faster than feature identification, though to be fair, we have far, far more pixels than particles. Note the calls to ``numba_link`` at the end of the report; this is the subnet resolution code referred to above. If you require a large ``search_range``, these computations can dramatically slow down linking. As it is, they are under control. ``pyfftw`` and the frequency-domain bandpass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``trackpy`` performs bandpass filtering to prepare an image for feature-finding. As of version 0.3, this is done with convolution; formerly, it was done using fast Fourier transforms (FFT). The two methods yield very close results, but the convolution method is roughly twice as fast. Because the two methods are *not* mathematically identical, ``trackpy`` still includes the FFT method in case you would like to use it (or just compare its results). This method is not available as part of ``locate()``; you must perform the bandpass separately and then call ``locate()`` with ``preprocess=False``. You can perform the FFT bandpass with one of two functions: - ``trackpy.preprocessing.legacy_bandpass()`` uses the FFT functions built into numpy. - ``trackpy.preprocessing.legacy_bandpass_fftw()`` uses the FFTW library, an exceptionally fast implementation of FFT. ``pyfftw`` is the module that lets Python programs use it. Like numba it is optional, but it can speed up the bandpass operation by a factor of ~3. (The default method now in ``trackpy`` is faster still.) You should be able to install FFTW from your Linux package manager, or on the Mac using `Homebrew `__ (with the command ``brew install fftw``). Once you've done that, ``pyfftw`` itself should install with ``pip install pyfftw`` Neighbor-finding strategy: KDTree ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An important operation in tracking (i.e. linking) is finding all the particles that are close to a certain point in space. ``trackpy`` has two ways of doing this: - A KDTree, which only handles more standard cases (e.g. Euclidean geometries) and is written in C. (This is the default.) - A hash table written in pure Python. It is very extensible but tends to be slower. The method is set by the ``neighbor_strategy`` argument to the various linking functions, which again defaults to KDTree. The KDTree is a standard part of ``scipy``; if you can run ``trackpy`` at all, it is enabled. -------------- Well, I hope you've enjoyed this brief tour of ``trackpy`` performance. Just remember that if you come up with new ways to improve performance, the ``trackpy`` community will be interested!