Walkthrough =========== This notebook follows an example project from start to finish. We take video of micron-sized particles diffusing in water, track them, and analyze the trajectories to obtain the viscosity of water. At the bottom of the notebook, we very briefly survey the more advanced features of trackpy. Browse the `full list of example notebooks `__ to learn more. Scientific IPython Setup ------------------------ We need Python's plotting library, matplotlib. Your environment might load matplotlib automatically, but for this tutorial I'll load it explicitly using this convention. If you are unfamiliar with matplotlib, do the same as I do here, and everything that follows will work without modification. .. code:: python from __future__ import division, unicode_literals, print_function # for compatibility with Python 2 and 3 import matplotlib as mpl import matplotlib.pyplot as plt # the following line only works in an IPython notebook %matplotlib notebook # Optionally, tweak styles. mpl.rc('figure', figsize=(10, 6)) mpl.rc('image', cmap='gray') We also might want to use scientific Python libraries. Finally, we'll import ``trackpy`` itself and its sister project, ``pims``. .. code:: python import numpy as np import pandas as pd from pandas import DataFrame, Series # for convenience import pims import trackpy as tp We use the alias ``tp`` for brevity. Step 1: Read the Data --------------------- Opening images or video ~~~~~~~~~~~~~~~~~~~~~~~ To get our data into Python, we use our sister project, `PIMS `__ (Python Image Sequence). PIMS makes is easy and convenient to load and process video data from many formats with one consistent interface. You can read in: \* a directory or zipfile of sequential images using ``ImageSequence`` \* a multi-frame TIFF file using ``TiffStack`` \* a video (AVI, MOV, etc.) using ``Video`` \* specialty formats used in microscopy and scientific video capture: \* ``Cine`` \* ``LSM`` \* Files supported by `Bioformats `__ \* ``ND2`` using `PIMS\_ND2 `__ ``ImageSequence`` works out of the box, so we recommended trying that to start. The others require some extra dependencies. See the README page. .. code:: python frames = pims.ImageSequence('../sample_data/bulk_water/*.png', as_grey=True) .. code:: python frames .. parsed-literal:: Source: /Users/dallan/Documents/Repos/trackpy-examples/sample_data/bulk_water/*.png Length: 300 frames Frame Shape: 424 x 640 Pixel Datatype: uint8 We can access any frame like ``frames[frame_number]``. The image is represented as a numpy array of intensities. .. code:: python print(frames[0]) # the first frame .. parsed-literal:: [[124 124 124 ..., 119 120 121] [124 124 124 ..., 119 121 121] [124 124 123 ..., 120 122 123] ..., [125 126 125 ..., 107 97 96] [125 125 124 ..., 115 108 105] [125 125 124 ..., 123 118 116]] In an IPython notebook, the frame is represented by displaying the image. .. code:: python frames[0] .. raw:: html

Alternatively, to make a proper plot with axes and control over scaling, use matplotlib. .. code:: python plt.imshow(frames[0]) .. parsed-literal:: .. raw:: html

.. parsed-literal:: Frames behave like numpy arrays with a few extra properties. .. code:: python frames[123].frame_no .. parsed-literal:: 123 .. code:: python frames[123].metadata # Scientific formats can pass experiment meta data here. .. parsed-literal:: {} Step 2: Locate Features ----------------------- Start with just the first frame. Estimate the size of the features (in pixels). The size must be an odd integer, and it is better to err on the large side, as we'll see below. I estimate 11 pixels. .. code:: python f = tp.locate(frames[0], 11, invert=True) The algorithm looks for *bright* features; since my features are dark, I set ``invert=True``. ``locate`` returns a spreadsheet-like object called a DataFrame. It lists - each feature's position, - various characterizations of its appearance, which we will use to filter out spurious features, - the "signal" strength and an estimate of uncertainty, both derived from this paper You can read more about DataFrames in the `pandas documentation `__. They can easily be exported to formats like CSV, Excel, SQL, HDF5, etc. .. code:: python f.head() # shows the first few rows of data .. raw:: html

	x	y	mass	size	ecc	signal	raw_mass	ep
0	294.900375	5.611320	345.949923	2.558583	0.186067	17.881905	10719	0.089503
1	310.237439	6.915316	246.550508	3.215075	0.008316	5.494440	10491	0.154562
2	36.036426	8.142955	436.058739	2.927963	0.116840	12.587263	11232	0.045968
3	67.322668	7.721634	405.889269	2.834555	0.065217	13.885948	11000	0.058931
4	274.313019	8.450485	288.508050	2.819906	0.161817	9.190700	10594	0.116354

.. code:: python plt.figure() # make a new figure tp.annotate(f, frames[0]) .. parsed-literal:: .. raw:: html

.. parsed-literal:: Refine parameters to elminate spurious features ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Many of these circles are clearly wrong; they are fleeting peaks in brightness that aren't actually particles. There are many ways to distinguish real particles from spurrious ones. The most important way is to look at total brightness ("mass"). .. code:: python fig, ax = plt.subplots() ax.hist(f['mass'], bins=20) # Optionally, label the axes. ax.set(xlabel='mass', ylabel='count') .. parsed-literal:: .. raw:: html

.. parsed-literal:: [, ] .. code:: python f = tp.locate(frames[0], 11, invert=True, minmass=200) .. code:: python plt.figure() tp.annotate(f, frames[0]) .. parsed-literal:: .. raw:: html

.. parsed-literal:: There are more options for controling and optimizing feature-finding. You can review them in the `documentation `__. Or, pull them up as you work by typing ``tp.locate?`` into IPython. Check for subpixel accuracy ~~~~~~~~~~~~~~~~~~~~~~~~~~~ As Eric Weeks points out in his related tutorial, a quick way to check for subpixel accuracy is to check that the decimal part of the x and/or y positions are evenly distributed. Trackpy provides a convenience plotting function for this: .. code:: python plt.figure() tp.subpx_bias(f) .. parsed-literal:: .. raw:: html

.. parsed-literal:: /Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared "is being cleared", UserWarning) .. parsed-literal:: Locate features in all frames ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Or, to start, just explore a subset of the frames. +-----------------------------+---------------------------------------+ | selection | syntax example | +=============================+=======================================+ | all the frames | ``frames[:]`` or simply ``frames``. | +-----------------------------+---------------------------------------+ | the first 10 frames | ``frames[:10]`` | +-----------------------------+---------------------------------------+ | the last 10 frames | ``frames[-10:]`` | +-----------------------------+---------------------------------------+ | a range of frames | ``frames[100:200]`` | +-----------------------------+---------------------------------------+ | every 10th frame | ``frame[::10]`` | +-----------------------------+---------------------------------------+ | a list of specific frames | ``frames[[100, 107, 113]]`` | +-----------------------------+---------------------------------------+ We'll locate features in the first 300 frames from this video. We use ``tp.batch``, which calls ``tp.locate`` on each frame and collects the results. .. code:: python f = tp.batch(frames[:300], 11, minmass=200, invert=True) .. parsed-literal:: Frame 299: 461 features As each frame is analyzed, ``tp.batch`` reports the frame number and how many features were found. If this number runs unexpectedly low or high, you may wish to interrupt it and try different parameters. Step 3: Link features into particle trajectories ------------------------------------------------ We have the locations of the particles in each frame. Next we'll track particles from frame to frame, giving each one a number for identification. First, we must must specify a maximum displacement, the farthest a particle can travel between frames. We should choose the smallest reasonable value because a large value slows computation time considerably. In this case, 5 pixels is reasonable. Second, we allow for the possibility that a particle might be missed for a few frames and then seen again. (Perhaps its "mass" slipped below our cutoff due to noise in the video.) Memory keeps track of disappeared particles and maintains their ID for up to some number of frames after their last appearance. We'll choose 3. .. code:: python t = tp.link_df(f, 5, memory=3) .. parsed-literal:: Frame 299: 461 trajectories present The result is the features DataFrame ``f`` with an additional column, ``particle``, identifying each feature with a label. .. code:: python t.head() .. raw:: html

	x	y	mass	size	ecc	signal	raw_mass	ep	frame
0	294.900375	5.611320	345.949923	2.558583	0.186067	17.881905	10719	0.089503	0
1	295.360118	5.316313	388.202575	2.524643	0.214783	18.883275	10786	0.078429	1
2	295.970101	5.627616	373.436505	2.630296	0.189075	17.458842	10710	0.088747	2
3	295.734210	5.843516	392.386204	2.617637	0.187835	18.511646	10750	0.080728	3
4	295.686010	5.810687	375.745878	2.543898	0.200468	18.458071	10737	0.083059	4

Filter spurious trajectories. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We have more filtering to do. Empheremeral trajectories -- seen only for a few frames -- are usually spurious and never useful. The convenience function ``filter_stubs`` keeps only trajectories that last for a given number of frames. .. code:: python t1 = tp.filter_stubs(t, 50) # Compare the number of particles in the unfiltered and filtered data. print('Before:', t['particle'].nunique()) print('After:', t1['particle'].nunique()) .. parsed-literal:: Before: 4195 After: 828 We can also filter trajectories by their appearance. At this stage, with trajectories linked, we can look at a feature's "average appearance" throughout its trajectory, giving a more accurate picture. .. code:: python plt.figure() tp.mass_size(t1.groupby('particle').mean()) # convenience function -- just plots size vs. mass .. parsed-literal:: .. raw:: html

.. parsed-literal:: The particles with especially low mass or especially large size are probably out of focus or aggregated, respectively. It is best to experiment by trial and error, filtering out regions of mass-size space and looking at the results using ``tp.annotate`` and ``tp.circle``. In the end, we need to separate the good particles from the spurious ones, and it doesn't matter how we get it done. .. code:: python condition = lambda x: ((x['mass'].mean() > 250) & (x['size'].mean() < 3.0) & (x['ecc'].mean() < 0.1)) t2 = tp.filter(t1, condition) # a wrapper for pandas' filter that works around a bug in v 0.12 .. code:: python plt.figure() tp.annotate(t2[t2['frame'] == 0], frames[0]) .. parsed-literal:: .. raw:: html

.. parsed-literal:: Trace the trajectories. .. code:: python plt.figure() tp.plot_traj(t1) .. parsed-literal:: .. raw:: html

.. parsed-literal:: Remove overall drift ~~~~~~~~~~~~~~~~~~~~ Compute the overall drifting motion, which we will subtract away, adopting the reference frame of the particles' average position. .. code:: python d = tp.compute_drift(t1) .. code:: python plt.figure() d.plot() .. parsed-literal:: .. raw:: html

.. parsed-literal:: .. raw:: html

.. parsed-literal:: .. code:: python tm = tp.subtract_drift(t1, d) With the overall drifting motion subtracted out, we plot the trajectories again. We observe nice random walks. .. code:: python plt.figure() tp.plot_traj(tm) .. parsed-literal:: .. raw:: html

.. parsed-literal:: Step 4: Analyze trajectories ---------------------------- Mean Squared Displacement of Individal Probes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compute the mean squared displacement of each particle and plot MSD vs. lag time. .. code:: python im = tp.imsd(tm, 100/285., 24) # microns per pixel = 100/285., frames per second = 24 .. code:: python fig, ax = plt.subplots() ax.plot(im.index, im, 'k-', alpha=0.1) # black lines, semitransparent ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]', xlabel='lag time $t$') ax.set_xscale('log') ax.set_yscale('log') .. parsed-literal:: .. raw:: html

Since we only analyzed 300 frames, the statistics are poor at large lag times. With more frames, we can study larger lag times. Ensemble Mean Squared Displacement ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python em = tp.emsd(tm, 100/285., 24) .. code:: python fig, ax = plt.subplots() ax.plot(em.index, em, 'o') ax.set_xscale('log') ax.set_yscale('log') ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]', xlabel='lag time $t$') ax.set(ylim=(1e-2, 10)) .. parsed-literal:: .. raw:: html

.. parsed-literal:: [(0.01, 10)] We can easily fit this ensemble mean-squared displacement to a power law, :math:`At^n`, using a convenience function, ``fit_powerlaw``, that performs a linear regression in log space. .. code:: python plt.figure() tp.utils.fit_powerlaw(em) # performs linear best fit in log space, plots plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]') plt.xlabel('lag time $t$') .. parsed-literal:: .. raw:: html

.. parsed-literal:: /Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/matplotlib/artist.py:221: MatplotlibDeprecationWarning: This has been deprecated in mpl 1.5, please use the axes property. A removal date has not been set. warnings.warn(_get_axes_msg, mplDeprecation, stacklevel=1) .. parsed-literal:: In water, a viscous material, the expected power-law exponent :math:`n = 1`. The value of :math:`A = 4D`, where :math:`D` is the particles' diffusivity. :math:`D` is related to viscosity :math:`\eta`, particle radius :math:`a`, and temperature :math:`T` as :math:`D = \displaystyle\frac{kT}{6\pi\eta a}`. For particles with a 1 :math:`\mu\text{m}` diameter in room-temperature water, :math:`A \approx 1.66`. Our value above is not far off. **This is the end of the walkthrough. Keep reading to review the more advanced capabilities in trackpy.** Preview of Advanced Features ---------------------------- These are covered in greater detail in later tutorials. Streaming --------- The feature-finding and trajectory-linking functions ``batch`` and ``link_df`` keep all of their results in memory. This approach is simple, but it isn't necessary. We can prcoess an unlimited number of frames if we save the results as we go. Trackpy includes a class to manage storing an retrieving data framewise in an HDF5 format. The general idea is easily extensive to other formats. .. code:: python with tp.PandasHDFStore('data.h5') as s: tp.batch(frames, 11, invert=True, minmass=200, output=s) .. parsed-literal:: Frame 299: 461 features .. code:: python with tp.PandasHDFStore('data.h5') as s: for linked in tp.link_df_iter(s, 5, memory=3): s.put(linked) .. parsed-literal:: Frame 299: 461 trajectories present You can get results by frame with ``s.get(frame_number)`` or, when you have sufficient memory, retrieve them all. The results is identifical to what you would obtained using the basic functions ``batch`` and ``link_df``. .. code:: python with tp.PandasHDFStore('data.h5') as s: trajectories = pd.concat(iter(s)) Optional High Performance Component: Numba ------------------------------------------ The core, time-consuming steps in particle location and linking are implemented in Python/numpy and also in pure Python optimized for numba. If numba is installed, trackpy will detect it and use it by default. You can switch it on and off to compare performance. .. code:: python %timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='numba') .. parsed-literal:: Frame 9: 469 features 1 loops, best of 3: 1.3 s per loop .. code:: python %timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='python') .. parsed-literal:: Frame 9: 463 features 1 loops, best of 3: 11.8 s per loop The linking functions ``link_df`` or ``link_df_iter`` support various options for ``link_strategy``, one of which is numba-based. Read the their docstrings for details. Parellelization --------------- Feature-finding can easily be parallelized: each frame an independent task, and the tasks can be divided among the available CPUs. IPython parallel makes this very straightforward. It is simplest to try this on the CPUs of the local machine. To do this from an IPython notebook, go to File > Open, and click the "Clusters" tab. Fill in the "engines" field with the number of available CPUs (e.g., 4) and click start. Now you are running a cluster -- it's that easy. More information on IPython parallel is available in `this section of the IPython documentation `__. .. code:: python from ipyparallel import Client client = Client() view = client.load_balanced_view() client[:] .. parsed-literal:: .. code:: python curried_locate = lambda image: tp.locate(image, 11, invert=True, minmass=200) .. code:: python %%px import trackpy as tp .. code:: python %%timeit amr = view.map_async(curried_locate, frames[:32]) amr.wait_interactive() .. parsed-literal:: 32/32 tasks finished after 2 s done 1 loops, best of 3: 2.14 s per loop .. code:: python %%timeit serial_result = list(map(curried_locate, frames[:32])) .. parsed-literal:: 1 loops, best of 3: 4.51 s per loop