Walkthrough

This notebook follows an example project from start to finish. We take video of micron-sized particles diffusing in water, track them, and analyze the trajectories to obtain the viscosity of water.

At the bottom of the notebook, we very briefly survey the more advanced features of trackpy. Browse the full list of example notebooks to learn more.

Scientific IPython Setup

We need Python’s plotting library, matplotlib. Your environment might load matplotlib automatically, but for this tutorial I’ll load it explicitly using this convention. If you are unfamiliar with matplotlib, do the same as I do here, and everything that follows will work without modification.

from __future__ import division, unicode_literals, print_function  # for compatibility with Python 2 and 3

import matplotlib as mpl
import matplotlib.pyplot as plt

# the following line only works in an IPython notebook
%matplotlib notebook

# Optionally, tweak styles.
mpl.rc('figure',  figsize=(10, 6))
mpl.rc('image', cmap='gray')

We also might want to use scientific Python libraries. Finally, we’ll import trackpy itself and its sister project, pims.

import numpy as np
import pandas as pd
from pandas import DataFrame, Series  # for convenience

import pims
import trackpy as tp

We use the alias tp for brevity.

Step 1: Read the Data

Opening images or video

To get our data into Python, we use our sister project, PIMS (Python Image Sequence). PIMS makes is easy and convenient to load and process video data from many formats with one consistent interface.

You can read in: * a directory or zipfile of sequential images using ImageSequence * a multi-frame TIFF file using TiffStack * a video (AVI, MOV, etc.) using Video * specialty formats used in microscopy and scientific video capture: * Cine * LSM * Files supported by Bioformats * ND2 using PIMS_ND2

ImageSequence works out of the box, so we recommended trying that to start. The others require some extra dependencies. See the README page.

frames = pims.ImageSequence('../sample_data/bulk_water/*.png', as_grey=True)
frames
<Frames>
Source: /Users/dallan/Documents/Repos/trackpy-examples/sample_data/bulk_water/*.png
Length: 300 frames
Frame Shape: 424 x 640
Pixel Datatype: uint8

We can access any frame like frames[frame_number]. The image is represented as a numpy array of intensities.

print(frames[0])  # the first frame
[[124 124 124 ..., 119 120 121]
 [124 124 124 ..., 119 121 121]
 [124 124 123 ..., 120 122 123]
 ...,
 [125 126 125 ..., 107  97  96]
 [125 125 124 ..., 115 108 105]
 [125 125 124 ..., 123 118 116]]

In an IPython notebook, the frame is represented by displaying the image.

frames[0]

Alternatively, to make a proper plot with axes and control over scaling, use matplotlib.

plt.imshow(frames[0])
<IPython.core.display.Javascript object>
<matplotlib.image.AxesImage at 0x10d4f44a8>

Frames behave like numpy arrays with a few extra properties.

frames[123].frame_no
123
frames[123].metadata  # Scientific formats can pass experiment meta data here.
{}

Step 2: Locate Features

Start with just the first frame. Estimate the size of the features (in pixels). The size must be an odd integer, and it is better to err on the large side, as we’ll see below. I estimate 11 pixels.

f = tp.locate(frames[0], 11, invert=True)

The algorithm looks for bright features; since my features are dark, I set invert=True.

locate returns a spreadsheet-like object called a DataFrame. It lists

  • each feature’s position,
  • various characterizations of its appearance, which we will use to filter out spurious features,
  • the “signal” strength and an estimate of uncertainty, both derived from this paper

You can read more about DataFrames in the pandas documentation. They can easily be exported to formats like CSV, Excel, SQL, HDF5, etc.

f.head() # shows the first few rows of data
x y mass size ecc signal raw_mass ep frame
0 294.900375 5.611320 345.949923 2.558583 0.186067 17.881905 10719 0.089503 0
1 310.237439 6.915316 246.550508 3.215075 0.008316 5.494440 10491 0.154562 0
2 36.036426 8.142955 436.058739 2.927963 0.116840 12.587263 11232 0.045968 0
3 67.322668 7.721634 405.889269 2.834555 0.065217 13.885948 11000 0.058931 0
4 274.313019 8.450485 288.508050 2.819906 0.161817 9.190700 10594 0.116354 0
plt.figure()  # make a new figure
tp.annotate(f, frames[0])
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11e3e1550>

Refine parameters to elminate spurious features

Many of these circles are clearly wrong; they are fleeting peaks in brightness that aren’t actually particles. There are many ways to distinguish real particles from spurrious ones. The most important way is to look at total brightness (“mass”).

fig, ax = plt.subplots()
ax.hist(f['mass'], bins=20)

# Optionally, label the axes.
ax.set(xlabel='mass', ylabel='count')
<IPython.core.display.Javascript object>
[<matplotlib.text.Text at 0x10d8092e8>, <matplotlib.text.Text at 0x11e6d2f28>]
f = tp.locate(frames[0], 11, invert=True, minmass=200)
plt.figure()
tp.annotate(f, frames[0])
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11d9ca5c0>

There are more options for controling and optimizing feature-finding. You can review them in the documentation. Or, pull them up as you work by typing tp.locate? into IPython.

Check for subpixel accuracy

As Eric Weeks points out in his related tutorial, a quick way to check for subpixel accuracy is to check that the decimal part of the x and/or y positions are evenly distributed. Trackpy provides a convenience plotting function for this:

plt.figure()
tp.subpx_bias(f)
<IPython.core.display.Javascript object>
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
  "is being cleared", UserWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x11d9ca240>

If we use a mask size that is too small, the histogram often shows a dip in the middle.

plt.figure()
tp.subpx_bias(tp.locate(frames[0], 7, invert=True, minmass=200))
<IPython.core.display.Javascript object>
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
  "is being cleared", UserWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x1152326a0>

Locate features in all frames

Or, to start, just explore a subset of the frames.

selection syntax example
all the frames frames[:] or simply frames.
the first 10 frames frames[:10]
the last 10 frames frames[-10:]
a range of frames frames[100:200]
every 10th frame frame[::10]
a list of specific frames frames[[100, 107, 113]]

We’ll locate features in the first 300 frames from this video. We use tp.batch, which calls tp.locate on each frame and collects the results.

f = tp.batch(frames[:300], 11, minmass=200, invert=True)
Frame 299: 461 features

As each frame is analyzed, tp.batch reports the frame number and how many features were found. If this number runs unexpectedly low or high, you may wish to interrupt it and try different parameters.

Step 4: Analyze trajectories

Mean Squared Displacement of Individal Probes

Compute the mean squared displacement of each particle and plot MSD vs. lag time.

im = tp.imsd(tm, 100/285., 24)  # microns per pixel = 100/285., frames per second = 24
fig, ax = plt.subplots()
ax.plot(im.index, im, 'k-', alpha=0.1)  # black lines, semitransparent
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
       xlabel='lag time $t$')
ax.set_xscale('log')
ax.set_yscale('log')
<IPython.core.display.Javascript object>

Since we only analyzed 300 frames, the statistics are poor at large lag times. With more frames, we can study larger lag times.

Ensemble Mean Squared Displacement

em = tp.emsd(tm, 100/285., 24)
fig, ax = plt.subplots()
ax.plot(em.index, em, 'o')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
       xlabel='lag time $t$')
ax.set(ylim=(1e-2, 10))
<IPython.core.display.Javascript object>
[(0.01, 10)]

We can easily fit this ensemble mean-squared displacement to a power law, \(At^n\), using a convenience function, fit_powerlaw, that performs a linear regression in log space.

plt.figure()
tp.utils.fit_powerlaw(em)  # performs linear best fit in log space, plots
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$')
<IPython.core.display.Javascript object>
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/matplotlib/artist.py:221: MatplotlibDeprecationWarning: This has been deprecated in mpl 1.5, please use the
axes property.  A removal date has not been set.
  warnings.warn(_get_axes_msg, mplDeprecation, stacklevel=1)
<matplotlib.text.Text at 0x12a52ca20>

In water, a viscous material, the expected power-law exponent \(n = 1\). The value of \(A = 4D\), where \(D\) is the particles’ diffusivity. \(D\) is related to viscosity \(\eta\), particle radius \(a\), and temperature \(T\) as

\(D = \displaystyle\frac{kT}{6\pi\eta a}\).

For particles with a 1 \(\mu\text{m}\) diameter in room-temperature water, \(A \approx 1.66\). Our value above is not far off.

This is the end of the walkthrough. Keep reading to review the more advanced capabilities in trackpy.

Preview of Advanced Features

These are covered in greater detail in later tutorials.

Streaming

The feature-finding and trajectory-linking functions batch and link_df keep all of their results in memory. This approach is simple, but it isn’t necessary. We can prcoess an unlimited number of frames if we save the results as we go.

Trackpy includes a class to manage storing an retrieving data framewise in an HDF5 format. The general idea is easily extensive to other formats.

with tp.PandasHDFStore('data.h5') as s:
    tp.batch(frames, 11, invert=True, minmass=200, output=s)
Frame 299: 461 features
with tp.PandasHDFStore('data.h5') as s:
    for linked in tp.link_df_iter(s, 5, memory=3):
        s.put(linked)
Frame 299: 461 trajectories present

You can get results by frame with s.get(frame_number) or, when you have sufficient memory, retrieve them all. The results is identifical to what you would obtained using the basic functions batch and link_df.

with tp.PandasHDFStore('data.h5') as s:
    trajectories = pd.concat(iter(s))

Optional High Performance Component: Numba

The core, time-consuming steps in particle location and linking are implemented in Python/numpy and also in pure Python optimized for numba. If numba is installed, trackpy will detect it and use it by default. You can switch it on and off to compare performance.

%timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='numba')
Frame 9: 469 features
1 loops, best of 3: 1.3 s per loop
%timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='python')
Frame 9: 463 features
1 loops, best of 3: 11.8 s per loop

The linking functions link_df or link_df_iter support various options for link_strategy, one of which is numba-based. Read the their docstrings for details.

Parellelization

Feature-finding can easily be parallelized: each frame an independent task, and the tasks can be divided among the available CPUs. IPython parallel makes this very straightforward.

It is simplest to try this on the CPUs of the local machine. To do this from an IPython notebook, go to File > Open, and click the “Clusters” tab. Fill in the “engines” field with the number of available CPUs (e.g., 4) and click start. Now you are running a cluster – it’s that easy. More information on IPython parallel is available in this section of the IPython documentation.

from ipyparallel import Client

client = Client()
view = client.load_balanced_view()

client[:]
<DirectView [0, 1, 2, 3]>
curried_locate = lambda image: tp.locate(image, 11, invert=True, minmass=200)
%%px

import trackpy as tp
%%timeit
amr = view.map_async(curried_locate, frames[:32])
amr.wait_interactive()
  32/32 tasks finished after    2 s
done
1 loops, best of 3: 2.14 s per loop
%%timeit
serial_result = list(map(curried_locate, frames[:32]))
1 loops, best of 3: 4.51 s per loop