This notebook follows an example project from start to finish. We take video of micron-sized particles diffusing in water, track them, and analyze the trajectories to obtain the viscosity of water.
At the bottom of the notebook, we very briefly survey the more advanced features of trackpy. Browse the full list of example notebooks to learn more.
We need Python’s plotting library, matplotlib. Your environment might load matplotlib automatically, but for this tutorial I’ll load it explicitly using this convention. If you are unfamiliar with matplotlib, do the same as I do here, and everything that follows will work without modification.
from __future__ import division, unicode_literals, print_function # for compatibility with Python 2 and 3
import matplotlib as mpl
import matplotlib.pyplot as plt
# the following line only works in an IPython notebook
%matplotlib notebook
# Optionally, tweak styles.
mpl.rc('figure', figsize=(10, 6))
mpl.rc('image', cmap='gray')
We also might want to use scientific Python libraries. Finally, we’ll import trackpy itself and its sister project, pims.
import numpy as np
import pandas as pd
from pandas import DataFrame, Series # for convenience
import pims
import trackpy as tp
We use the alias tp for brevity.
To get our data into Python, we use our sister project, PIMS (Python Image Sequence). PIMS makes is easy and convenient to load and process video data from many formats with one consistent interface.
You can read in: * a directory or zipfile of sequential images using ImageSequence * a multi-frame TIFF file using TiffStack * a video (AVI, MOV, etc.) using Video * specialty formats used in microscopy and scientific video capture: * Cine * LSM * Files supported by Bioformats * ND2 using PIMS_ND2
ImageSequence works out of the box, so we recommended trying that to start. The others require some extra dependencies. See the README page.
frames = pims.ImageSequence('../sample_data/bulk_water/*.png', as_grey=True)
frames
<Frames>
Source: /Users/dallan/Documents/Repos/trackpy-examples/sample_data/bulk_water/*.png
Length: 300 frames
Frame Shape: 424 x 640
Pixel Datatype: uint8
We can access any frame like frames[frame_number]. The image is represented as a numpy array of intensities.
print(frames[0]) # the first frame
[[124 124 124 ..., 119 120 121]
[124 124 124 ..., 119 121 121]
[124 124 123 ..., 120 122 123]
...,
[125 126 125 ..., 107 97 96]
[125 125 124 ..., 115 108 105]
[125 125 124 ..., 123 118 116]]
In an IPython notebook, the frame is represented by displaying the image.
frames[0]
Alternatively, to make a proper plot with axes and control over scaling, use matplotlib.
plt.imshow(frames[0])
<IPython.core.display.Javascript object>
<matplotlib.image.AxesImage at 0x10d4f44a8>
Frames behave like numpy arrays with a few extra properties.
frames[123].frame_no
123
frames[123].metadata # Scientific formats can pass experiment meta data here.
{}
Start with just the first frame. Estimate the size of the features (in pixels). The size must be an odd integer, and it is better to err on the large side, as we’ll see below. I estimate 11 pixels.
f = tp.locate(frames[0], 11, invert=True)
The algorithm looks for bright features; since my features are dark, I set invert=True.
locate returns a spreadsheet-like object called a DataFrame. It lists
You can read more about DataFrames in the pandas documentation. They can easily be exported to formats like CSV, Excel, SQL, HDF5, etc.
f.head() # shows the first few rows of data
x | y | mass | size | ecc | signal | raw_mass | ep | frame | |
---|---|---|---|---|---|---|---|---|---|
0 | 294.900375 | 5.611320 | 345.949923 | 2.558583 | 0.186067 | 17.881905 | 10719 | 0.089503 | 0 |
1 | 310.237439 | 6.915316 | 246.550508 | 3.215075 | 0.008316 | 5.494440 | 10491 | 0.154562 | 0 |
2 | 36.036426 | 8.142955 | 436.058739 | 2.927963 | 0.116840 | 12.587263 | 11232 | 0.045968 | 0 |
3 | 67.322668 | 7.721634 | 405.889269 | 2.834555 | 0.065217 | 13.885948 | 11000 | 0.058931 | 0 |
4 | 274.313019 | 8.450485 | 288.508050 | 2.819906 | 0.161817 | 9.190700 | 10594 | 0.116354 | 0 |
plt.figure() # make a new figure
tp.annotate(f, frames[0])
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11e3e1550>
Many of these circles are clearly wrong; they are fleeting peaks in brightness that aren’t actually particles. There are many ways to distinguish real particles from spurrious ones. The most important way is to look at total brightness (“mass”).
fig, ax = plt.subplots()
ax.hist(f['mass'], bins=20)
# Optionally, label the axes.
ax.set(xlabel='mass', ylabel='count')
<IPython.core.display.Javascript object>
[<matplotlib.text.Text at 0x10d8092e8>, <matplotlib.text.Text at 0x11e6d2f28>]
f = tp.locate(frames[0], 11, invert=True, minmass=200)
plt.figure()
tp.annotate(f, frames[0])
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11d9ca5c0>
There are more options for controling and optimizing feature-finding. You can review them in the documentation. Or, pull them up as you work by typing tp.locate? into IPython.
As Eric Weeks points out in his related tutorial, a quick way to check for subpixel accuracy is to check that the decimal part of the x and/or y positions are evenly distributed. Trackpy provides a convenience plotting function for this:
plt.figure()
tp.subpx_bias(f)
<IPython.core.display.Javascript object>
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
"is being cleared", UserWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x11d9ca240>
If we use a mask size that is too small, the histogram often shows a dip in the middle.
plt.figure()
tp.subpx_bias(tp.locate(frames[0], 7, invert=True, minmass=200))
<IPython.core.display.Javascript object>
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
"is being cleared", UserWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x1152326a0>
Or, to start, just explore a subset of the frames.
selection | syntax example |
---|---|
all the frames | frames[:] or simply frames. |
the first 10 frames | frames[:10] |
the last 10 frames | frames[-10:] |
a range of frames | frames[100:200] |
every 10th frame | frame[::10] |
a list of specific frames | frames[[100, 107, 113]] |
We’ll locate features in the first 300 frames from this video. We use tp.batch, which calls tp.locate on each frame and collects the results.
f = tp.batch(frames[:300], 11, minmass=200, invert=True)
Frame 299: 461 features
As each frame is analyzed, tp.batch reports the frame number and how many features were found. If this number runs unexpectedly low or high, you may wish to interrupt it and try different parameters.
We have the locations of the particles in each frame. Next we’ll track particles from frame to frame, giving each one a number for identification.
First, we must must specify a maximum displacement, the farthest a particle can travel between frames. We should choose the smallest reasonable value because a large value slows computation time considerably. In this case, 5 pixels is reasonable.
Second, we allow for the possibility that a particle might be missed for a few frames and then seen again. (Perhaps its “mass” slipped below our cutoff due to noise in the video.) Memory keeps track of disappeared particles and maintains their ID for up to some number of frames after their last appearance. We’ll choose 3.
t = tp.link_df(f, 5, memory=3)
Frame 299: 461 trajectories present
The result is the features DataFrame f with an additional column, particle, identifying each feature with a label.
t.head()
x | y | mass | size | ecc | signal | raw_mass | ep | frame | particle | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 294.900375 | 5.611320 | 345.949923 | 2.558583 | 0.186067 | 17.881905 | 10719 | 0.089503 | 0 | 0 |
1 | 295.360118 | 5.316313 | 388.202575 | 2.524643 | 0.214783 | 18.883275 | 10786 | 0.078429 | 1 | 0 |
2 | 295.970101 | 5.627616 | 373.436505 | 2.630296 | 0.189075 | 17.458842 | 10710 | 0.088747 | 2 | 0 |
3 | 295.734210 | 5.843516 | 392.386204 | 2.617637 | 0.187835 | 18.511646 | 10750 | 0.080728 | 3 | 0 |
4 | 295.686010 | 5.810687 | 375.745878 | 2.543898 | 0.200468 | 18.458071 | 10737 | 0.083059 | 4 | 0 |
We have more filtering to do. Empheremeral trajectories – seen only for a few frames – are usually spurious and never useful. The convenience function filter_stubs keeps only trajectories that last for a given number of frames.
t1 = tp.filter_stubs(t, 50)
# Compare the number of particles in the unfiltered and filtered data.
print('Before:', t['particle'].nunique())
print('After:', t1['particle'].nunique())
Before: 4195
After: 828
We can also filter trajectories by their appearance. At this stage, with trajectories linked, we can look at a feature’s “average appearance” throughout its trajectory, giving a more accurate picture.
plt.figure()
tp.mass_size(t1.groupby('particle').mean()) # convenience function -- just plots size vs. mass
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11e675d30>
The particles with especially low mass or especially large size are probably out of focus or aggregated, respectively. It is best to experiment by trial and error, filtering out regions of mass-size space and looking at the results using tp.annotate and tp.circle. In the end, we need to separate the good particles from the spurious ones, and it doesn’t matter how we get it done.
condition = lambda x: ((x['mass'].mean() > 250) & (x['size'].mean() < 3.0) &
(x['ecc'].mean() < 0.1))
t2 = tp.filter(t1, condition) # a wrapper for pandas' filter that works around a bug in v 0.12
plt.figure()
tp.annotate(t2[t2['frame'] == 0], frames[0])
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11e10cf28>
Trace the trajectories.
plt.figure()
tp.plot_traj(t1)
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x11d9bd240>
Compute the overall drifting motion, which we will subtract away, adopting the reference frame of the particles’ average position.
d = tp.compute_drift(t1)
plt.figure()
d.plot()
<IPython.core.display.Javascript object>
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x122f93f28>
tm = tp.subtract_drift(t1, d)
With the overall drifting motion subtracted out, we plot the trajectories again. We observe nice random walks.
plt.figure()
tp.plot_traj(tm)
<IPython.core.display.Javascript object>
<matplotlib.axes._subplots.AxesSubplot at 0x1182f2ba8>
Compute the mean squared displacement of each particle and plot MSD vs. lag time.
im = tp.imsd(tm, 100/285., 24) # microns per pixel = 100/285., frames per second = 24
fig, ax = plt.subplots()
ax.plot(im.index, im, 'k-', alpha=0.1) # black lines, semitransparent
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
xlabel='lag time $t$')
ax.set_xscale('log')
ax.set_yscale('log')
<IPython.core.display.Javascript object>
Since we only analyzed 300 frames, the statistics are poor at large lag times. With more frames, we can study larger lag times.
em = tp.emsd(tm, 100/285., 24)
fig, ax = plt.subplots()
ax.plot(em.index, em, 'o')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
xlabel='lag time $t$')
ax.set(ylim=(1e-2, 10))
<IPython.core.display.Javascript object>
[(0.01, 10)]
We can easily fit this ensemble mean-squared displacement to a power law, \(At^n\), using a convenience function, fit_powerlaw, that performs a linear regression in log space.
plt.figure()
tp.utils.fit_powerlaw(em) # performs linear best fit in log space, plots
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$')
<IPython.core.display.Javascript object>
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/matplotlib/artist.py:221: MatplotlibDeprecationWarning: This has been deprecated in mpl 1.5, please use the
axes property. A removal date has not been set.
warnings.warn(_get_axes_msg, mplDeprecation, stacklevel=1)
<matplotlib.text.Text at 0x12a52ca20>
In water, a viscous material, the expected power-law exponent \(n = 1\). The value of \(A = 4D\), where \(D\) is the particles’ diffusivity. \(D\) is related to viscosity \(\eta\), particle radius \(a\), and temperature \(T\) as
\(D = \displaystyle\frac{kT}{6\pi\eta a}\).
For particles with a 1 \(\mu\text{m}\) diameter in room-temperature water, \(A \approx 1.66\). Our value above is not far off.
This is the end of the walkthrough. Keep reading to review the more advanced capabilities in trackpy.
These are covered in greater detail in later tutorials.
The feature-finding and trajectory-linking functions batch and link_df keep all of their results in memory. This approach is simple, but it isn’t necessary. We can prcoess an unlimited number of frames if we save the results as we go.
Trackpy includes a class to manage storing an retrieving data framewise in an HDF5 format. The general idea is easily extensive to other formats.
with tp.PandasHDFStore('data.h5') as s:
tp.batch(frames, 11, invert=True, minmass=200, output=s)
Frame 299: 461 features
with tp.PandasHDFStore('data.h5') as s:
for linked in tp.link_df_iter(s, 5, memory=3):
s.put(linked)
Frame 299: 461 trajectories present
You can get results by frame with s.get(frame_number) or, when you have sufficient memory, retrieve them all. The results is identifical to what you would obtained using the basic functions batch and link_df.
with tp.PandasHDFStore('data.h5') as s:
trajectories = pd.concat(iter(s))
The core, time-consuming steps in particle location and linking are implemented in Python/numpy and also in pure Python optimized for numba. If numba is installed, trackpy will detect it and use it by default. You can switch it on and off to compare performance.
%timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='numba')
Frame 9: 469 features
1 loops, best of 3: 1.3 s per loop
%timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='python')
Frame 9: 463 features
1 loops, best of 3: 11.8 s per loop
The linking functions link_df or link_df_iter support various options for link_strategy, one of which is numba-based. Read the their docstrings for details.
Feature-finding can easily be parallelized: each frame an independent task, and the tasks can be divided among the available CPUs. IPython parallel makes this very straightforward.
It is simplest to try this on the CPUs of the local machine. To do this from an IPython notebook, go to File > Open, and click the “Clusters” tab. Fill in the “engines” field with the number of available CPUs (e.g., 4) and click start. Now you are running a cluster – it’s that easy. More information on IPython parallel is available in this section of the IPython documentation.
from ipyparallel import Client
client = Client()
view = client.load_balanced_view()
client[:]
<DirectView [0, 1, 2, 3]>
curried_locate = lambda image: tp.locate(image, 11, invert=True, minmass=200)
%%px
import trackpy as tp
%%timeit
amr = view.map_async(curried_locate, frames[:32])
amr.wait_interactive()
32/32 tasks finished after 2 s
done
1 loops, best of 3: 2.14 s per loop
%%timeit
serial_result = list(map(curried_locate, frames[:32]))
1 loops, best of 3: 4.51 s per loop