walkthrough¶
Walkthrough¶
This notebook follows an example trackpy project from start to finish. We take video of micron-sized particles diffusing in water, track them, and analyze the trajectories to obtain the viscosity of water.
At the bottom of the notebook, we very briefly survey the more advanced features of trackpy. Browse the rest of the documentation to learn more.
Scientific IPython Setup¶
We need Python's plotting library, matplotlib. Your environment might load matplotlib automatically, but for this tutorial I'll load it explicitly using this convention. If you are unfamiliar with matplotlib, do the same as I do here, and everything that follows will work without modification.
from __future__ import division, unicode_literals, print_function # for compatibility with Python 2 and 3
import matplotlib as mpl
import matplotlib.pyplot as plt
# change the following to %matplotlib notebook for interactive plotting
%matplotlib inline
# Optionally, tweak styles.
mpl.rc('figure', figsize=(10, 5))
mpl.rc('image', cmap='gray')
We also might want to use scientific Python libraries. Finally, we'll import trackpy
itself and its sister project, pims
.
import numpy as np
import pandas as pd
from pandas import DataFrame, Series # for convenience
import pims
import trackpy as tp
We use the alias tp
for brevity.
Step 1: Read the Data¶
Opening images or video¶
To get our data into Python, we use our sister project, PIMS (Python Image Sequence). PIMS makes it easy and convenient to load and process video data from many formats with one consistent interface.
Using PIMS, trackpy can read:
- a directory or zipfile of sequential images
- a multi-frame TIFF file
- a video (AVI, MOV, etc.)
- specialty formats used in microscopy and scientific video capture:
Cine
, NorPixseq
LSM
- Files supported by Bioformats
ND2
using PIMS_ND2
(Some of the formats require some extra dependencies. For a complete list, see the README page for PIMS, or the installation instructions in the documentation.)
For many formats, using pims.open
just works. Since these images are in color, we also need to set up a pipeline to convert each image to grayscale when it is read.
@pims.pipeline
def gray(image):
return image[:, :, 1] # Take just the green channel
frames = gray(pims.open('../sample_data/bulk_water/*.png'))
frames
(ImageSequence,) processed through proc_func. Original repr: <Frames> Source: /Users/nkeim/projects/trackpy/trackpy-examples/sample_data/bulk_water/*.png Length: 300 frames Frame Shape: (424, 640, 4) Pixel Datatype: uint8
We can access any frame like frames[frame_number]
. The image is represented as a numpy array of intensities. If you're using the Anaconda distribution of Python, these should be in the range [0, 255]. If you have a more custom environment they may be in the range [0, 1], in which case you'll have to experiment with the minmass
parameters below.
print(frames[0]) # the first frame
[[125 125 125 ... 120 120 121] [125 125 125 ... 120 121 121] [125 125 124 ... 121 123 124] ... [125 126 125 ... 108 98 97] [125 125 125 ... 116 109 106] [125 125 125 ... 124 119 117]]
In an IPython notebook, the frame is represented directly by displaying the image.
frames[0]
Alternatively, to make a proper plot with axes and control over scaling, use matplotlib's imshow()
method.
plt.imshow(frames[0]);
Frames behave like numpy arrays, but with a few extra properties.
frames[123].frame_no
123
frames[123].metadata # Scientific formats can pass experiment meta data here.
{}
Step 2: Locate Features¶
Start with just the first frame. Estimate the size of the features (in pixels). The size must be an odd integer, and it is better to err on the large side, as we'll see below. We estimate 11 pixels.
f = tp.locate(frames[0], 11, invert=True)
The algorithm looks for bright features; since the features in this set of images are dark, we set invert=True
.
locate
returns a spreadsheet-like object called a DataFrame. It lists
- each feature's position,
- various characterizations of its appearance, which we will use to filter out spurious features,
- the "signal" strength and an estimate of uncertainty, both derived from this paper
More information about DataFrames may be found in the pandas documentation. DataFrames can easily be exported to formats like CSV, Excel, SQL, HDF5, etc.
f.head() # shows the first few rows of data
y | x | mass | size | ecc | signal | raw_mass | ep | frame | |
---|---|---|---|---|---|---|---|---|---|
0 | 4.750000 | 103.668564 | 192.862485 | 2.106615 | 0.066390 | 10.808405 | 10714.0 | 0.073666 | 0 |
1 | 5.249231 | 585.779487 | 164.659302 | 2.962674 | 0.078936 | 4.222033 | 10702.0 | 0.075116 | 0 |
2 | 5.785986 | 294.792544 | 244.624615 | 2.244542 | 0.219217 | 15.874846 | 10686.0 | 0.077141 | 0 |
3 | 5.869369 | 338.173423 | 187.458282 | 2.046201 | 0.185333 | 13.088304 | 10554.0 | 0.099201 | 0 |
4 | 6.746377 | 310.584169 | 151.486558 | 3.103294 | 0.053342 | 4.475355 | 10403.0 | 0.147430 | 0 |
tp.annotate(f, frames[0]);
Refine parameters to elminate spurious features¶
Many of these circles are clearly wrong; they are fleeting peaks in brightness that aren't actually particles. Rejecting them often improves results and speeds up feature-finding. There are many ways to distinguish real particles from spurious ones. The most important way is to look at total brightness ("mass").
fig, ax = plt.subplots()
ax.hist(f['mass'], bins=20)
# Optionally, label the axes.
ax.set(xlabel='mass', ylabel='count');
We can then specify the minmass
parameter. If your image is especially noisy, you may also find the threshold
parameter useful.
f = tp.locate(frames[0], 11, invert=True, minmass=20)
tp.annotate(f, frames[0]);
There are more options for controling and optimizing feature-finding. You can review them in the documentation, where the most comprehensive description is in the API reference. Or, pull them up as you work by typing tp.locate?
into IPython.
Check for subpixel accuracy¶
As Eric Weeks points out in his related tutorial, a quick way to check for subpixel accuracy is to check that the decimal part of the x and/or y positions are evenly distributed. Trackpy provides a convenience plotting function for this called subpx_bias
:
tp.subpx_bias(f)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1c268a1490>, <matplotlib.axes._subplots.AxesSubplot object at 0x1c269a54d0>]], dtype=object)
If we use a mask size that is too small, the histogram often shows a dip in the middle.
tp.subpx_bias(tp.locate(frames[0], 7, invert=True, minmass=20));
Locate features in all frames¶
Or, to start, just explore a subset of the frames.
selection | syntax example |
---|---|
all the frames | frames[:] or simply frames . |
the first 10 frames | frames[:10] |
the last 10 frames | frames[-10:] |
a range of frames | frames[100:200] |
every 10th frame | frame[::10] |
a list of specific frames | frames[[100, 107, 113]] |
We'll locate features in the first 300 frames from this video. We use tp.batch
, which calls tp.locate
on each frame and collects the results.
f = tp.batch(frames[:300], 11, minmass=20, invert=True);
Frame 299: 624 features
As each frame is analyzed, tp.batch
reports the frame number and how many features were found. If this number runs unexpectedly low or high, you may wish to interrupt it and try different parameters.
If your images are small, you may find that printing this number actually slows down batch
significantly! In that case you can run tp.quiet()
to turn it off.
Step 3: Link features into particle trajectories¶
We have the locations of the particles in each frame. Next we'll track particles from frame to frame, giving each one a number for identification.
First, we must must specify a maximum displacement, the farthest a particle can travel between frames. We should choose the smallest reasonable value because a large value slows computation time considerably. In this case, 5 pixels is reasonable.
Second, we allow for the possibility that a particle might be missed for a few frames and then seen again. (Perhaps its "mass" slipped below our cutoff due to noise in the video.) Memory keeps track of disappeared particles and maintains their ID for up to some number of frames after their last appearance. Here we use 3 frames.
# tp.quiet() # Turn off progress reports for best performance
t = tp.link(f, 5, memory=3)
Frame 299: 624 trajectories present.
The result is the features DataFrame f
with an additional column, particle
, identifying each feature with a label. We denote this new DataFrame t
.
t.head()
y | x | mass | size | ecc | signal | raw_mass | ep | frame | particle | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4.750000 | 103.668564 | 192.862485 | 2.106615 | 0.066390 | 10.808405 | 10714.0 | 0.073666 | 0 | 0 |
365 | 284.118980 | 25.313881 | 149.037779 | 2.321961 | 0.031799 | 7.008575 | 10770.0 | 0.067577 | 0 | 1 |
364 | 282.753601 | 534.788476 | 222.754482 | 1.908443 | 0.103416 | 15.874846 | 10415.0 | 0.141946 | 0 | 2 |
363 | 280.010398 | 275.185353 | 186.782757 | 2.508665 | 0.183181 | 7.261897 | 10438.0 | 0.132499 | 0 | 3 |
362 | 279.134153 | 252.780324 | 151.064355 | 2.938060 | 0.253812 | 4.306474 | 10359.0 | 0.171764 | 0 | 4 |
Filter spurious trajectories¶
We have more filtering to do. Ephemeral trajectories — seen only for a few frames — are usually spurious and never useful. The convenience function filter_stubs
keeps only trajectories that last for a given number of frames.
t1 = tp.filter_stubs(t, 25)
# Compare the number of particles in the unfiltered and filtered data.
print('Before:', t['particle'].nunique())
print('After:', t1['particle'].nunique())
Before: 13715 After: 1505
We can also filter trajectories by their particles' appearance. At this stage, with trajectories linked, we can look at a feature's "average appearance" throughout its trajectory, giving a more accurate picture.
plt.figure()
tp.mass_size(t1.groupby('particle').mean()); # convenience function -- just plots size vs. mass
The particles with especially low mass, or that are especially large or non-circular (eccentric), are probably out of focus or aggregated, respectively. It is best to experiment by trial and error, filtering out regions of mass-size space and looking at the results using tp.annotate
. In the end, we need to separate the good particles from the spurious ones, and it doesn't matter how we get it done.
t2 = t1[((t1['mass'] > 50) & (t1['size'] < 2.6) &
(t1['ecc'] < 0.3))]
plt.figure()
tp.annotate(t2[t2['frame'] == 0], frames[0]);
Trace the trajectories using plot_traj()
:
plt.figure()
tp.plot_traj(t2);
Remove overall drift¶
Compute the overall drifting motion, which we will subtract away, adopting the reference frame of the particles' average position.
d = tp.compute_drift(t2)
d.plot()
plt.show()
tm = tp.subtract_drift(t2.copy(), d)
With the overall drifting motion subtracted out, we plot the trajectories again. We observe nice random walks.
ax = tp.plot_traj(tm)
plt.show()
Step 4: Analyze trajectories¶
Trackpy includes several functions to help with some common analyses for particle trajectories. (See the "Static Analysis" and "Motion Analysis" sections of the API reference.)
Here, we can show that these data are consistent with colloidal particles undergoing Brownian motion in water.
Mean Squared Displacement of Individal Probes¶
Compute the mean squared displacement (MSD) of each particle using the imsd
function, and plot MSD vs. lag time.
im = tp.imsd(tm, 100/285., 24) # microns per pixel = 100/285., frames per second = 24
fig, ax = plt.subplots()
ax.plot(im.index, im, 'k-', alpha=0.1) # black lines, semitransparent
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
xlabel='lag time $t$')
ax.set_xscale('log')
ax.set_yscale('log')
Since we only analyzed 300 frames, the statistics are poor at large lag times. With more frames, we can study larger lag times.
Ensemble Mean Squared Displacement¶
Now use the emsd
function to compute the ensemble mean squared displacement (EMSD) of all particles:
em = tp.emsd(tm, 100/285., 24) # microns per pixel = 100/285., frames per second = 24
fig, ax = plt.subplots()
ax.plot(em.index, em, 'o')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
xlabel='lag time $t$')
ax.set(ylim=(1e-2, 10));
We can easily fit this ensemble mean-squared displacement to a power law, $At^n$, using a convenience function, fit_powerlaw
, which performs a linear regression in log space.
plt.figure()
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$');
tp.utils.fit_powerlaw(em) # performs linear best fit in log space, plots]
n | A | |
---|---|---|
msd | 1.0787 | 1.595089 |
In water, a viscous material, the expected power-law exponent $n = 1$. The value of $A = 4D$, where $D$ is the particles' diffusivity. $D$ is related to viscosity $\eta$, particle radius $r$, and temperature $T$ through the Stokes-Einstein equation:
$$ D = \displaystyle\frac{k_B T}{6\pi\eta r} $$where $k_B$ is the Boltzmann constant. For particles with a 1 $\mu\text{m}$ diameter in room-temperature water, $A \approx 1.66$ $\mu\textrm{m}^2 / \textrm{s}$. Our values of $n$ and $A$ above are not far off. (If you'd like to know more about this measurement and its uncertainty, this paper is a thorough discussion at the advanced undergraduate level.)
This is the end of the walkthrough. Keep reading to review a few more advanced capabilities in trackpy.
Preview of Some Advanced Features¶
Check out the other tutorials for in-depth explorations of these topics and more!
Parallelization¶
Feature-finding can easily be parallelized: each frame is an independent task, and tasks can be divided among the multiple CPUs in a modern computer. Starting with trackpy v0.4.2, this capability is built into batch
. See the parallelization tutorial for details.
%timeit tp.batch(frames[:20], 11, invert=True, minmass=20, processes='auto')
Frame 19: 576 features 766 ms ± 9.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Optional High Performance Component: Numba¶
The core, time-consuming steps in particle location and linking are implemented in Python/numpy and also in pure Python optimized for numba. If numba is installed, trackpy will detect it and use it by default. You can switch it on and off to compare performance.
%timeit tp.batch(frames[:20], 11, invert=True, minmass=20, engine='numba')
Frame 19: 576 features 1.08 s ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit tp.batch(frames[:20], 11, invert=True, minmass=20, engine='python')
Frame 19: 576 features 3.42 s ± 41.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The linking functions similarly use numba if it's available; you can control this manually using the link_strategy
option.
Streaming¶
The feature-finding and trajectory-linking functions batch
and link
keep all of their results in memory. This approach is simple, but it isn't necessary. We can prcoess an unlimited number of frames if we save the results as we go.
Trackpy includes a class to manage storing an retrieving data framewise in the widely-used HDF5 format. The general idea is easily extensible to other formats.
with tp.PandasHDFStore('data.h5') as s:
tp.batch(frames, 11, invert=True, minmass=200, output=s)
Frame 299: 328 features
with tp.PandasHDFStore('data.h5') as s:
# As before, we require a minimum "life" of 5 frames and a memory of 3 frames
for linked in tp.link_df_iter(s, 5, memory=3):
s.put(linked)
Frame 299: 328 trajectories present.
You can get results by frame with s.get(frame_number)
or, when you have sufficient memory, retrieve them all. The results are identifical to what you would obtain using the basic functions batch
and link
.
with tp.PandasHDFStore('data.h5') as s:
trajectories = pd.concat(iter(s))