walkthrough¶
Walkthrough¶
This notebook follows an example trackpy project from start to finish. We take video of micron-sized particles diffusing in water, track them, and analyze the trajectories to obtain the viscosity of water.
At the bottom of the notebook, we very briefly survey the more advanced features of trackpy. Browse the rest of the documentation to learn more.
Scientific IPython Setup¶
We need Python's plotting library, matplotlib. Your environment might load matplotlib automatically, but for this tutorial I'll load it explicitly using this convention. If you are unfamiliar with matplotlib, do the same as I do here, and everything that follows will work without modification.
from __future__ import division, unicode_literals, print_function # for compatibility with Python 2 and 3
import matplotlib as mpl
import matplotlib.pyplot as plt
# change the following to %matplotlib notebook for interactive plotting
%matplotlib inline
# Optionally, tweak styles.
mpl.rc('figure', figsize=(10, 5))
mpl.rc('image', cmap='gray')
We also might want to use scientific Python libraries. Finally, we'll import trackpy
itself and its sister project, pims
.
import numpy as np
import pandas as pd
from pandas import DataFrame, Series # for convenience
import pims
import trackpy as tp
We use the alias tp
for brevity.
Step 1: Read the Data¶
Opening images or video¶
To get our data into Python, we use our sister project, PIMS (Python Image Sequence). PIMS makes it easy and convenient to load and process video data from many formats with one consistent interface.
Using PIMS, trackpy can read:
- a directory or zipfile of sequential images
- a multi-frame TIFF file
- a video (AVI, MOV, etc.)
- specialty formats used in microscopy and scientific video capture:
Cine
, NorPixseq
LSM
- Files supported by Bioformats
ND2
using PIMS_ND2
(Some of the formats require some extra dependencies. For a complete list, see the README page for PIMS, or the installation instructions in the documentation.)
For many formats, using pims.open
just works. Since these images are in color, we also need to set up a pipeline to convert each image to grayscale when it is read.
@pims.pipeline
def gray(image):
return image[:, :, 1] # Take just the green channel
frames = gray(pims.open('../sample_data/bulk_water/*.png'))
frames
(ImageSequence,) processed through proc_func. Original repr: <Frames> Source: /Users/nkeim/projects/trackpy/trackpy-examples/sample_data/bulk_water/*.png Length: 300 frames Frame Shape: (424, 640, 4) Pixel Datatype: uint8
We can access any frame like frames[frame_number]
. The image is represented as a numpy array of intensities. If you're using the Anaconda distribution of Python, these should be in the range [0, 255]. If you have a more custom environment they may be in the range [0, 1], in which case you'll have to experiment with the minmass
parameters below.
print(frames[0]) # the first frame
[[125 125 125 ... 120 120 121] [125 125 125 ... 120 121 121] [125 125 124 ... 121 123 124] ... [125 126 125 ... 108 98 97] [125 125 125 ... 116 109 106] [125 125 125 ... 124 119 117]]
In an IPython notebook, the frame is represented directly by displaying the image.
frames[0]
Alternatively, to make a proper plot with axes and control over scaling, use matplotlib's imshow()
method.
plt.imshow(frames[0]);
Frames behave like numpy arrays, but with a few extra properties.
frames[123].frame_no
123
frames[123].metadata # Scientific formats can pass experiment meta data here.
{}
Step 2: Locate Features¶
Start with just the first frame. Estimate the size of the features (in pixels). The size must be an odd integer, and it is better to err on the large side, as we'll see below. We estimate 11 pixels.
f = tp.locate(frames[0], 11, invert=True)
The algorithm looks for bright features; since the features in this set of images are dark, we set invert=True
.
locate
returns a spreadsheet-like object called a DataFrame. It lists
- each feature's position,
- various characterizations of its appearance, which we will use to filter out spurious features,
- the "signal" strength and an estimate of uncertainty, both derived from this paper
More information about DataFrames may be found in the pandas documentation. DataFrames can easily be exported to formats like CSV, Excel, SQL, HDF5, etc.
f.head() # shows the first few rows of data
y | x | mass | size | ecc | signal | raw_mass | ep | frame | |
---|---|---|---|---|---|---|---|---|---|
0 | 4.750000 | 103.668564 | 192.862485 | 2.106615 | 0.066390 | 10.808405 | 10714.0 | 0.073666 | 0 |
1 | 5.249231 | 585.779487 | 164.659302 | 2.962674 | 0.078936 | 4.222033 | 10702.0 | 0.075116 | 0 |
2 | 5.785986 | 294.792544 | 244.624615 | 2.244542 | 0.219217 | 15.874846 | 10686.0 | 0.077141 | 0 |
3 | 5.869369 | 338.173423 | 187.458282 | 2.046201 | 0.185333 | 13.088304 | 10554.0 | 0.099201 | 0 |
4 | 6.746377 | 310.584169 | 151.486558 | 3.103294 | 0.053342 | 4.475355 | 10403.0 | 0.147430 | 0 |
tp.annotate(f, frames[0]);
Refine parameters to elminate spurious features¶
Many of these circles are clearly wrong; they are fleeting peaks in brightness that aren't actually particles. Rejecting them often improves results and speeds up feature-finding. There are many ways to distinguish real particles from spurious ones. The most important way is to look at total brightness ("mass").
fig, ax = plt.subplots()
ax.hist(f['mass'], bins=20)
# Optionally, label the axes.
ax.set(xlabel='mass', ylabel='count');
We can then specify the minmass
parameter. If your image is especially noisy, you may also find the threshold
parameter useful.
f = tp.locate(frames[0], 11, invert=True, minmass=20)
tp.annotate(f, frames[0]);
There are more options for controling and optimizing feature-finding. You can review them in the documentation, where the most comprehensive description is in the API reference. Or, pull them up as you work by typing tp.locate?
into IPython.
Check for subpixel accuracy¶
As Eric Weeks points out in his related tutorial, a quick way to check for subpixel accuracy is to check that the decimal part of the x and/or y positions are evenly distributed. Trackpy provides a convenience plotting function for this called subpx_bias
:
tp.subpx_bias(f)
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1c268a1490>, <matplotlib.axes._subplots.AxesSubplot object at 0x1c269a54d0>]], dtype=object)
If we use a mask size that is too small, the histogram often shows a dip in the middle.
tp.subpx_bias(tp.locate(frames[0], 7, invert=True, minmass=20));
Locate features in all frames¶
Or, to start, just explore a subset of the frames.
selection | syntax example |
---|---|
all the frames | frames[:] or simply frames . |
the first 10 frames | frames[:10] |
the last 10 frames | frames[-10:] |
a range of frames | frames[100:200] |
every 10th frame | frame[::10] |
a list of specific frames | frames[[100, 107, 113]] |
We'll locate features in the first 300 frames from this video. We use tp.batch
, which calls tp.locate
on each frame and collects the results.
f = tp.batch(frames[:300], 11, minmass=20, invert=True);
Frame 299: 624 features
As each frame is analyzed, tp.batch
reports the frame number and how many features were found. If this number runs unexpectedly low or high, you may wish to interrupt it and try different parameters.
If your images are small, you may find that printing this number actually slows down batch
significantly! In that case you can run tp.quiet()
to turn it off.
Step 3: Link features into particle trajectories¶
We have the locations of the particles in each frame. Next we'll track particles from frame to frame, giving each one a number for identification.
First, we must must specify a maximum displacement, the farthest a particle can travel between frames. We should choose the smallest reasonable value because a large value slows computation time considerably. In this case, 5 pixels is reasonable.
Second, we allow for the possibility that a particle might be missed for a few frames and then seen again. (Perhaps its "mass" slipped below our cutoff due to noise in the video.) Memory keeps track of disappeared particles and maintains their ID for up to some number of frames after their last appearance. Here we use 3 frames.
# tp.quiet() # Turn off progress reports for best performance
t = tp.link(f, 5, memory=3)
Frame 299: 624 trajectories present.
The result is the features DataFrame f
with an additional column, particle
, identifying each feature with a label. We denote this new DataFrame t
.
t.head()
y | x | mass | size | ecc | signal | raw_mass | ep | frame | particle | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4.750000 | 103.668564 | 192.862485 | 2.106615 | 0.066390 | 10.808405 | 10714.0 | 0.073666 | 0 | 0 |
365 | 284.118980 | 25.313881 | 149.037779 | 2.321961 | 0.031799 | 7.008575 | 10770.0 | 0.067577 | 0 | 1 |
364 | 282.753601 | 534.788476 | 222.754482 | 1.908443 | 0.103416 | 15.874846 | 10415.0 | 0.141946 | 0 | 2 |
363 | 280.010398 | 275.185353 | 186.782757 | 2.508665 | 0.183181 | 7.261897 | 10438.0 | 0.132499 | 0 | 3 |
362 | 279.134153 | 252.780324 | 151.064355 | 2.938060 | 0.253812 | 4.306474 | 10359.0 | 0.171764 | 0 | 4 |
Filter spurious trajectories¶
We have more filtering to do. Ephemeral trajectories — seen only for a few frames — are usually spurious and never useful. The convenience function filter_stubs
keeps only trajectories that last for a given number of frames.
t1 = tp.filter_stubs(t, 25)
# Compare the number of particles in the unfiltered and filtered data.
print('Before:', t['particle'].nunique())
print('After:', t1['particle'].nunique())
Before: 13715 After: 1505
We can also filter trajectories by their particles' appearance. At this stage, with trajectories linked, we can look at a feature's "average appearance" throughout its trajectory, giving a more accurate picture.
plt.figure()
tp.mass_size(t1.groupby('particle').mean()); # convenience function -- just plots size vs. mass
The particles with especially low mass, or that are especially large or non-circular (eccentric), are probably out of focus or aggregated, respectively. It is best to experiment by trial and error, filtering out regions of mass-size space and looking at the results using tp.annotate
. In the end, we need to separate the good particles from the spurious ones, and it doesn't matter how we get it done.
t2 = t1[((t1['mass'] > 50) & (t1['size'] < 2.6) &
(t1['ecc'] < 0.3))]
plt.figure()
tp.annotate(t2[t2['frame'] == 0], frames[0]);