Walkthrough
===========
This notebook follows an example project from start to finish. We take
video of micron-sized particles diffusing in water, track them, and
analyze the trajectories to obtain the viscosity of water.
At the bottom of the notebook, we very briefly survey the more advanced
features of trackpy. Browse the `full list of example
notebooks `__
to learn more.
Scientific IPython Setup
------------------------
We need Python's plotting library, matplotlib. Your environment might
load matplotlib automatically, but for this tutorial I'll load it
explicitly using this convention. If you are unfamiliar with matplotlib,
do the same as I do here, and everything that follows will work without
modification.
.. code:: python
from __future__ import division, unicode_literals, print_function # for compatibility with Python 2 and 3
import matplotlib as mpl
import matplotlib.pyplot as plt
# the following line only works in an IPython notebook
%matplotlib notebook
# Optionally, tweak styles.
mpl.rc('figure', figsize=(10, 6))
mpl.rc('image', cmap='gray')
We also might want to use scientific Python libraries. Finally, we'll
import ``trackpy`` itself and its sister project, ``pims``.
.. code:: python
import numpy as np
import pandas as pd
from pandas import DataFrame, Series # for convenience
import pims
import trackpy as tp
We use the alias ``tp`` for brevity.
Step 1: Read the Data
---------------------
Opening images or video
~~~~~~~~~~~~~~~~~~~~~~~
To get our data into Python, we use our sister project,
`PIMS `__ (Python Image Sequence).
PIMS makes is easy and convenient to load and process video data from
many formats with one consistent interface.
You can read in: \* a directory or zipfile of sequential images using
``ImageSequence`` \* a multi-frame TIFF file using ``TiffStack`` \* a
video (AVI, MOV, etc.) using ``Video`` \* specialty formats used in
microscopy and scientific video capture: \* ``Cine`` \* ``LSM`` \* Files
supported by
`Bioformats `__
\* ``ND2`` using `PIMS\_ND2 `__
``ImageSequence`` works out of the box, so we recommended trying that to
start. The others require some extra dependencies. See the README page.
.. code:: python
frames = pims.ImageSequence('../sample_data/bulk_water/*.png', as_grey=True)
.. code:: python
frames
.. parsed-literal::
Source: /Users/dallan/Documents/Repos/trackpy-examples/sample_data/bulk_water/*.png
Length: 300 frames
Frame Shape: 424 x 640
Pixel Datatype: uint8
We can access any frame like ``frames[frame_number]``. The image is
represented as a numpy array of intensities.
.. code:: python
print(frames[0]) # the first frame
.. parsed-literal::
[[124 124 124 ..., 119 120 121]
[124 124 124 ..., 119 121 121]
[124 124 123 ..., 120 122 123]
...,
[125 126 125 ..., 107 97 96]
[125 125 124 ..., 115 108 105]
[125 125 124 ..., 123 118 116]]
In an IPython notebook, the frame is represented by displaying the
image.
.. code:: python
frames[0]
.. raw:: html
Alternatively, to make a proper plot with axes and control over scaling,
use matplotlib.
.. code:: python
plt.imshow(frames[0])
.. parsed-literal::
.. raw:: html
.. parsed-literal::
Frames behave like numpy arrays with a few extra properties.
.. code:: python
frames[123].frame_no
.. parsed-literal::
123
.. code:: python
frames[123].metadata # Scientific formats can pass experiment meta data here.
.. parsed-literal::
{}
Step 2: Locate Features
-----------------------
Start with just the first frame. Estimate the size of the features (in
pixels). The size must be an odd integer, and it is better to err on the
large side, as we'll see below. I estimate 11 pixels.
.. code:: python
f = tp.locate(frames[0], 11, invert=True)
The algorithm looks for *bright* features; since my features are dark, I
set ``invert=True``.
``locate`` returns a spreadsheet-like object called a DataFrame. It
lists
- each feature's position,
- various characterizations of its appearance, which we will use to
filter out spurious features,
- the "signal" strength and an estimate of uncertainty, both derived
from this paper
You can read more about DataFrames in the `pandas
documentation `__. They can
easily be exported to formats like CSV, Excel, SQL, HDF5, etc.
.. code:: python
f.head() # shows the first few rows of data
.. raw:: html
|
x |
y |
mass |
size |
ecc |
signal |
raw_mass |
ep |
frame |
0 |
294.900375 |
5.611320 |
345.949923 |
2.558583 |
0.186067 |
17.881905 |
10719 |
0.089503 |
0 |
1 |
310.237439 |
6.915316 |
246.550508 |
3.215075 |
0.008316 |
5.494440 |
10491 |
0.154562 |
0 |
2 |
36.036426 |
8.142955 |
436.058739 |
2.927963 |
0.116840 |
12.587263 |
11232 |
0.045968 |
0 |
3 |
67.322668 |
7.721634 |
405.889269 |
2.834555 |
0.065217 |
13.885948 |
11000 |
0.058931 |
0 |
4 |
274.313019 |
8.450485 |
288.508050 |
2.819906 |
0.161817 |
9.190700 |
10594 |
0.116354 |
0 |
.. code:: python
plt.figure() # make a new figure
tp.annotate(f, frames[0])
.. parsed-literal::
.. raw:: html
.. parsed-literal::
Refine parameters to elminate spurious features
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many of these circles are clearly wrong; they are fleeting peaks in
brightness that aren't actually particles. There are many ways to
distinguish real particles from spurrious ones. The most important way
is to look at total brightness ("mass").
.. code:: python
fig, ax = plt.subplots()
ax.hist(f['mass'], bins=20)
# Optionally, label the axes.
ax.set(xlabel='mass', ylabel='count')
.. parsed-literal::
.. raw:: html
.. parsed-literal::
[, ]
.. code:: python
f = tp.locate(frames[0], 11, invert=True, minmass=200)
.. code:: python
plt.figure()
tp.annotate(f, frames[0])
.. parsed-literal::
.. raw:: html
.. parsed-literal::
There are more options for controling and optimizing feature-finding.
You can review them in the
`documentation `__.
Or, pull them up as you work by typing ``tp.locate?`` into IPython.
Check for subpixel accuracy
~~~~~~~~~~~~~~~~~~~~~~~~~~~
As Eric Weeks points out in his related tutorial, a quick way to check
for subpixel accuracy is to check that the decimal part of the x and/or
y positions are evenly distributed. Trackpy provides a convenience
plotting function for this:
.. code:: python
plt.figure()
tp.subpx_bias(f)
.. parsed-literal::
.. raw:: html
.. parsed-literal::
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
"is being cleared", UserWarning)
.. parsed-literal::
If we use a mask size that is too small, the histogram often shows a dip
in the middle.
.. code:: python
plt.figure()
tp.subpx_bias(tp.locate(frames[0], 7, invert=True, minmass=200))
.. parsed-literal::
.. raw:: html
.. parsed-literal::
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/pandas/tools/plotting.py:3235: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
"is being cleared", UserWarning)
.. parsed-literal::
Locate features in all frames
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Or, to start, just explore a subset of the frames.
+-----------------------------+---------------------------------------+
| selection | syntax example |
+=============================+=======================================+
| all the frames | ``frames[:]`` or simply ``frames``. |
+-----------------------------+---------------------------------------+
| the first 10 frames | ``frames[:10]`` |
+-----------------------------+---------------------------------------+
| the last 10 frames | ``frames[-10:]`` |
+-----------------------------+---------------------------------------+
| a range of frames | ``frames[100:200]`` |
+-----------------------------+---------------------------------------+
| every 10th frame | ``frame[::10]`` |
+-----------------------------+---------------------------------------+
| a list of specific frames | ``frames[[100, 107, 113]]`` |
+-----------------------------+---------------------------------------+
We'll locate features in the first 300 frames from this video. We use
``tp.batch``, which calls ``tp.locate`` on each frame and collects the
results.
.. code:: python
f = tp.batch(frames[:300], 11, minmass=200, invert=True)
.. parsed-literal::
Frame 299: 461 features
As each frame is analyzed, ``tp.batch`` reports the frame number and how
many features were found. If this number runs unexpectedly low or high,
you may wish to interrupt it and try different parameters.
Step 3: Link features into particle trajectories
------------------------------------------------
We have the locations of the particles in each frame. Next we'll track
particles from frame to frame, giving each one a number for
identification.
First, we must must specify a maximum displacement, the farthest a
particle can travel between frames. We should choose the smallest
reasonable value because a large value slows computation time
considerably. In this case, 5 pixels is reasonable.
Second, we allow for the possibility that a particle might be missed for
a few frames and then seen again. (Perhaps its "mass" slipped below our
cutoff due to noise in the video.) Memory keeps track of disappeared
particles and maintains their ID for up to some number of frames after
their last appearance. We'll choose 3.
.. code:: python
t = tp.link_df(f, 5, memory=3)
.. parsed-literal::
Frame 299: 461 trajectories present
The result is the features DataFrame ``f`` with an additional column,
``particle``, identifying each feature with a label.
.. code:: python
t.head()
.. raw:: html
|
x |
y |
mass |
size |
ecc |
signal |
raw_mass |
ep |
frame |
particle |
0 |
294.900375 |
5.611320 |
345.949923 |
2.558583 |
0.186067 |
17.881905 |
10719 |
0.089503 |
0 |
0 |
1 |
295.360118 |
5.316313 |
388.202575 |
2.524643 |
0.214783 |
18.883275 |
10786 |
0.078429 |
1 |
0 |
2 |
295.970101 |
5.627616 |
373.436505 |
2.630296 |
0.189075 |
17.458842 |
10710 |
0.088747 |
2 |
0 |
3 |
295.734210 |
5.843516 |
392.386204 |
2.617637 |
0.187835 |
18.511646 |
10750 |
0.080728 |
3 |
0 |
4 |
295.686010 |
5.810687 |
375.745878 |
2.543898 |
0.200468 |
18.458071 |
10737 |
0.083059 |
4 |
0 |
Filter spurious trajectories.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have more filtering to do. Empheremeral trajectories -- seen only for
a few frames -- are usually spurious and never useful. The convenience
function ``filter_stubs`` keeps only trajectories that last for a given
number of frames.
.. code:: python
t1 = tp.filter_stubs(t, 50)
# Compare the number of particles in the unfiltered and filtered data.
print('Before:', t['particle'].nunique())
print('After:', t1['particle'].nunique())
.. parsed-literal::
Before: 4195
After: 828
We can also filter trajectories by their appearance. At this stage, with
trajectories linked, we can look at a feature's "average appearance"
throughout its trajectory, giving a more accurate picture.
.. code:: python
plt.figure()
tp.mass_size(t1.groupby('particle').mean()) # convenience function -- just plots size vs. mass
.. parsed-literal::
.. raw:: html
.. parsed-literal::
The particles with especially low mass or especially large size are
probably out of focus or aggregated, respectively. It is best to
experiment by trial and error, filtering out regions of mass-size space
and looking at the results using ``tp.annotate`` and ``tp.circle``. In
the end, we need to separate the good particles from the spurious ones,
and it doesn't matter how we get it done.
.. code:: python
condition = lambda x: ((x['mass'].mean() > 250) & (x['size'].mean() < 3.0) &
(x['ecc'].mean() < 0.1))
t2 = tp.filter(t1, condition) # a wrapper for pandas' filter that works around a bug in v 0.12
.. code:: python
plt.figure()
tp.annotate(t2[t2['frame'] == 0], frames[0])
.. parsed-literal::
.. raw:: html
.. parsed-literal::
Trace the trajectories.
.. code:: python
plt.figure()
tp.plot_traj(t1)
.. parsed-literal::
.. raw:: html
.. parsed-literal::
Remove overall drift
~~~~~~~~~~~~~~~~~~~~
Compute the overall drifting motion, which we will subtract away,
adopting the reference frame of the particles' average position.
.. code:: python
d = tp.compute_drift(t1)
.. code:: python
plt.figure()
d.plot()
.. parsed-literal::
.. raw:: html
.. parsed-literal::
.. raw:: html
.. parsed-literal::
.. code:: python
tm = tp.subtract_drift(t1, d)
With the overall drifting motion subtracted out, we plot the
trajectories again. We observe nice random walks.
.. code:: python
plt.figure()
tp.plot_traj(tm)
.. parsed-literal::
.. raw:: html
.. parsed-literal::
Step 4: Analyze trajectories
----------------------------
Mean Squared Displacement of Individal Probes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compute the mean squared displacement of each particle and plot MSD vs.
lag time.
.. code:: python
im = tp.imsd(tm, 100/285., 24) # microns per pixel = 100/285., frames per second = 24
.. code:: python
fig, ax = plt.subplots()
ax.plot(im.index, im, 'k-', alpha=0.1) # black lines, semitransparent
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
xlabel='lag time $t$')
ax.set_xscale('log')
ax.set_yscale('log')
.. parsed-literal::
.. raw:: html
Since we only analyzed 300 frames, the statistics are poor at large lag
times. With more frames, we can study larger lag times.
Ensemble Mean Squared Displacement
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: python
em = tp.emsd(tm, 100/285., 24)
.. code:: python
fig, ax = plt.subplots()
ax.plot(em.index, em, 'o')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set(ylabel=r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]',
xlabel='lag time $t$')
ax.set(ylim=(1e-2, 10))
.. parsed-literal::
.. raw:: html
.. parsed-literal::
[(0.01, 10)]
We can easily fit this ensemble mean-squared displacement to a power
law, :math:`At^n`, using a convenience function, ``fit_powerlaw``, that
performs a linear regression in log space.
.. code:: python
plt.figure()
tp.utils.fit_powerlaw(em) # performs linear best fit in log space, plots
plt.ylabel(r'$\langle \Delta r^2 \rangle$ [$\mu$m$^2$]')
plt.xlabel('lag time $t$')
.. parsed-literal::
.. raw:: html
.. parsed-literal::
/Users/dallan/miniconda/envs/py3/lib/python3.4/site-packages/matplotlib/artist.py:221: MatplotlibDeprecationWarning: This has been deprecated in mpl 1.5, please use the
axes property. A removal date has not been set.
warnings.warn(_get_axes_msg, mplDeprecation, stacklevel=1)
.. parsed-literal::
In water, a viscous material, the expected power-law exponent
:math:`n = 1`. The value of :math:`A = 4D`, where :math:`D` is the
particles' diffusivity. :math:`D` is related to viscosity :math:`\eta`,
particle radius :math:`a`, and temperature :math:`T` as
:math:`D = \displaystyle\frac{kT}{6\pi\eta a}`.
For particles with a 1 :math:`\mu\text{m}` diameter in room-temperature
water, :math:`A \approx 1.66`. Our value above is not far off.
**This is the end of the walkthrough. Keep reading to review the more
advanced capabilities in trackpy.**
Preview of Advanced Features
----------------------------
These are covered in greater detail in later tutorials.
Streaming
---------
The feature-finding and trajectory-linking functions ``batch`` and
``link_df`` keep all of their results in memory. This approach is
simple, but it isn't necessary. We can prcoess an unlimited number of
frames if we save the results as we go.
Trackpy includes a class to manage storing an retrieving data framewise
in an HDF5 format. The general idea is easily extensive to other
formats.
.. code:: python
with tp.PandasHDFStore('data.h5') as s:
tp.batch(frames, 11, invert=True, minmass=200, output=s)
.. parsed-literal::
Frame 299: 461 features
.. code:: python
with tp.PandasHDFStore('data.h5') as s:
for linked in tp.link_df_iter(s, 5, memory=3):
s.put(linked)
.. parsed-literal::
Frame 299: 461 trajectories present
You can get results by frame with ``s.get(frame_number)`` or, when you
have sufficient memory, retrieve them all. The results is identifical to
what you would obtained using the basic functions ``batch`` and
``link_df``.
.. code:: python
with tp.PandasHDFStore('data.h5') as s:
trajectories = pd.concat(iter(s))
Optional High Performance Component: Numba
------------------------------------------
The core, time-consuming steps in particle location and linking are
implemented in Python/numpy and also in pure Python optimized for numba.
If numba is installed, trackpy will detect it and use it by default. You
can switch it on and off to compare performance.
.. code:: python
%timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='numba')
.. parsed-literal::
Frame 9: 469 features
1 loops, best of 3: 1.3 s per loop
.. code:: python
%timeit tp.batch(frames[:10], 11, invert=True, minmass=200, engine='python')
.. parsed-literal::
Frame 9: 463 features
1 loops, best of 3: 11.8 s per loop
The linking functions ``link_df`` or ``link_df_iter`` support various
options for ``link_strategy``, one of which is numba-based. Read the
their docstrings for details.
Parellelization
---------------
Feature-finding can easily be parallelized: each frame an independent
task, and the tasks can be divided among the available CPUs. IPython
parallel makes this very straightforward.
It is simplest to try this on the CPUs of the local machine. To do this
from an IPython notebook, go to File > Open, and click the "Clusters"
tab. Fill in the "engines" field with the number of available CPUs
(e.g., 4) and click start. Now you are running a cluster -- it's that
easy. More information on IPython parallel is available in `this section
of the IPython
documentation `__.
.. code:: python
from ipyparallel import Client
client = Client()
view = client.load_balanced_view()
client[:]
.. parsed-literal::
.. code:: python
curried_locate = lambda image: tp.locate(image, 11, invert=True, minmass=200)
.. code:: python
%%px
import trackpy as tp
.. code:: python
%%timeit
amr = view.map_async(curried_locate, frames[:32])
amr.wait_interactive()
.. parsed-literal::
32/32 tasks finished after 2 s
done
1 loops, best of 3: 2.14 s per loop
.. code:: python
%%timeit
serial_result = list(map(curried_locate, frames[:32]))
.. parsed-literal::
1 loops, best of 3: 4.51 s per loop