search_range
on the Fly¶As discussed in the tutorial on “Advanced Linking,” a basic limitation
of the tracking algorithm involves the selection of search_range
. If
it is too small, many valid trajectories will be unlinked or linked
incorrectly. If it is too large, trackpy
will become overwhelmed
with candidate features, and will either be bogged down by those
possibilities or will give up entirely, raising a
SubnetOversizeException
.
The tutorial on “Advanced Linking” identifies some ways to ease this
dilemma, including prediction, which lets trackpy
select candidate
features more intelligently. Still, “perfect” tracking is often
computationally impossible. One is then faced with the task of finding a
search_range
that allows linking to finish in a reasonable amount of
time while merely minimizing linking errors. The usual method for
optimizing search_range
is to repeatedly adjust it and run the
linker, searching for the largest value that does not halt or
intolerably slow the algorithm. This search can be time-consuming and
frustrating, especially for long movies. Perhaps worse, one must choose
the search_range
for all particles in all frames, even if the
failure is being caused by a single large subnet spanning just two
successive frames.
This difficulty is addressed by the “adaptive search” functionality that
is built into trackpy
starting with version 0.3. Instead of
specifying a single search_range
for all particles in all frames,
one gives trackpy
a maximum value. trackpy
then automatically
reduces search_range
only when and where it encounters difficulty.
When it encounters a large subnet, the algorithm repeatedly attempts to
link only those particles, using smaller and smaller values of
search_range
. As search_range
is reduced, some of the possible
track assignments are eliminated from consideration. The original large
subnet is thereby broken up into smaller pieces, some of which may be
small enough to solve directly (by considering all possible track
assignments). This process continues until no oversize subnet(s) remain,
or a minimum search_range
is reached. In the latter case, a
SubnetOversizeException
is raised and linking is aborted.
To any of the link
family of functions, supply the search_range
argument as usual, along with an adaptive_stop
argument, and
optionally an adaptive_step
. When trackpy
encounters a large
subnet, it will locally reduce search_range
by multiplying it by
adaptive_step
. (adaptive_step
must be greater than 0 and less
than 1; the default is 0.95.) This multiplication will be applied
repeatedly if necessary. adaptive_stop
(note the single-letter
difference!) tells trackpy
when to give up. When search_range
has become less than or equal to adaptive_stop
, trackpy
aborts
linking by raising a SubnetOversizeException
.
For example, at the high end, you might consider 10.0 to be a generous
but reasonable search_range
for your movie. At the low end, you feel
that a search_range
of 2.0 — the radius of one of your particles —
is unreasonably small and would be a sure sign of trouble. Finally, you
decide that search_range
should be reduced carefully — just 5% at a
time. The values you should specify are then
{'search_range': 10.0,
'adaptive_stop': 2.0,
'adaptive_step': 0.95}
{'adaptive_step': 0.95, 'adaptive_stop': 2.0, 'search_range': 10.0}
There is one more parameter that you may wish to adjust, but the reason is a little technical.
Under normal circumstances, if you have a movie that does not cry out
for adaptive search, large subnets will be rare, and to get the most
accurate linking you will want the computer to solve them all, even ones
that take a few minutes. This favors a default maximum subnet size,
stored in trackpy.linking.Linker.MAX_SUB_NET_SIZE
, that is rather
generous.
This situation changes when adaptive search is enabled. As it reduces
search_range
, trackpy
is searching for the largest computable
subnet(s) embedded within the original oversize, non-computable subnet.
If you are lazy (and who isn’t?), you will want to choose a generous
(but reasonable) maximum search_range
and let trackpy
do the
hard work of optimizing it downward. The result is that trackpy
will
frequently be solving subnets that are just below the maximum size. This
favors a small maximum size, corresponding to a subnet that can be
solved in well under a second.
Because a different behavior is desired in each case, there is a
different default subnet limit for adaptive search, stored in
trackpy.linking.Linker.MAX_SUB_NET_SIZE_ADAPTIVE
. As with the
regular default, you may wish to adjust this value depending on your
patience and the speed of your computer.
The default values in the current version of trackpy
are
import trackpy
import pandas as pd
import numpy as np
%matplotlib notebook
from matplotlib.pyplot import * # not recommended usage, but we use it for brevity here
trackpy.linking.Linker.MAX_SUB_NET_SIZE
/Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/IPython/kernel/__init__.py:13: ShimWarning: The IPython.kernel package has been deprecated. You should import from ipykernel or jupyter_client instead. "You should import from ipykernel or jupyter_client instead.", ShimWarning)
30
trackpy.linking.Linker.MAX_SUB_NET_SIZE_ADAPTIVE
15
Since subnet computation time scales very roughly as \(N!\), this is actually a huge difference in speed:
import math
float(math.factorial(trackpy.linking.Linker.MAX_SUB_NET_SIZE) / \
math.factorial(trackpy.linking.Linker.MAX_SUB_NET_SIZE_ADAPTIVE))
2.0284320493172736e+20
The difference in maximal subnet size is the primary drawback of
adaptive search. Solving a large subnet may be slow, but at least you
are getting the best possible set of trajectories, ones that minimize
the total displacement over all particles. (For details, see the
original paper by Crocker & Grier, referenced in the introduction to the
trackpy
documentation.) Depending on the details of your movie, if
you insist on solving smaller subnets you may instead get results that
are less than fully correct.
In other words, adaptive search can abolish a lot of headaches, tedium, and gut-wrenching decisions, but it cannot magically clean up problematic data, and it cannot abolish the possibility of shooting yourself in the foot. Proceed accordingly.
This is a highly contrived example, but it illustrates how adaptive
search can greatly ease the problem of choosing search_range
.
As example data, we define a grid of points that has contracted in the second frame. It is visualized below, using a function we define.
def contracting_grid(scale=0.95):
"""Two frames with a grid of 441 points.
In the second frame, the points contract, so that the outermost set
coincides with the second-outermost set in the previous frame.
This is a way to challenge (and/or stump) a subnet solver.
"""
pts0x, pts0y = np.mgrid[-10:11,-10:11]
pts0 = pd.DataFrame(dict(x=pts0x.flatten(), y=pts0y.flatten(),
frame=0))
pts1 = pts0.copy()
pts1.frame = 1
pts1.x = pts1.x * scale
pts1.y = pts1.y * scale
return pd.concat([pts0, pts1], ignore_index=True)
cg = contracting_grid()
cg_original = cg.copy()
def trshow(tr, first_style='bo', last_style='g.', style='b.'):
frames = list(tr.groupby('frame'))
nframes = len(frames)
for i, (fnum, pts) in enumerate(frames):
if i == 0:
sty = first_style
elif i == nframes - 1:
sty = last_style
else:
sty = style
plot(pts.x, pts.y, sty)
if 'particle' in tr.columns:
for pid in tr.particle.unique():
plot(tr.x[tr.particle == pid],
tr.y[tr.particle == pid], 'r-')
axis('equal'); ylim(-11, 11)
xlabel('x')
ylabel('y')
figure(figsize=(6, 6))
trshow(cg_original)
<IPython.core.display.Javascript object>
In the preceding plot, the large blue dots are the first frame, and the small green dots are the second frame.
Next, we perform linking with a “small” search_range
of 0.65. The
computation finishes quickly but fails to link the particles with the
largest displacements, in the corners.
%%time
tracks_toosmall = trackpy.link_df(cg, 0.65)
Frame 1: 441 trajectories present
CPU times: user 308 ms, sys: 6.78 ms, total: 315 ms
Wall time: 318 ms
/Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/trackpy/linking.py:595: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
features.sort(['particle', t_column], inplace=True)
figure(figsize=(6, 6))
trshow(tracks_toosmall)
<IPython.core.display.Javascript object>
Seeing this problem, we increase search_range
to 0.8. This
computation also finishes quickly — by raising a
SubnetOversizeException
!
%%time
try:
tracks_toobig = trackpy.link_df(cg, 0.8)
except trackpy.SubnetOversizeException:
print('Failed!!')
Frame 0: 441 trajectories present
Failed!!
CPU times: user 22.6 ms, sys: 1.95 ms, total: 24.6 ms
Wall time: 23.3 ms
By trial and error (or some trigonometry), we can find a value of
search_range
that works for this movie. But it requires more than a
little patience:
%%time
# Warning: takes several minutes!
tracks_regular = trackpy.link_df(cg, 0.75)
Frame 1: 441 trajectories present
CPU times: user 6min 11s, sys: 1.44 s, total: 6min 13s
Wall time: 6min 13s
/Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/trackpy/linking.py:595: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
features.sort(['particle', t_column], inplace=True)
figure(figsize=(6, 6))
trshow(tracks_regular)
<IPython.core.display.Javascript object>
As printed above, it took several minutes to track just 441 particles in 2 frames. However, the result is correct, as shown by the plot, and by checking the duration of all the computed tracks:
tracks_regular.groupby('particle').particle.count().value_counts()
2 441
Name: particle, dtype: int64
This result indicates that there are 441 tracks that last for 2 frames, and no tracks that last for 1 frame, which would represent unlinked particles. So linking was successful, but at great cost.
Since we know this movie is tricky, we choose a very fine
adaptive_step
— just 1% per iteration. But we can choose an
irresponsibly large search_range
of 0.95 — almost the spacing
between particles!
%%time
tracks_adaptive = trackpy.link_df(cg, 0.95, adaptive_stop=0.56, adaptive_step=0.99)
Frame 1: 441 trajectories present
CPU times: user 155 ms, sys: 2.46 ms, total: 158 ms
Wall time: 156 ms
/Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/trackpy/linking.py:595: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
features.sort(['particle', t_column], inplace=True)
With adaptive search enabled, we can obtain the correct results more
than an order of magnitude faster than we could with the single value of
search_range
we found by hand. Almost as importantly, this
computation was not preceded by rounds of trial and error to select
search_range
.
figure(figsize=(6, 6))
trshow(tracks_adaptive)
<IPython.core.display.Javascript object>
Both the plot and our check of track lengths show that the result is correct.
tracks_adaptive.groupby('particle').particle.count().value_counts()
2 441
Name: particle, dtype: int64
We can use linking diagnostics, also new in trackpy
v0.3, to see
what adaptive search did. (See the separate tutorial on diagnostics for
a more complete overview.) Let’s re-run linking with diagnostics, and
look at the search_range
that was used to link each particle in the
second frame. (It’s not recorded for the first frame because those
particles have no preceding frame.)
tracks_adaptive = trackpy.link_df(cg, 0.95, adaptive_stop=0.56, adaptive_step=0.99, diagnostics=True)
Frame 1: 441 trajectories present
/Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/trackpy/linking.py:595: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
features.sort(['particle', t_column], inplace=True)
figure()
scatter(tracks_adaptive.x, tracks_adaptive.y, c=tracks_adaptive.diag_search_range.astype(float),
s=20, linewidth=0, cmap=cm.GnBu, vmin=0.7)
colorbar(label='diag_search_range')
axis('image')
<IPython.core.display.Javascript object>
(-10.454959021724294,
10.454959021724292,
-10.456405403060545,
10.456405403060545)
As we might have guessed, adaptive search worked by reducing
search_range
in the trouble spots at the corners.
As discussed above, adaptive search is not magic, and it may create the
false impression that bad data are trackable. Until you are confident
that you’re using adaptive search properly, it’s a good idea to pay
attention to diagnostic output, especially diag_search_range
, to get
a sense of where and how often it’s used.