Obtaining Diagnostic Information from Linking ============================================= Beginning with version 0.3, ``link_df()`` and ``link_df_iter()`` can include diagnostic data in their output, to give you clues about what the linking algorithm did while linking each particle. This information may help to diagnose errors or bad results from linking, or to verify that the linking algorithm is working properly. How to use it ------------- Pass the optional keyword argument ``diagnostics=True`` to ``link_df()`` or ``link_df_iter()``. The linker will add the "particle" column as usual, along with extra columns whose names begin with "``diag_``\ ". Not all types of diagnostic data may pertain to every particle, so that many entries in these diagnostic columns will have the placeholder value of ``NaN`` (not a number). We'll discuss the individual columns and their meanings below, by way of an example: the ``bulk_water`` movie that was tracked in the walkthrough tutorial. Details ~~~~~~~ 1. Diagnostics typically slow down linking by a few percent and consume significant additional memory, so they are turned off by default. 2. If you already ran linking with diagnostics turned on, but no longer want the extra columns cluttering up your results, you can use the ``strip_diagnostics()`` function to automatically remove all columns whose names start with "``diag_``\ ". 3. The set of diagnostic of columns, and the order in which they appear, is not fixed. Each column is added only if that diagnostic datum was recorded. For example, below we will encounter the ``diag_remembered`` column that gives information about the ``memory`` feature. If you did not specify the ``memory`` option for linking, this column will never be present in your results. 4. For performance reasons, ``link_df()`` ordinarily does *not* make a copy of its input DataFrame, instead modifying it in-place. For practical reasons, however, with diagnostics enabled ``link_df()`` must work on a copy. Depending on how it was written, your program may therefore behave differently when diagnostics are turned on! If you suspect a problem, you can instruct ``link_df()`` to behave more consistently by supplying ``copy_features=True``. Example ------- We'll track the ``bulk_water`` movie much as in the walkthrough, but with ``diagnostics=True``. We'll then examine the data we get back. First, we set up our environment: .. code:: python %matplotlib notebook import numpy as np import pandas as pd import matplotlib.pyplot as plt from pandas import DataFrame, Series import pims import trackpy as tp .. parsed-literal:: /Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/IPython/kernel/__init__.py:13: ShimWarning: The `IPython.kernel` package has been deprecated. You should import from ipykernel or jupyter_client instead. "You should import from ipykernel or jupyter_client instead.", ShimWarning) 4 frames should be enough for this demonstration. .. code:: python frames = pims.ImageSequence('../sample_data/bulk_water/*.png', as_grey=True) f = tp.batch(frames[:4], 11, minmass=66, invert=True, meta=False) .. parsed-literal:: Frame 3: 495 features We'll link with a larger ``search_radius`` than in the tutorial, because we want to generate a lot of subnetworks (more on that below). .. code:: python t = tp.link_df(f, 13, memory=3, diagnostics=True) .. parsed-literal:: Frame 3: 495 trajectories present .. parsed-literal:: /Users/dallan/miniconda/envs/trackpy-examples/lib/python3.4/site-packages/trackpy/linking.py:595: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) features.sort(['particle', t_column], inplace=True) When we look at the columns in the resulting tracks DataFrame, we see some new names: .. code:: python list(t.columns) .. parsed-literal:: ['x', 'y', 'mass', 'size', 'ecc', 'signal', 'raw_mass', 'ep', 'frame', 'particle', 'diag_search_range', 'diag_subnet', 'diag_subnet_iterations', 'diag_subnet_size', 'diag_remembered'] Let's look at the first 5 rows in the results. Because there are so many columns, we'll transpose the data so they fit on the page. .. code:: python t.head().T .. raw:: html
0 1 2 3 4
x 294.9 295.36 295.97 295.734 310.237
y 5.61132 5.31631 5.62762 5.84352 6.52064
mass 345.95 388.203 373.437 392.386 237.16
size 2.55858 2.52464 2.6303 2.61764 3.18841
ecc 0.186067 0.214783 0.189075 0.187835 0.00687384
signal 17.8819 18.8833 17.4588 18.5116 5.49444
raw_mass 10719 10786 10710 10750 10487
ep 0.089503 0.0784293 0.0887466 0.0807279 0.156558
frame 0 1 2 3 0
particle 0 0 0 0 1
diag_search_range NaN 13 13 13 NaN
diag_subnet NaN NaN NaN NaN NaN
diag_subnet_iterations NaN NaN NaN NaN NaN
diag_subnet_size NaN NaN NaN NaN NaN
diag_remembered NaN NaN NaN NaN NaN
``diag_search_range`` ~~~~~~~~~~~~~~~~~~~~~ ``trackpy`` didn't record every kind of diagnostic data for every feature. We do notice, however, that there is a ``diag_search_range`` for everything past the first frame. This column is the value of ``search_range`` that was used to find a particle and link it. Since the particles in the first frame did not need to be found, ``diag_search_range`` is null there. But for the subsequent frames, it is the same as the ``search_range`` we told ``link_df()`` to use, because those particles were successfully linked. ``diag_search_range`` would be a nearly pointless column except that it reveals the workings of adaptive search when that feature is turned on. See the tutorial on adaptive search for a more meaningful look at ``diag_search_range``. ``diag_remembered`` ~~~~~~~~~~~~~~~~~~~ When a particle seems to disappear, ``memory`` allows its "ghost" to stick around for a few more frames in case that particle reappears. In that event, the ``diag_remembered`` column records the number of frames for which the particle was missing. Let's look in the 4th frame to see whether this actually happened. Again, we show just the first 5 entires and transpose to fit on the page: .. code:: python remembered_frame3 = t[(t.frame == 3) & (t.diag_remembered > 0)] print(len(remembered_frame3), 'remembered particles in frame number 3.') remembered_frame3.head().T .. parsed-literal:: 13 remembered particles in frame number 3. .. raw:: html
242 265 310 873 1038
x 339.717 276.435 111.902 284.111 64.1174
y 58.228 64.4811 70.8655 196.545 232.489
mass 330.074 176.822 327.241 186.937 198.975
size 2.83784 3.22542 2.21055 2.6481 3.2842
ecc 0.0604219 0.168835 0.152832 0.089197 0.104093
signal 11.1272 4.55204 18.2082 6.57517 4.24857
raw_mass 10600 10411 10696 10271 10698
ep 0.107869 0.187151 0.0887687 0.410803 0.0884424
frame 3 3 3 3 3
particle 63 70 82 230 272
diag_search_range 13 13 13 13 13
diag_subnet 132 111 NaN 122 116
diag_subnet_iterations 27 3 NaN 3 3
diag_subnet_size 4 2 NaN 2 2
diag_remembered 1 2 1 1 1
One of these particles has a ``diag_remembered`` value of 2, which means that it was present in the first frame, went missing for 2 frames, and then seemingly reappeared in the fourth frame. If you are not sure whether memory (or a specific value of the ``memory`` option) is appropriate for your movie, ``diag_remembered`` shows you which trajectories and frames to spot-check. ``diag_subnet`` ~~~~~~~~~~~~~~~ As you may have also learned in other ``trackpy`` tutorials (including the one devoted to the topic), how easily subnetworks arise, and how they are resolved, can be crucial to the feasibility and correctness of linking. ``trackpy`` records diagnostic data whenever it encounters a subnetwork. (If there were no subnetworks at all, the columns discussed here will not be present.) As with memory, these data can help you focus on potential trouble spots. In our example data, we can see what subnetworks were encountered when linking frames 0 and 1: .. code:: python subnet_particles = t[(t.frame == 1) & ~(t.diag_subnet.isnull())] print(len(subnet_particles), 'particles involved in subnets between frames 0 and 1.') subnet_particles.head().T .. parsed-literal:: 87 particles involved in subnets between frames 0 and 1. .. raw:: html
35 171 183 217 221
x 552.664 198.428 98.5585 99.9468 328.583
y 11.499 34.7235 40.5719 52.6163 51.6568
mass 322.581 291.7 233.694 172.662 256.228
size 2.30788 2.54431 3.0722 3.41624 3.34206
ecc 0.112383 0.0819793 0.0102799 0.0974881 0.0920241
signal 18.6746 14.0842 6.36398 3.54713 5.11205
raw_mass 10594 10664 10695 10603 10448
ep 0.113443 0.0975633 0.0918683 0.111118 0.171748
frame 1 1 1 1 1
particle 10 44 47 57 58
diag_search_range 13 13 13 13 13
diag_subnet 15 10 11 11 12
diag_subnet_iterations 1 1 3 3 20
diag_subnet_size 1 1 2 2 3
diag_remembered NaN NaN NaN NaN NaN
The subnet columns are as follows: **``diag_subnet``** sequentially numbers each subnet as it is encountered. This allows you to count subnets, and identify particles that are in the same subnet. Note that the numbering is sequential over the *entire movie*. Therefore, to count the subnets in a particular frame, use .. code:: python len(subnet_particles.diag_subnet.unique()) .. parsed-literal:: 45 We can also take this opportunity to show these subnet particles. This would be the starting point for more in-depth spot checks of how subnets were solved. .. code:: python tp.annotate(subnet_particles, frames[1], plot_style={'markersize': 10}); .. parsed-literal:: .. raw:: html Like ``diag_subnet``, the columns ``diag_subnet_size`` and ``diag_subnet_iterations`` also pertain to the entire subnet; all members of a subnet have the same values. It would be nice to consolidate these data into a shorter table of the subnets and their properties. To do this, we can use the powerful grouping and aggregation features of Pandas: .. code:: python subnets = subnet_particles.groupby('diag_subnet')[['diag_subnet_size', 'diag_subnet_iterations']].first() # Count how many particles in the *present* frame are in each subnet, # and include it in the results. subnet_current_sizes = subnet_particles.diag_subnet.value_counts() subnet_current_sizes.name = 'current_size' subnets = subnets.join(subnet_current_sizes) subnets.head() .. raw:: html
diag_subnet_size diag_subnet_iterations current_size
diag_subnet
0 3 5 3
1 2 3 2
2 2 3 2
3 1 1 2
4 2 3 2
Each subnet involves a set of trajectories in the past (generally the previous frame), and a set of candidate particles in the present (the frame in which the diagnostic info was recorded). **``diag_subnet_size``** gives the number of particles in the past, and in this example, we computed the ``current_size`` column to see the number in the present. You can see that some subnets necessitated the ending of a trajectory (e.g. 3 particles to 2), while others required creation (e.g. 1 to 2). Finally, the **``diag_subnet_iterations``** column counts the number of possible solutions that were explored by the subnet-solving algorithm before it decided which was best. It is therefore roughly proportional to the time spent solving each subnet. These data are recorded by the numba linker only (``link_strategy='numba'``), which is used by default if ``numba`` is installed. Almost all of the subnets we're examining are small and were solved quickly: .. code:: python plt.figure() plt.hist(subnets.diag_subnet_iterations) plt.xlabel('Time to solve subnet (arbitrary units)') plt.ylabel('Number of occurrences'); .. parsed-literal:: .. raw:: html However, a subnet of :math:`\sim` 25 particles can easily take :math:`10^9` iterations or more to solve.