Streaming: Processing Unlimited Frames On-Disk¶

A key feature of trackpy is the ability to process an unlimited number of frames.

For feature-finding, this is straightforward: a frame is loaded, features are located, the locations are saved the disk, and the memory is cleared for the next frame. For linking, the problem is more challenging, but trackpy handles all this complexity for you, using as little memory as possible throughout.

When data sets become large, beginning-friendly file formats like CSV or Excel become impractical. We recommend using the HDF5 file format, which is trackpy can read and write out of the box. (HDF5 is widely used; you can be sure it will be around for many, many years to come.)

If you have some other format in mind, see the end of this tutorial, where we explain how to extend trackpy’s interface to support other formats.

Install PyTables¶

You need pytables, which you can easily install using conda. (Type this command into a Terminal or Command Prompt.)

conda install pytables

Locate Features, Streaming Results into an HDF5 File¶

import trackpy as tp
import pims

def gray(image):
    return image[:, :, 0]

images = pims.ImageSequence('../sample_data/bulk_water/*.png', process_func=gray)
images = images[:10]  # We'll take just the first 10 frames for demo purposes.

# For this demo, we'll first remove the file if it already exists.
!rm -f data.h5

We can use locate inside a loop:

with tp.PandasHDFStore('data.h5') as s:  # This opens an HDF5 file. Data will be stored and retrieved by frame number.
    for image in images:
        features = tp.locate(image, 11, invert=True)  # Find the features in a given frame.
        s.put(features)  # Save the features to the file before continuing to the next frame.

or, equivalently, we can use batch, which accepts the storage file as output.

with tp.PandasHDFStore('data.h5') as s:
    tp.batch(images, 11, invert=True, output=s)

Frame 9: 510 features

We can get the data for a given frame:

with tp.PandasHDFStore('data.h5') as s:
    frame_2_results = s.get(2)

frame_2_results.head()  # Display the first few rows.

	x	y	mass	size	ecc	signal	raw_mass	ep	frame
0	295.886941	5.624598	372.183679	2.580235	0.189920	17.396842	8915	0.108849	2
1	68.203651	6.471969	416.980545	2.888723	0.088487	13.265092	9267	0.069054	2
2	336.888818	6.445367	340.325712	2.565562	0.028504	16.418269	8815	0.130158	2
3	432.423210	6.799269	564.962429	3.048429	0.302762	17.614302	9131	0.080412	2
4	460.570088	7.785952	359.136047	2.924729	0.124871	12.503980	8907	0.110293	2

Or dump all the data, if your machine has enough memory to hold it:

with tp.PandasHDFStore('data.h5') as s:
    all_results = s.dump()

all_results.head()  # Display the first few rows.

	x	y	mass	size	ecc	signal	raw_mass	ep
0	103.430478	5.247191	308.202079	2.738100	0.039502	11.795655	8983	0.098772
1	294.831759	5.692167	355.060049	2.574877	0.162698	17.422941	8917	0.109846
2	311.069767	7.223679	255.933257	3.321975	0.007893	5.627285	8644	0.204852
3	431.496378	7.273025	627.442294	2.872567	0.273653	19.695498	9199	0.074267
4	36.061983	8.255091	483.621872	2.973328	0.123753	13.635345	9531	0.053765

You can dump the first N frames using s.dump(N).

Link Trajectories, Streaming From and Updating the HDF5 File¶

with tp.PandasHDFStore('data.h5') as s:
    for linked in tp.link_df_iter(s, 3, neighbor_strategy='KDTree'):
        s.put(linked)

Frame 9: 510 trajectories present

The original data is overwritten.

with tp.PandasHDFStore('data.h5') as s:
    frame_2_results = s.get(2)

frame_2_results.head()  # Display the first few rows.

	x	y	mass	size	ecc	signal	raw_mass	ep	frame	particle
0	295.886941	5.624598	372.183679	2.580235	0.189920	17.396842	8915	0.108849	2	1
1	432.423210	6.799269	564.962429	3.048429	0.302762	17.614302	9131	0.080412	2	3
2	36.560049	8.409618	440.901203	3.004805	0.132634	12.938901	9508	0.055229	2	4
3	68.203651	6.471969	416.980545	2.888723	0.088487	13.265092	9267	0.069054	2	5
4	629.016761	7.998041	499.506812	3.298593	0.183847	9.350802	9374	0.062147	2	7

Framewise Data Interfaces¶

Built-in interfaces¶

There are three different interfaces. You can use them interchangeably. They offer different performance advantages.

PandasHDFStore – fastest for a small (~100) number of frames
PandasHDFStoreBig – fastest for a medium or large number of frames
PandasHDFStoreSingleNode – optimizes HDF queries that access multiple frames (advanced)

Writing your own interface¶

Trackpy implements a generic interface that could be used to store and retrieve particle tracking data in any file format. We hope that it can make it easier for researchers who use different file formats to exchange data. Any in-house format could be accessed using the same simple interface demonstrated above.

At present, the interface is implemented only for HDF5 files. To extend it to any format, write a class subclassing trackpy.FramewiseData. This custom class must implement the methods put, get, close, and __iter__ and the properties max_frame and t_column. Refer to the built-in classes in framewise_data.py for examples to work from.