Faster Feature Location Through Parallel Computation¶

Feature-finding can easily be parallelized: each frame an independent task, and the tasks can be divided among the multiple CPU cores in most modern computers. Instead of running in a single process as usual, your code is spread across multiple "worker" processes, each running on its own CPU core.

First, let's set up the movie to track:

In [1]:

import pims
import trackpy as tp

@pims.pipeline
def gray(image):
    return image[:, :, 1]

frames = gray(pims.ImageSequence('../sample_data/bulk_water/*.png'))

In [2]:

tp.quiet()  # Disabling progress reports makes this a fairer comparison

Using trackpy.batch¶

Beginning with trackpy v0.4.2, use the "processes" argument to have trackpy.batch run on multiple CPU cores at once (using Python's built-in multiprocessing module). Give the number of cores you want to use, or specify 'auto' to let trackpy detect how many cores your computer has.

Let's compare the time required to process the first 100 frames:

In [6]:

%%timeit
features = tp.batch(frames[:100], 13, invert=True, processes='auto')

2.33 s ± 55.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For comparison, here's the same thing running in a single process. This was run on a laptop with only 2 cores, so we should expect batch to take roughly twice as long as the parallel version:

In [8]:

%%timeit
features = tp.batch(frames[:100], 13, invert=True)

4.93 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using IPython Parallel¶

Using IPython parallel is a little more involved, but it gives you a lot of flexibility if you need to go beyond batch, for example by having the parallel workers run your own custom image processing. It also works with all versions of trackpy.

Install ipyparallel and start a cluster¶

As of IPython 6.2 (November 2017), IPython parallel is a separate package. If you are not using a comprehensive distribution like Anaconda, you may need to install this package at the command prompt using pip install ipyparallel or conda install ipyparallel.

It is simplest to start a cluster on the CPUs of your local machine. In order to start a cluster, you will need to go to a Terminal and type:

ipcluster start

This automatically uses all available CPU cores, but you can also use the -n option to specify how many workers to start. Now you are running a cluster — it's that easy! More information on IPython parallel is available in the IPython parallel documentation.

In [10]:

from ipyparallel import Client
client = Client()
view = client.load_balanced_view()

We can see that there are four cores available.

In [11]:

client[:]

Out[11]:

<DirectView [0, 1, 2, 3]>

Use a little magic, %%px, to import trackpy on all cores.

In [12]:

%%px
import trackpy as tp
tp.quiet()

Use the workers to locate features¶

Define a function from locate with all the parameters specified, so the function's only argument is the image to be analyzed. We can map this function directly onto our collection of images. (This is a called "currying" a function, hence the choice of name.)

In [13]:

curried_locate = lambda image: tp.locate(image, 13, invert=True)

In [14]:

view.map(curried_locate, frames[:4])  # Optionally, prime each engine: make it set up numba.

Out[14]:

<AsyncMapResult: <lambda>>

Compare the time it takes to locate features in the first 100 images with and without parallelization.

In [15]:

%%timeit
amr = view.map_async(curried_locate, frames[:100])
amr.wait_interactive()
results = amr.get()

 100/100 tasks finished after    2 s
done
2.9 s ± 195 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [16]:

%%timeit
serial_result = list(map(curried_locate, frames[:100]))

3.9 s ± 58.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Finally, if we want to get output similar to batch, we collect the results into a single DataFrame:

In [17]:

import pandas as pd

amr = view.map_async(curried_locate, frames[:100])
amr.wait_interactive()
results = amr.get()

features_ipy = pd.concat(results, ignore_index=True)
features_ipy.head()

 100/100 tasks finished after    2 s
done

Out[17]:

	y	x	mass	size	ecc	signal	raw_mass	ep
0	5.728435	295.067222	297.073839	2.499673	0.230136	16.877187	14760.0	0.081197
1	5.918431	339.195418	254.571603	2.979975	0.300296	13.077611	14693.0	0.089778
2	6.782609	309.578502	219.491795	3.551496	0.137154	4.506474	14508.0	0.126768
3	7.380101	431.548351	474.240123	2.852436	0.358819	16.877187	15011.0	0.059789
4	8.202306	36.250343	321.903627	2.882596	0.173362	10.603468	15401.0	0.042414

parallel-locate#

Faster Feature Location Through Parallel Computation¶

Using trackpy.batch¶

Using IPython Parallel¶

Install ipyparallel and start a cluster¶

Use the workers to locate features¶