A Kernel-Based Change Detection Method to Map Shifts in Phytoplankton Communities Measured by Flow Cytometry

2021 
Automated, ship-board flow cytometers provide high-resolution maps of phytoplankton composition over large swaths of the world9s oceans. They therefore pave the way for understanding how environmental conditions shape community structure. Identification of community changes along a cruise transect commonly segments the data into distinct regions. However, existing segmentation methods are generally not applicable to flow cytometry data, as this data is recorded as "point cloud" data, with hundreds or thousands of particles measured during each time interval. Moreover, nonparametric segmentation methods that do not rely on prior knowledge of the number of species, are desirable to map community shifts. We present CytoSegmenter, a kernel-based change-point estimation method for segmenting point cloud data that does not rely on parametric assumptions on the data distributions. Our method relies on a Hilbertian embedding of point clouds that allows us to work with point cloud data similarly to vectorial data. The change-point locations can be found using an efficient dynamic programming algorithm. The method can be used to automatically segment long series of underway flow cytometry data. Through an analysis of 12 cruises, we demonstrate that CytoSegmenter allows us to locate abrupt changes in phytoplankton community structure. We show that the changes in community structure generally coincide with changes in the temperature and salinity of the ocean. We also illustrate how the main parameter of CytoSegmenter can be easily calibrated using limited auxiliary annotated data. CytoSegmenter is publicly available and implemented in the programming language Python. The method is generally applicable for segmenting series of point cloud data from any domain. Moreover, it readily scales to thousands of point clouds, each containing thousands of points. In the context of underway flow cytometry data, it does not require prior clustering of particles to define taxa labels, eliminating a potential source of error. This represents an important advance in automating the analysis of large datasets now emerging in biological oceanography and other fields. It also allows for the approach to potentially be applied during research cruises.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    35
    References
    0
    Citations
    NaN
    KQI
    []