Data pipeline for near real-time analysis of plankton images – Available for the end-users
Finnish Environment Institute – SYKE, one of the PHIDIAS consortiums involved in the Ocean use case, has created a data pipeline for near real-time analysis of plankton images using CSC Allas object storage and cloud computing services. Neural network classifier analyses the image data and results, e.g., species composition of nuisance cyanobacterial blooms, is available for end-users in a delay of about 2 hours.
The marine environment is evolving continuously, and because marine observation is still expensive, observation data are unique and must be well preserved and easy to be retrieved. – PHIDIAS Ocean Use case team
Cyanobacteria blooms are an annual nuisance in the Baltic Sea for recreation, fisheries, and other uses and they have also cascaded effects on ecosystem functioning. Emerging plankton imaging technologies can be used to track the development of phytoplankton biomass and to provide information on the phytoplankton community composition, e.g., which cyanobacteria species dominate the blooms. This information is valuable for remote sensing validation, ecosystem modeling, management of the seas, and for the public, especially as some of the species are toxic.
Latest achievement: PHIDIAS Ocean use case
SYKE has an autonomous and operational imaging device Imaging FlowCytobot installed at Utö Marine Research Station, in the Baltic Sea. The instrument provides thousands of plankton images in an hour (see image below) and there is an urgent need to sort out the issues in data management.
In PHIDIAS, the team created a near real-time data pipeline from the instrument to CSC Allas cloud object storage. Allas is based on CEPH object storage technology. From Allas they share data to other services within the CSC's computing platform and perform subsequent neural network analysis on a Linux virtual machine with 6 vCPUs and 16 GB of memory, also provided by CSC cloud computing services. A neural network is based on pre-trained ResNET-18 and fine-tuned with a labeled Baltic Sea phytoplankton image data set.
Data transfer and analysis result in a delay of about two hours from the image capture to the point when the image has been classified and data is available for users.
The system was tested in 2021, and near real-time data was used in weekly national algae reviews by SYKE targeted to the public. Further developments of data use and visualisation will be done within JERICO-S3 (EU H0202 INFRAIA project). To our knowledge this is the first near the real-time application of such an image data set, normally the classifications are done in the delayed mode.
Although the availability of such near real-time results would be useful for modeling and earth observation communities, there are still two bottlenecks to be solved. First, there is no generally agreed data (short or long-term) storage for plankton images coming from such systems, as the amount of data is much too large to be handled by existing systems like EcoTaxa. Second, there is no data-aggregator taking up the Ai-classified results in their databases, an issue that prevents the wider use of the results. These topics are not for PHIDIAS to solve, but they may promote the solutions.
Click here, to learn more on the latest updates on the PHIDIAS Ocean use case.