USE CASE 3 - Ocean
Different experts from Ifremer, Université de Liège, CNRS, CSC, Maris and SYKE – Finnish Environment Institute are involved in the PHIDIAS Use Case on Ocean study, working towards improving the use of cloud services for marine data management. This use case’s main goal aimed to improve the use of cloud services for marine data management, data service to users in a FAIR perspective, data processing on demand, taking into account the European Open Science Cloud (EOSC) challenge and the Copernicus Data and Information Access Services (DIAS).
"The marine environment is evolving continuously, and because marine observation is still expensive, observation data are unique and must be well preserved and easy to be retrieved. – PHIDIAS Ocean Use case team"
Challenges
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing. This is why scientists say that “a measurement not made today is lost forever”. For these reasons, it is fundamental to properly store both the data and metadata, so that access to them can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Interoperable and Reusable.
Solutions
Since the marine environment is evolving continuously, and because marine observation is still expensive, observation data are unique and must be well preserved and easy to be retrieved.
The Ocean use case are working on achieving the following solutions:
The improvement of long-term stewardship of marine in situ data.
The SeaNoe service allows users to upload, archive and publish their data, to which a permanent identifier (DOI) is assigned so the dataset can be cited and referenced, for instance in research papers. Efforts will be articulated around scalability, the exchanges between data centres in charge of related data types and the protection of long-time archives. The long-tail data (measurements acquired more randomly, e.g. during a scientific cruise or manual work) are of particular interest.
The improvement of data storage for services to users.
The goal is to provide users with (1) fast and interoperable access to data from multiple sources, for visualisation and submitting purposes; (2) parallel processing capabilities within dedicated high performance computing, using, for example, Jupyter notebooks or the PANGEO software ecosystem.
CSC Allas Cloud object storage service
As its latest development, SYKE has created a data pipeline for near real-time analysis of plankton images using CSC Allas object storage and cloud computing services. Neural network classifier analyses the image data and results, e.g. species composition of nuisance cyanobacterial blooms, is available for end-users in a delay of about 2 hours. Read more.
Marine data processing workflows for on-demand processing.
The objective is that users can access data, software tools and computing resources in a seamless way to create added-value products, for example quality controlled, merged datasets or gridded fields. This path to achieve these objectives is led by IFREMER, together with Europe’s leading research groups in ocean studies, such as the Université de Liège, MARIS, CNRS, CSC and the Finnish Environment Institute, with the coordination of CINES, the leading HPC centre in France.
MARIS has developed a new API designed to further enhance the data access to users. Based on users criteria, the API provides a neCDF file storing the coordinates and observations in a region and a period of interest. ULiège has worked with the API to query oceanographic observations as a first step for the preparation of climatologies.
ULiège partner has tested the access of datasets stored at Ifremer using the iRODS tool (https://irods.org/). Such datasets can then be ingested in the DIVAnd interpolation tool for the preparation of new products as described above.
Impact
In order to fulfil the scientific goals of the use case, the work plans are mostly focused on technical developments and the implementation of tools. In particular, the tools related to the long-term archiving of both data and metadata and the storage and archiving of large salinity datasets from in situ (SeaDataCloud) and from satellite (SMOS mission) have to be developed or improved.
Using the MARIS API greatly speeds up the workflow with respect to the previous situation, where several intermediate steps had to be taken before getting the final dataset. The solution has been implemented with SeaDataNet data but could be expanded to any other data sources.
Learn more about the PHIDIAS use case 3 "Ocean"
Events/Webinars:
- IMDIS 2021 turned into virtual, 12-14 April 2021
- Zoo & Phytoplankton EOV Products: Big data and machine learning methods to enhance biodiversity data
- PHIDIAS: Boosting the use of cloud services for marine data management, services and processing
- PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
News:
- Data pipeline for near real-time analysis of plankton images – Available for the end-users
- Podcast on Ocean Use Case: Addressing the issues related to the ocean research
- Highlights & Takeaways: PHIDIAS at the IMDIS 2021
- Highlights & Takeaways: PHIDIAS webinar on Ocean Use case
- PHIDIAS: Continuing to boost cloud services for marine data management, services and processing
- Combining in situ and satellite measurements in oceanography
- Using HPC to combine marine environmental data from different sources
- MARIS' contribution to the Ocean use case
- Webinar “PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services”