Highlights & Takeaways: PHIDIAS at the IMDIS 2021
The International Conference on Marine Data and Information Systems (IMDIS 2021) took place on 12-14 April 2021, aimed at providing an overview of the existing information systems to serve different users in ocean science. It also shows the progress in the development of efficient: infrastructures for managing large and diverse data sets, standards, interoperable information systems, services and tools for education. The conference presented different systems for online access to data, meta-data and products, communication standards and adapted technology to ensure platforms interoperability. Sessions focused on infrastructures, technologies and services for different users: environmental authorities, research, schools, universities, etc.
The PHIDIAS Ocean use case
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing. This is why scientists say that "a measurement not made today is lost forever". For these reasons, it is fundamental to properly store both the data and metadata, so that access to them can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Interoperable and Reusable. The PHIDIAS Ocean use case is focused on 3 aspects:
- Improvement of long-term stewardship of marine in situ data. The SEANOE service allows users to upload, archive and publish their data, including the processed data via HPC, to which a permanent identifier (DOI) is assigned so the dataset can be cited and referenced. Efforts will be articulated around the scalability, the exchanges between data centres in charge of related data types and the protection of long-term archives. The long-tail data, referring to data not acquired routinely but during scientific missions or specific events, are of particular interest. Usually they cannot make their way so easily to the data centres, as there is no automated procedure.
- Improvement of data storage for services to users. The goal is to provide users with (1) fast and interoperable access to data from multiple sources, for visualization and sub-setting purposes; (2) parallel processing capabilities within dedicated high-performance computing, using, for example, Jupyter notebooks or the PANGEO software ecosystem.
- Marine data processing workflows for on-demand processing. The objective is that users can access data, software tools and computing resources in a seamless way to create added-value products, for example quality-controlled, merged datasets or gridded fields.
Key results
The efforts of the pilot use cases in PHIDIAS strive to improve the activities of the researchers and specialists on different aspects:
- Data publication: with the improved capabilities of SEANOE, they will be able to seamlessly upload large datasets, ensure their long-term archiving (also of the processed datasets) and publish them following standards, best practices and recommendations from data management groups. This will also enhance the ingestion of long-tail data, which in turn will be made available to a larger community.
- Data access: thanks to fast access to the most recent data collections obtained from different sources and providers (Euro-ARGO, SeaDataNet, EMODnet, CMEMS, imaging flow cytometer), users will be able to demonstrate pilot operations such as sub-setting (based on regions, parameters), quality-control, visualisation or spatial interpolation.
- Data processing: the deployment of cutting-edge tools such as DIVAnd (spatial-temporal interpolation, https://github.com/gher-ulg/DIVAnd.jl) in an HPC environment will allow scientists and experts to perform spatio-temporal interpolation of large datasets. In particular, this use case will be in the North Atlantic Ocean and the Baltic Sea, which represents 10 million observations for a total of approx. 250 GBytes. The final product will consist of an inter-comparison of satellite data and in-situ data of sea surface salinity, including Inspire-compliant online services for data visualisation and access.
The Presentation and Key takeaway
The PHIDIAS - Prototype of HPC/Data Infrastructure for On-demand Services has the opportunity to be part of the poster presentation at the IMDIS 2021 conference.
In 2020, the poster abstract proposal was carried out by the PHIDIAS’ Ocean use case team, led by IFREMER, in collaboration with the PHIDIAS’ communications and dissemination team, TRUST-IT Services, CINES and Neovia Innovations.
Alexander Barth and Charles Troupin at the Université de Liège presented the PHIDIAS ocean use case, providing the most recent update specifically about the DIVAnd. This tool is also one of the main contributions to PHIDIAS.
The team had an opportunity to communicate with different HPC users, environmental authorities, researchers, schools, universities, etc, interested in using the PHIDIAS tools, specifically DIVAnd.