Advanced Data Management
2° Year of course - Second semester
Frequency Not mandatory
- 6 CFU
- 48 hours
- English
- Trieste
- Opzionale
- Standard teaching
- Oral Exam
- SSD INF/01
(knowledge and understanding) The class aims at perfecting student's knowledge deepening the topics of programming, management and access to data bases. Concepts related to Big Data, Open Science and Interoperability in the scientific research scenario will be provided.
(applying knowledge and understanding) The topics discussed in the class will be applied to specific domains. Using tools like UML, XML/XSD, Persistent ID, and concepts like FAIR principles, the student will design an application to manage, curate and access the data.
(making judgements) The concept of Data Science and the tools provided to the students will guide them in integrating their own domain data resources in a shared data publishing scenario.
(communication skills) In a scientific environment where data are distributed, big sized and generated by various projects, the student will be able to choose among data structures, access and management alternatives, justifying her/his technological choices.
(learning skills) The lessons will be given in an interdisciplinary context. The student will autonomously apply the learned concepts to her/his own specific research domain.
Programming and usage/management of database systems are the basis on top of which the class contents are developed. No mandatory prerequisite is required.
The class will consist of an introduction on Big Data, Open Data and FAIR principles concepts, followed by two main building blocks: data and metadata models and structures, data resource interoperability and access. Data and Metadata Models and Structures will discuss data models, their definitions and design, data structures and metadata. It will include UML, ORM, XSD, JSON, data structure formats, tabular formats, images, hierarchical structures, including metadata query-ability. Data Resource Interoperability and Access will discuss interoperability concept; persistent identifiers definition and use; resource catalogues; metadata models used for discovery purposes; data curation and preservation, vocabularies and semantics and interfaces for data access.
The main reference books are hereafter listed. Further details will be provided by means of notes and references to auxiliary materials.
"UML Database Modeling Workbook", Michael Blaha
"Python and HDF5 - Unlocking scientific data", Andrew Colette
"Reference Model for an Open Archival Information System (OAIS)", recommendation by Consultative Committee for Space Data Systems (CCSDS)
Lectures and examples will be provided to students through a web-accessible solution (yet to be defined).
The class will consist of an introduction on Big Data, Open Data and FAIR principles concepts, followed by two main building blocks: data and metadata models and structures, data resource interoperability and access.
Data and Metadata Models and Structures will discuss data models, their definitions and design, data structures and metadata. It will include UML, ORM, XSD, JSON, data structure formats, tabular formats, images, hierarchical structures, including metadata query-ability.
Data Resource Interoperability and Access will discuss interoperability concept; persistent identifiers definition and use; resource catalogues; metadata models used for discovery purposes; data curation and preservation, vocabularies and semantics and interfaces for data access.
Lectures including practical exemplification in classroom.
Lecture notes/viewgraphs, software and services used during the lessons will be provided usually through a git repository (web accessible). Non-attending students are kindly requested to contact the lecturers to agree on how to undergo preparation and final test.
Supplementary material on this class topics can be found in:
"PID Information Types", PIT RDA Recommendation
"Research Data Collections", Research Data Collection RDA Recommendation
"20 Years of Persistent Identifiers - which systems are here to stay?", J. Klump, R. Huber
"Conceptual Interoperability (LCIM)", Wikipedia and Paper: Tolk, Diallo, Turnitsa http://www.iiisci.org/journal/cv%24/sci/pdfs/p468106.pdf
Knowledge verification will consist in preparing and presenting a small project of data management facing the content of the lectures.
The exam will be evaluated the according to the following criteria:
- completeness of the project with respect to all the course contents
- critical thinking on the pros and cons of the data management solutions adopted in the project, highlighting criticalities, problems and peculiarities of the analysed use case
- degree of understanding of the theoretical and practical aspects of the subject
- clarity of the exposition of the project
If feasible, the preparation and presentation of a partial or full demo of the project is encouraged, but not mandatory. At presentation time there will be a discussion with Q&A.
This course explores topics closely related to one or more goals of the United Nations 2030 Agenda for Sustainable Development (SDGs)