Data Management
2° Year of course - Second semester
Frequency Not mandatory
- 6 CFU
- 60 hours
- English
- Trieste
- Opzionale
- Standard teaching
- Oral Exam
- SSD INF/01
Knowledge and understanding, Knowledge about the the most recent technologies to store, manipulate, retrieve, transfom big collections of data both in the traditional relational setting and in the wide landscape of NoSQL proposals . Understanding the challenging problems of storing and retrieving big collection of data with a particular focus on data consistency, efficient retrival, scalability, non-homegeinity.
Applying knowledge and understanding: Ability of choosing the proper data model, tuning the level of consistency and availability, choosing the proper optimization strategy depending on the resource and timeliness constraints.
Making Judgement: Ability of evaluating the functionalities of different alternative data base management systems in the wide landscape of the contemporary NoSQL proposals
Communication skills: Ability of presenting the results of big data analysis for decisional support, team work skills.
Learning skills: Ability to deal with the rapid technological evolution of Nosql systems by autonomously learning the technical innovations
Basic knowledge of file systems principles. Knowledge of programming principles. The student can be advantaged by a basic knowledge
of relational databases which will be, anyway, outlined in the initial part of the course.
The course will present traditional relational database management systems, highlighting the differences with recently proposed NoSQL paradigms. The course will recap the basic functionalities of a transactional relational system (OLTP): data definition, manipulation, and query languages; standard technologies (particularly concurrency management and query optimization). Then, OLAP systems, the multidimensional data model, and its ROLAP implementation will be considered. The issue of optimization for analytical queries (use of indexes, fragmentation, and materialization of views) will be explored in depth. The concepts will be exemplified in the PostgreSQL DBMS. Finally, in accordance with the time limits of the course, an overview of the technological choices adopted in the most recent NoSQL proposals (particularly the document-oriented DBMS MongoDB and the graph model DBMS Neo4J) will be possibly provided.
For OLTP: Avi Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concepts, 7th Edition, McGrawHill. For OLAP: M. Golfarelli, S. Rizzi, Datawarehouse Design, Modern Principles and methodologies, McGrawHill. For NoSQL: https://cassandra.apache.org/doc/latest/ https://www.mongodb.com/docs/manual/ https://neo4j.com/docs/
Recap of Relational DBMS. Architetcture and Functionalities of a (centralized) Data Base Management System (DBMS): Data Definition Languages, Data Manipulation Languages, View definition language. Transactional DBMS. the Relational data model (relational schemata, relations, primary e foreign keys). Encoding a conceptual data model into a relational model. SQL for defining data and consistency constraints in a relational DBMS (CREATE TABLE). SQL for manipulating data (INSERT, DELETE, UPDATE) SQL for creating views and materialized views. SQL for defining TRIGGERS. Technolgies of a (centralized) Relational DBMS. Transactions, ACID properties of transactions. Technologies for Isolation of transactions: 2PL, Strict 2PL, Timestamping. Buffer management Reliability control system: Log organization, Transaction management, Failure management Physical file organization: Heap structures, Sequential structures, Hash-based structures. Indexing: B-trees and B+trees, inverted indexes, bitmap indexes, hash indexes, multiattribute indexes. Query optimization: selection with indexes and join (nested loop join, index join, merge join, hash join), cost-based optimization. Introduction to Analytical DBMSs: OLTP versus OLAP. The multidimensional model and operations of the multidimensional model. The relational implementation of an OLAP system (ROLAP). Star schema and snowflake schema. Techniques for optimizing analytical queries: denormalization, horizontal and vertical partitioning, materialization of views. Optimization in PostgreSQL (indexes, materialized views, multiset group by, horizontal partitioning by inheritance, optimization parameters tuning) Introduction to the NoSQL paradigms. Document DBMS (XML and Json formats), Graph DBMS. Case studies: MongoDB, Neo4J.
Frontal lessons transmitted synchronously remotely. Class exercises. Home exercises with correction in class. Design team work.
Slides, exercises, previous exams can be found in the shared folder in the team channel fr the course.
The exam includes a written exam and a discussion of a project (possibly) developed in groups. The written exam consists of SQL exercises (writing commands and SQL queries) and open-ended questions regarding transaction system technologies (OLTP) and analytical systems (OLAP), with particular emphasis on query optimization techniques. Examples of exercises will be presented during the course, and examples of written exam texts will be provided throughout the course. Each exercise is assigned a maximum score (for a total sum of 32 points). The maximum score is awarded for the correct resolution of the exercise. In the presence of errors, the score is reduced according to the severity of the error. The written exam is passed if the cumulative score across the individual exercises is at least 18. There is the possibility of passing the written exam in two distinct parts: 1) SQL and design of relational databases; 2) DBMS technologies. The project work consists in optimizing a set of queries on a public data benchmark. The project is evaluated with respect to: the quality and completeness of the documentation; the appropriateness of the techniques adopted; to the results obtained. The grade of the exam is obtained by mediating the grade of the written exam and the grade of the project. Honors are awarded to those who obtain the maximum mark in both parts.
This course explores topics closely related to one or more goals of the United Nations 2030 Agenda for Sustainable Development (SDGs)