INFORMATION RETRIEVAL AND DATA VISUALIZATION
2° Year of course - Second semester
Frequency Not mandatory
- 6 CFU
- 48 hours
- English
- Trieste
- Opzionale
- Standard teaching
- Oral Exam
- SSD INF/01
Information Retrieval
The course will provide both a theoretical and practical introduction to the concept of information retrieval. You will learn how to efficiently search among large collections of documents and how to rank the documents found.
• Knowledge and understanding: After the course you will be able to
Understand how an information retrieval system works, which different models are available and how its performances can be evaluated.
• Applying knowledge and understanding: You will be able to decide how to implement and evaluate an information retrieval system for a given task.
• Making Judgments: You will be trained to critically assess information and data from various sources, synthesizing it to make informed decisions and propose solutions to complex Information retrieval problems.
• Communication skills: You will be able to describe how an information retrieval system works and explain the design choices made.
• Learning skills: You will be able to follow and learn new information
retrieval techniques.
Data Visualization
This course will provide the knowledge and practical skills necessary to develop a strong foundation in data visualization. You will learn how to visualize data in order to support reasoning in data analysis and communicate information to others.
• Knowledge and understanding: You will understand the basic concepts and mechanisms of data visualization. By studying human perceptual/cognitive processing and the properties of different visual channels, you will be able to make informed decisions about visualization designs.
• Applying knowledge and understanding: You will practice to derive abstract information from the data and choose an appropriate way to represent it in order to create an effective visualization for the given task. You will learn how to improve existing visualizations and design interactive visualizations in Python.
• Making judgement: You will be able to discern between trustworthy and misleading visualizations.
• Communication skills: You will learn how to present visualizations to the public.
• Learning skills: You will be able to upgrade your knowledge on specific aspects of data visualization using online resources and tools.
Information Retrieval
A basic understanding of linear algebra, probability, and data structures are needed. Programming knowledge, preferably in Python, is need for the practical part of the course.
Data Visualization
Basic knowledge of Python and scientific Python is needed to be able to follow the practical demonstrations, but is not mandatory for completing the final project.
Information Retrieval
The content of the course will provide both a high-level overview of information retrieval and an in-depth look on the most important techniques and approaches.
Data structures for IR, Models (Boolean, Vector space, probabilistic), Evaluation of IR system, IR on the
Data Visualization
Foundations of data visualization, historical visualizations, Data abstraction, Task abstraction, Human visual perception, Designing a visualization, Creating interactive visualizations in Python.
Introduction to Information Retrieval
Cambridge University Press. 2008.
Freely available at: http://nlp.stanford.edu/IR-book/
Tamara Munzner
Visualization Analysis & Design
A K Peters Visualization Series, CRC Press, Boca Raton, 2014.
Andy Kirk
Data Visualization: A Handbook for Data Driven Design
SAGE Publications, London, 2016.
Edward R. Tufte
The Visual Display of Quantitative Information
Graphics Press, Cheshire, 2015.
Information Retrieval
The content of the course will provide both a high-level overview of information retrieval and an in-depth look on the most important techniques and approaches.
• Introduction to information retrieval and its history
• Data structures for information retrieval
– Inverted indices: how they work, compression, querying
– How to build and keep inverted indices updated
• Models: Boolean, Vector space, probabilistic
– Set-based models: Boolean and fuzzy
– The vector space model
– How to assign weights
– Probabilistic models
– Language models
• Evaluation of information retrieval systems
• Clustering
– Flat clustering
– Hierarchical Clustering
– Latent Semantic Indexing
• Information retrieval on the web
– Crawling and the structure of the web
– Link analysis: Pagerank and HITS
Data Visualization
• Foundations: defining data visualization, historical visualizations, the purposes of data visualization and the three principles of good visualization design.
• Data abstraction: dataset types, attribute types and semantics.
• Task abstraction: visualization goals and tasks, actions and targets.
• Human visual perception: attention and memory, visual encoding, visual order, color perception and color specification.
• Designing a visualization: steps of visualization design, basic charts, visualizing multivariate data, uncertainty and missing data, interactivity, storytelling and tools.
• Examples: (un)trustworthy and (in)accessible visualizations.
• Creating interactive visualizations in Python.
Information retrieval
Theoretical lectures followed by practical demonstrations, in which the algorithm described in the theoretical lectures will be implemented and/or applied.
Data visualization
Frontal lectures with (non-compulsory) homework every week. The last few hours will be dedicated to hands-on sessions where you will create interactive visualizations in Python.
Lecture slides will be made available every day after the class.
The exam consists of two independent parts, one about information retrieval and the other about data visualization. The final mark is computed as the average of the marks of the two parts (the final mark can be rounded up/down depending on the class participation and the completion of homework assignments). The minimum grade to pass the exam is 18 and the maximum grade is 30.
Honours may only be awarded in exceptional cases if the student(s) has (have) extended the work in the paper with original contributions.
In any type of content produced by the student for admission to or participation in an exam (projects, reports, exercises, tests), the use of Large Language Model tools (such as ChatGPT and the like) must be explicitly declared. This requirement must be met even in the case of partial use.
Regardless of the method of assessment, the teacher reserves the right to further investigate the student's actual contribution with an oral exam for any type of content produced.
Information Retrieval
The exam will consist of a short practical project of designing an implementation of an information retrieval system followed by its presentation in class. The resulting project can be implemented in Python, other languages can be used after discussion with the professor.
The oral presentation should concisely communicate the system's design principles, methodologies employed, functionalities, strengths, potential areas of improvement, and any challenges encountered during the developmental phase. Evaluation criteria will emphasize the system's efficiency, accuracy, and user-friendliness. In addition, the clarity, depth, and consistency of the oral presentation will also be under scrutiny. Points might be deducted for systems that do not adhere to best practices of information retrieval design. To encourage iterative development, students will have an opportunity to refine their systems based on feedback to potentially achieve a higher grade. During the presentation general questions related to the course will be asked.
Data Visualization
The exam consists of a group project where small groups of two to three students select a topic and ask three questions about it. They need to find the data and design visualizations to answer these three questions (the visualizations can be created using Python or some other language/tool). The results are presented orally in a 10-minute presentation supported by slides. After the presentation, the students answer questions about their design choices.
The evaluation focuses on the accessibility, trustworthiness and elegance of the visualizations and the clarity, comprehensiveness and consistency of the entire presentation. Points are deducted for not adhering to the principles of good visualization design. Students are given the possibility to improve on the visualizations to achieve a better mark.
This course explores topics closely related to one or more goals of the United Nations 2030 Agenda for Sustainable Development (SDGs)