Mod. B: Data Mining

[298SM]
a.a. 2025/2026

2° Year of course - First semester

Frequency Not mandatory

  • 3 CFU
  • 24 hours
  • INGLESE
  • Trieste
  • Opzionale
  • Oral Exam
  • SSD INF/01
Curricula: common
Syllabus

In this course, the students will learn advanced techniques for the design and analysis of sequential algorithms.

Knowledge and understanding: general-purpose data structures for implementing dictionaries. Advanced text-indexing data structures and pattern-matching algorithms. Sketching data structures for the analysis of massive datasets.

Applying knowledge and understanding: being able to identify the most suitable techniques to solve algorithmic problems and correctly apply them. Being able to evaluate the performance of algorithms.

Making Judgement: being capable of applying and combine algorithmic techniques in a critical way, identifying the most effective approaches to solve a given problem. Being able to critically compare different methods to evaluate their effectiveness.

Communication skills: being able to explain the basic ideas of algorithms and communicate the results to a learned public.

Learning skills: being capable of exploring literature of algorithms to find alternative approaches and combine them to solve complex problems.

Basic knowledge of algorithms and data structures. Basic knowledge of complexity analysis, as from an introductory course of algorithms.

Dictionaries: balanced search trees and hashing. Amortized analysis techniques: aggregate analysis and accounting method. String matching problem: definition and use of Suffix Tree and Suffix Array, with several applications. Approximate string matching problem: Hamming distance, k-mismatch problem and its solution using the suffix tree. Edit distance and its computation. Algorithms for massive datasets: streaming model, Bloom filter, Count-min sketch, k-th minimum value.

Cormen, Thomas; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford. Introduction to Algorithms. MIT Press and McGraw-Hill Gusfield, Dan: Algorithms on Strings, Trees, and Sequences. Cambridge University Press Aggarwal, Charu C. Data mining: the textbook. Vol. 1. New York: springer, 2015. (freely available online)

Dictionaries: balanced search trees and hashing. Amortized analysis techniques: aggregate analysis and accounting method. String matching problem: definition and use of Suffix Tree and Suffix Array, with several applications. Approximate string matching problem: Hamming distance, k-mismatch problem and its solution using the suffix tree. Edit distance and its computation. Algorithms for massive datasets: streaming model, Bloom filter, Count-min sketch, k-th minimum value.

Frontal and interactive lectures. During the lectures, non-compulsory homework exercises will be given to help understanding of the course's topics.

-

The exam will consist of a written theory test. The students will be required to: - give formal definitions of the problems seen in class - state and justify the computational complexity of the studied algorithms - compare different solutions to the same problem - provide simple examples and/or simulate some steps of the algorithms on the examples - describe the algorithms seen in class. The written test will be given a grade on the scale of 30s. Laude can be given for an exceptional exam, in which the student demonstrates a deep understanding of the concepts and is able to exhibit them clearly and concisely.

Questo insegnamento approfondisce argomenti strettamente connessi a uno o più obiettivi dell’Agenda 2030 per lo Sviluppo Sostenibile delle Nazioni Unite

icona 4 icona  9