Computational Genomics

[327SM]
a.a. 2025/2026

2° Year of course - First semester

Frequency Not mandatory

  • 6 CFU
  • 48 hours
  • English
  • Trieste
  • Obbligatoria
  • Standard teaching
  • Oral Exam
  • SSD INF/01
  • Advanced concepts and skills
Curricula: DATA SCIENCE AND ARTIFICIAL INTELLIGENCE FOR HEALTH AND LIFE SCIENCES
Syllabus

Knowledge and understanding: capability of understanding current research in the application of advanced Machine Learning models to real biological problems, with a strong focus on computational oncology.

Applying knowledge and understanding: capability of identifying the best model to approach a real biological data analysis problem, putting into practice theoretical training from other courses. Capacity to implement models for real case studies.

Making judgments: understanding how to translate a biological problem into a data analysis problem, leveraging on modern technologies.

Communication skills: capability of explaining to a wide audience, both in verbal and graphical terms, how to solve a real biological problem with advanced machine learning solutions, understanding the pros and cons of each model.

Learning skills: Capability of running real data analysis in the context of computational oncology, using state of the art machine learning tools.

Understanding of probabilistic graphical models and some practicality with their implementation (e.g., in probabilistic programs). Keen desire to see how to apply Machine Learning to biology. No concept or understanding of biology is required.

This course covers state-of-the-art data analysis approaches for modern computational biology applied to cancer. Required background on biological concepts will be given, and every lecture will examine a real world biological data analysis problem where the suitable computational model is presented. Most of the models we will see will have deep roots in probabilistic Bayesian graphical, and will be discussed also from the point of view of the implementation. Practical session for every theoretical topic will be delivered in the R/Python programming languages, in order to show the applications to real-world data. The knowledge acquired will be perfect to apply for a PhD program in computational biology, and to approach a Master thesis in the area.

- Biology primer
- Cancer primer
- R programming refresher/tutorial
- Bioinformatics sequence alignment and somatic mutation calling
- Statistical models for Copy Number calling from bulk sequencing
- Integrative models for Quality Control somatic calls
- Normalisation for Copy Number calling to derive Cancer Cell Fractions
- Statistical models for Subclonal Copy Number detection from multiple data types
- Non-negative matrix for Mutational signatures deconvolution
- Hierarchical Non-parametrics for Mutational signatures deconvolution
- Stochastic branching-process models for Population Genetics of allele diffusion
- Dirichlet hierarchical mixture models for tumour subclonal deconvolution
- Possion-Cox models for Copy Number timing
- Introduction to Deep-Learning based Digital Pathology
- Mixed-effects models for single-cell Differential Expression
- Introduction to Large Language Models for DNA modelling
- Final project discussion

Notes available from the teacher, for every topic, along with scientific papers. For most lectures, R/Python source code to analyze real and simulated data provided by the instructor.

- Biology primer
- Cancer primer
- R programming refresher/tutorial
- Bioinformatics sequence alignment and somatic mutation calling
- Statistical models for Copy Number calling from bulk sequencing
- Integrative models for Quality Control somatic calls
- Normalisation for Copy Number calling to derive Cancer Cell Fractions
- Statistical models for Subclonal Copy Number detection from multiple data types
- Non-negative matrix for Mutational signatures deconvolution
- Hierarchical Non-parametrics for Mutational signatures deconvolution
- Stochastic branching-process models for Population Genetics of allele diffusion
- Dirichlet hierarchical mixture models for tumour subclonal deconvolution
- Possion-Cox models for Copy Number timing
- Introduction to Deep-Learning based Digital Pathology
- Mixed-effects models for single-cell Differential Expression
- Introduction to Large Language Models for DNA modelling
- Final project discussion

Blackboard teaching and computer coding sessions. Students may be requested to bring their own laptop. Teaching materials are available in due course, via GitHub.

Sometimes external tutors (expert researchers) might attend and deliver lectures.

A project, usually carried out in groups (3-5 people), will be assigned based on the interests. The project includes usually reproducing some real world analysis, oretending some machine learning model (mathematics and implementation) to analyze real data.
A seminar will present the results. Historically, results presented by the students might be made available online as a collective preprint or similar.

This course explores topics closely related to one or more goals of the United Nations 2030 Agenda for Sustainable Development (SDGs)