STATISTICAL LEARNING AND MACHINE LEARNING
2° Year of course - First semester
Frequency Not mandatory
- 9 CFU
- 60 hours
- ITALIANO
- Trieste
- Obbligatoria
- Standard teaching
- Oral Exam
- SSD SECS-S/01
- Knowledge and understanding: students must show that they understand the essential ideas that motivate the use of supervised learning techniques and the critical aspects that limit their use.
- Applying knowledge and understanding: students must show that they knows how to use the techniques they have learned for the analysis of real data, also by using suitable software tools.
- Making judgements: students must show that they know how to choose the most suitable analysis strategy also in the context of analysis of a real data set.
- Communication skills: students will be able to effectively communicate the results of data analysis using appropriate tools (including modern techniques for compiling dynamic documents).
- Learning skills: students at the end of the course will be able to consult scientific papers, theoretical or applied, that involve the use of supervised machine learning techniques.
Basic knowledge of statistics (equivalent to two undergraduate statistics courses). Ability to use R software.
The course aims to provide the essential ideas, the concepts and the main techniques of statistical and machine learning in particular for supervised learning problems.
1.Introduction
2.What Is Statistical Learning?
a. Supervised Versus Unsupervised Learning
b. Regression Versus Classification Problems
c. Assessing Model Accuracy
d. The Trade-Off Between Prediction Accuracy and Model Interpretability
e. The Bias-Variance Trade-Off
f. Sample use and re-use: test and training set, cross validation, bootstrap
3. Regression problems
a. Linear Model Selection and
Regularization
b. Ridge Regression and the Lasso
c. Moving Beyond Linearity
d. Polynomial Regression
e. Regression Splines Smoothing Splines
f. GAM
g. Tree-Based Methods: Regression Trees
4. Classification problems
a. K nearest neighbours
b. Discriminant Analysis
c. Logistic regression
d. Classification trees
e. Support vector machines
5. Ensemble methods
a. Bagging, Random Forest, Boosting
6. Deep learning and neural nets
7. Practical topics and application of supervised learning
a. Data Wrangling and feature engineering
b. Data imbalance in classification
c. Some applications for actuarial problems.
James G., Witten D. , Hastie T, Tibshirani R - An Introduction to Statistical Learning, Second Edition. Springer 2021. (Main text) It can be freely downloaded from https://www.statlearning.com/ Hastie T, Tibshirani R, Friedman J - The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer, 2009. It can be freely downloaded from https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12.pdf Handouts and additional material will be provided on moodle2/units platform
Classroom lessons.
Laboratory activities and guided exercises.
Group work with common discussion.
The course will make use of teaching tools available on the moodle2, MS/Teams and wooclap platforms. In addition, all students are expected to use R software, so they need to have or have access to a computer.
The evaluation takes place at different times and in several ways:
- For attending students:
1. During the course, homework will be assigned to be delivered within the established deadlines;
2. Some intermediate tests will be held during the course;
3. The student must submit a report in which he exposes the result of a project assigned at the end of the course.
The final evaluation will take place by averaging the marks obtained in the 3 parts (with weights respectively equal to 0.2, 0.4, 0.4).
The three parts of the exams are such that it is possible to judge the achievement of the training objectives as set out above.
- For non-attending students:
students will participate to an oral exam in which they will also be asked to carry out some analyses using the R.
This course covers some topics related to one or more objectives of the 2030 Agenda for the Sustainable Development of United Nations.