Accèder directement au contenu

Machine Learning in Python with scikit-learn

Graduated Program in Life Science, Department of Biology, ENS-PSL
BIO-AA-PG- | Machine Learning in Python with scikit-learn (ENS/Biology)
Level | Semester : PhD and Postdocs | S2
Where : Biology department, ENS
Duration : 6 weeks
Dates : April 8th – May 27th, 2024
Maximum class size : 16 students

2024 program

Coordination

Aurélien Wyngaard, Department of Biology, ENS
Denis Thieffry, Department of Biology, ENS

Credits

3 ECTS

Keywords

Python | Programming | Linux | Machine learning | scikit-learn

Course prerequisites

A little bit of Linux and good bases in Python (being able to handle numpy arrays, ideally pandas dataframes, and knowing how to make plots).
If you have no Linux/Unix background, you can check the first sections of an online course such as https://www.tutorialspoint.com/unix/index.htm

Course objectives and description
Aims

The objective of the course is to initiate young life-science scientists to the bases of machine learning, and how to use it in Python with the scikit-learn package.

Organization

The course will include twelve classes (two per week), each two-hours long, over a period of six weeks (with a one-week break), in April-May 2024.
A large part of each class will be devoted to practical coding exercises.
A few hours of homework per week.

Assessment

• The participants will be regularly asked to explain their code during the classes.
• Coding exercises and quizzes will be proposed over the duration of the course.

Course material

The course will be based on the INRIA open online course (https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/), adapted towards biology.

2024-2025 program

• April 8th : Introduction and tabular data exploration
• April 11th : Fitting a scikit-learn model on numerical data (1)
• April 15th : Fitting a scikit-learn model on numerical data (2) and on categorical data
• April 18th : Selecting the best model (1)
• April 22nd : Selecting the best model (2) and dealing with hyperparameters
• April 25th : Linear models (1)
• April 29th : Linear models (2)
• May 2nd : Linear models (3)

Week off (May 6th-12th)

• May 13th : Decision tree models
• May 16th : Evaluating model performance (1)
• May 21st : Evaluating model performance (2) (/ !\ on Tuesday, not Monday !)
• May 27th : Evaluating model performance (3) and ensemble of models