Machine Learning in Python with scikit-learn
Graduated Program in Life Science, Department of Biology, ENS-PSL
BIO-AA-PG- | Machine Learning in Python with scikit-learn (ENS/Biology)
Level | Semester : PhD and Postdocs | S2
Where : Biology department, ENS
Duration : 6 weeks
Dates : April 8th – May 27th, 2024
Maximum class size : 16 students
Coordination
Aurélien Wyngaard, Department of Biology, ENS
Denis Thieffry, Department of Biology, ENS
Credits
3 ECTS
Keywords
Python | Programming | Linux | Machine learning | scikit-learn
Course prerequisites
A little bit of Linux and good bases in Python (being able to handle numpy arrays, ideally pandas dataframes, and knowing how to make plots).
If you have no Linux/Unix background, you can check the first sections of an online course such as https://www.tutorialspoint.com/unix/index.htm
Course objectives and description
Aims
The objective of the course is to initiate young life-science scientists to the bases of machine learning, and how to use it in Python with the scikit-learn package.
Organization
The course will include twelve classes (two per week), each two-hours long, over a period of six weeks (with a one-week break), in April-May 2024.
A large part of each class will be devoted to practical coding exercises.
A few hours of homework per week.
Assessment
• The participants will be regularly asked to explain their code during the classes.
• Coding exercises and quizzes will be proposed over the duration of the course.
Course material
The course will be based on the INRIA open online course (https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/), adapted towards biology.
• April 8th : Introduction and tabular data exploration
• April 11th : Fitting a scikit-learn model on numerical data (1)
• April 15th : Fitting a scikit-learn model on numerical data (2) and on categorical data
• April 18th : Selecting the best model (1)
• April 22nd : Selecting the best model (2) and dealing with hyperparameters
• April 25th : Linear models (1)
• April 29th : Linear models (2)
• May 2nd : Linear models (3)
Week off (May 6th-12th)
• May 13th : Decision tree models
• May 16th : Evaluating model performance (1)
• May 21st : Evaluating model performance (2) (/ !\ on Tuesday, not Monday !)
• May 27th : Evaluating model performance (3) and ensemble of models