Statistical Learning

Basic description of this course

This course is part of a full series of statistics classes, organized by Genotoul - Biostatistics in the INRA of Toulouse during the years 2011 and 2012. Its aim is to introduce forecasting methods coming from machine learning. It is divided into

  • a presentation of the topic (approximately 3 hours);
  • practical applications using the free statistical software environment R R.

The course is organized as follows:

  • basic introduction to machine learning;
  • introduction to multilayer perceptrons (neural networks);
  • introduction to classification and regression trees (CART algorithm);
  • introduction to random forests.

The following R packages are used in the applications:

  • car (usefull tools);
  • nnet (neural networks);
  • e1071 (machine learning);
  • rpart (classification and regression trees);
  • randomForest (random forests).

2012 material

How to use the material?

Once all the files have been downloaded, the best way to use them is to make the following directory hierarchy:

  • a folder named “data” containing the data (Rdata and csv files above);
  • a folder named “ML” containing two subdirectories:
    • one named “Cours” with the theoretical part slides;
    • one named “TP” with the slides and the R script related to the practical application.