Basic tools to know or learn if you work with me

Most information on computer ressources at INRAE/MIAT and on usage of linux and software for linux is available in Livret Informatique MIAT. Feel free to report any missing or incorrect information. In addition to this tutorial, the current page gathers basic recommandations on practices, links and tools that I recommend to use for your work with me. They are all written for Linux based (mostly Ubuntu) OS. Again, do not hesitate to report mistakes or missing information.

Between meetings, if you have a question or a problem, I prefer to be contacted my mattermost if these are short questions (in a canal shared with the other supervisors if any) or by emails (also put in CC the other supervisors if any): I will usually find a way to meet you soon to help you solve it (if I can not solve it by electronic means). Be as precise as possible explaining your problem (for computing issues, for instance, copy/paste your command, the error messages or join screenshots).

R

I’m mainly programming with R and RStudio IDE.

Installation

On Ubuntu, RStudio is installed by downloading the Debian package at this link and by running the following command lines (from the download directory):

sudo dpkg -i ...
sudo apt install -f

where ... is the name of the downloaded file. The first command line usually returns an error that is corrected by the second one. R can be installed in two different ways: from the Ubuntu repository or a direct, fixed version, can be installed from RStudio cloud, which is better suited for reproducible analyses.

Reproducibility in R analyses

What I strongly recommend for every projet is to:

  • have a fixed version of R installed on your computer (and install the most recent version each time you create a new project) as described on this page;

  • have an associated R project created with RStudio, myproject.Rproj (for instance);

  • include a file profile4R.sh referencing the proper R version with:
    export RSTUDIO_WHICH_R=/opt/R/4.0.3/bin/R
    

    in it (adapted to your R version)

  • always launch RStudio from a terminal:
    source profile4R.sh
    rstudio myproject.Rproj &
    

To ensure the maximum reproducibility (and efficiency) of your analyses:

  • set up a renv environment as described in this tutorial (if a security warning is displayed on this page, you can safely ignore it) and systematically use it.

  • organize your scripts and data properly. My project repositories are generally organized as:
    .
    ├─ communications
    │   ├── 2018-01-12_workshopXXX
    │   └── 2018-03-15_SeminaireYYY
    ├── CR
    ├── data
    ├── myproject.Rproj
    ├── results
    ├── renv.lock
    ├── renv
    └── RLib
    
  • raw data files must never be manually edited. Unless said differently, a file is modified by creating a script that documents and performs the data edition and that exports a new dataset;

  • analyses are all performed in scripts or (better) RMarkdown or quarto files fully commented in English (origin of the data, purpose of the script, interpretation of results), including the date of last modification and a sessionInfo() output;

  • scripts are run from the RLib repository and must contain only relative paths to data. Never use read.table("/home/myname/someproject/data/mydata.csv") but use read.table("../data/mydata.csv") instead. Another (probably even more elegant) solution is to use the here package and to write your path from the project root systematically;

  • scripts are properly formatted using standard conventions in R: see the lintr website for recommandations. You can also check the R package formatr for some automatic formatting;

  • using R, avoid loops in your scripts.

Other interesting tips for R

Finally, for quick&dirty analyses or package development (or every time you don’t need to control the reproducibility of your analyses), I strongly advise using the official ubuntu release of R along with the package repository provided by r2u:

  • R is installed with (in a terminal, for Ubuntu 22.04 LTS jammy):
    sudo su
    wget -q -O- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc \
      | tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
    echo "deb [arch=amd64] https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/" \
      > /etc/apt/sources.list.d/cran_r.list
    apt-key adv --keyserver keyserver.ubuntu.com --recv-keys \
      67C2D66C4B1D4339 51716619E084DAB9
    apt update
    apt install r-base-core
    
  • r2u is configured with (in a terminal, same distribution):
    sudo su
    apt update -qq && apt install --yes --no-install-recommends wget \
      ca-certificates gnupg
    wget -q -O- https://eddelbuettel.github.io/r2u/assets/dirk_eddelbuettel_key.asc \
      | tee -a /etc/apt/trusted.gpg.d/cranapt_key.asc
    echo "deb [arch=amd64] https://r2u.stat.illinois.edu/ubuntu jammy main" \
       > /etc/apt/sources.list.d/cranapt.list
    

    CRAN and Bioconductor packages are then simply installed with:

    sudo apt install r-cran-mixkernel
    sudo apt install r-bioc-asics
    

    respectively.

Additionally,

Git

First download of a project is performed using git clone ... with ... being the repository URL. Once your repository cloned, you can create a RStudio project (as explained in the previous section) associated to it, which should give you access to a Git menu, with easy-to-use buttons.

Good practices on git include:

  • not versioning heavy data files (that are not supposed to change; these files can be sent to collaborators using FileSender;

  • not versioning result files (that can be produced from scripts present in the directory or compiles from a tex file);

  • not versioning binary files (or only the ones that are strictly necessary and are not heavy).

Genotoul BioInfo server

To run your scripts, organize your remote directories as your local directories (sending files with the command line scp or using the linux tool gftp that can be installed with sudo apt install gftp or using git). Do not forget to send your renv.lock file as well and to use the same R version (all R versions are not necessary available on Genotoul). You will have first to reinstall your renv environment using the renv::restore() command.

LaTeX

For writing articles or sometimes for creating slides, posters, …, I am usually using $\LaTeX$ that handles perfectly mathematical formula. I strongly recommend that you maintain a unique global bib file for your bibliography somewhere. Be careful that BibTeX entries imported from website or with zotero usually need some cleans-up (in particular, BibTeX file are recommended to be written in pure ASCII, while automatic importation are usually encoded in UTF-8, e.g., including accents or some special characters; automatic importations also frequently include non-official fields).