This two-day course will provide an overview of the RNA-seq analysis pipeline, as well as the downstream analysis of the resulting data using bioconductor packages in R. The course will cover the following topics:

  • The structure of an RNAseq analysis pipeline:
    • Raw data quality check;
    • RNAseq reads alignment;
    • Gene Expression level quantification and normalization by reads counting;
    • De novo Transcripts reconstruction and differential splicing.
  • Overview of downstream analysis
    • Differential Expression analysis with R/Bioconductor packages;
    • Class discovery: usage of Principal Component Analysis, Clustering, Heatmaps, Gene Set Enrichment Analysis in RNA-seq analysis.

Next Generation Sequencing (NGS) techniques will not be covered in this course; experimental design as well as the statistical methods will not be detailed in this course.

In this course, R programmers will learn how to create R packages, the best way to make R scripts reusable. Participants will learn how to identify and create clear, clean and usable packages in R.

This course is recommended even for programmers who do not plan to distribute their R scripts or datasets: R packages are also useful for a developer who works alone, and wants to keep track of his scripts and the related documentation.

Usage of NGS is increasing in several biological fields due to a very rapid decrease in cost. However, it often results in hundreds of Gbs of data making the downstream analysis very challenging and requires bioinformatics skills.

In this module, we will introduce the most used sequencing technologies and explain their decryption concepts.

We will also introduce the repositories e.g. the European Nucleotide Archive (ENA), Sequence Read Archive (SRA) from which you could retrieve raw data based on specific experiments. We will practice the usage of command line tools to search and fetch NGS raw data in a powerful way.

Finally, using different datasets, we will practice screening for quality control, filtering reads for better downstream analysis, mapping reads to reference genome and visualize the output.

CAS-PMO 2018-2019