Usage of NGS is increasing in several biological fields due to a very rapid decrease in cost. However, it often results in hundreds of Gbs of data making the downstream analysis very challenging and requires bioinformatics skills.

In this module, we will introduce the most used sequencing technologies and explain their decryption concepts.

We will also introduce the repositories e.g. the European Nucleotide Archive (ENA), Sequence Read Archive (SRA) from which you could retrieve raw data based on specific experiments. We will practice the usage of command line tools to search and fetch NGS raw data in a powerful way.

Finally, using different datasets, we will practice screening for quality control, filtering reads for better downstream analysis, mapping reads to reference genome and visualize the output.

Within the Swiss Personalized Health Network ([SPHN](https://sphn.ch)) and related national initiatives researchers use patient data (i.e., confidential human data) in their research projects. Dealing with confidential human data requires awareness of data privacy, respective laws and information security. This course explains what should be done in practice to protect the patients’ privacy when performing biomedical research on human data.

Do you feel unable to statistically analyse data, despite having already followed an introduction course on statistics ? If yes, this course was created for you.
The goal of this training is to provide researchers with the practical skills required in order to analyse real biomedical data. This includes:

  • how to explore data
  • how to choose and apply an analysis method (statistical tests in particular)
  • how to manage common issues encountered during data analysis, such as outliers, batch effects, management of biological vs technical replicates
  • how (and when) to evaluate the power of an experiment
  • how to communicate the results

During this two-day training, you will be provided with datasets to analyse in small groups, using information provided by the trainers. The results will then be discussed together. The datasets will be chosen to allow you to cover the most common questions that arise during a statistical analysis, including the assumptions of tests (and the requirement for normality of data in particular), the handling of outliers, missing data.

With a constant evolution of technologies, laboratory biologists are faced with an increasing need of bioinformatics skills to deal with high-throughput data storage, retrieval and analysis.

Although several resources developped for such tasks have a web interface (most of the time, the first choice of biologitsts), many operations can be more efficiently handled with command lines (CLI).

With a constant evolution of technologies, laboratory biologists are faced with an increasing need of bioinformatics skills to deal with high-throughput data storage, retrieval and analysis.

Although several resources developped for such tasks have a web interface (most of the time, the first choice of biologitsts), many operations can be more efficiently handled with command lines (CLI).

Python is an open-source and general-purpose scripting language which runs on all major operating systems. It was designed to be easily read and written with comparatively simple syntax, and is thus a good choice for beginners in programming. Python is applied in many disciplines and is one of the most common languages for bioinformatics. The Python community enthusiastically maintains a rich collection of libraries/modules for everything from web development to machine learning. Other programming languages such as R have comparable functionality to Python, however some tasks are more natural (and easier!) in Python.

CAS-PMO 2018-2019