Topic outline

  • General

    Autumn School Machine Learning applied to Systems Biology

    Schwarzenberg 19-24 November 2017

    This page is addressed to registered participants. To access course description and application form (now closed), please click here.

    For any assistance, please contact training@sib.swiss


  • Practical Information

    Hotel

    The address of the Autumn School event is Hotel und Bildungszentrum Matt, Mattstrasse 19, 6103 Schwarzenberg, Switzerland.
    This is a nice location in the mountains, near the famous Pilatus. Schwarzenberg, village of about 1700 inhabitants, is  about 20 minutes by car from Luzern.

    The hotel website shows information but is unfortunately only in german: https://www.bzmatt.ch/


    Venue


    By public transportation:

    - the final stop is Schwarzenberg Ennematt. To get there:

    - from main railway stations, get to Luzern Station. Then,

    - Luzern -> Malters by regional train, then

    - Malters -> Scharzenberg Ennematt by bus, then

    - in front of the bus stop (you have to cross the road), take the Mattstrasse and walk for 5 minutes

    You can find timetables at https://www.sbb.ch/en

    On Sunday, you might plan the following:

    Luzern -> Schwarzenberg Ennematt
    16h16 -> 16h43 or
    17h16 -> 17h43


    By car:

    The address of the hotel is:

    Hotel & Bildungszentrum Matt
    Mattstrasse 19
    CH-6103 Schwarzenberg / Luzern

    Car parking is free of charge.


  • Programme

    Approximate timing for a typical day is the following:
    09:00 - 12:30           lectures
    12:30 - 14:00           lunch
    14:00 - 17:00/30      practical / exercises
    17:00/30 - 19:00/30 free
    19:00/30                   dinner

    Sunday 19 November - Broad introduction and welcome dinner

    Dr Frédéric Schütz, SIB Swiss Institute of Bioinformatics

    17:00 - 18:00    Arrival of the participants
    18:15                Informal welcome and presentation of the event (Grégoire Rossier, co-organizer, SIB Swiss Institute of Bioinformatics)
    18:30                Broad machine learning introduction
    19:30                "Round table" with participants' background and expectations
    ~20:15             Welcome dinner

    Monday 20 November: Introduction to machine learning

    Dr Frédéric Schütz, SIB Swiss Institute of Bioinformatics

    Morning: Lectures

      Introduction to machine learning
    • Supervised vs unsupervised learning
    • Introduction to some classification and machine learning algorithms: k-means, LDA/QDA, Random forest, etc.
    • Evaluating performance
      • generalization/overfitting
      • training, test sets
      • cross-validation, bootstrap, jackknife
      • Model selection
      • ROC curves
    Afternoon: Exercises: machine learning with R.

    Tuesday 21 November: Best practice in applied machine learning

    Dr Eric Paquet, Computational Systems Biology, EPFL

    Morning: lectures
    • Pitfalls, experimental design and batch effect
    • Diagnostic/QC plots in R
    • PCA
    • Clustering/heatmaps
    • Boxplots
    • Normalization
    • Feature selections
    • Regularization (lasso, ridge and elastic net)
    • Neural networks (perceptron)
    • Kernel trick (spectral)
    • Reproducible research, Sweave, Jupyter notebooks, git
    • Example of the MAQC II
    • Example of applied machine learning in Systems Biology
    • Cancer subtypes. How many subtypes? and identification
    • HMM
    • image analysis (drug discovery)
    • image analysis (morphology classification)
    Afternoon: exercises

    Wednesday 22 November: Participants’ day

    Morning: "participants, the floor is yours..."
    • Lightning presentations
    • Poster session
      see a detailed list below
    Afternoon: Social activity
    • visit of a glass factory, including fun activities.

    Thursday 23 November: Machine Learning and metagenomics to study microbial communities

    Dr Luis Pedro Coelho, EMBL, Heidelberg, Germany

    Morning: lectures**
    • Brief Introduction to microbial community wetlab technologies
    • Presentation of important questions in the field
    • Overview of raw data processing with NGLess tool
    • Classification based on metagenomics-derived features
    • Example based on Zeller et al., 2014: http://doi.org/10.15252/msb.20145645
    • Feature normalization/filtering
    • Biomarker discovery
    **Lectures will be interactive based on Python & Jupyter notebooks

    Afternoon: exercises
    • Clustering for metagenomics: Metagenomic species, mOTUs, subspecies discovery…
    • Machine learning for the exploration of community/environmental links:
    • Example based on Sunagawa et al., 2015: http://doi.org/10.1126/science.1261359
    • Different forms of ordination analysis
    • Feature normalization for clustering
    • Discussion of batch effects and techniques to minimize their impact on the final analysis
    • Computer vision techniques for studying micro-eukaryotic communities

    Friday 24 November : Deep learning in single-cell analysis

    Dr María Rodríguez-Martínez, IBM Research Lab Zurich

    Morning: lectures
    • Introduction to deep learning
      • Why and how deep
      • Activations functions
      • Cost functions
      • Backpropagation
      • Regularization
      • Optimization
    • Multi-Layer Perceptron (MLP)
    • Auto-enconders (AE)
    • Convolutional Neural Networks (CNN)
    • Recurrent Neural Networks (RNN)
    Afternoon: exercises
    • Word Embeddings for molecular interaction inference (INtERAcT)
    • Deep SWATH-MS, deep and unsupervised MS processing (DeepSWATH)
    • Characterizing cell populations on single-cell data
  • Participants' presentations

    Posters

    Bulak Arpat
    Analysis of Translational Pausing by Disome Profiling
    Amel Bekkar
    Logical modeling of cardiovascular disease
    Violeta Castelo Szekely
    Sequence determinants of DENR-MCTS1 mediated translation reinitiation
    Chiara Cotroneo
    Computational prediction of clusters of bacterial genes
    Sunniva Foerster
    Pairwise drug combinations against Neisseria gonorrhoeae
    Anamarija Fofonjka
    An elastic instability generates predictable folding of the frilled dragon erectile ruff during development
    Qingyao Huang
    Integrative analysis of cancer genome profiling data to study the interplay of genetic background and molecular mechanisms in cancer
    Lidia Lacruz
    Prevision of facial morphology within the context of forensic DNA phenotyping
    Mose Manni Machine Learning for predictions in infectious diseases outbreaks
    Marco Meola
    Improved classification of short read sequences from dairy products for bacterial species identification using the manually curated reference database DAIRYdb
    Gautam Munglani
    Image feature recognition and quantification of tip-growing cells
    Rocío Rama Ballesteros Pattern recognition of relevant features involved in coevolution
    Stephan Schmeing ReSequenceR: Simulating more realistic high-throughput sequencing data
    Marthe Solleder
    Analysis and prediction of phosphopeptide-HLA interaction
    Daniel Spies
    De-convolution of epigenetic regulators of RNAi mutant mESC in MES and 2i media
    Christoph Stritt Population genomics of transposable elements in a Mediterranean grass species


    Lightning presentations

    Each presentation should be 5 minutes + 2-3 minutes question.

    Nicolas Blöchliger
    Quantifying the uncertainty of antimicrobial susceptibility testing
    Janko Tackmann
    Inference of  Microbial Interaction Networks from Massive Data Sets through Causal Knowledge Discovery
    Monica Ticlla Ccenhua
    Identifying metabolic pathways in Mycobacterium tuberculosis relevant for transmission
    Mariamawit Ashenafi
    3D Functional Organization of Plant Nucleus
    Marti Bernardo Faura
    A systems biologist´s expedition: from systems biomedicine to plant biotechnology
    Adhideb Ghosh Prediction of gain-of-function and loss-of-function variants using in silico bioinformatics tools
    David Dreher
    Towards shapes as landmarks in development
    Athos Fiori
    Gene expression dynamic across cell cycle
    Simon Friedensohn
    Mining immune repertoires for functional antibodies
    POSTER SESSION

    Annika Gable
    Using biological networks to identify new drug targets for rare genetic diseases
    Tilman Flock
    Beyond structure-guided drug design
    Alicia Kaestli
    A software for automated arrhythmia detection in iPSC-derived cardiomyocytes
    Mattia Tomasoni
    GWAS on features extracted from the retinal images
    Lisa Lamberti
    Efficient methods for detecting genetic interactions
    Hyunjin Shim
    Feature learning of virus genome evolution with the nucleotide skip-gram neural network
    Marie Zufferey
    Comparison of TAD calling methods
    Garif Yalak
    Exoenzymes: Enzymes of the extracellular matrix


  • Prerequisites and software installation

    Knowledge / skills:
    • Active participation.
    • Ready for networking with peers and teachers.
    • Good programming skills in Python and R.
    • Basic statistical knowledge.
    • Basic of terminal (shell) usage
    Material:
    Here is the list with links to the installation page

    R 3.4.2 :
    and following packages :
    gplots
    e1071
    class
    ROCR
    ggplot2
    randomForest
    caret
    nnet

    Rstudio Desktop (open source license) :

    Python :
    and following packages :
    -Scikit-learn
    -matplotlib
    -seaborn
    -numpy
    -scipy

    Jupyter notebook :

    Docker :

    Weka 3.8 :