Topic outline

  • General

    Autumn School Machine Learning for Bioinformatics

    Schwarzenberg 19-24 November 2017

    This page is addressed to registered participants. To access course description and application form (now closed), please click here.

    For any assistance, please contact training@sib.swiss

    • Practical Information

      Hotel

      The address of the Autumn School event is Hotel und Bildungszentrum Matt, Mattstrasse 19, 6103 Schwarzenberg, Switzerland.
      This is a nice location in the mountains, near the famous Pilatus. Schwarzenberg, village of about 1700 inhabitants, is  about 20 minutes by car from Luzern.

      The hotel website shows information but is unfortunately only in german: https://www.bzmatt.ch/


      Venue


      By public transportation:

      - the final stop is Schwarzenberg Ennematt. To get there:

      - from main railway stations, get to Luzern Station. Then,

      - Luzern -> Malters by regional train, then

      - Malters -> Scharzenberg Ennematt by bus, then

      - in front of the bus stop (you have to cross the road), take the Mattstrasse and walk for 5 minutes

      You can find timetables at https://www.sbb.ch/en

      On Sunday, you might plan the following:

      Luzern -> Schwarzenberg Ennematt
      16h16 -> 16h43 or
      17h16 -> 17h43


      By car:

      The address of the hotel is:

      Hotel & Bildungszentrum Matt
      Mattstrasse 19
      CH-6103 Schwarzenberg / Luzern

      Car parking is free of charge.


      • Programme

        Approximate timing for a typical day is the following:
        09:00 - 12:30           lectures
        12:30 - 14:00           lunch
        14:00 - 17:00/30      practical / exercises
        17:00/30 - 19:00/30 free
        19:00/30                   dinner

        Sunday 19 November - Broad introduction and welcome dinner

        Dr Frédéric Schütz, SIB Swiss Institute of Bioinformatics

        17:00 - 18:00    Arrival of the participants
        18:15                Informal welcome and presentation of the event (Grégoire Rossier, co-organizer, SIB Swiss Institute of Bioinformatics)
        18:30                Broad machine learning introduction
        19:30                "Round table" with participants' background and expectations
        ~20:15             Welcome dinner

        Monday 20 November: Introduction to machine learning

        Dr Frédéric Schütz, SIB Swiss Institute of Bioinformatics

        Morning: Lectures

          Introduction to machine learning
        • Supervised vs unsupervised learning
        • Introduction to some classification and machine learning algorithms: k-means, LDA/QDA, Random forest, etc.
        • Evaluating performance
          • generalization/overfitting
          • training, test sets
          • cross-validation, bootstrap, jackknife
          • Model selection
          • ROC curves
        Afternoon: Exercises: machine learning with R.

        Tuesday 21 November: Best practice in applied machine learning

        Dr Eric Paquet, Computational Systems Biology, EPFL

        Morning: lectures
        • Pitfalls, experimental design and batch effect
        • Diagnostic/QC plots in R
        • PCA
        • Clustering/heatmaps
        • Boxplots
        • Normalization
        • Feature selections
        • Regularization (lasso, ridge and elastic net)
        • Neural networks (perceptron)
        • Kernel trick (spectral)
        • Reproducible research, Sweave, Jupyter notebooks, git
        • Example of the MAQC II
        • Example of applied machine learning in Systems Biology
        • Cancer subtypes. How many subtypes? and identification
        • HMM
        • image analysis (drug discovery)
        • image analysis (morphology classification)
        Afternoon: exercises

        Wednesday 22 November: Participants’ day

        Morning: "participants, the floor is yours..."
        • Lightning presentations
        • Poster session
          see a detailed list below
        Afternoon: Social activity
        • visit of a glass factory, including fun activities.

        Thursday 23 November: Machine Learning and metagenomics to study microbial communities

        Dr Luis Pedro Coelho, EMBL, Heidelberg, Germany

        Morning: lectures**
        • Brief Introduction to microbial community wetlab technologies
        • Presentation of important questions in the field
        • Overview of raw data processing with NGLess tool
        • Classification based on metagenomics-derived features
        • Example based on Zeller et al., 2014: http://doi.org/10.15252/msb.20145645
        • Feature normalization/filtering
        • Biomarker discovery
        **Lectures will be interactive based on Python & Jupyter notebooks

        Afternoon: exercises
        • Clustering for metagenomics: Metagenomic species, mOTUs, subspecies discovery…
        • Machine learning for the exploration of community/environmental links:
        • Example based on Sunagawa et al., 2015: http://doi.org/10.1126/science.1261359
        • Different forms of ordination analysis
        • Feature normalization for clustering
        • Discussion of batch effects and techniques to minimize their impact on the final analysis
        • Computer vision techniques for studying micro-eukaryotic communities

        Friday 24 November : Deep learning in single-cell analysis

        Dr María Rodríguez-Martínez, IBM Research Lab Zurich

        Morning: lectures
        • Introduction to deep learning
          • Why and how deep
          • Activations functions
          • Cost functions
          • Backpropagation
          • Regularization
          • Optimization
        • Multi-Layer Perceptron (MLP)
        • Auto-enconders (AE)
        • Convolutional Neural Networks (CNN)
        • Recurrent Neural Networks (RNN)
        Afternoon: exercises
        • Word Embeddings for molecular interaction inference (INtERAcT)
        • Deep SWATH-MS, deep and unsupervised MS processing (DeepSWATH)
        • Characterizing cell populations on single-cell data
        • Participants' presentations

          Posters

          Bulak Arpat
          Analysis of Translational Pausing by Disome Profiling
          Amel Bekkar
          Logical modeling of cardiovascular disease
          Violeta Castelo Szekely
          Sequence determinants of DENR-MCTS1 mediated translation reinitiation
          Chiara Cotroneo
          Computational prediction of clusters of bacterial genes
          Sunniva Foerster
          Pairwise drug combinations against Neisseria gonorrhoeae
          Anamarija Fofonjka
          An elastic instability generates predictable folding of the frilled dragon erectile ruff during development
          Qingyao Huang
          Integrative analysis of cancer genome profiling data to study the interplay of genetic background and molecular mechanisms in cancer
          Lidia Lacruz
          Prevision of facial morphology within the context of forensic DNA phenotyping
          Mose Manni Machine Learning for predictions in infectious diseases outbreaks
          Marco Meola
          Improved classification of short read sequences from dairy products for bacterial species identification using the manually curated reference database DAIRYdb
          Gautam Munglani
          Image feature recognition and quantification of tip-growing cells
          Rocío Rama Ballesteros Pattern recognition of relevant features involved in coevolution
          Stephan Schmeing ReSequenceR: Simulating more realistic high-throughput sequencing data
          Marthe Solleder
          Analysis and prediction of phosphopeptide-HLA interaction
          Daniel Spies
          De-convolution of epigenetic regulators of RNAi mutant mESC in MES and 2i media
          Christoph Stritt Population genomics of transposable elements in a Mediterranean grass species


          Lightning presentations

          Each presentation should be 5 minutes + 2-3 minutes question.

          Nicolas Blöchliger
          Quantifying the uncertainty of antimicrobial susceptibility testing
          Janko Tackmann
          Inference of  Microbial Interaction Networks from Massive Data Sets through Causal Knowledge Discovery
          Monica Ticlla Ccenhua
          Identifying metabolic pathways in Mycobacterium tuberculosis relevant for transmission
          Mariamawit Ashenafi
          3D Functional Organization of Plant Nucleus
          Marti Bernardo Faura
          A systems biologist´s expedition: from systems biomedicine to plant biotechnology
          Adhideb Ghosh Prediction of gain-of-function and loss-of-function variants using in silico bioinformatics tools
          David Dreher
          Towards shapes as landmarks in development
          Athos Fiori
          Gene expression dynamic across cell cycle
          Simon Friedensohn
          Mining immune repertoires for functional antibodies
          POSTER SESSION

          Annika Gable
          Using biological networks to identify new drug targets for rare genetic diseases
          Tilman Flock
          Beyond structure-guided drug design
          Alicia Kaestli
          A software for automated arrhythmia detection in iPSC-derived cardiomyocytes
          Mattia Tomasoni
          GWAS on features extracted from the retinal images
          Lisa Lamberti
          Efficient methods for detecting genetic interactions
          Hyunjin Shim
          Feature learning of virus genome evolution with the nucleotide skip-gram neural network
          Marie Zufferey
          Comparison of TAD calling methods
          Garif Yalak
          Exoenzymes: Enzymes of the extracellular matrix


          • Prerequisites and software installation

            Knowledge / skills:
            • Active participation.
            • Ready for networking with peers and teachers.
            • Good programming skills in Python and R.
            • Basic statistical knowledge.
            • Basic of terminal (shell) usage
            Material:
            Here is the list with links to the installation page

            R 3.4.2 :
            and following packages :
            gplots
            e1071
            class
            ROCR
            ggplot2
            randomForest
            caret
            nnet

            Rstudio Desktop (open source license) :

            Python :
            and following packages :
            -Scikit-learn
            -matplotlib
            -seaborn
            -numpy
            -scipy

            Jupyter notebook :

            Docker :

            Weka 3.8 :