Foundational Methods in Data Science 2024

Foundational Methods in Data Science 2024

The African Institute for Mathematical Sciences Research and Innovation Centre in Rwanda (AIMS RIC) is inviting applications for the doctoral training school “Foundational Methods in Data Science” scheduled from March 10, 2024 to April 6, 2024 at AIMS Rwanda in Kigali. Organized around four (4) core courses, several special topics courses and additional training components.

This training schools, funded by Carnegie Cooperation of New York , will be highly interactive as it brings together renowned international and national academics, researchers and graduate students, working on topics relevant to modern data science, including machine learning and artificial intelligence, their mathematical theories and numerical implementations, using methods from statistics, optimization, functional analysis, numerical analysis and linear algebra, with a focus on practicals and hands-on problem solving sessions. The training school will provide a platform for researchers and data scientists to interact in an interdisciplinary and trans-disciplinary environment.

How to apply

Interested candidates should submit their application using this link by 12th February 2024 11:59pm Kigali time.

Please ensure to complete your profile as accurately as possible. Each applicant will automatically be considered for one of our fully-funded excellence scholarships covering travel and accommodation costs. Due to limited capacity, only a few applicants will be selected for in-person participation.

Scientific committee

  1. Franca Hoffmann, California Institute of Technology
  2. Cecil Ouma, AIMS Research and Innovation Centre
  3. Issa Karambal, AIMS Research and Innovation Centre
  4. Emmanuel Masabo, African Centre of Excellence in Data Science, University of Rwanda
  5. Tim Brown, Carnegie Mellon University Africa
  6. Wilfred Ndifon, AIMS Research and Innovation Centre
  7. Bubacarr Bah, Medical Research Council Unit The Gambia at the London School of Hygiene & Tropical Medicine
  8. Peter Diggle, Lancaster University
  9. Philipp Berens, University of Tübingen
  10. Sophie Dabo-Niang, University of Lille
  11. Vicky Kondi, AIMS Research and Innovation Centre

Local Organizing committee

  1. Cecil Ouma [Co-Chair]
  2. Franca Hoffmann, [Co-Chair]
  3. Winnie Nakiyingi
  4. Yves Bonheur Mugiraneza
  5. Alison Karungi
  6. Isambi Sailon Mbalawata
  7. Molly Mutesi

Schedule Overview

Week 1:

  • Machine Learning Essentials

    Machine Learning Essentials will cover the basic concepts and foundations of ML. The course begins with an introductory lecture on statistical learning theory, emphasizing key supervised learning techniques such as Empirical Risk Minimization and Regularization. Then, it delves into the principles of widely-used algorithms in data science, detailing their implementation. Throughout the course, quizzes and hands-on tutorials provide practical experience, enabling students to apply various ML methods, including Linear Models, Support Vector Machines (SVM) and (deep) neural networks. Finally, the course includes a brief overview of Generative AI, focusing on Large Language models enabled by transformers (GPTs). To consolidate their learning, students will undertake two AI projects as homework assignments. Basic knowledge in Python is a prerequisite.
    by
    Habiboulaye Amadou Boubacar
  • Introduction to R
    The course introduces the participants to R software and how to do data analysis and visualisation.
    by
    Mutono Nyamai and Thumbi Mwangi
  • Introduction to Generative Modeling
    Characterizing probability distributions is essential for describing uncertainty and lies at the heart of machine learning, and decision making. A flexible method to approximate complex probability distributions is generative models. These models describe distributions as transformations of simple reference distributions that are easy to sample from, such as a standard Gaussian. Recent years have seen an explosion in generative modeling techniques to create realistic-looking images and to tackle complex scientific problems such as drug discovery and numerical weather prediction. This course will describe the mathematical foundations of generative models. We will introduce three of the most popular approaches in machine learning: normalizing flows, generative adversarial networks, and score-based diffusion models. Students will have an opportunity to implement these models and to observe their advantages and disadvantages using numerical experiments. Lastly, we will provide a brief overview of active research topics with the goal of motivating additional research and applications of this proliferating field.
    by Ricardo Baptista
  • Presentation Skills Training

Week 2:

  • Statistics and Scientific Method
    The course gives a very broadly based introduction to statistics, covering: design of experiments; analysis of data; statistical modelling and inference. It emphasises the role of statistical thinking as an integral part of the scientific method, rather than presenting statistics as a collection of unrelated techniques. The ideas are motivated by discussion of specific examples from the biological and environmental sciences. Lab exercises use the R software environment.
    by
    Peter John Diggle
  • Problem Solving in Data Science

    The UN vision of the Data revolution whereby data science changes the world, is one which we are all active participants. However it is very easy to get lost in the abstract and miss the potential impact we could have on the world by using our skills to directly solve real world problems. This course will expose participants to skills and approaches which data-scientists use to contribute to real world problems, with a focus on highlighting the real-world impact of Data-Science to African development problems. Most importantly it will force participants to start with problems and data and consider what questions different data-science approaches can actually answer. This will attempt to put into perspective different types of knowledge and try to communicate the limits of data science as well as it’s exciting potential.
    by
    Lily Clements and James Musyoka
  • Computational Methods for Random Effect Models

    The content will encompass Laplace approximation, Monte Carlo Maximum Likelihood, and some MCMC algorithms for Bayesian inference. These methods will be practically applied to spatial data, providing an opportunity to explore real-life spatial disease data from Africa.
    by
    Johnson Olatunji
  • Large Language Models
    This course aims to equip students with foundations and specific toolsets for the modelling and use of Large Language Models (LLMs). Massive pre-trained language models, derived from the work in Natural Language Processing (NLP), form the basis of all state-of-the-art systems across a wide range of tasks. These models portray outstanding ability to generate fluent text and perform few-shot learning. Beyond NLP, these models are now being generalised to other use cases including scientific discovery involving small molecule, and climate research using climate foundation models. These models are however hard to understand and give rise to novel challenges including: adaptation, miniaturization, and scalability. In this course, students will learn the fundamentals about the modeling, theory, and systems aspects of large language models, as well as gain hands-on experience working with them.
    by Mulang’ I Onando

Week 3:

Week 4:

  • Data-Driven Optimization with Machine Learning Applications
    The course is designed to teach foundations of modern optimization algorithms for the solution of data-driven problems as well as for training machine learning models.
    Besides the the clarification and interpretation of theoretical results, focus is also given to Python-based implementation and solution of concrete data-driven problems using modern machine learning platforms.
    by
    Abebe Geletu
  • Fundamentals of Reinforcement Learning
    In this course, we introduce essential principles of reinforcement learning and the types of problems this ML approach is well suited to tackle. The emphasis is on exposure
    by Sekou Remy
  • Functional Data Analysis: Exploring Frontiers and Beyond

    In a world increasingly awash with data, the need to extract meaningful insights from data has never been more crucial. Imagine if we could look beyond conventional data points and treat data as dynamic, continuous functions, capturing the nuances and subtleties of ever-changing phenomena. Functional Data Analysis (FDA) (Bosq (2000); Ramsay and Silverman (2005); Ferraty and View (2006); Ramsay, Hooker and Graves (2009); Kokoszha and Reimnerr (2017)), is a captivating field that turns large-scale, high-dimensional datasets efficiently, making it a valuable tool for extracting meaningful insights from the wealth of continuous data available today.
    This course delves into the advanced aspects of FDA, exploring cutting-edge techniques and applications that extend beyond traditional methods.
    Participants will embark on a journey through the frontiers of FDA, uncovering sophisticated methodologies tailored to handle high-dimensional and complex datasets. From un-supervised to supervised learning approaches, this course offers an introduction and an in-depth exploration of the latest advancements in the field of FDA.
    Through a combination of theoretical guaranties and practical exercises, students will gain insights into advanced topics such as functional regression, classification, and dimensionality reduction. Real-world case studies and applications across various domains, including healthcare, finance, and engineering, will illustrate the versatility and efficacy of FDA in solving complex problems.Key Topics Include:
    Introduction to Functional Data Analysis (FDA)
    Functional Regression and Classification
    Dimensionality Reduction Techniques
    Dynamic Modeling and Functional Time Series Analysis
    Advanced Applications and Case Studies
    Future Directions and Emerging Trends in FDAPrerequisites: Basic knowledge of statistics and familiarity with R language would be beneficial but not required.References:Bosq, D. (2000). Linear processes in function spaces: theory and applications, Volume 149. Springer Science & Business MediaFerraty, F. and P. Vieu (2006). Nonparametric functional data analysis. Springer Series in Statistics. Springer.Kokoszka, P. and M. Reimnerr (2017). Introduction to functional data analysis. CRC press.Ramsay, J. O. and B. W. Silverman (2005). Functional Data Analysis (Second ed.). Springer Series in Statistics. Springer.Ramsay, J., Hooker, G., Graves, S. (2009). Functional data analysis with R and MATLAB. Springer Series in Statistics. Springer.
    by Sophie Dabo-Niang

Event Pictures