Autumn School MALVIC: Machine Learning and Vision for Industrial Applications

Autumn School MALVIC: Machine Learning and Vision for Industrial Applications

Machine learning and computer vision have the potential to significantly improve the automation and autonomy of many industrial applications (e.g. offshore, automotive, telecommunication, gaming, and multimedia) by enhancing the operational performance, decreasing cost related to manual operations, increasing benefits, minimizing losses, optimizing productivity and improving safety and security.

The goal of this Autumn School MALVIC is to bring together pioneering international scientists in machine learning and computer vision with both academia and practitioners from the industrial fields on a unique setting for the discussion and demonstration of practical, hands-on machine learning and vision research and development. Offshore industrial applications and industrial process scenarios are examples for the autumn school target.

Recordings of the given lectures

View MALVIC21 on Vimeo

Invited speakers

Thomas Bäck: Industrial Optimization and the Search for New Algorithms

Thomas Back

Direct global optimization algorithms based on some instance of evolutionary computation have shown big successes in a wide range of application domains, for example engineering design optimization. When problem dimensionality is small (n < 20), so-called efficient global optimization (EGO) is also a very suitable class of algorithms, and I will introduce a generalization of the concept of an acquisition function in EGO that automatically handles the exploration – exploitation tradeoff.

In automatic machine learning, the optimization of hyperparameters (also called the algorithm configuration problem) is currently of considerable interest. I will briefly explain this problem and then provide some examples illustrating that this task can be handled by direct global optimization algorithms as well. While it is commonly applied to machine learning algorithms, algorithm configuration for evolution strategies is a new application domain. I will give a simple example how a combinatorial design space of 4.608 configuration variants of evolution strategies can be created and investigated using data mining. This kind of “combinatorial algorithmics” provides an opportunity for discovering the unexplored areas of the optimization algorithm design space. Finally, I provide a quick idea of an extension of EGO for the combined algorithm selection and hyperparameter optimization (CASH) task in machine learning.

To conclude, I return to engineering design optimization tasks, one in wing design and one in ship design. Both are multi-objective, both use a variant of efficient global optimization, and the first focuses on modeling user preferences in objective space while the second learns internal models of the constraints using radial basis functions. Both aim at illustrating today’s requirements in engineering design applications.

Prof. Thomas Bäck, Professor at Leiden University (The Netherlands) and Chief Scientist at NORCE. He is head of the Natural Computing Research Group and Director of Education at the Leiden Institute of Advanced Computer Science (LIACS). He received his PhD in Computer Science from Dortmund University, Germany, in 1994. He has been Associate Professor of Computer Science at Leiden University since 1996 and full Professor for Natural Computing since 2002.

Horst Bischof: Understanding Activities in an Industrial Context

Horst Bischof

This talk will highlight some recent work in the area of understanding actions and human activities. Special emphasis will be devoted to sequence segmentation and recognition of complex (long-term) Activities and domain adaptation. Examples from real world applications will illustrate the presented methods.

Prof. Horst Bischof is vice rector for research and Professor at the Institute for Computer Graphics and Vision at the Graz University of Technology, Austria. He has more than 750 publications with notable works on object recognition, visual learning, on-line and life-long learning, motion and tracking, visual surveillance and biometrics and medical computer vision.

Daniel Cremers: Deep Visual SLAM

Daniel Cremers

Visual Simultaneous Localization and Mapping (SLAM) is of utmost importance to autonomous systems and augmented reality. I will discuss direct methods for visual SLAM (LSD SLAM and DSO) that recover camera motion and 3D structure directly from brightness consistency thereby providing better performance in terms of precision and robustness compared to classical keypoint-based techniques.

Moreover, I will demonstrate how we can leverage the predictive power of self-supervised deep learning in order to significantly boost the performance of direct SLAM methods. The resulting methods D3VO allow us to track a single camera with a precision that is on par with state-of-the-art stereo-inertial odometry methods.

Lastly, I will introduce MonoRec - a deep network that can generate faithful dense reconstruction of the observed world from a single moving camera.

Prof. Daniel Cremers is Professor of Informatics and Mathematics at TU Munich and Germany. He is one of the leading experts in computer vision, machine learning & deep networks with focus on mathematical image analysis (segmentation, motion estimation, multiview reconstruction, visual SLAM). In December 2010 he was listed among "Germany's top 40 researchers below 40" (Capital). On March 1st 2016, Prof. Cremers received the Gottfried Wilhelm Leibniz Award, the biggest award in German academia.

Marius Leordeanu: Towards Unsupervised Learning in Space and Time: From Metric Depth Estimation and Semantic Segmentation to Complete 4D Scene Understanding

Marius Leordeanu

We address the exciting and very challenging problem of unsupervised visual learning in space and time, which has a tremendous impact in the current world of artificial intelligence and its applications to robotics and computer science. We will start by presenting our recent work on unsupervised monocular metric depth estimation and semi-supervised semantic segmentation, in the context of Unmanned Aerial Vehicles(UAVs). We will introduce efficient algorithms and novel datasets from UAV flights, captured in varied and complex European scenes. We will also present extensive comparisons to the state-of-the-art. Then we will move towards the more complex case of unsupervised multi-task learning of the visual 4D scene from as many interpretation views as possible.

At the core of our approach we have a self-supervised learning model based on automatically reaching consensus in a graph of neural networks. Each node in the graph is a scene interpretation layer, while each edge is a deep net that transforms one layer at one node into another from a different node. The edge networks are trained unsupervised on the pseudo-ground truth provided by consensus among multiple paths that reach the nets' start and end nodes. These paths act as ensemble teachers for any given edge and strong consensus is used for high-confidence supervisory signals. The unsupervised learning process is repeated over several generations, in which each edge becomes a "student" and also part of different ensemble "teachers" for training other students. By optimizing such consensus between different paths, the graph reaches consistency and robustness over multiple interpretations and generations, in the face of unknown labels. We will present several efficient strategies for constructing the unsupervised pseudo-labels from multiple graph paths and show that the multi-task graph structure as well as the consensus-finding procedure are essential factors.

Throughout the presentation we will balance the intuitions and theoretical insights with extensive tests in the real world of robotics and vision, such as visual learning for drones. We will show that the idea of pushing self-supervised learning towards the case of multiple-tasks, has the potential to improve robustness and move the field forward, towards the general case of unsupervised multi-interpretation learning in space and time.Prof. Marius Leordeanu,Professor, Politehnica University of Bucharest. He is also a Senior Scientist of the Romanian Academy (IMAR). He holds a PhD from the Robotics Institute of CMU and Bachelor’s in Computer Science and Mathematics in 2003, from Hunter College of the City University of New York.

Marius Leordeanu is Professor at the University Politehnica of Bucharest (UPB) and Senior Researcher at the Institute of Mathematics of the Romanian Academy (IMAR). Marius obtained his Bachelor's in Mathematics and Computer Science at Hunter College - City University of New York (2003) and PhD in Robotics at Carnegie Mellon University (2009). His research spans different areas in vision and learning, with main focus on unsupervised learning in the space-time domain, vision for drones and aerial scene understanding, optimization on graphs and neural nets, as well as relating vision to natural language. He coordinates several research groups, both in academia and industry, having strong collaborations on topics that range from general computer vision (e.g. Google, Bitdefender, NORCE) to specific applications for autonomous vehicles (e.g. ARNIA, Google, NORCE), the wood industry (Fordaq) and medical imaging (Siemens). For his work on graph matching and unsupervised learning he received the "Grigore Moisil Prize" in Mathematics (2014), the top award given by the Romanian Academy. In 2020 Marius published a book, Unsupervised Learning in Space and Time (Springer), which proposes a general unsupervised learning model that brings together the powers of graphs and deep neural networks.

Jürgen Schmidhuber: Modern Artificial Intelligence - 1980s-2021 and Beyond

Jurgen Schmidhuber

Significant historic events appear to be occurring more frequently as time goes on. Interestingly, it seems like subsequent intervals between these events are shrinking exponentially by a factor of four. This process looks like it should converge around the year 2040.

The last of these major events can be said to have occurred around 1990 when the cold war ended, the WWW was born, mobile phones became mainstream, the first self-driving cars appeared, and modern AI with very deep neural networks came into being. In this talk, I'll focus on the latter, with emphasis on Metalearning since 1987 and what I call "the miraculous year of deep learning" which saw the birth of—among other things—(1) very deep learning through unsupervised pre-training, (2) the vanishing gradient analysis that led to the LSTMs running on your smartphones and to the really deep Highway Nets/ResNets, (3) neural fast weight programmers that are formally equivalent to what’s now called linear Transformers, (4) artificial curiosity for agents that invent their own problems (familiar to many nowadays in the form of GANs), (5) the learning of sequential neural attention, (6) the distilling of teacher nets into student nets, and (7) reinforcement learning and planning with recurrent world models. I’ll discuss how in the 2000s much of this has begun to impact billions of human lives, how the timeline predicts the next big event to be around 2030, what the final decade until convergence might hold, and what will happen in the subsequent 40 billion years. Take all of this with a grain of salt though.

Prof. Jürgen Schmidhuber, Scientific Director of IDSIA, Switzerland, is a computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artificial Intelligence Research in Manno, in the district of Lugano, in Ticino in southern Switzerland.

Guy Theraulaz: The Collective Intelligence of Superorganisms

Guy Theraulaz

Several animal species living in groups or societies are able to display spectacular collective behaviors. This is the case of starlings flocks, which, at dusk, gather tens of thousands of individuals and engage in astonishing aerial choreographies. At another scale, social insects (ants, termites, certain species of wasps and bees) have developed amazing abilities to coordinate their activities. For the past thirty years, scientists have sought to unravel the mysteries of this collective intelligence. Such ability emerges from the interactions between individuals that allow these groups of animals to self-organize. Today, thanks to these studies, the analysis and modeling of these interactions, we have a better understanding the mechanisms that allow these social organisms to coordinate their movements, build complex nest architectures and collectively solve multiple problems

Guy Theraulaz is a senior research fellow at the National Center for Scientific Research CNRS) and an expert in the study of collective animal behaviors. He is also a leading researcher in the field of swarm intelligence, primarily studying social insects but also distributed algorithms, e.g. for collective robotics, directly inspired by nature.

His research focuses on the understanding of a broad spectrum of collective behaviors in animal societies by quantifying and then modeling the individual level behaviors and interactions, thereby elucidating the mechanisms generating the emergent, group-level properties.

He was one of the main characters of the development of quantitative social ethology and collective intelligence in France. He published many papers on nest construction in ant and wasp colonies, collective decision in ants and cockroaches, and collective motion in fish schools and pedestrian crowds. He has also coauthored five books, among which Swarm Intelligence: From Natural to Artificial Systems (Oxford University Press, 1999) and Self-organization in biological systems (Princeton University Press, 2001) that are now considered as reference textbooks.

René Vidal: Mathematics of Deep Learning

Rene Vidal

The past few years have seen a dramatic increase in the performance of recognition systems thanks to the introduction of deep networks for representation learning. However, the mathematical reasons for this success remain elusive. For example, a key issue is that the neural network training problem is non- convex, hence optimization algorithms may not return a global minima. In addition, the regularization properties of algorithms such as dropout remain poorly understood. The first part of this tutorial will overview recent work on the theory of deep learning that aims to understand how to design the network architecture, how to regularize the network weights, and how to guarantee global optimality. The second part of this tutorial will present an analysis of the optimization and regularization properties of dropout for matrix factorization. Examples from neuroscience and computer vision will also be presented.

Prof. René Vidal, Professor at JHU, USA, and Chief scientist at NORCE. He is the Herschel Seder Professor of Biomedical Engineering and the Inaugural Director of the Mathematical Institute for Data Science at The Johns Hopkins University. He has secondary appointments in Computer Science, Electrical and Computer Engineering, and Mechanical Engineering.

Danny Weyns: From Self-adaptation to Lifelong Computing

potrait photo of Danny Weyns

With the progressing digitalisation of our society, the demands on computing systems increase at incredible speed. This goes to the level that current human-based engineering of computing systems is coming close to the point where it will simply be no longer manageable. The first part of this tutorial will elaborate on the concept of self-adaptation that was initially proposed by IBM (in their program on autonomic computing) as “the only viable solution” to deal with the manageability problems of complex computing systems that face continuous change. We explain the basic principles of self-adaptation, elaborate on engineering approaches for its realisation, and illustrate these with examples. The second part of the tutorial will then argue why self-adaptation falls short to tackle the hard problems of future computing systems. We will make a case for "lifelong computing” - a newly proposed paradigm for the design and operation of computing systems. A lifelong computing system dynamically evolves its own architecture where the design choices are ultimately enacted by the system itself. This yields self-learning systems that autonomously handle changing conditions, both foreseen and unforeseen. The tutorial concludes with highlighting key challenges that we need to overcome to realize the lifelong computing paradigm.

Danny Weyns is professor at the Department of Computer Science at the Katholieke Universiteit Leuven Belgium. He is also affiliated with Linnaeus University in Sweden. His main research interests are in software engineering of self-adaptive systems, with a particular focus on the use of design models and verification techniques at runtime to assurance the goals of computing systems. He received a PhD from for the KU Leuven in 2006. Recently he authored the book “An Introduction to Self-adaptive Systems: A Contemporary Software Engineering Perspective” that was published by Wiley.

Xin Yao: Ensemble Approaches to Class Imbalance Learning


Many real world classification problems have highly imbalanced and skew data distributions. In fault diagnosis and condition monitoring for example, there are ample data for the normal class, yet data for faults are always very limited and costly to obtain. It is often a challenge to increase the performance of a classifier on minority classes without sacrificing the performance on majority classes. This talk discusses some of the techniques and algorithms that have been developed for class imbalance learning, especially through ensemble learning.

First, the motivations behind ensemble learning are introduced and the importance of diversity highlighted.

Second, some of the challenges of multi-class imbalance learning and potential solutions are presented. What might have worked well for the binary case do not work for multiple classes anymore, especially when the number of classes increases.

Third, online class imbalance learning will be discussed, which can be seen as a combination of online learning and class imbalance learning. Online class imbalance learning poses new research challenges that still have not been well understood., let alone solved, especially for imbalanced data streams with concept drift.

Fourth, the natural fit of multi-objective learning to class imbalance learning is pointed out. The relationship between multi-objective learning and ensemble learning will be discussed. Finally, future research directions will be given.

Xin Yao is a Chair Professor of Computer Science at the Southern University of Science and Technology, Shenzhen, China, and a part-time professor at the University of Birmingham, UK. His major research interests include evolutionary computation, ensemble learning and search-based software engineering. He is an IEEE fellow, a former (2014-15) President of IEEE Computational Intelligence Society (CIS) and a former (20003-08) Editor-in-Chief of IEEE Transactions on Evolutionary Computation. His work won the 2001 IEEE Donald G. Fink Prize Paper Award, 2010, 2016 and 2017 IEEE Transactions on Evolutionary Computation Outstanding Paper Awards, 2010 BT Gordon Radley Award for Best Author of Innovation (Finalist), 2011 IEEE Transactions on Neural Networks Outstanding Paper Award, and many other best paper awards. He received a Royal Society Wolfson Research Merit Award in 2012, the IEEE CIS Evolutionary Computation Pioneer Award in 2013, and the 2020 IEEE Frank Rosenblatt Award.

Practical talks

Gal Chechik: Reasoning About Perception

potrait photo of Gal Chechik

AI aims to build systems that interact with their environment, with people and with other agents in the real world. This vision poses hard algorithmic challenges: from generalizing effectively from little or no samples, through adapting to new domains to communicating in ways that are natural to people. I will discuss our recent research thrusts for facing these challenges. First, an approach to model the high-level structure of visual scene. Second, leveraging compositional structures in attribute space to learn from descriptions without any visual samples. Finally, an approach where agents learn new concepts without labels, by using elimination to reason about their environment. Joint work with colleagues at NVIDIA and Bar-Ilan University.

Gal Chechik, is a director of AI research at NVIDIA and a Professor at Bar-Ilan University, Israel. His research spans algorithms, theory and applications of deep learning, with a focus on strong generalization: few-shot and zero-shot learning, and adaptation to novel domains for example in personalized federated learning. A particular interest is in perception, action, and reasoning (PAR) and their intersection for the purpose of smarter generalization.

Gal earned his PhD in 2004 from the Hebrew University developing machine learning methods to study neural coding. He then worked at Stanford CS department with D. Koller, studying computational principles regulating molecular pathways. In 2007 he joined Google research as a senior research scientist, developing large scale machine learning algorithm for machine perception. Since 2009, he heads the learning systems lab at the Gonda center of Bar Ilan university, and was appointed an full professor in 2019. Gal is the author of ~100 refereed publications and ~40 patents, including publications in Nature Biotechnology, Cell and PNAS. His work won best-paper awards in NeurIPS and ICML.

Mathias Grundmann: On-device ML solutions for Mobile and Web


In this talk, I will present several on-device Machine Learning (ML) solutions for mobile and web that are powering a wide range of impactful Google Products. On-device ML has major benefits enabling low-latency, offline and privacy-preserving approaches. However, to ship these solutions in production, we need to overcome substantial technical challenges to deliver on-device ML in real-time and with low-latency. Once solved, our solutions power applications like background replacement and light adjustment in Google Meet, AR effects in YouTube and Duo, gesture controls of devices and view-finder tracking for Google Lens and Translate.

In this talk, I will cover some of the core-recipes behind Google’s on-device ML solutions, from model design over enabling ML solutions infrastructure (MediaPipe) to on-device ML inference acceleration. In particular we will be covering video segmentation, face meshes and iris tracking, hand tracking for gesture control and body tracking to power 3D avatars. The covered solutions are also available to the research and developer community via MediaPipe, —an open source cross platform framework for building customizable ML pipelines for mobile, web, desktop and python.

Matthias Grundmann is a Director of Research at Google leading a team of ~40 Applied ML and Software Engineers with focus on on-device Machine Learning solutions. His team develops high-quality, cross-platform ML solutions (MediaPipe) powered by cutting-edge, accelerated ML inference for mobile and web.

His team productionized on-device solutions ranging from video segmentation for Google Meet and YouTube, over 2D object and calibration-free 6 DOF camera tracking, to computational video solutions powering Light Adjustment in Google Meet, Motion Photos on Pixel and Live Photo stabilization in Google Photos.

Among the wide portfolio of Applied ML solutions his team develops are holistic methods for hand, body 3D object, and high-fidelity facial geometry tracking. His team has advanced on-device ML technology across Google delivering sparsity powering Google Meet, quantization and on-device CPU and GPU inference.

Matthias received his Ph.D. from the Georgia Institute of Technology in 2013 for his work on Computational Video with focus on Video Stabilization and Rolling Shutter removal for YouTube. His work on Rolling Shutter removal won the best paper award at ICCP, 2012. He was the recipient of the 2011 Ph.D. Google Fellowship in Computer Vision.

Roger Kvam: UNINETT Sigma 2

potrait photo of Roger Kvam

UNINETT Sigma2 has the responsibility for and operates the national e-infrastructure for large-scale data- and computational science in Norway. We offer services in High-Performance Computing (HPC) and storage of scientific data. Through the e-infrastructure, Norwegian researchers and research institutions gain access to some of the world's most powerful computers. All research areas with a need for high-capacity computations and large scale data storage can apply for resources on the e-infrastructure.

Today, Sigma2 has users that span from climate and marine research to language, energy and health. For instance, The Norwegian Institute of Public Health (NIPH), uses Sigma2's services for calculating virus spread and anticipated vaccine effect in connection with covid-19.

Sigma2 also provides services to industry through the competence centre collaboration with NORCE and SINTEF, and coordinates Norway’s participation in international collaborations.

Sigma2 activities are jointly financed by the Research Council of Norway (RCN) and the Sigma2 consortium partners, which are the universities in Oslo, Bergen, Trondheim and Tromsø. This collaboration goes by the name NRIS – Norwegian Research Infrastructure Services. The business is run non-profit. Sigma2 is a subsidiary of Uninett AS with its head office in Trondheim

Roger Kvam, UNINETT Sigma 2. Senior project manager, leader National competence center for HPC for industry. Experienced from international IT management in Asia, Europe and US. HPC architecture and operation for semiconductor engineering. IT management and HPC for oil & gas exploration.

Stefano Soatto: Learning Representations

potrait photo of Stefano Soatto

Representations are functions of the data that are "useful" for a task. Of all functions, one wishes to design or learn those that contain all the "information" in the data, and none of the variability that is irrelevant to the task. Depending on how one defines and measures "useful" and "information", different notions of representations can be instantiated. What are the relationships among those? Are there common principles behind the different tools and models? Is there a common notion of "optimality" that emerges from all formalisms? If so, are such optimal representations computable? If not, can they be approximated? If such representations are learned using "past data" (training set), can we predict how well they will perform on "future data" (test set)?

These questions have nothing to do with Deep Learning, but understanding them sets the stage for the second part of the lecture. In Deep Learning, we are given a training set, and we minimize a loss function that, at least at face value, knows nothing about "future data". Just like the activations of a network in response to a test datum can be understood as a representation of future data, the parameters (weights) of a network can be understood as a representation of the past training set. What properties should the weights exhibit that can be optimized during training, which ensure that desirable properties of the activations emerge? Is there something special in deep neural networks that addresses this issue of generalization? Do these properties translate to a variational principle? Does this principle have anything to do with optimality of representations? Can they be imbued into the optimization we use to train deep networks?

In this lecture we will derive a theory of representation that is the first to address these questions for deep learning. The question the theory answers is: "What are the functions of given (past) data one can compute, so that the resulting representation of future data is best for the task at hand?" What it does not address is what happens when the task is not completely specified beforehand. Furthermore, we will dive deep into how such representations can be computed in practice, and what to do when the task is not specified at the outset.

Prof. Stefano Soatto, Professor, UCLA, Director of Applied Science, Amazon AI. He is a professor of computer science with notable works in Computer Vision and Nonlinear Estimation and Control Theory (vision, sound, touch) to interact with humans and the environment.

Fridtjof Stein: Looking Far Ahead … Perception Challenges in the Field of Autonomous Trucking

potrait photo of Fridtjof Stein

Trucks are special in several aspects. Therefore it is only partially possible to transfer an existing sensor set of a robo taxi to an autonomous truck. In this talk I will focus on the specific sensor challenges in the context of the different modalities. I will elaborate on hardware- and software topics.

Dr. Fridtjof Stein is a senior scientist at Daimler truck within the field of perception. He works for about three decades at Daimler in the field of autonomous driving in public traffic including real-time vision especially in the fields of stereo vision, optical flow, object detection, and ground modeling in the automotive domain.

Business talks: Success Stories – How did AI Shape Your Business?

Introduction by chair Anne Grete Ellingsen, project manager national European Digital innovation Hub candidate. Short presentation of the EUs program on investment in digital infrastructure and the benefit for SMEs and start ups

Crayon presented by Geir Gulliksen

Intelecy presented by Espen Davidsen

Aquabyte presented by Trude Jansen Hagland

Rocketfarm presented by Hallvard Haugen

Idean presented by Lars Petter Aase

Workshop Organizers

The scientific committee

  • Nabil Belbachir, NORCE , Norway
  • René Vidal, Johns Hopkins University, USA
  • Thomas Bäck, Leiden University and NORCE
  • Marius Leordeanu, Politehnica University of Bucharest

The local team

  • Tonje Holand Salgado (Eyde Cluster)
  • Christian Von der Ohe (GCE NODE)
  • Anne Grete Ellingsen, (DIH Oceanopolis)
  • Magdalena Entner (NORCE)
  • Rune Rolvsjord (NORCE)
  • Inger-Lise Bergman (NORCE)

Supported by:


Nabil Belbachir

Research Director Smart Instrumentation and Industrial Testing - Grimstad

+47 401 08 137


From: Monday 18. october 2021
at 09.00

To: Friday 22. october 2021
at 17.00


Partly online and partly a physical event in Kristiansand, Norway


Register by September 17th for early bird rates.

Full week attendance:
5325 NOK (early bird)
7485 NOK (regular registration)

One day attendance:
1900 NOK (early bird)
2790 NOK (regular registration)

3675 NOK (early bird)
5200 NOK (regular registration)

All rates including VAT (25%).

Research Groups

Smart Instrumentation and Industrial Testing