Show Notes
About This Episode
What does it actually mean for a machine to see? Not to detect, or to classify, or to return a confidence score, but to genuinely understand what's in front of a camera the way you understand what's in front of your face. That question turns out to be harder than it sounds, and the story of how researchers have been wrestling with it for decades contains something that goes far beyond computer science.
In this episode, Mikkel Svold sits down with Andreas Møgelmose, Associate Professor of AI at Aalborg University's Visual Analysis & Perception Lab, to explore the mechanics of computer vision from the ground up. The conversation moves from the basics of neural networks to the 2012 breakthrough that connected vision research with language, sound, and ultimately the large language models that now power tools millions of people use every day.
But the episode's most striking moment might be a small anecdote about skin cancer and a ruler. It reveals something important not just about AI, but about how any system, human or machine, can learn to measure the wrong thing. It's a story about correlation, causation, and what happens when a model gets very good at finding a pattern that isn't the pattern you needed it to find. That problem doesn't stay inside AI research. It shows up wherever we use data to make decisions.
Andreas brings the kind of clarity that comes from working at the intersection of theory and application. He explains why the 2012 breakthrough in vision research became the foundation of a much broader AI revolution, why "neural" networks don't actually work like brains, and what the three most pressing open problems in the field look like right now.
In This Episode
- Why vision may be the most important sense a computer can gain, and what it unlocks
- How AI's subfields spent decades apart, and why a 2012 vision paper reunited them
- The difference between training, fine-tuning, and simply giving a model context
- Why neural networks have almost nothing to do with how the brain actually works
- How convolutional networks build understanding in layers, from edges to faces to objects
- The skin cancer case where an AI learned to detect rulers, not tumours
- Why the most accurate models are often the least explainable, and what regulators are doing about it
- The three open problems that define the current frontier of computer vision
- Why ChatGPT was a UI revolution more than a research breakthrough
Chapters
- 00:01 Introduction and guest background
- 01:11 What computer vision is and why it matters
- 01:16 How AI's subfields converged since 2012
- 03:08 Neural networks from the 1950s to 2012
- 05:31 Learning from data, backpropagation explained
- 10:39 Training, fine-tuning, and context windows
- 14:16 Convolutional networks and how vision builds in layers
- 22:27 When AI learns the wrong thing, the skin cancer case
- 24:25 The explainability trade-off and the EU AI Act
- 31:53 The three open problems in computer vision
Key Quotes
"Vision is probably our most important sense. If computers can see, they can do things that would otherwise be impossible."
"The model learned to look for rulers, not cancer. On real-world data, it failed."
"We have fully explainable models, but they're far weaker than top neural networks. There's a trade-off."
"ChatGPT was a UI revolution. Putting powerful models into a familiar chat interface broadened access."
About Andreas Møgelmose
Andreas Møgelmose is Associate Professor of AI at Aalborg University, where he works in the Visual Analysis & Perception Lab. His research focuses on computer vision, machine learning, and the practical challenges of deploying vision systems in real-world environments. He brings both theoretical rigour and applied experience to questions about how machines learn to understand visual information, and what the consequences are when they get it wrong.
Contact & Follow
Questions, topic ideas, or guest suggestions: podcast@bigideasonly.com
Find more episodes at [podcast website]