In this episode of Learning from Machine Learning, we explore the intersection of pure mathematics and modern data science with Leland McInnes, the mind behind an ecosystem of tools for unsupervised learning including UMAP, HDBSCAN, PyNN Descent and DataMapPlot. As a researcher at the Tutte Institute for Mathematics and Computing, McInnes has fundamentally shaped how we approach and understand complex data.
Leland views data through a unique geometric lens, drawing from his background in algebraic topology to uncover hidden patterns and relationships within complex datasets. This perspective led to the creation of UMAP, a breakthrough in dimensionality reduction that preserves both local and global data structure to allow for incredible visualizations and clustering. Similarly, his clustering algorithm HDBSCAN tackles the messy reality of real-world data, handling varying densities and noise with remarkable effectiveness.
But perhaps what's most striking about Leland isn't just his technical achievements – it's his philosophy toward algorithm development. He champions the concept of "decomposing black box algorithms," advocating for transparency and understanding over blind implementation. By breaking down complex algorithms into their fundamental components, Leland argues, we gain the power to adapt and innovate rather than simply consume.
For those entering the field, Leland offers poignant advice: resist the urge to chase the hype. Instead, find your unique angle, even if it seems unconventional. His own journey – applying concepts from algebraic topology and fuzzy simplicial sets to data science – demonstrates how breakthrough innovations often emerge from unexpected connections.
Throughout our conversation, Leland’s passion for knowledge and commitment to understanding shine through. His approach reminds us that the most powerful advances in data science often come not from following the crowd, but from diving deep into fundamentals and drawing connections across disciplines.
There's immense value in understanding the tools you use, questioning established approaches, and bringing your unique perspective to the field. As Leland shows us, sometimes the most significant breakthroughs come from seeing familiar problems through a new lens.
Until next time... keep on learning.
Learning from Machine Learning is part of an ongoing series exploring the minds shaping modern machine learning. If you enjoyed this conversation, consider sharing it with a colleague or friend who might find it valuable.
Resources for Leland McInnes' Work and Libraries
UMAP
HDBSCAN
PyNN Descent
DataMapPlot
References from this Episode
Maarten Grootendorst - BERTopic
Vincent Warmerdam - Calmcode
Emily Riehl - Category Theory in Context
David Spivak - Fuzzy Simplicial Sets
Improving Mapper’s Robustness by Varying Resolution According to Lens-Space Density
Resources to learn more about Learning from Machine Learning
Chapters
00:00 Understanding Unstructured Data Challenges
03:02 The Journey into Mathematics and Machine Learning
05:57 Exploring Unsupervised Learning
09:10 The Role of Libraries in Data Processing
11:53 Advancements in Clustering Algorithms
14:46 The Breakthrough of UMAP
18:06 Evaluating Dimensionality Reduction Techniques
20:51 Unexpected Applications of UMAP
24:04 The Importance of Visualization in Data Science
28:16 Iterative Processes in Data Analysis
35:12 Decomposing Algorithms for Better Understanding
38:20 The Hype vs. Reality of AI and Machine Learning
43:13 Unanswered Questions in Machine Learning
45:35 Advice for Aspiring Data Scientists
50:13 Lessons from a Career in Research
Share this post