0:00
/
0:00
Transcript

Leland McInnes: UMAP, HDBSCAN & the Geometry of Data | Learning from Machine Learning #10

Decomposing Black Boxes, Decoding Data's Hidden Geometry

In this episode of Learning from Machine Learning, we explore the intersection of pure mathematics and modern data science with Leland McInnes, the mind behind an ecosystem of tools for unsupervised learning including UMAP, HDBSCAN, PyNN Descent and DataMapPlot. As a researcher at the Tutte Institute for Mathematics and Computing, McInnes has fundamentally shaped how we approach and understand complex data.

Leland views data through a unique geometric lens, drawing from his background in algebraic topology to uncover hidden patterns and relationships within complex datasets. This perspective led to the creation of UMAP, a breakthrough in dimensionality reduction that preserves both local and global data structure to allow for incredible visualizations and clustering. Similarly, his clustering algorithm HDBSCAN tackles the messy reality of real-world data, handling varying densities and noise with remarkable effectiveness.

But perhaps what's most striking about Leland isn't just his technical achievements – it's his philosophy toward algorithm development. He champions the concept of "decomposing black box algorithms," advocating for transparency and understanding over blind implementation. By breaking down complex algorithms into their fundamental components, Leland argues, we gain the power to adapt and innovate rather than simply consume.

For those entering the field, Leland offers poignant advice: resist the urge to chase the hype. Instead, find your unique angle, even if it seems unconventional. His own journey – applying concepts from algebraic topology and fuzzy simplicial sets to data science – demonstrates how breakthrough innovations often emerge from unexpected connections.

Throughout our conversation, Leland’s passion for knowledge and commitment to understanding shine through. His approach reminds us that the most powerful advances in data science often come not from following the crowd, but from diving deep into fundamentals and drawing connections across disciplines.

There's immense value in understanding the tools you use, questioning established approaches, and bringing your unique perspective to the field. As Leland shows us, sometimes the most significant breakthroughs come from seeing familiar problems through a new lens.

Until next time... keep on learning.


Learning from Machine Learning is part of an ongoing series exploring the minds shaping modern machine learning. If you enjoyed this conversation, consider sharing it with a colleague or friend who might find it valuable.


Resources for Leland McInnes' Work and Libraries

Leland’s Github

UMAP
HDBSCAN
PyNN Descent
DataMapPlot

References from this Episode


Resources to learn more about Learning from Machine Learning

Chapters

00:00 Understanding Unstructured Data Challenges

03:02 The Journey into Mathematics and Machine Learning

05:57 Exploring Unsupervised Learning

09:10 The Role of Libraries in Data Processing

11:53 Advancements in Clustering Algorithms

14:46 The Breakthrough of UMAP

18:06 Evaluating Dimensionality Reduction Techniques

20:51 Unexpected Applications of UMAP

24:04 The Importance of Visualization in Data Science

28:16 Iterative Processes in Data Analysis

35:12 Decomposing Algorithms for Better Understanding

38:20 The Hype vs. Reality of AI and Machine Learning

43:13 Unanswered Questions in Machine Learning

45:35 Advice for Aspiring Data Scientists

50:13 Lessons from a Career in Research

Discussion about this video