On this episode of Learning from Machine Learning, Lukas Biewald, co-founder and CEO of Weights & Biases, traces his journey from programming games as a kid to being coached to take AI out of the pitch in the mid-2000s to building one of the most essential tools in AI development today.
Takeaways
Timing is counterintuitive: What feels "late" to insiders is often still early to the market
Stay hands on as you climb: Leaders must continue to be technical to be capable and effective
Focus on power users: AI developers are a smaller market but wield disproportionate organizational influence by “automating the automation”
Optimize for feedback loops: Rapid iteration beats perfect planning in uncertain environments
AI is underhyped: If you zoom out, the recursive potential of machines improving machines is barely understood
Summary
You think you're late, but you're early in AI. Lukas emphasizes that despite common feelings of being "late" to the field of AI, especially with the rapid advancements of large language models (LLMs) and robotics, the technology is still "so underhyped" and its potential is immense. He always felt "late to stuff" throughout his career but retrospectively realized he was "early to everything". The technology works "really well" and affects "so many industries," and any current disappointments about it not solving "every single one of their business problems" should be viewed with patience.
Executives and leaders must stay technical and prioritize execution. Lukas strongly believes that executives should remain technical and be capable of doing "the IC (individual contributor) job". He has observed that executives who are good managers but "can't do the individual work" tend to fail. He realized that carving out "huge blocks of time to stay technical" leads to many ideas. This mindset ensures that leaders understand the practicalities and challenges of the work, allowing for more effective guidance and problem-solving.
Feedback loops and robust evaluations are crucial for progress in ML, in business and in life. From his experience in machine learning, Lukas has learned that the feedback loops are “your unit of work". He is "obsessed with getting feedback quickly" to improve whatever he is doing, which stems from the need to rapidly determine if an ML experiment is working.
This principle extends to business; he notes that asking customers for feedback, though sometimes "annoying" or "stressful," is essential for success, as it helps deeply understand their "pain and the problem and the need".
In ML, this translates to knowing "when you should stop" an experiment or what to try next, often requiring decisions with "imperfect information," much like life's complex situations.
For LLMs, while "testing by vibes" (looking at individual examples) is valuable for preventing "horrible deployments," achieving "incremental steady progress" and nudging accuracy higher necessitates "clear metrics" and robust evaluation systems.
Why it matters: Whether you’re writing code, pitching VCs, or leading a team, Lukas’ journey reminds us to keep our hands dirty, ship before we’re ‘ready,’ and listen obsessively to the people actually using what we build.
Career Journey and Background
Lukas Biewald's path through AI spans from the mid-2000s "AI winter" to today's boom. His first company, CrowdFlower (later Figure 8), tackled data labeling when investors actively avoided AI-related pitches.
"I was basically building an AI labeling company, but I was coached to take AI out of the pitch because it was such an AI winter that investors did not want to invest in anything related to AI."
The pivotal moment came in 2016 when AlphaGo defeated Lee Sedol. As a serious Go player, Lukas understood the profound implications:
"I was super skeptical that AlphaGo was going to beat Lee Sedol. And then when it did, I was just like, oh my God. That's kind of what made me realize, okay, I need to get more technical again."
This led him to take an "unpaid internship at OpenAI" to re-immerse himself in cutting-edge AI research, despite running a successful company in his thirties.
Weights & Biases Mission and Challenges
The Developer Tools Strategy
Founded in 2018, W&B bet on building tools for AI developers rather than end-user applications. The core insight:
"These AI developers, even smaller market, it's like a subset of developers, but they're even more powerful within their organizations because they're automating the automation."
Evolution with LLMs
The shift from custom model building to LLM integration workflows required product evolution. W&B introduced Weave for LLM workflows while maintaining their core experiment tracking product (now called "models").
"…what we actually saw was this democratization of AI that everybody had talked about, where anyone can really harness the power of AI. And then they get all the kind of drawbacks of AI too."
The Evaluation Challenge
Unlike traditional ML where practitioners expected evaluations, LLM users often need convincing:
"…at the early days of Weights and Biases… no one would be like, I shouldn't do evaluations… [more] like, how do we do evaluations? Now, I think with LLMs, you kind of actually have to convince people to do evaluations."
Key Insights and Highlights
Technical Leadership Philosophy
He explains, as a technical leader, "… you better be able to do the IC [individual contributor] job. And, I do not know how companies function without that mindset."
"… you better be able to do the IC [individual contributor] job. And, I do not know how companies function without that mindset."
He regularly carves out substantial time for coding, inspired by the New Relic CEO who spent one week per month writing code despite running a public company.
Machine Learning as Business Philosophy
The principles of ML training apply to business decisions:
"The feedback loops are kind of like your unit of work… the faster you're getting feedback and the more feedback that you're getting, I'm sort of obsessed with getting feedback."
Customer Feedback Reality
Even successful entrepreneurs struggle with the basics:
"There's no entrepreneur that does it [asking for customer feedback] enough... it's pretty annoying to do it. It's like a little bit stressful… we just don't do it enough."
The 10x Engineer Reality
"Doesn't everyone believe in the 10x engineer? How could you not? If you've actually worked with engineers... I can't imagine not believing in the 10x engineer."
Broader Perspectives
AI is "Underhyped"
Counter to popular narrative, Lukas argues AI's potential is underestimated:
"Honestly, if you zoom out even a little bit, all this stuff is so underhyped. You can't hype it enough, right?"
The Recursive Future
"If computers can program other computers, then you could solve literally every problem that humans face, right? …when you have that sort of recursive ability, to automatically improve algorithms… that's just the most powerful technology that you could possibly build."
Impact on Developer Productivity
AI coding tools are creating inequality rather than pure democratization:
"It seems like the more dominant effect is it's making the best developers even more productive than not the best developers."
Advice to His Younger Self
The most actionable insight for anyone in emerging technologies:
"You think you're late, but you're early. I just always felt bad... I was always feeling bad that I was late to stuff, but actually I was early to everything."
References
Weights & Biases - https://wandb.ai/
CrowdFlower/Figure 8 (now part of Appen) - https://appen.com/
OpenAI - https://openai.com/
CoreWeave - https://www.coreweave.com/
Scale AI - https://scale.com/
GitHub - https://github.com/
Google - https://google.com/
Netlify - https://www.netlify.com/
Stanford University - https://stanford.edu/
Y Combinator - https://www.ycombinator.com/
Daphne Koller - Stanford Professor, Co-founder of Coursera
Lee Sedol - Professional Go player defeated by AlphaGo
Weave - W&B's LLM workflow tool - https://wandb.ai/site/weave
MLflow - https://mlflow.org/
Cursor - AI-powered code editor - https://cursor.sh/
Claude - Anthropic's AI assistant - https://claude.ai/
Windsurf - https://windsurf.io/
Lovable - AI app builder - https://lovable.dev/
Replit - https://replit.com/
V0 - Vercel's AI interface generator - https://v0.dev/
Gradient Dissent - https://www.youtube.com/@WeightsBiases/podcasts
Fully Connected - W&B's conference - https://wandb.ai/fully-connected
🌐 Weights & Biases - https://wandb.ai/
📧 Weights & Biases Blog - https://wandb.ai/site/blog
Glossary of Key Terms
AI Winter: A period of reduced funding and interest in artificial intelligence research, often following periods of high expectations and hype.
Agentic Systems: AI systems that can chain together different tools and models to perform complex tasks, often involving prompting Large Language Models (LLMs) to use external functions or access information.
AlphaGo: A computer program developed by Google DeepMind that plays the board game Go. Its victory over human Go champion Lee Sedol in 2016 was a significant milestone for AI.
Computer Vision: A field of artificial intelligence that enables computers and systems to derive meaningful information from digital images, videos, and other visual inputs, and to take actions or make recommendations based on that information.
CoreWeave: A company that provides high-performance computing infrastructure, particularly GPUs. Weights and Biases was acquired by CoreWeave in 2025.
Crowdflower (later Figure 8): Lukas Biewald's first company, founded in 2007, which focused on data labeling – a crucial process for training machine learning models.
Deep Learning: A subfield of machine learning that uses artificial neural networks with multiple layers (deep neural networks) to learn from data, capable of discovering complex patterns in large datasets.
Democratization of AI: The concept that AI technologies and their benefits should be accessible to a wider range of people and organizations, not just large tech companies.
Distributed Training: A method of training machine learning models where the computational workload is spread across multiple GPUs or machines, often used for very large models or datasets.
Evaluations: The process of assessing the performance and quality of machine learning models or AI systems using various metrics and test datasets.
Experiment Tracking: The process of systematically recording and managing the various components of machine learning experiments, including code versions, hyperparameters, datasets, and results. Weights and Biases offers this as a core feature.
Feedback Loops: A system where the output or results of a process are fed back as input, allowing for continuous adjustment and improvement. Crucial in both machine learning training and product development.
Fine-tuning: The process of taking a pre-trained machine learning model (often a large model) and further training it on a smaller, specific dataset to adapt it to a new task or domain.
Fully Connected: A conference hosted by Weights and Biases.
Generative AI: A type of artificial intelligence that can produce new content, such as text, images, or audio, rather than just analyzing or classifying existing data. Large Language Models are a prime example.
Gödel, Escher, Bach: A Pulitzer Prize-winning book by Douglas Hofstadter exploring themes of intelligence, consciousness, and mathematics, which inspired Lukas Biewald as a child.
Go: An abstract strategy board game for two players, originating in ancient China. It is known for its strategic depth and was famously conquered by Google's AlphaGo AI.
Gradient Descent: An optimization algorithm used to minimize the cost function of a machine learning model by iteratively adjusting parameters (weights and biases) in the direction of the steepest descent of the function.
Gradient Dissent: Podcast by Lukas Biewald.
GPUs (Graphics Processing Units): Specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer for output to a display device. They are crucial for training deep learning models.
Hyperparameters: Parameters whose values are set before the learning process begins, influencing how the model learns (e.g., learning rate, batch size, number of layers).
Labeled Data: Data that has been tagged or annotated with meaningful information, such as classifications or attributes, which is used to train supervised machine learning models.
Large Language Models (LLMs): Advanced AI models trained on vast amounts of text data, capable of “understanding”, generating, and responding to human language in a coherent and contextually relevant way.
LeetCode: A platform for coding practice and technical interviews
MLflow: An open-source platform for managing the machine learning lifecycle, often used by companies like Databricks as an alternative to tools like Weights and Biases.
MLOps: A set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.
Non-deterministic: In the context of LLMs, refers to the characteristic that given the same input, the model might produce different outputs, making consistent testing and evaluation challenging.
OpenAI: OpenAI, Inc. is an American artificial intelligence organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work". Lukas Biewald briefly interned with OpenAI to get more technical after AlphaGo's success.
PageRank: An algorithm used by Google Search to rank web pages in their search engine results, an early example of successful algorithms in practice.
Personas: In the context of LLMs, refers to setting the style or character of the model's responses to be consistent with a particular role or identity.
Prompt Engineering: The process of designing and refining inputs (prompts) for Large Language Models to elicit desired outputs, often compared to how a product manager functions for engineers.
Reinforcement Learning: A type of machine learning where an agent learns to make decisions by performing actions in an environment and receiving rewards or penalties, aiming to maximize cumulative reward.
Reward Functions: In reinforcement learning, a function that defines the feedback (rewards or penalties) an agent receives for its actions in the environment, guiding its learning process.
Simulated Annealing: A probabilistic technique for approximating the global optimum of a given function, inspired by the annealing process in metallurgy; mentioned by Lukas as an early concept that captured his imagination, similar to gradient descent.
Tables (W&B Product): A feature within Weights and Biases that allows users to visualize and interact with their data, enabling grouping, filtering, and inspection of individual examples.
Testing by Vibes: A colloquial term used in the LLM world to describe an informal and subjective evaluation method where users try out the model and make judgments based on their "feeling" or impression of its performance, rather than rigorous metrics.
Transformer Architecture: A neural network architecture introduced in 2017, which forms the basis for most modern Large Language Models and is highly efficient at processing sequential data.
Weights and Biases (W&B): An MLOps platform founded by Lukas Biewald, Sean, and Chris, providing tools for experiment tracking, model management, and collaboration in machine learning development.
Weave (W&B Product): A specific product within Weights and Biases designed for running and tracking Large Language Models, especially for tasks involving chaining LLMs and using tools (agentic systems).
Share this post