Reflections on AI: The Stochastic Era

I’ve always loved jazz and improvisational music. My wife, Sarah, appreciates the perfect, tight structure of a three-minute song, and I get it. There’s a real beauty in that precision. But for me, the magic happens in the exploratory freedom of a 10, 15, or even 25-minute musical journey. It’s about letting go of a rigid plan to discover something new and amazing in the moment.

I was thinking about this recently, remembering a weekend back in August of 1996. I was standing on a decommissioned Air Force base in Plattsburgh, New York, with three good friends and a huge smile on my face. We were at The Clifford Ball, Phish’s first festival, and the band was on fire. During the second set of the second night, they launched into “Run Like An Antelope.” The jam that followed was pure improvisational genius—a high-energy, tight-but-loose exploration that broke free from the song’s structure to create something utterly unique and unrepeatable. The entire festival was like that, a masterclass in creative freedom.

I’m a firm believer in what Steve Jobs called standing at the “crossroads of technology and the liberal arts.” That Phish jam is a perfect example of the artistic side: letting go of a rigid structure can lead to something far more profound. It feels counterintuitive, but for my entire career in technology, I’ve seen the other side—a world built on perfect, deterministic machines. Now, we’re standing at a new crossroads, and the same principle of letting go is about to change everything.

Steve Jobs famously said, ” — it’s technology married with liberal arts, married with the humanities, that yields us the results that make our hearts sing … ”

A Jarring Shift in Thinking

For as long as I’ve been a software engineer and a technology leader, computers have been defined by their deterministic nature. They are perfect, logical calculators. Input A always produces Output B. 2 + 2 will always equal 4. But we are now entering a new era: the Stochastic Era.

The most powerful large language models today, the ones that can generate art, write poetry, and are changing our world, are fundamentally not deterministic. At their core, they are probabilistic engines making sophisticated guesses. Letting go of rigid structure has allowed for the room for what feels like creativity. This is a massively jarring shift in thinking. How can this randomness—this seeming imperfection—be the essential ingredient for building true, human-like intelligence?

From Certainty to Probability: What is Stochastic Thinking?

To understand this shift, we need to contrast two mindsets.

  • Deterministic Thinking: This is like following a precise recipe to bake a cake. You use the exact same ingredients and instructions every time, and you get the exact same cake. It’s predictable and reliable.
  • Stochastic Thinking: This is like a skilled chef improvising a meal. They have a deep understanding of ingredients and techniques, but they create a dish based on what’s fresh and available. The meal is different every time, but it’s creative, adapted, and often brilliant.

It’s crucial to understand that this isn’t just chaos or random noise. It’s principled randomness. A stochastic system uses probability distributions to make the best possible guess based on the vast amount of data it has learned from.

The Engine of Modern AI: How LLMs Actually Work

The generative AI revolution we are living through was ignited by a single research paper. In 2017, researchers at Google published a paper titled “Attention Is All You Need.” It introduced a new architecture called the Transformer, which is the blueprint for every modern Large Language Model (LLM), from ChatGPT to Gemini.

Before the Transformer, AI models processed language sequentially, one word at a time, often forgetting the context of earlier words. The Transformer’s breakthrough was a mechanism called self-attention, which allows the model to look at all the words in a sentence at once and weigh their relevance to each other. This enabled a far deeper understanding of context and, crucially, allowed for massive parallelization in training.

Stochastic thinking is not just an add-on to this architecture; it is its fundamental operating principle.

  1. The Core Engine: A Probabilistic Word Predictor. At its heart, an LLM is predicting the most probable next word in a sequence. Its creativity comes from the fact that it doesn’t always pick the #1 most likely word. Instead, it samples from a distribution of likely candidates, allowing for variety and novelty.
  2. Controllable Randomness: Temperature and Top-P Sampling. We can control this randomness with parameters. Temperature acts as a creativity dial—low temperature makes the AI more factual and predictable, while high temperature makes it more creative and surprising. Top-P sampling provides another lever, telling the model to only consider a set of the most likely words.
  3. The Learning Process: Stochastic Gradient Descent. Even the training process is stochastic. It would be impossible to learn from the entire internet at once. Instead, models learn using Stochastic Gradient Descent (SGD), where they take a small, random batch of data, learn from it, and adjust. This random sampling makes learning efficient and helps the model generalize its knowledge.

The Wall of Determinism: Why Old AI Hit a Limit

An old vintage pickup truck parked on a dirt road in a scenic landscape with grasslands and rolling hills under a colorful twilight sky.

For decades, AI research focused on rule-based “expert systems.” This deterministic approach could never lead to AGI for a few key reasons:

  • The Real World is Messy: The world isn’t a clean set of IF-THEN statements. It’s ambiguous, nuanced, and unpredictable.
  • Brittleness: Rule-based systems are brittle. They fail the moment they encounter a situation not explicitly covered by their hand-crafted rules.
  • The Creativity Problem: A deterministic system can only follow its programming. It can never create something truly novel or surprising.

The Bitter Lesson

In 2019, AI pioneer Rich Sutton wrote a now-famous essay called “The Bitter Lesson.” His central point was that, in the long run, general-purpose methods that leverage massive computation (like learning and search) will always outperform systems where humans try to hand-craft their knowledge.

This is the ultimate validation of the stochastic approach. Instead of trying to teach an AI all the grammatical rules of English, we let a general learning algorithm discover the patterns for itself from trillions of words. This is exactly how LLMs work, and it’s a lesson that connects directly to the ideas in my previous post on the Law of Accelerating Returns. When you combine The Bitter Lesson (let computation do the work) with the Stochastic Engine of LLMs and place it on the exponential curve of Accelerating Returns, you get the explosive, transformative moment in AI that we are witnessing right now.

How Stochasticity Unlocks Intelligence

This new approach is the bridge to AGI because it enables capabilities that were impossible before:

  1. Creativity and Exploration: Randomness allows an AI to explore novel combinations of ideas and generate content that has never existed before.
  2. Robustness and Adaptability: A probabilistic model can handle the uncertainty of the real world, making informed guesses instead of breaking down.
  3. Efficient Learning: It is the only way to effectively learn from the planet-scale datasets required to achieve general intelligence.

I saw early glimpses of this in my career. I had the incredible opportunity to be mentored by Steve Kirsch, the founder of Infoseek and a true tech pioneer. We worked together on algorithms for blocking spam for major clients like Yahoo Mail. The techniques we used were essentially early stochastic models, employing Bayesian probability to “guess” if an email was spam based on patterns, rather than relying on rigid rules. That company was later sold to Proofpoint, but the core lesson about the power of probabilistic systems stayed with me.

Even today, my role as CTO for O2E Brands is a stochastic exercise. I’m constantly weighing probabilities—the likelihood of a project’s success, market adoption, potential risks—to make the best strategic bets with the available data. It’s never about one certain answer.

The Art of the Guess

An abstract digital artwork featuring swirling purple and golden lines set against a dark blue background, reminiscent of intricate neural connections.

Looking ahead, these non-deterministic, stochastic models will power the next wave of systems on the path to AGI, from autonomous agents that can navigate unpredictable environments to scientific AIs that can form novel hypotheses.

The journey to AGI isn’t about building a faster, more powerful calculator. It’s about building a more sophisticated and intuitive “guesser.” We’ve spent a century trying to make machines perfectly logical. It turns out, to make them truly intelligent, we first have to teach them the art of probability. The messy, jarring concept of randomness is not a bug—it’s the feature that will finally get us to AGI.

Thank you for reading. Leave a comment if you have thoughts or comments.

2 thoughts on “Reflections on AI: The Stochastic Era

  1. Rob Leiphart

    Fascinating read. Stochastics is one of many theories that exist in the stock market for technical analysts. The parallels make sense here and I enjoyed the read. Thank you.

Leave a Reply