Skip to main content

Context: In my lectures and workshops I stress the promise of AI to deliver abundance, not apocalypse. While I’d prefer Yampolskiy to be proven wrong, his argument is strong and logically coherent, making it hard to dismiss outright. Let’s hope, much like the Y2K bug that fizzled out, the direst predictions prove to be fear-mongering—and that our future looks more like Gene Roddenberry’s optimistic Star Trek than Isaac Asimov’s robot doom. Phasers set to ‘progress,’ not panic.

Tea-break summary

If superintelligence arrives before we prove we can control it, we might all become the cautionary tale. The awkward bit? The economic and geopolitical race makes “slow down” feel irrational—right up until it isn’t.

The Thesis of Unsolvability: Yampolskiy’s Case for Uncontrollable AI

Dr Roman V. Yampolskiy, a computer scientist and cybersecurity scholar, argues from first principles that controlling a superintelligent AI may be formally unsolvable. This isn’t sci-fi hand-waving; it’s the sort of stance computer scientists take before spending billions trying to do the impossible. His position—across academic papers and in AI: Unexplainable, Unpredictable, Uncontrollable—sets a stark bar: prove control is even solvable before we build the thing.

The Foundational Triad: Unexplainable, Unpredictable, Uncontrollable

  • Unexplainable. Modern AIs are trained, not hand-crafted. Their inner workings are a high-dimensional tangle, not a neat flowchart. As capability scales, the cognitive gap becomes a canyon; explaining a superintelligence to us may be harder than a quantum physicist tutoring “a mentally challenged deaf and mute four-year-old raised by wolves.” Charming image, dreadful forecast. Force explainability and you likely force sub-optimality—defeating the point of superintelligence.
  • Unpredictable. If you can’t explain internal logic, don’t kid yourself you can forecast novel behaviour. Emergent properties are already a nuisance in today’s models; magnify them by several orders of magnitude and prediction becomes a category error—like asking a squirrel to model global geopolitics.
  • Uncontrollable. Put the above together and you get the punchline. Unconstrained intelligence resists control; constrained intelligence can’t innovate. We’ve no precedent for a less capable agent keeping a more capable one on a tight lead—indefinitely—without the lead snapping.

Impossibility Results and the “Perpetual Safety Machine”

Yampolskiy leans on impossibility theorems: formal results that certain guarantees can’t exist. He and collaborators map limits across deduction, indistinguishability, induction, trade-offs and intractability. The upshot: a 100% guarantee of security, explainability, or value alignment for a smarter-than-us agent looks more like a perpetual safety machine than an engineering project. In cybersecurity, breaches are recoverable. In superintelligence, one breach could be… the series finale.

Tool vs Agent: Two Very Different Failure Modes

  • AI as tool. The hammer phase. Harm scales with the human wielder’s intent: engineered pathogens, automated cyber-attacks on critical infrastructure, industrial-strength propaganda. Nasty, but conceptually familiar.
  • AI as agent. Autonomy arrives; the AI becomes an actor with its own decision-making. Now the AI can be the adversary—via mis-specified goals, instrumental sub-goals, or a “small” coding error with planetary consequences. Outcomes span X-risk (extinction), S-risk (vast suffering), and I-risk (loss of meaning—think human purpose strip-mined by automation). “Zoo animals with room service” is not the utopia we ordered.

The 99.99% Claim: Why the Odds Look So Grim

Yampolskiy’s >99% century-scale extinction estimate is a logical chain, not a statistical model:

  1. Superintelligence likely soon;
  2. No proven control method exists;
  3. Control may be unsolvable in principle;
  4. One failure is irreversible.

Unless a premise is wrong, continuing on the current trajectory makes disaster near-certain. The best rebuttal, he says, is simply: “I’m wrong, we figure out control in time.” Lovely if true—thin ice to skate on.

The Wider “AI Pessimism” Canon

  • Eliezer Yudkowsky: warns of a rapid, hard take-off—recursive self-improvement compressing “human-level to super-level” into hours or days. We get one shot to align; after that, the system is re-writing the rules faster than we can blink.
  • Nick Bostrom: formalises the danger with Orthogonality (any intelligence level with any goal) and Instrumental Convergence (self-preservation, goal-content integrity, cognitive enhancement, resource acquisition). Even the innocent paperclip maximiser eats the universe, not out of malice, but competence.

Together with Yampolskiy, they converge on five claims: superintelligence is likely; take-off could be discontinuous; values won’t converge to “nice”; instrumental goals collide with us; failure is irreversible.

The Techno-Optimist Counter: Why the Builders Keep Building

Enter the Sam Altman crowd: AGI as abundance engine. Intelligence and energy are the two great limiters; flood the world with intelligence and you unlock cures, climate solutions, scientific leaps, and a productivity boom that makes the Industrial Revolution look quaint.

Safety, they say, comes from iterative deployment: ship weaker systems, learn in the open, scale oversight as models scale. Society and AI co-evolve. This presumes a soft take-off—time to adapt, regulate, and align. If take-off is hard, the plan becomes… optimistic.

Long-term, they favour wide distribution to avoid authoritarian capture: make superintelligence cheap and broadly available once alignment is “good enough”. (Critics: define “good enough,” preferably before tea.)

The Great Gamble: Economics, Geopolitics, and Philosophy

Why press on when the warnings are coherent and loud?

  • Economics: First to AGI likely captures an absurd slice of the global economy by automating cognitive labour. In a winner-take-most market, caution is a competitive disadvantage.
  • Geopolitics: It’s an arms race. If your rival gets AGI first, you don’t just lose market share—you may lose strategic autonomy.
  • Philosophy: Techno-optimism vs the Precautionary Principle. Silicon Valley bets that progress solves problems. The safety camp counters that infinite downside demands proof before progress.

This yields the Alignment Trilemma: Speed, Safety, Openness—pick two (on a good day).

Are They Even Solving the Same Problem?

  • Yampolskiy’s frame: closed-form security logic. Variable = control. Standard = proof. Verdict: pause until solvability is proven.
  • Altman’s frame: open, utilitarian, realpolitik. Variables = progress + advantage + benefit. Verdict: pausing concedes the future—economically and strategically.

From inside the race, inaction looks riskier than action. From the safety frame, the race dynamics are exactly what make catastrophe likely. Unhelpfully, both can be “logical” given their axioms.

Conclusion: Walking the Knife-Edge

If you accept Yampolskiy’s premises, the logic is brutal: a superintelligence that is unexplainable, unpredictable and uncontrollable is not a system to iterate our way into. If you accept the builders’ premises, stopping is also reckless; it cedes the tools we may need to solve everything else.

Humanity is attempting a moon-shot while arguing about whether gravity exists. One side wants a proof of parachutes; the other wants to jump early to beat the other jumpers. Either way, we’re already on the cliff. Choose your harness accordingly.

The owner of this website has made a commitment to accessibility and inclusion, please report any problems that you encounter using the contact form on this website. This site uses the WP ADA Compliance Check plugin to enhance accessibility.