The AI Casino

🎰 The Slot Machine Loop of LLMs

I recently stumbled upon a piece that got me thinking: Generative AI runs on gambling addiction: “Just one more prompt, bro”.

Purely hypothetically, if you were an AI provider who wanted to make your product not just useful, but truly addictive - how would you actually do it?

🪝 Engineering the Hook

To build a hook, you would look at the gambling industry. They perfected the science of variable payouts decades ago.

The most powerful way to reinforce a behavior isn’t to reward it every time, but to reward it randomly.

In a casino, this is a slot machine. In AI, it’s one-shotting a complex problem. You prompt a model ten times. Nine times it may give you nothing too special, or even broken code. But the tenth time? It may produce a piece of code so delightful and elegant that it gives you a huge hit of dopamine.

You’re hooked. And you’ll spend the rest of the day chasing that high again.

🫥 The Routing Layer

If AI providers are slottifying their products, are they doing so intentionally, or is it just an emergent property of the tech?

From an engineering perspective, every AI provider most likely has a prompt router. Running your top-of-the-line model for every “Hello World” request while massively subsidizing token costs with cheap subscriptions is financial suicide. You put a cache and a very cheap, lightweight model (a Flash or Lite variant) in front of every prompt.

If the router thinks the prompt is easy, it sends it to the cheap model. If it looks complex - or if the user is a “whale” they want to impress - maybe it gets sent to the heavy hitter.

What if that routing isn’t just about cost? What if it’s about retention? If a user hasn’t had a “win” in a while, do you route their next coding prompt to your best and brightest model just to keep them in the game?

📈 Degradation or Deliberate Variance?

We’ve all heard the complaints: “Opus is getting dumber” or “Gemini used to be better at this.” Usually, we chalk it up to manual alignment.

Prompt routing offers an alternative explanation. If AI companies don’t legally guarantee that a specific model name maps to a specific set of weights for every single token, they have a massive lever for cost-cutting and psychological manipulation.

What if 1 out of 10 prompts gets the “Pro” treatment, and the other 9 get the “Lite” version with a “Pro” label on the UI? You get just enough brilliance to stick around, and just enough mediocrity to keep you prompting.

🤔 What about Benchmarks?

Wouldn’t benchmarks catch such manipulation? Remember the Volkswagen emissions scandal? The car’s software would detect when it was being tested and “cheat” by reducing emissions only during the test.

Is it such a stretch to imagine an LLM router detecting a benchmark prompt (MMLU, HumanEval, etc.) and routing it to a specialized cluster, while average users get the “distilled” version?

I’m not claiming AI model providers are putting their thumb on the scale. But if they wanted to, there’s little to prevent them from doing so for the random consumer.

The countermeasure to manipulation however is not regulation - any regulation in that direction would probably be too heavy-handed. Rather, it’s open-source models, which are now only a few months behind the state-of-the-art models.

💪 Hardware Sovereignty

The only way to know for sure what weights are processing your tokens is to host them yourself. But it’s 2026, and with current supply chain crunches getting your hands on decent GPU and RAM is more and more cost-prohibitive.

Maybe a few years from now the manufacturing will be so advanced and the tech so powerful and cheap that we’ll have a Kimi K2.5-equivalent model running locally in our pocket. Until then, consider this: every time you hit “Generate” with a supposedly state-of-the-art model, you might be placing a bet.

And in the long run, the house always wins.