What if I told you modern AI can run on a computer from 1997?
You'd probably laugh. I did.
Then EXO Labs actually did it. They took a Pentium II from the late 90s (total eBay cost: £118.88), loaded a Llama 2 AI model onto it, and watched it generate coherent responses at 39 tokens per second.
This says something important: The future of AI doesn't actually need billion-dollar data centers.
🔧 How They Actually Did It
The Team Behind It
This was EXO Labs, led by AI researcher Andrej Karpathy. You might know him from Tesla (where he led the Autopilot team) or OpenAI (where he was a founding member).
The team called their approach "llama98.c." It's based on Karpathy's llama2.c, which is 700 lines of pure C code that handles AI inference on minimal hardware. They adapted it for Windows 98, compiled it with Borland C++ 5.02 (from 1999), and made it work.
The hard part wasn't running the model. It was the infrastructure. Getting files onto a machine with no USB support. They used FTP. The keyboard had to go in port 2, the mouse in port 1. Everything was broken in ways that required real engineering to fix.
But they fixed it. And it worked.
Scale Your Team Without Scaling Complexity
You got AI running on a 1997 computer. Your payroll shouldn't be harder than that.
Deel handles hiring, payroll, and compliance across 150+ countries. One dashboard. Done. Trusted by 35,000+ companies.
→ Try Deel ⤵
AI in HR? It’s happening now.
Deel's free 2026 trends report cuts through all the hype and lays out what HR teams can really expect in 2026. You’ll learn about the shifts happening now, the skill gaps you can't ignore, and resilience strategies that aren't just buzzwords. Plus you’ll get a practical toolkit that helps you implement it all without another costly and time-consuming transformation project.
Why This Actually Matters
You could dismiss this as a stunt. A fun demo that proves nothing.
Except EXO Labs is pointing to something that's about to reshape AI infrastructure.
BitNet. It's a transformer architecture that uses ternary weights instead of floating-point numbers. Every parameter in the model is one of three values: negative one, zero, or one.
Here's the impact: A 7-billion parameter BitNet model needs only 1.38 GB of storage. Compare that to a standard 7B model that needs 13 GB. It runs on CPUs instead of GPUs. It's 50% more efficient than full-precision models.
And it runs at near full-precision accuracy on language tasks.
The math is staggering. BitNet b1.58 delivers 2.71 times faster inference and 3.55 times less memory usage than standard FP16 models. Energy consumption drops by 55 to 82% per token depending on hardware.
A 30-billion parameter BitNet model has similar performance to a standard 7-billion parameter model but with dramatically lower resource requirements.
What This Means
1. Your Device Gets Smarter
Your laptop stops being slow. A 100-billion parameter BitNet model runs at normal conversation speeds (5-7 tokens per second) on a single CPU. That's it. No waiting. No lag.
2. Your Data Stays Put
Everything runs locally. No uploading. No cloud processing. Your files, your emails, your work stays on your machine. Faster. Simpler. More secure.
3. Hardware You Own Gets Better
That 2018 MacBook or old phone? Both can now run AI locally. No need to buy anything new. Existing devices just work better.
Faster. More private. On hardware you already have. That's the shift.
The Edge Computing Shift
This connects to something bigger happening in 2026.
Hardware companies are no longer racing on clock speeds. They're racing on efficiency. Specialized AI accelerators. Neuromorphic chips. Quantum co-processors. The goal: Enable devices to run sophisticated models locally without relying on remote data centers.
That's the shift. From cloud-centric AI to device-centric AI.
Meta's building this into consumer hardware. Google's embedding it in Pixel phones. Apple's running models locally on iPhones. And OpenAI is launching its first consumer hardware device in late 2026, designed by Jony Ive, that emphasizes local inference over cloud-dependent operations.
The Windows 98 demo is a proof of concept. The actual play is happening on billions of devices that already exist. Your phone. Your laptop. Your tablet. Your old MacBook gathering dust.
Each one becomes capable of running AI locally.
The EXO Labs Vision
The team's stated goal: "Build open infrastructure to train frontier models and enable any human to run them anywhere".
That's not just engineering. That's philosophy. The alternative is a future where AI runs exclusively in mega data centers owned by a handful of companies.
EXO Labs is actively recruiting for their Discord community. They have a Retro channel where people experiment with running LLMs on old Macs, Gameboys, Raspberry Pis, and other limited devices.
While AI Gets Cheaper, Art Gets Smarter
Computing power is democratizing. But fine art? Still appreciating.
Invest in authenticated artworks from legendary artists starting at $1,000. No expertise. No hassle. Just real value that outpaces inflation.
→ Try Masterworks ⤵
What investment is rudimentary for billionaires but ‘revolutionary’ for 70,571+ investors entering 2026?
Imagine this. You open your phone to an alert. It says, “you spent $236,000,000 more this month than you did last month.”
If you were the top bidder at Sotheby’s fall auctions, it could be reality.
Sounds crazy, right? But when the ultra-wealthy spend staggering amounts on blue-chip art, it’s not just for decoration.
The scarcity of these treasured artworks has helped drive their prices, in exceptional cases, to thin-air heights, without moving in lockstep with other asset classes.
The contemporary and post war segments have even outpaced the S&P 500 overall since 1995.*
Now, over 70,000 people have invested $1.2 billion+ across 500 iconic artworks featuring Banksy, Basquiat, Picasso, and more.
How? You don’t need Medici money to invest in multimillion dollar artworks with Masterworks.
Thousands of members have gotten annualized net returns like 14.6%, 17.6%, and 17.8% from 26 sales to date.
*Based on Masterworks data. Past performance is not indicative of future returns. Important Reg A disclosures: masterworks.com/cd
What You Should Know
The Windows 98 demo works, but it's glacially slow on larger models. A 1-billion parameter model runs at only 0.0093 tokens per second. That's not practical.
But BitNet changes the equation. With ternary weights, smaller models perform like much larger ones. You don't need billion parameter models running locally. You need purpose-built, fine-tuned models.
This is the edge AI opportunity. Not running giant models on your phone. Running the right models for the specific task on any device.
Resources & Credits:
Source/Credit: Futura Sciences: Someone used a 1997 processor and proved that a modern AI can run on just 128MB of RAM
Research on BitNet b1.58: BitNet: Ternary Quantization for LLMs
Andrej Karpathy's work: karpathy.ai
On edge AI in 2026: Tech Industry Forum: 2026 in AI
Related reading:
Turn Your Amazon Store Into a Revenue Machine
You know how to optimize. Now optimize your affiliate network.
Levanta connects you with creators on Amazon and Walmart. Recruit. Manage. Scale. Some sellers hit $100k/month in affiliate revenue within 30 days.
→ Try Levanta ⤵
Pay for Results, Stop Paying for Traffic
Your problem isn’t traffic, it’s paying for useless clicks that never convert.
Levanta helps Amazon sellers shift from ad spend to performance based affiliate marketing so you only pay when a sale happens.
Stay sharp,
Better Every Day




