AI & Humanoids

Trillion-dollar disruptor: China's DeepSeek upends AI world overnight

Trillion-dollar disruptor: China's DeepSeek upends AI world overnight
DeepSeek's open-source model shows that the US way is not the only AI way
DeepSeek's open-source model shows that the US way is not the only AI way
View 2 Images
DeepSeek's open-source model shows that the US way is not the only AI way
1/2
DeepSeek's open-source model shows that the US way is not the only AI way
The company's LLM was built using old Nvidia chips for a fraction of the cost invested by the likes of Anthropic and OpenAI on their respective models
2/2
The company's LLM was built using old Nvidia chips for a fraction of the cost invested by the likes of Anthropic and OpenAI on their respective models

The US AI giants got a wake-up call this week, when fledgling Chinese firm DeepSeek wiped a record-breaking trillion dollars off the value of heavyweights like Nvidia and OpenAI. The technology's gatekeepers are rattled – and they have good reason to be, as DeepSeek's model R1 shows how the costly existing roadmap is no longer the only way forward.

This game-changing event is on the back of the company's latest AI model – DeepSeek-R1 – being released for use on smartphones across the globe, following desktop launch on January 10.

DeepSeek has been on our radar for a few weeks, after its chatbot V3 dropped on December 26 and was reported to have performed as well as the leading US GPTs (generative pre-trained transformers) – something that few news outlets covered at the time (including us). With the AI frontrunners – all US companies – developing new features at breakneck speed, it was hard to imagine that this unheard-of large language model (LLM), even one that looked impressive on paper, and was fundamentally different in many ways, could rock the boat.

But that all changed overnight on January 27, 2025 – as China woke up on the day before Lunar New Year's Eve, DeepSeek had become the #1 app in the AI/GPT world and decimated the stock price of the who's who of the industry: As well as Nvidia and OpenAi, scalps included Meta, Google's parent company Alphabet, Nvidia partners Oracle, plus many other energy and data center firms. Elon Musk dodged this bullet – only because X is no longer listed on the market.

While the market downturn is no doubt a temporary one, DeepSeek has permanently altered the path of the AI timeline. Until now, the US has been so far ahead in the field that all we really expected to see were poor imitations of the 'gold standard' models. And this is why DeepSeek is so interesting, because it's forged its own path, setting up China as a new player in what some are now calling a digital arms race.

The company's LLM was built using old Nvidia chips for a fraction of the cost invested by the likes of Anthropic and OpenAI on their respective models
The company's LLM was built using old Nvidia chips for a fraction of the cost invested by the likes of Anthropic and OpenAI on their respective models

What makes it so different are a number of things: It's been trained on older, cheaper chips and cut out a few of the costly steps that has, until now, been the standard route for chatbots. Because of this, its development cost a reported US$5.6 million to rent the hardware required for training the model, compared with an estimated $60 million for Llama 3.1 405B, which also used 11 times the computing resources. GPT-4 cost more than $100 million. Microsoft has also said it plans to spend $80 billion on AI development in 2025. R1 is also open source, rather than closely guarded proprietary, which in turn helps DeepSeek navigate regional restrictions.

Overall, this has triggered a kind of existential crisis for the US-dominated industry – because what if a model could be produced for a fraction of the cost, and trained more efficiently, and be just as good, if not better?

"There are a few things to know about this one," said Casey Newton, one of the hosts of the Hard Fork podcast on January 10. "One is that it's really big; it has more than 680 billion parameters, which makes it significantly bigger than the largest model in Meta's Llama series, which I would say up to this point has been the gold standard for open models. That one has 405 billion parameters.

"But the really, really important thing about DeepSeek is that it was trained at a cost of US$5.5 million," he continued. "And so what that means is you now have an LLM that is about as good as the state-of-the-art [AIs] that was trained for a tiny fraction of what something like Llama or ChatGPT was trained for."

To understand why DeepSeek is so significant, you have to look at where it came from. Its developer, quantitive – or quant – trader Liang Wenfung bought up thousands of Nvidia chips back in 2021 to work on a 'side project' to assist with his day job at the helm of one of the Chinese market's largest hedge-fund companies, High-Flyer. The 40-year-old financier used these chips to build algorithms and mathematical models to help predict market trends and steer investments, with DeepSeek only established in 2023.

“When we first met him, he was this very nerdy guy with a terrible hairstyle talking about building a 10,000-chip cluster to train his own models," one of Liang’s business partners told the Financial Times. "We didn’t take him seriously. He couldn’t articulate his vision other than saying: ‘I want to build this, and it will be a game changer.’ We thought this was only possible from giants like ByteDance and Alibaba.”

Less than two years on, the maker of those chips – Nvidia – would see $593 billion wiped from its market value overnight thanks to Wenfung. It's now the biggest daily loss in US market history. (Incidentally, export of advanced Nvidia chips has now been restricted – yet DeepSeek-V3 was trained on cheaper, older Nvidia H800 hardware.)

What makes DeepSeek's R1 model such a game-changer is its unorthodox training (and, in turn, the money saved in the process). This fantastic explainer covers a recent research paper released by the company, which essentially details how DeepSeek bypassed the traditional supervised fine-tuning stage of LLM development and instead focused on the AI's "self-evolution through a pure reinforcement learning process."

“We demonstrate that reasoning capabilities can be significantly improved through large-scale reinforcement learning (RL), even without using supervised fine-tuning (SFT) as a cold start," the DeepSeek researchers wrote in the January paper. "Furthermore, performance can be further enhanced with the inclusion of a small amount of cold-start data."

While this is unlikely to rock the world of LLM users, who are most likely casually interacting with the likes of Google's Gemini or Anthropic's Claude, it stands as a defining moment in the development of this technology. Which brings us to another aspect of its business model that sets it apart – and has the industry rattled: Access.

As Nature's Elizabeth Gibney wrote about on January 23. DeepSeek-R1 is released as "open weight," which means it can be used as a tool for researchers to study and build on. In comparison, existing market-leading models are what researchers deem a "black box," a closed-off system controlled instead by the developers. It paves the way for scientists to harness an existing model for their own uses, rather than build from the ground up.

"DeepSeek hasn’t released the full cost of training R1, but it is charging people using its interface around one-30th of what [Open AI's] o1 costs to run," Gibney noted. "The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model."

However, as DeepSeek triggered the market crash on January 27, it was met with cyberattackers attempting to crash its servers.

"Due to large-scale malicious attacks on DeepSeek's services, we are temporarily limiting registrations to ensure continued service," the company posted on its status page. "Existing users can log in as usual. Thanks for your understanding and support."

As of writing, DeepSeek-R1 can still be downloaded and the site accessed, but new registrations are restricted to China residents with a local phone number.

Meanwhile, a somewhat inevitable backlash is now under way, with countless news outlets including Forbes noting that DeepSeek-R1 is hampered by censorship, stonewalling questions that would evoke criticism of China. Silicon Valley startup Perplexity AI – which currently has its sights on a US merger deal with TikTok's parent company ByteDance – was briefly hosting an "uncensored" search engine powered by DeepSeek-R1, but this too has been taken offline.

Regardless of how this plays out in the coming days and weeks, one thing is certain: DeepSeek, in a few short weeks, has singlehandedly shifted the course of AI development.

“The emergence of DeepSeek is a significant moment in the AI revolution," said Professor Geoff Webb, from the Department of Data Science & AI at Monash University in Australia. "Until now it has seemed that billion-dollar investments and access to the latest generation of specialized Nvidia processors were prerequisites for developing state-of-the-art systems. This effectively limited control to a small number of leading US-based tech corporations.

He adds that if DeepSeek's claims are all true, "it means that the US tech sector no longer has exclusive control of the AI technologies, opening them to wider competition and reducing the prices they can charge for access to and use of their systems."

Webb then makes an important point that few people are talking about: The monopolization of AI by a handful of powerful players in the US – further consolidated by government-legislated export restrictions on crucial Nvidia hardware – essentially denies the rest of the world a stake in the most significant technological advancement since the internet.

“Looking beyond the implications for the stock market, current AI technologies are US-centric and embody US values and culture," he added. "This new development has the potential to create more diversity through the development of new AI systems. “It also has the potential to make AI more accessible for researchers around the world both for developing new technologies and for applying them in diverse areas including healthcare.”

9 comments
9 comments
Trylon
This is a new Cold War. In 1970, Colossus: The Forbin Project posited dueling US-Soviet supercomputers. 55 years later, Russia is only a bit player in the field of AI. China is the big threat.
jimbo92107
This is wonderful news, especially for consumers and indie developers with smaller financing. The massive shortcuts that Liang Wenfung discovered hint that the field of LLM development is fertile ground for radical optimization. How low can you go? How many shortcuts can you find? Simplify, simplify!
Daishi
The CEO of scale AI went on CNBC and said DeepSeek probably has bunch of H100's they have not disclosed publicly because of export restrictions which means the quoted training cost may not be entirely accurate. They also used GPT to help train their model which reduced their costs but it wouldn't have been possible without a good (and expensive) model already existing to guide training. Cost reduction is also not new to the space as the inference costs to achieve the same performance fell almost 2 orders of magnitude last year. Those optimizations are necessary because time scaling models are demanding and it would be too expensive to time scale already computationally expensive models. Some of the methods will be adopted by others and we will see the next generation of time scaled reasoning models soon.
matthew4506
Why is everything always China vs US ? We all live in the same tiny blue marble rolling through a vast lonely expanse of space. Wouldn’t it be better to work together to solve the worlds problems?
PeterP
DeepSeek blows ChatGPT out of the water. Here’s an example:
Q: How many stars are there within 1 light year of earth?
A: ChatGPT: There is only 1 star within 1 light year of Earth, and that star is Proxima Centauri. Proxima Centauri is part of the Alpha Centauri star system, which is the closest star system to our own. It is located 4.24 light years away from Earth, making it the closest known star to us. No other stars are within 1 light year of Earth—Proxima Centauri is the only one in that range.
A: DeepSeek: Within 1 light-year of Earth, there is only one star: our very own Sun (Sol). The next closest star system, Proxima Centauri, is about 4.24 light-years away, so no other stars fall within the 1-light-year radius of Earth.
ChatGPT is not only entirely incorrect, but its answer simply makes no sense. DeepSeek is simply spot-on.
PeterP
I would like to add a point of reference to my previous comment - I have been asking ChatGPT the same question every week for the last year. It has got it wrong every time. After it gives me a wrong answer, I give it the correct answer. It never learns...
christopher
Face-palm moment for Liang - pulls his quant chips off their job to train an AI model - misses the Trillion-Dollar stock-market opportunity he created !
There's loads of global AI players and models already, many of which are open; it isn't and wasn't ever going to be, "monopolization of AI by a handful of powerful players in the US". Here's some - Falcon 180B, Falcon 40B, ERNIE 4.0, ERNIE-ViLG, PLATO-XL, Hunyuan, Tongyi Qianwen, M6, PanGu-Σ, MindSpore, GLM-130B, SenseNova, Gorilla, Grok (yes, it's English, not USA), Gemini, Mistral 7B, Mixtral, Luminous, GPT-NO, Fujitsu AI, Sakura, NTT AI, HyperCLOVA, K-GPT, Yandex GPT, GigaChat, Hanooman, Bhashini AI, GraphCast, GPT-J.
I've "cut/paste" a bunch of math and programming questions that ChatGPT got wrong (in general, ChatGPT gets *every* programming question wrong in one way or another *every time*, along with most math questions too) and deepseek got all of them right first try, every time.
Damn those jealous DDoS sore-losers who've started making deepseek hard for us to use now... we're going to have to download it and run our own local copy now. (OK, so I know the price of the GPUs needed for that - the point here is that we *can* do that. Here's the URL for it: https://github.com/deepseek-ai/DeepSeek-LLM ).
Daishi
@christopher There are distilled versions of it on HuggingFace that you can run on local hardware with decent results such as a 32B parameter version of R1. LMStudio simplifies running them locally. Ollama has some distilled versions of it as well such as 14B and 8B versions that will run on most laptops but I have not seen any benchmarks for how it performs in that state vs other small models (like LLama) created as 8B to begin with.
Also to others this chart from Guido at a16z adds a lot of useful context on trends in model pricing. Training does not equal inference, but it does represent that there is exponential improvements in cost vs performance and Deepseek is not the first time we have seen a step function in cost reduction. Generally this is viewed as good https://i.imgur.com/amKcGzl.jpg
Daishi
Not sure if I can post this here but this breakdown/FAQ on DeepSeek seems like a worthy read. It seems likely they did do final training run on older H800 chips (that are permitted with export restrictions) and also that they borrowed heavily from other existing models (OpenAI) to guide training. Using other models to guide training has been a bit of an open secret in training for a while but people are big mad when a Chinese company does it. https://stratechery.com/2025/deepseek-faq/