Powered by MOMENTUM MEDIA
investor daily logo

DeepSeek marks a potential shift in the AI competitive landscape

  •  
By Richard Clode
  •  
7 minute read

The concerns around China’s DeepSeek play into the growing debate on AI scaling challenges as well as the ROI of AI capex spend, and ultimately, concerns around the sustainability of AI capex beneficiary earnings and the prices the market is willing to pay.

We continue to expect ongoing strong spending on artificial intelligence (AI) capital expenditure as seen recently from announcements by Meta and the Stargate AI project. But we also think we need to be more selective in those AI capex beneficiaries, as well as think about the next phases of AI investment opportunity as this new tech wave develops.

We characterise infrastructure as the first phase of a new wave followed by platforms and then the software, applications and services.

We are approaching that pivot to the platform phase led by the cloud but still see longer-term investment opportunities in AI infrastructure as well.

==
==

The market has rapidly shifted from concerns on AI capex being too high, to now worrying that AI capex is going to collapse. Both cannot happen simultaneously, and the truth likely lies in between. Ultimately, we think these developments are positive for the long-term health and development of AI.

We continue to identify selective AI infrastructure beneficiaries and build our exposure to platforms that will benefit from more efficient AI compute, training models and inferencing.

What has DeepSeek achieved in terms of LLM innovation?

DeepSeek, the Chinese AI start-up and developer of open-source large language models (LLM) launched its third-generation V3 LLM in December 2024. DeepSeek-V3 is a mixture of experts (MoE) model that is benchmarking well against the best developed LLMs in the West and this month, DeepSeek-R1, which is a reinforcement learning reasoning model that benchmarks well against OpenAI’s o1 generative pre-trained transformer (GPT).

V3 uses an MoE model, taking several smaller models working together with a total of 671 billion parameters and only 37 billion active parameters at any given moment for each token during inferencing. V3 has further innovations such as multi-head latent attention reducing cache and memory size/usage, mixed precision computation on FP8 and a post-training phase re-architecture. Now MoE always looks more efficient as only a portion of the total parameters are active at any given point during token inferencing so that’s not overly surprising albeit V3 looks even more efficient, about 10x vs peers and 3-7x given other innovations.

The DeepSeek-R1 model is claimed uniquely to have done away with supervised fine-tuning. So there seems to be some innovation there, even if a lot of the headline improvements come from more standard techniques, while there is a wider debate on how much of the work DeepSeek has done themselves and how much is from leveraging open-source third-party LLMs.

3 key reasons why markets are concerned with DeepSeek

1. DeepSeek appears to have significantly lower training costs

DeepSeek claims to have trained V3 on only 2,048 Nvidia H800 GPUs for two months, which at US$2 per hour explains the US$5 million total cost headline number announced. That is a fraction of what Western hyperscalers are throwing at their LLM training (e.g. it’s 9 per cent of the compute used for Meta’s LLaMA 3.1 405B model).

2. China can still compete despite US restrictions

DeepSeek shows that a Chinese company can compete with the US best-of-breed AI companies, despite the current restrictions on Chinese access to advanced US semiconductor technology. This evokes memories of a generation of Russian coders, who, given restrictions on PC time in post-Soviet Russia, invented ingenious ways to code. Has the same thing happened in China where semi restrictions have forced greater LLM architecture innovation versus the US which has just relied on throwing the compute kitchen sink at the problem?

3. AI monetisation

DeepSeek is charging significantly less than OpenAI to use its models (about 20–40x lower), which plays into the AI monetisation concern given the extraordinary amounts of capex deployed in the West.

A notable AI force

The global AI ecosystem is taking note of DeepSeek’s developments. Despite only being launched two years ago (2023), DeepSeek benefits from the pedigree and backing of the team at quantitative fund High-Flyer Capital Management, as well as the success and innovation of its prior generation models. This is why while V3 was launched in December and R1 earlier this month, the market is only reacting now because R1’s reasoning capabilities are now viewed as cutting edge.

Plus, over the last weekend, DeepSeek became the top free app on Apple’s AppStore, overtaking ChatGPT. Silicon Valley investor Marc Andreessen posted that DeepSeek is “one of the most amazing and impressive breakthroughs I’ve ever seen”, which is high praise from a credible industry veteran. Comments like that have heightened the market’s concerns for the sustainability of AI capex and associated companies like Nvidia.

What do we make of all this?

New technology waves require innovation. Any new technology wave requires innovation to drive down the cost curve over time to enable mass adoption. We are witnessing multiple avenues of AI innovation to address scaling issues with training LLMs as well as more efficient inferencing. DeepSeek appears to bring some genuine innovation to the architecture of general purpose and reasoning models. Innovation and the driving down of costs are key to unlocking AI and enabling mass adoption longer term.

Distillation. DeepSeek’s model leverages a technique called distillation, which is being pursued more broadly in the AI industry. Distillation refers to equipping smaller models with the abilities of larger ones, by transferring the learnings of the larger, teacher model into the smaller, student one. However, it is important to note DeepSeek’s distillation techniques are reliant on the work of others. Exactly how reliant is a key question the market is grappling with currently.

Take the capex number with a pinch of salt: Related to the above, the capex numbers referred to are just comparing apples to oranges. The US$5 million cited relates to just one training run, ignoring any prior training runs and the training of the larger teacher models, whether at DeepSeek or the third-party open source LLMs they were built on.

Open source innovation. As AI luminary Yann LeCun has noted, this is a victory for the open source model of driving community innovation with DeepSeek leveraging Meta’s Llama and Alibaba’s Qwen open source models. Again, this is positive for the longer-term development of AI, driving and proliferating innovation. However, due to the current state of geopolitics, one would probably expect greater US government scrutiny on other countries accessing state of the art AI LLMs from the US.

LLMs commoditising? It has long been our belief that monetising LLMs in the longer term will be challenging given the volume of competition, including from open source developers and competitors looking to monetise in alternative ways. The DeepSeek announcement only brings greater scrutiny to the return on investment (ROI) of the huge capex general purpose foundational model developers are spending.

Richard Clode, international portfolio manager, Janus Henderson