The Times Australia
The Times World News

.
The Times Real Estate

.

How a small Chinese AI company is shaking up US tech heavyweights

  • Written by Tongliang Liu, Associate Professor of Machine Learning and Director of the Sydney AI Centre, University of Sydney



Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves through the tech community[1], with the release of extremely efficient AI models that can compete with cutting-edge products from US companies such as OpenAI and Anthropic.

Founded in 2023, DeepSeek has achieved its results[2] with a fraction of the cash and computing power of its competitors.

DeepSeek’s “reasoning” R1 model, released last week, provoked excitement among researchers, shock among investors, and responses from AI heavyweights. The company followed up on January 28 with a model[3] that can work with images as well as text.

So what has DeepSeek done, and how did it do it?

What DeepSeek did

In December, DeepSeek released its V3 model[4]. This is a very powerful “standard” large language model that performs at a similar level to OpenAI’s GPT-4o and Anthropic’s Claude 3.5.

While these models are prone to errors and sometimes make up their own facts[5], they can carry out tasks such as answering questions, writing essays and generating computer code. On some tests[6] of problem-solving and mathematical reasoning, they score better than the average human.

V3 was trained at a reported cost[7] of about US$5.58 million. This is dramatically cheaper than GPT-4, for example, which cost more than US$100 million[8] to develop.

DeepSeek also claims to have trained V3 using around 2,000 specialised computer chips, specifically H800 GPUs made by NVIDIA[9]. This is again much fewer than other companies, which may have used up to 16,000[10] of the more powerful H100 chips.

On January 20, DeepSeek released another model, called R1[11]. This is a so-called “reasoning” model, which tries to work through complex problems step by step. These models seem to be better at many tasks that require context and have multiple interrelated parts, such as reading comprehension and strategic planning.

The R1 model is a tweaked version of V3, modified with a technique called reinforcement learning. R1 appears to work at a similar level to OpenAI’s o1[12], released last year.

DeepSeek also used the same technique to make “reasoning” versions of small open-source models that can run on home computers.

This release has sparked a huge surge of interest in DeepSeek, driving up the popularity of its V3-powered chatbot app[13] and triggering a massive price crash[14] in tech stocks as investors re-evaluate the AI industry. At the time of writing, chipmaker NVIDIA has lost around US$600 billion[15] in value.

How DeepSeek did it

DeepSeek’s breakthroughs have been in achieving greater efficiency: getting good results with fewer resources. In particular, DeepSeek’s developers have pioneered two techniques that may be adopted by AI researchers more broadly.

The first has to do with a mathematical idea called “sparsity”. AI models have a lot of parameters that determine their responses to inputs (V3 has around 671 billion), but only a small fraction of these parameters is used for any given input.

However, predicting which parameters will be needed isn’t easy. DeepSeek used a new technique to do this, and then trained only those parameters. As a result, its models needed far less training than a conventional approach.

The other trick has to do with how V3 stores information in computer memory. DeepSeek has found a clever way to compress the relevant data, so it is easier to store and access quickly.

Computer screen with deepseek logo and the words into the unknown underneath it.
DeepSeek has shaken up the multi-billion dollar AI industry. Robert Way/Shutterstock[16]

What it means

DeepSeek’s models and techniques have been released under the free MIT License[17], which means anyone can download and modify them.

While this may be bad news for some AI companies – whose profits might be eroded by the existence of freely available, powerful models – it is great news for the broader AI research community.

At present, a lot of AI research requires access to enormous amounts of computing resources. Researchers like myself who are based at universities (or anywhere except large tech companies) have had limited ability to carry out tests and experiments.

More efficient models and techniques change the situation. Experimentation and development may now be significantly easier for us.

For consumers, access to AI may also become cheaper. More AI models may be run on users’ own devices, such as laptops or phones, rather than running “in the cloud” for a subscription fee.

For researchers who already have a lot of resources, more efficiency may have less of an effect. It is unclear whether DeepSeek’s approach will help to make models with better performance overall, or simply models that are more efficient.

References

  1. ^ shockwaves through the tech community (www.theverge.com)
  2. ^ achieved its results (www.technologyreview.com)
  3. ^ a model (techcrunch.com)
  4. ^ V3 model (arxiv.org)
  5. ^ sometimes make up their own facts (arxiv.org)
  6. ^ some tests (www.anthropic.com)
  7. ^ reported cost (www.scmp.com)
  8. ^ more than US$100 million (www.wired.com)
  9. ^ H800 GPUs made by NVIDIA (www.reuters.com)
  10. ^ up to 16,000 (www.nytimes.com)
  11. ^ called R1 (arxiv.org)
  12. ^ OpenAI’s o1 (openai.com)
  13. ^ V3-powered chatbot app (www.theguardian.com)
  14. ^ massive price crash (www.ft.com)
  15. ^ has lost around US$600 billion (www.abc.net.au)
  16. ^ Robert Way/Shutterstock (www.shutterstock.com)
  17. ^ MIT License (opensource.org)

Read more https://theconversation.com/deepseek-how-a-small-chinese-ai-company-is-shaking-up-us-tech-heavyweights-248434

The Times Features

Airbnb unveils hidden wine regions to explore across Australia

Ahead of Easter, Airbnb launches Hidden Vines - the ultimate grape escape guide - as new data reveals travellers are keen to uncover lesser-known regions and revisit old favour...

Why 20% of workers don't feel safe in their workplace

NEW RESEARCH REVEALS MORE THAN A QUARTER OF AUSTRALIAN EMPLOYEES DON’T EVER TAKE A BREAK COS highlights the importance of employees feeling like they can take breaks, and tips...

Riding in Style: Must-Have Bogs Gumboots for Equestrians

Key Highlights Bogs Gumboots offer superior comfort, durability, and waterproof protection, making them ideal for the demanding conditions of equestrian life. Their contoured...

How to Tell If You Need a New Roof in Melbourne Due to Leaks

Picture waking up in the early hours of the morning to the sound of mellow drips at your home, heralding the tremendous downpour of the previous night. As you look above, you not...

Rise of the Grey WoMad: Older Women Travelling Solo

Older Australian women are increasingly ditching their families and choosing to travel solo creating a new type of traveller known as the ‘Grey WoMad’. Budget travel platform ...

Why You Need an Expert Electrician for Your Business’s Electrical Upgrades and Repairs

When it comes to maintaining and upgrading your business’s electrical systems, it’s essential to call in a professional. Electrical work in any commercial setting requires the ex...

Times Magazine

Blocky Adventures: A Minecraft Movie Celebration for Your Wrist

The Minecraft movie is almost here—and it’s time to get excited! With the film set to hit theaters on April 4, 2025, fans have a brand-new reason to celebrate. To honor the upcoming blockbuster, watchfaces.co has released a special Minecraft-inspir...

The Ultimate Guide to Apple Watch Faces & Trending Wallpapers

In today’s digital world, personalization is everything. Your smartwatch isn’t just a timepiece—it’s an extension of your style. Thanks to innovative third-party developers, customizing your Apple Watch has reached new heights with stunning designs...

The Power of Digital Signage in Modern Marketing

In a fast-paced digital world, businesses must find innovative ways to capture consumer attention. Digital signage has emerged as a powerful solution, offering dynamic and engaging content that attracts and retains customers. From retail stores to ...

Why Cloud Computing Is the Future of IT Infrastructure for Enterprises

Globally, cloud computing is changing the way business organizations manage their IT infrastructure. It offers cheap, flexible and scalable solutions. Cloud technologies are applied in organizations to facilitate procedures and optimize operation...

First Nations Writers Festival

The First Nations Writers Festival (FNWF) is back for its highly anticipated 2025 edition, continuing its mission to celebrate the voices, cultures and traditions of First Nations communities through literature, art and storytelling. Set to take ...

Improving Website Performance with a Cloud VPS

Websites represent the new mantra of success. One slow website may make escape for visitors along with income too. Therefore it's an extra offer to businesses seeking better performance with more scalability and, thus represents an added attracti...

LayBy Shopping