The AI Race, The Stargate Project, DeepSeek Tsunami, Open Source Models and why should I care?

Caio Moreno
4 min readJan 28, 2025

--

Dear all,

First of all, the goal of this post is to explain this as simple as I can, so please do not expect deep technical details, the goal is to explain this to a bigger audience and not just the Gen AI experts.

We all probably saw the news in the last 5 days about the AI Race, The Stargate Project and the DeepSeek Tsunami, but why should I care?

The last days were mind blowing, we saw President Trump announce a $500 billion AI Infrastructure investment project called The Stargate Project, we saw the Chinese AI Lab release DeepSeek V3 as an open source model and the astonish results from DeepSeek-V3 compared to the status quo, and this results were capable to reduce the value of most of the US Tech Stocks.

Open Source AI Models and the AI Open Source Race

As Meta LLama 3.1. 3.2, 3.3, the great thing about DeepSeek-V3, it is open source.

Not far away, Meta Llama 3.3 was the best open source AI Model, but now we have a new kid in the block challenging companies like Meta and Open AI.

TL-DR

In the end, we just saw a group of amazing data scientists, AI engineers, experts in general with very strong mathematical and engineers background, plus hedge fund money, a lot of GPUs, the back up of the Chinese Government that were able to find a better way to build a new AI model and this is challenging the entire AI ecosystem and US AI dominance.

Also, they released the new AI Model and the results as an open source, so we have now people all around the world studying about this research and trying to replicate it and to improve it. See the paper for more details here.

Let’s learn more about it with some super interesting videos from CNBC, BBC, CNN and get more deep into DeepSeek V3.

Here are some videos you should watch to understand the impact of this in your own personal life.

How China’s New AI Model DeepSeek Is Threatening U.S. Dominance

How DeepSeek achieved its AI breakthrough, Benchmark partner Chetan Puttagunta explains

Trump announces a $500 billion AI infrastructure investment in the US

China’s DeepSeek triggers global tech sell-off

A shocking Chinese AI advancement called DeepSeek is sending US stocks plunging

Marc Andreessen warns Chinese ChatGPT rival DeepSeek is ‘AI’s Sputnik moment’

What is DeepSeek?

“We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.”

Learn more:
https://github.com/deepseek-ai/DeepSeek-V3
https://www.deepseek.com/
https://arxiv.org/html/2412.19437v1
https://arxiv.org/pdf/2412.19437v1

DeepSeek founder meets Chinese Premier

Learn more about DeepSeek founder:

https://www.theguardian.com/technology/2025/jan/28/who-is-behind-deepseek-and-how-did-it-achieve-its-ai-sputnik-moment

Try it DeepSeek V3 on Hugging Face

https://huggingface.co/deepseek-ai/DeepSeek-V3

Try it DeepSeek V3 on Ollama

https://ollama.com/library/deepseek-r1

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Not long ago, Databricks presented about MegaBlocks: Efficient Sparse Training with Mixture-of-Experts.

It is great to see that DeepSeek V3 is also a Mixture-of-Experts (MoE) language model.

Learn more about MegaBlocks:
https://arxiv.org/abs/2211.15841
https://github.com/databricks/megablocks

Is this a great example of doing more with less?

Great interview (in Spanish) with Enrique Dans, Innovation Profesor at IE University about DeepSeek and the role of innovation in environments with less resource.

DeepSeek fallout with Databricks CEO: TechCheck Livestream

DeepSeek, LLMs, NVIDIA, Scaling Laws and some surprising tech insights from a CEO: https://www.youtube.com/live/SGZdcL3HLrc

Generative AI / LLMs using Databricks Data Intelligence Platform

Here is a presentation about how you could use Databricks to create your own LLM from scratch and how your company could benefit from building a new LLM using your own enterprise data in a secure and enterprise way.

Learn more:
https://github.com/drcaiomoreno/Generative-AI-Using-Databricks

DeepSeek R1 on Databricks

https://www.databricks.com/blog/deepseek-r1-databricks

https://github.com/drcaiomoreno/deepseek-r1-on-databricks

--

--

Caio Moreno
Caio Moreno

Written by Caio Moreno

Solutions Architect @databricks | Professor | PhD | Ex-Microsoft | Ex-Avanade/Accenture | Ex-Pentaho/Hitachi | Ex-AOL | Ex-IT4biz CEO. (Opinions are my own)

No responses yet