A year after R1 rattled Silicon Valley, DeepSeek is back with something far bigger. Meet V4-Pro and V4-Flash — the open-source models that are closing the gap with GPT-5 and Gemini, at a fraction of the price.
On Friday, April 24, 2026 — almost exactly one year after DeepSeek's R1 model upended the global AI industry — the Hangzhou-based startup did it again. DeepSeek quietly dropped two preview models on Hugging Face: DeepSeek-V4-Pro and DeepSeek-V4-Flash. Within hours, the AI community was buzzing. The benchmarks were remarkable. The pricing was borderline shocking. And the geopolitical implications were hard to ignore.
So what exactly is DeepSeek V4, why does it matter, and should you switch from ChatGPT or Claude? Let's break it all down.
| Stat | Value |
|---|---|
| Total parameters (V4-Pro) | 1.6T — world's largest open-weight model |
| Context window | 1M tokens — fit entire codebases in one prompt |
| Price vs Claude Opus 4.7 | 7× cheaper at near-identical coding benchmarks |
What is DeepSeek V4?
DeepSeek V4 is the fourth-generation flagship model family from DeepSeek, a Hangzhou-based AI lab that first made waves in January 2025 with R1 — a reasoning model that matched OpenAI's o1 at a fraction of the cost, and briefly crashed Nvidia's stock price in the process.
V4 comes in two variants, both built on a Mixture-of-Experts (MoE) architecture and released under the permissive MIT License. This means developers can use, modify, and commercially deploy the models with almost no restrictions.
V4-Pro
1.6 trillion total parameters, 49 billion active per token. The biggest open-weight model in existence — larger than Moonshot's Kimi K2.6 (1.1T) and more than double DeepSeek V3.2 (671B).
V4-Flash
284 billion total parameters, 13 billion active. Optimized for speed, cost, and efficiency — and surprisingly competitive with Pro on most benchmarks.
Both models support a 1 million token context window, enough to process an entire software codebase, a legal document, or a full-length novel in a single prompt. That alone is a major practical leap forward.
"DeepSeek V4 is the most powerful open-source model available today — and it runs on Chinese chips."
The Breakthrough: Hybrid Attention Architecture
The headline technical innovation in V4 is the Hybrid Attention Architecture — a mechanism that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). In plain English, this allows the model to handle very long conversations and documents far more efficiently than its predecessors.
At the 1-million-token context setting, V4-Pro requires only 27% of the compute (FLOPs) and 10% of the memory (KV cache) that DeepSeek V3.2 needed for the same task. V4-Flash pushes those numbers even lower: just 10% of the FLOPs and 7% of the cache.
| Model | FLOPs vs V3.2 | KV Cache vs V3.2 |
|---|---|---|
| V4-Pro | 27% | 10% |
| V4-Flash | 10% | 7% |
This is a huge engineering achievement — making frontier-class AI dramatically more accessible for real-world deployment.
How Does It Compare to GPT-5 and Gemini?
Here's where things get interesting. DeepSeek's own benchmarks position V4-Pro-Max as competitive with — and in some cases superior to — major closed-source rivals from OpenAI and Google.
| Model | SWE-bench Verified | Price (output/M tokens) | Open Source? |
|---|---|---|---|
| DeepSeek V4-Pro | 80.6% | $3.48 | ✅ MIT |
| Claude Opus 4.7 | ~80.8% | $25.00 | ❌ Closed |
| GPT-5.4 | ~82% | $30.00 | ❌ Closed |
| Gemini 3.1 Pro | ~81% | $18.00 | ❌ Closed |
| DeepSeek V4-Flash | 79.0% | $0.28 | ✅ MIT |
The math is staggering. V4-Pro scores 80.6% on SWE-bench Verified — a real-world software engineering benchmark — while costing $3.48 per million output tokens versus Claude Opus 4.7's $25. That's a 7× price gap at near-identical coding performance. For enterprises running large-scale AI workloads, that difference is transformative.
DeepSeek acknowledges that V4 does trail the frontier on some benchmarks. On Humanity's Last Exam (HLE), an expert-level cross-domain reasoning test, V4-Pro sits at 37.7% versus Claude at 40.0% and Gemini-3.1-Pro at 44.4%. For general knowledge retrieval, Google holds a clear edge. But on coding and mathematics — arguably the highest-value use cases for most developers — V4-Pro is essentially world-class.
The Geopolitical Angle: Chinese Chips, No Nvidia
Perhaps the most significant subplot in this release isn't the benchmarks — it's the hardware. DeepSeek optimized V4 for Huawei's Ascend 950 AI chips, and notably did not give Nvidia or AMD early access for optimization. That's a reversal of standard industry practice, where Western chipmakers are typically the first to receive model weights.
Huawei's Ascend supernode confirmed full support for DeepSeek V4 out of the box. If V4 can run at scale on Chinese-made chips without US-manufactured GPUs — which have been subject to export restrictions since October 2022 — it signals a meaningful step toward a self-contained Chinese AI stack: Chinese weights, Chinese chips, Chinese inference software.
The timing is also pointed. DeepSeek released V4 just one day after the US government accused China of stealing American AI labs' intellectual property on an "industrial scale." DeepSeek itself has been accused by Anthropic and OpenAI of "distilling" — essentially copying — their models. The race, clearly, is intensifying.
How to Access DeepSeek V4 Today
V4 is available right now through three channels.
1. Web Interface
Visit chat.deepseek.com — Expert Mode maps to V4-Pro, Instant Mode to V4-Flash.
2. API Access
Use model strings deepseek-v4-pro and deepseek-v4-flash via DeepSeek's API.
⚠️ Migration notice: The existing
deepseek-chatanddeepseek-reasonerendpoints will be fully retired after July 24, 2026. Developers should migrate now to avoid disruption.
3. Self-Host via Hugging Face
Open weights are available under the MIT license for anyone who wants to run their own inference stack.
Should You Switch from GPT or Claude?
If your primary use cases are coding, math, and software engineering — and cost efficiency matters to you — DeepSeek V4 deserves serious consideration.
Switch if you:
- Primarily work on coding, math, or software engineering
- Care about cost efficiency at scale
- Want open weights you can self-host and modify
- Run large-scale API workloads where $3.48 vs $25/M tokens matters
Stick with closed models if you need:
- Deep factual knowledge retrieval
- Multimodal inputs (images, audio, video)
- Absolute frontier reasoning on expert-level tasks
- A fully validated, production-stable release (V4 is still preview)
DeepSeek itself estimates it trails state-of-the-art frontier models by "approximately 3 to 6 months." Independent benchmark evaluations have not yet been fully completed. DeepSeek's R1 claims were validated by third-party testing within days — whether V4 holds up to the same scrutiny will be clear very soon.
"DeepSeek V4 is going to be very competitive against its US rivals."
— Lian Jye Su, Chief Analyst, Omdia
The Bottom Line
DeepSeek V4 is a landmark release for open-source AI. Whether you're a developer looking to cut costs, a researcher wanting unrestricted access to frontier-class weights, or just someone watching the US-China AI race unfold in real time — this matters. The gap between open and closed AI is narrowing fast, and DeepSeek is the biggest reason why.
One year after R1 shocked the world, DeepSeek has done it again. And this time, Silicon Valley had time to prepare — and it still may not be enough.
Our tech desk covers the latest developments in artificial intelligence, open-source models, and the global AI race. This report was compiled from DeepSeek's official release notes, Hugging Face model cards, and independent analyst commentary.
