He Asistente Deepseek arrived at APP Store on January 11 and, since then, has climbed to occupy the first position in Apple’s store in the United States, ahead of Chatgpt, from Openai; A milestone for an application that has just reached the market and that competes with the most popular ‘chatbot’ for more than two years.
Deepseek is an generative artificial intelligence assistant (the Chinese firm of the same name launched on January 11 at the App Store, where It is offered under a free use modality. In addition to answering questions in a conversational format, it can navigate to offer updated responses, Summarize Text Documents quickly and use reasoning to solve complex problems.
At its base is the Deepseek V3 model, which was launched in December. This has been trained with 671,000 million parameters with Mixture of Experts (MOE) architecture, which divides an AI model into small neuronal networks to act separately as if they were different experts.
Also has 37,000 million activated parameters for each token, as their person in charge of the Github repository explains. And those responsible have resorted to the multicabezal latent care mechanism (MLA) to “achieve efficient inference and profitable training.”
The technology company has employedor 14.8 billion tokens “diverse and high quality” together with a supervised adjustment and phase reinforcement learning. They also ensure that each Token has required for their training 3.7 days with 2,048 GPU NVIDIA H800, which makes total training The 2,788 million GPU hours y A total cost of 5,576 million dollars.
Although it remains a high sum of money, the cost of training is far from the 78 million dollars of GPT-4 and the 191 million dollars of Google’s Gemini Ultra, as stated in artificial Intelligence Index Report 2024.
As they claim, the model “It exceeds other open source models and achieves a performance comparable to that of the main closed code models”. Thus, in the evaluation of language understanding (MMLU Pro) in a variety of tasks, Deepseek V3 reaches a score of 75.9 compared to 78.0 Claude 3.5 Sonnet, 72.6 of GPT 4 and 73.3 of flame 3.1 405b.
In the evaluation of the ability to answer complex postgraduate level questions (GPAQ Diamond), Deepseek V3 has obtained a score of 59.1, below Claude 3.5 Sonnet (65.0), but above GPT 4 (49.9), Qwen 2.5 of 72b (49.0) and calls 3.1 405b (51.1).
It is also in second position in the analysis of the ability to solve real world software problems (SWE), in which it reaches A 42.0 score, compared to Claude 3.5 Sonnetfollowed by GPT 4 (38.8), call 3.1 405b (24.5) and Qwen 2.5 of 72b (23.8).
On the other hand, it stands out in the challenge resolution tests (Math 500), where it obtains 90.2, while Claude 3.5 Sonnet reaches 78.9; Qwen 2.5 of 72b, 80.0; GPT 4, 74.6, and call 3.1 405B, 73.8 points. He also does so in solving mathematical problems with Aime 2024, with a score of 39.2, followed by Qwen 2.5 of 72b and calls 3.1 405b (both 23.3), Claude 3.5 Sonnet (16.0) and GPT 4 (9.3).
In programming, in the Codeforces test, Depseek V3 reaches 51.6 points, Qwen 2.5 of 72b, 24.8; Call 3.1 405B, 25.3 points; GPT 4O23.6; and Claude 3.5 Sonnet, a score of 20.3.
New reasoning models
Last week, the Chinese company presented a new family of reasoning models, DeepSeek-R1-Zero y DeepSeek-R1the first one trained through large -scale reinforcement learning without supervised fine adjustment as a preliminary step, as explained in the text of the research published in Arxiv.org.
In the second, on the other hand, training in multiple phases and cold starting data before the learning of reinforcement to overcome the problems of legibility and mixture of languages have also been used. Thanks to this, its developers say that it reaches a performance in reasoning tasks comparable to OpenAi O1.
“Our goal is to explore the potential of the LLM [modelos de lenguaje de gran tamaño] to Develop reasoning capabilities without any supervised data, focusing on its self -evolution through a pure reinforcement learning process “they explain.