DeepSeek R1: open source reasoning model
•
2025-01-29
Last week, Chinese AI company DeepSeek released its highly anticipated open-source reasoning models, dubbed DeepSeek R1
. DeepSeek R1 models, both distilled* and full size, are available for running locally in LM Studio on Mac, Windows, and Linux.
DeepSeek R1 distilled into Qwen 7B (MLX, 4-bit) solving an algebra question, 100% offline on an M1 Mac.
If you've gone online in the last week or so, there's little chance you missed the news about DeepSeek.
DeepSeek R1 models represent significant and exciting milestone for openly available models: you can now run "reasoning" models, similar in style to OpenAI's o1 models, on your local system. All you need is enough RAM.
The release from DeepSeek included:
Yes, if you have enough RAM.
Here's how to do it:
Reasoning models were trained to "think" before providing a final answer. This is done using a technique called "Chain-of-thought" (CoT). CoT is a technique that encourages models to break down complex problems into smaller, more manageable steps. This allows the model to arrive at a final answer through a series of intermediate steps, rather than attempting to solve the problem in one go. DeepSeek's CoT is contained in <think>...</think>
tokens.
When asked a non-trivial question, DeepSeek models will start their response with a <think>
token. The content afterwards will often look like a long stream of consciousness, where the model is working through the problem step by step. After the closing </think>
token, the model start generating regular content, which is the final answer to the question. The content after the </think>
token is directly influenced by the content in the <think>
section.
Below is output from DeepSeek-R1-Distill-Qwen-7B that demonstrates its ability to "think" to holistically answer the question "Are tomatoes fruits?" The thinking section is wrapped in <think>...</think>
tags.
User's question:
Are tomatoes fruits?
Model's response:
According to several popular reasoning benchmarks like AIME 2024, MATH-500, and CodeForces, the open-source flagship 671B parameter DeepSeek-R1 model performs comparably to OpenAI's full-sized o1 reasoning model. The smaller DeepSeek-R1 "distilled" models perform comparably to OpenAI's o1-mini reasoning models.
Source: @deepseek_ai on X
"Distilling" DeepSeek-R1 means: Taking smaller "dense models" like Llama3 and Qwen2.5, and fine-tuning them using artifacts generated by a larger model, with the intention to instill in them capabilities that resemble the larger model.
DeepSeek did this by curating around 800k (600k reasoning, 200k non-reasoning) high-quality generations from DeepSeek-R1, and training Llama3 and Qwen2.5 models on them (Source: DeepSeek's R1 publication).
This is an efficient technique to "teach" smaller, pre-existing models how to reason like DeepSeek-R1.
DeepSeek-R1 was largely trained using unsupervised reinforcement learning. This is an important achievement, because it means humans did not have to curate as much labeled supervised fine-tuning (SFT) data.
DeepSeek-R1-Zero, DeepSeek-R1's predecessor, was fine-tuned using only reinforcement learning. However, it had issues with readability and language mixing.
DeepSeek eventually arrived on a multi-stage training pipeline for R1 that mixed SFT and RL techniques to maintain RL novelty and cost benefits while addressing the shortcomings of DeepSeek-R1-Zero.
More detailed info on training can be found in DeepSeek's R1 publication.
You can leverage LM Studio's APIs to call DeepSeek R1 models from your own code.
Here are some relevant documentation links:
lms
: LM Studio's CLI