← All Models

phi-4

15.1K Downloads

phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets.

Models
Updated 2 days ago
8.30 GB
2.10 GB

Memory Requirements

To run the smallest phi-4, you need at least 2 GB of RAM. The largest one may require up to 8 GB.

Capabilities

phi-4 models are available in gguf formats.

About phi-4

undefined

phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Technical report: https://arxiv.org/pdf/2412.08905

Primary usecase

Phi-4 is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:

  • Memory/compute constrained environments.
  • Latency bound scenarios.
  • Reasoning and logic.

Performance

To understand the capabilities, Microsoft compares phi-4 with a set of models over OpenAI’s SimpleEval benchmark.

At the high-level overview of the model quality on representative benchmarks. For the table below, higher numbers indicate better performance:

CategoryBenchmarkphi-4 (14B)phi-3 (14B)Qwen 2.5 (14B instruct)GPT-4o-miniLlama-3.3 (70B instruct)Qwen 2.5 (72B instruct)GPT-4o
Popular Aggregated BenchmarkMMLU84.877.979.981.886.385.388.1
ScienceGPQA56.131.242.940.949.149.050.6
MathMGSM
MATH
80.6
80.4
53.5
44.6
79.6
75.6
86.5
73.0
89.1
66.3*
87.3
80.0
90.4
74.6
Code GenerationHumanEval82.667.872.186.278.9*80.490.6
Factual KnowledgeSimpleQA3.07.65.49.920.910.239.4
ReasoningDROP75.568.385.579.390.276.780.9

* These scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following. Microsoft uses the simple-evals framework because it is reproducible, but Meta reports 77 for MATH and 88 for HumanEval on Llama-3.3-70B.

License

Phi-4 is provided under the MIT license.