Mistral 7B is a 7.3B parameter model that:
Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code, while remaining good at English tasks
Uses Grouped-query attention (GQA) for faster inference
Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost
The benchmarks are categorized by their themes:
Commonsense Reasoning: 0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
World Knowledge: 5-shot average of NaturalQuestions and TriviaQA.
Reading Comprehension: 0-shot average of BoolQ and QuAC.
Math: Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
Code: Average of 0-shot Humaneval and 3-shot MBPP
Popular aggregated results: 5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)
License
Mistral 7B is provided under the Apache-2.0 license