Setup on Macbook Pro M1 16GB Ram
---
Core Settings
* Temperature: 0.7
* Top K Sampling: 40
* Top P Sampling: 0.95
* Repeat Penalty: 1.1
Min P Sampling: 0.05 (keep current)
Key Adjustments
* CPU Threads: Reduce to 4-8 threads instead of 6. For a 20B model, too many threads can cause context switching overhead. Start with 4 and increase if you have headroom.
* Context Length: The model supports up to 8192 tokens. Consider increasing your context window if you need longer conversations, but monitor memory usage.
Memory Management:
* Ensure you have at least 16GB RAM available
* If experiencing slowdowns, try reducing batch size in advanced settings
* Monitor GPU VRAM if using GPU acceleration
Model-Specific Recommendations
* Context Overflow: "Truncate Middle" is good for maintaining conversation coherence
* Stop Strings: Adding common stop tokens like \n\n or ### if the model tends to over-generate
* Limit Response Length: enabled