5 Downloads
1 star
Setup on Macbook Pro M1 16GB Ram --- Core Settings * Temperature: 0.7 * Top K Sampling: 40 * Top P Sampling: 0.95 * Repeat Penalty: 1.1 Min P Sampling: 0.05 (keep current) Key Adjustments * CPU Threads: Reduce to 4-8 threads instead of 6. For a 20B model, too many threads can cause context switching overhead. Start with 4 and increase if you have headroom. * Context Length: The model supports up to 8192 tokens. Consider increasing your context window if you need longer conversations, but monitor memory usage. Memory Management: * Ensure you have at least 16GB RAM available * If experiencing slowdowns, try reducing batch size in advanced settings * Monitor GPU VRAM if using GPU acceleration Model-Specific Recommendations * Context Overflow: "Truncate Middle" is good for maintaining conversation coherence * Stop Strings: Adding common stop tokens like \n\n or ### if the model tends to over-generate * Limit Response Length: enabled