Advanced ⏱️ 6 min

🎓 What is GGUF?

What is GGUF?

GGUF (GPT-Generated Unified Format) is a file format for storing quantized large language models optimized for efficient CPU and GPU inference.

GGUF is the successor to GGML, designed by Georgi Gerganov for use with llama.cpp. It has become the standard format for running LLMs locally.

Feature	Benefit
CPU Inference	Run models without expensive GPUs
Apple Silicon	Optimized for M1/M2/M3 chips
Multiple Quants	Choose quality vs. memory tradeoff
Wide Support	Works with Ollama, LM Studio, llama.cpp

Quant	Bits/Weight	Quality	Use Case
Q8_0	8.5	Best	Maximum quality
Q6_K	6.6	Excellent	Great balance
Q5_K_M	5.7	Great	Recommended for most
Q4_K_M	4.8	Good	Memory constrained
Q3_K_M	3.9	Acceptable	Very low memory
Q2_K	2.6	Lossy	Extreme limits

ollama run model-name

./llama-cli -m model.gguf -p "Your prompt here"

On Free2AITools, look for the GGUF badge on model cards. Models with GGUF support have higher Deploy Scores.