What is GGUF?

What is GGUF?

GGUF (GPT-Generated Unified Format) is a file format for storing quantized large language models optimized for efficient CPU and GPU inference.

Overview

GGUF is the successor to GGML, designed by Georgi Gerganov for use with llama.cpp. It has become the standard format for running LLMs locally.

Key Benefits

FeatureBenefit
CPU InferenceRun models without expensive GPUs
Apple SiliconOptimized for M1/M2/M3 chips
Multiple QuantsChoose quality vs. memory tradeoff
Wide SupportWorks with Ollama, LM Studio, llama.cpp

Quantization Levels

QuantBits/WeightQualityUse Case
Q8_08.5BestMaximum quality
Q6_K6.6ExcellentGreat balance
Q5_K_M5.7GreatRecommended for most
Q4_K_M4.8GoodMemory constrained
Q3_K_M3.9AcceptableVery low memory
Q2_K2.6LossyExtreme limits

VRAM/RAM Requirements (7B Model)

QuantMemory Required
Q8_0~8 GB
Q5_K_M~5 GB
Q4_K_M~4 GB
Q2_K~3 GB

How to Use GGUF Models

With Ollama

ollama run model-name

With llama.cpp

./llama-cli -m model.gguf -p "Your prompt here"

With LM Studio

  1. Download the GGUF file
  2. Import into LM Studio
  3. Start chatting

Finding GGUF Models

On Free2AITools, look for the GGUF badge on model cards. Models with GGUF support have higher Deploy Scores.