📄

Paper

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

by Fei Zuo arxiv/2604.20913

Free2AITools Nexus Index

38.3

S: Semantic 50

Query-time baseline · scored live at search

A: Authority 0

P: Popularity 0

R: Recency 73

Q: Quality 60

Tech Context

Vital Performance —

Large language models are increasingly deployed on CPU-only platforms where memory bandwidth is the primary bottleneck for autoregressive generation. Weight quantization to four bits or below reduces memory pressure, yet existing systems still dequantize weights and perform floating-point multiplications, limiting the achievable gains. Ternary weights in {-1, 0, +1} provide a more efficient alternative, replacing multiplications with conditional additions, subtractions, or no-ops. While Fairy...

Source →

- Citations

Paper Information Summary
Entity Passport
Registry ID	2604.20913
License	arXiv
Provider	arxiv

📜

Cite this paper

Academic & Research Attribution

BibTeX

@misc{arxiv_2604_20913,
  author = {Fei Zuo},
  title = {FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels Paper},
  year = {2026},
  howpublished = {\url{https://arxiv.org/abs/2604.20913}},
  note = {Accessed via Free2AITools.}
}

APA Style

Fei Zuo. (2026). FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels [Paper]. Free2AITools. https://arxiv.org/abs/2604.20913

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Free2AITools Nexus Index V2.0

Methodology How FNI works

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 0

Popularity (P) 0

Recency (R) 73

Quality (Q) 60

💬 Index Insight

FNI V2.0 for FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels: Authority (A:0), Popularity (P:0), Recency (R:73), Quality (Q:60). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

HuggingFace API GitHub Metadata Arxiv Citation DB Methodology

Open data Updated: Live data

📝 Executive Summary

"Large language models are increasingly deployed on CPU-only platforms where memory bandwidth is the primary bottleneck for autoregressive generation. Weight quantization to four bits or below reduces memory pressure, yet existing systems still dequantize weights and perform floating-point multiplications, limiting the achievable gains. Ternary weights in {-1, 0, +1} provide a more efficient alternative, replacing multiplications with conditional additions, subtractions, or no-ops. While Fairy..."

❝ Cite Node

@article{Zuo2026FairyFuse:,
  title={FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels},
  author={Fei Zuo},
  journal={arXiv preprint arXiv:2604.20913},
  year={2026}
}

👥 Collaborating Minds

Fei Zuo

🔗 Full Paper

Free2AITools indexes the abstract and factual metadata for this paper. Read the complete, authoritative paper on the official source.

Read the full paper on arXiv

📊 Research Signals

📅1970Published

⏱️73RecencyFNI pillar

✅60QualityFNI pillar

🗂️cs.LGField

📄 Data Source: arXiv ↗

🔄 Updated daily

Source summary: Based on arXiv metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Paper Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: 2604.20913
slug: 2604.20913
source: arxiv
author: Fei Zuo
license: arXiv
tags: arxiv:cs.LG, llm

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 0
stars: null
forks: null

Data indexed from public sources. Updated daily.