πŸ“„
Paper

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

by Independent / Community 2312.11514
Free2AITools Nexus Index
59.1
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 89
P: Popularity 66
R: Recency 100
Q: Quality 65
Tech Context
Vital Performance

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to DRAM. Our method involves constructing an infer...

Semantic Scholar 196 Citations
Paper Information Summary
Entity Passport
Registry ID 2312.11514
License ArXiv
Provider semantic_scholar
πŸ“œ

Cite this paper

Academic & Research Attribution

BibTeX
@misc{arxiv_2312_11514,
  author = {Unknown},
  title = {LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper},
  year = {2026},
  howpublished = {\url{https://arxiv.org/abs/2312.11514}},
  note = {Accessed via Free2AITools.}
}
APA Style
Unknown. (2026). LLM in a flash: Efficient Large Language Model Inference with Limited Memory [Paper]. Free2AITools. https://arxiv.org/abs/2312.11514

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 89
Popularity (P) 66
Recency (R) 100
Quality (Q) 65

πŸ’¬ Index Insight

FNI V2.0 for LLM in a flash: Efficient Large Language Model Inference with Limited Memory: Authority (A:89), Popularity (P:66), Recency (R:100), Quality (Q:65). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data

πŸ“ Executive Summary

"Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters in flash memory, but bringing them on demand to DRAM. Our method involves constructing an infer..."

❝ Cite Node

@article{Alizadeh-Vahid2026LLM,
  title={LLM in a flash: Efficient Large Language Model Inference with Limited Memory},
  author={Keivan Alizadeh-Vahid and Iman Mirzadeh and Dmitry Belenko and Karen Khatamifard and Minsik Cho and C. C. D. Mundo and Mohammad Rastegari and Mehrdad Farajtabar},
  journal={arXiv preprint arXiv:2312.11514},
  year={2026}
}

πŸ‘₯ Collaborating Minds

Keivan Alizadeh-Vahid Iman Mirzadeh Dmitry Belenko Karen Khatamifard Minsik Cho C. C. D. Mundo Mohammad Rastegari Mehrdad Farajtabar

πŸ”— Full Paper

Free2AITools indexes the abstract and factual metadata for this paper. Read the complete, authoritative paper on the official source.

Read the full paper on arXiv

πŸ“Š Research Signals

πŸ“ˆ196CitationsSemantic Scholar
πŸ›οΈ89AuthorityFNI pillar
⏱️100RecencyFNI pillar
βœ…65QualityFNI pillar
πŸ—‚οΈinfrastructure opsField
πŸ“¦Data Source: semantic_scholar
πŸ”„ Updated daily

Source summary: Based on semantic_scholar metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Paper Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
2312.11514
slug
2312.11514
source
semantic_scholar
author
Unknown
license
ArXiv
tags
paper, research, academic

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
0
stars
0
forks
0
citations
196

Data indexed from public sources. Updated daily.