moondream2
⚠️ This repository contains the latest version of Moondream 2, our previous generation model. The latest version of Moondream is Moondream 3 (Preview). --- Moondream is a small vision language model designed to run efficiently everywhere. Website / Demo / GitHub This repository contains the latest (...
| Entity Passport | |
| Registry ID | hf-model--vikhyatk--moondream2 |
| Provider | huggingface |
Compute Threshold
~2.7GB VRAM
* Static estimation for 4-Bit Quantization.
Cite this model
Academic & Research Attribution
@misc{hf_model__vikhyatk__moondream2,
author = {vikhyatk},
title = {moondream2 Model},
year = {2026},
howpublished = {\url{https://huggingface.co/vikhyatk/moondream2}},
note = {Accessed via Free2AITools Knowledge Fortress}
} 🔬Technical Deep Dive
Full Specifications [+]▾
⚡ Quick Commands
ollama run moondream2 huggingface-cli download vikhyatk/moondream2 pip install -U transformers 💬 Why this score?
The Nexus Index for moondream2 aggregates Popularity (P:0), Velocity (V:0), and Credibility (C:0). The Utility score (U:0) represents deployment readiness, context efficiency, and structural reliability within the Nexus ecosystem.
🔗 Source Links (Click to verify)
🚀 What's Next?
Technical Deep Dive
license: apache-2.0
pipeline_tag: image-text-to-text
new_version: moondream/moondream3-preview
⚠️ This repository contains the latest version of Moondream 2, our previous generation model. The latest version of Moondream is Moondream 3 (Preview).
Moondream is a small vision language model designed to run efficiently everywhere.
This repository contains the latest (2025-06-21) release of Moondream 2, as well as historical releases. The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-06-21",
trust_remote_code=True,
device_map={"": "cuda"} # ...or 'mps', on Apple Silicon
)
Captioning
print("Short caption:")
print(model.caption(image, length="short")["caption"])
print("\nNormal caption:")
for t in model.caption(image, length="normal", stream=True)["caption"]:
# Streaming generation example, supported for caption() and detect()
print(t, end="", flush=True)
print(model.caption(image, length="normal"))
Visual Querying
print("\nVisual query: 'How many people are in the image?'")
print(model.query(image, "How many people are in the image?")["answer"])
Object Detection
print("\nObject detection: 'face'")
objects = model.detect(image, "face")["objects"]
print(f"Found {len(objects)} face(s)")
Pointing
print("\nPointing: 'person'")
points = model.point(image, "person")["points"]
print(f"Found {len(points)} person(s)")
Changelog
2025-06-21 (full release notes)
- Grounded Reasoning
Introduces a new step-by-step reasoning mode that explicitly grounds reasoning in spatial positions within the image before answering, leading to more precise visual interpretation (e.g., chart median calculations, accurate counting). Enable withreasoning=Truein thequeryskill to trade off speed vs. accuracy. - Sharper Object Detection
Uses reinforcement learning on higher-quality bounding-box annotations to reduce object clumping and improve fine-grained detections (e.g., distinguishing “blue bottle” vs. “bottle”). - Faster Text Generation
Yields 20–40 % faster response generation via a new “superword” tokenizer and lightweight tokenizer transfer hypernetwork, which reduces the number of tokens emitted without loss in accuracy and eases future multilingual extensions. - Improved UI Understanding
Boosts ScreenSpot (UI element localization) performance from an F1@0.5 of 60.3 to 80.4, making Moondream more effective for UI-focused applications. - Reinforcement Learning Enhancements
RL fine-tuning applied across 55 vision-language tasks to reinforce grounded reasoning and detection capabilities, with a roadmap to expand to ~120 tasks in the next update.
2025-04-15 (full release notes)
- Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
- Added temperature and nucleus sampling to reduce repetitive outputs
- Better OCR for documents and tables (prompt with “Transcribe the text” or “Transcribe the text in natural reading order”)
- Object detection supports document layout detection (figure, formula, text, etc)
- UI understanding (ScreenSpot F1@0.5 up from 53.3 to 60.3)
- Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)
2025-03-27 (full release notes)
- Added support for long-form captioning
- Open vocabulary image tagging
- Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
- Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
- Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)
- Fixed token streaming bug affecting multi-byte unicode characters
- gpt-fast style
compile()now supported in HF Transformers implementation
📝 Limitations & Considerations
- • Benchmark scores may vary based on evaluation methodology and hardware configuration.
- • VRAM requirements are estimates; actual usage depends on quantization and batch size.
- • FNI scores are relative rankings and may change as new models are added.
- • Source: Unknown
Social Proof
AI Summary: Based on Hugging Face metadata. Not a recommendation.
🛡️ Model Transparency Report
Verified data manifest for traceability and transparency.
🆔 Identity & Source
- id
- hf-model--vikhyatk--moondream2
- source
- huggingface
- author
- vikhyatk
- tags
- transformerssafetensorsmoondream1text-generationimage-text-to-textcustom_codedoi:10.57967/hf/6762license:apache-2.0endpoints_compatibleregion:usnullB
⚙️ Technical Specs
- architecture
- HfMoondream
- params billions
- 1.93
- context length
- 4,096
- pipeline tag
- image-text-to-text
- vram gb
- 2.7
- vram is estimated
- true
- vram formula
- VRAM ≈ (params * 0.75) + 0.8GB (KV) + 0.5GB (OS)
📊 Engagement & Metrics
- likes
- 1,348
- downloads
- 1,747,612
Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)