πŸ“Š
Dataset

Ultra Fineweb L3

by openbmb openbmb/ultra-fineweb-l3
Free2AITools Nexus Index
59.1
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 55
P: Popularity 57
R: Recency 93
Q: Quality 50
Tech Context
Vital Performance
Data Integrity 59.1 FNI Score
- Size
- Rows
- Tokens
Dataset Information Summary
Entity Passport
Registry ID openbmb/ultra-fineweb-l3
License Apache-2.0
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset_openbmb_ultra_fineweb_l3,
  author = {openbmb},
  title = {Ultra Fineweb L3 Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/openbmb/Ultra-FineWeb-L3}},
  note = {Accessed via Free2AITools.}
}
APA Style
openbmb. (2026). Ultra Fineweb L3 [Dataset]. Free2AITools. https://huggingface.co/datasets/openbmb/Ultra-FineWeb-L3

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 55
Popularity (P) 57
Recency (R) 93
Quality (Q) 50

πŸ’¬ Index Insight

FNI V2.0 for Ultra Fineweb L3: Authority (A:55), Popularity (P:57), Recency (R:93), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data
⬇️
Downloads
62,290

🎯 Task Categories

text-generation

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Ultra-FineWeb-L3

πŸ“œ Ultra-FineWeb Technical Report | πŸ“¦ UltraData Collection | 🌐 UltraData | πŸ€— MiniCPM5 Series

English | δΈ­ζ–‡

πŸ“š Introduction

Ultra-FineWeb-L3 is the L3 refined data for general high-quality web data within UltraData's L0-L4 tiered data management framework. Moving beyond L2 quality selection, it transforms high-value web corpora into structured, high-learnability training data with clearer reasoning signals and richer educational styles. Built on top of Ultra-FineWeb, it leverages MiniCPM4 and Qwen3 to perform **Q&A Pair Gen

Social Proof

HuggingFace Hub
62.3KDownloads
πŸ”„ Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--openbmb--ultra-fineweb-l3
slug
openbmb--ultra-fineweb-l3
source
huggingface
author
openbmb
license
Apache-2.0
tags
task_categories:text-generation, language:en, language:zh, license:apache-2.0, size_categories:1b<n<10b, format:parquet, modality:text, library:datasets, library:dask, library:polars, library:mlcroissant, arxiv:2505.05427, arxiv:2602.09003, region:us, llm, pretraining, data-synthesis, data-filtering, high-quality, general-knowledge, qa-generation, multi-style-rewriting, minicpm

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
62,290
stars
null
forks
null

Data indexed from public sources. Updated daily.