📊

Dataset

Forgespectrum 114k

Name: Forgespectrum 114k
Creator: hardiksharma6555
License: CC-BY-NC-4.0

by hardiksharma6555 hardiksharma6555/forgespectrum-114k

Free2AITools Nexus Index

59.6

S: Semantic 50

Query-time baseline · scored live at search

A: Authority 62

P: Popularity 52

R: Recency 99

Q: Quality 50

Tech Context

Vital Performance —

Source →

Data Integrity 59.6 FNI Score

- Size

- Rows

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	hardiksharma6555/forgespectrum-114k
License	CC-BY-NC-4.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset_hardiksharma6555_forgespectrum_114k,
  author = {hardiksharma6555},
  title = {Forgespectrum 114k Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/hardiksharma6555/forgespectrum-114k}},
  note = {Accessed via Free2AITools.}
}

APA Style

hardiksharma6555. (2026). Forgespectrum 114k [Dataset]. Free2AITools. https://huggingface.co/datasets/hardiksharma6555/forgespectrum-114k

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Free2AITools Nexus Index V2.0

Methodology How FNI works

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 62

Popularity (P) 52

Recency (R) 99

Quality (Q) 50

💬 Index Insight

FNI V2.0 for Forgespectrum 114k: Authority (A:62), Popularity (P:52), Recency (R:99), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

HuggingFace API GitHub Metadata Arxiv Citation DB Methodology

Open data Updated: Live data

⬇️

Downloads

33,477

🎯 Task Categories

image-classification

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

ForgeSpectrum (v3) — AI-Generated Image Detection with Reasoning Traces

ForgeSpectrum is a multi-domain corpus for AI-generated / manipulated image detection, annotated by Gemini-2.5-Pro with structured forensic reasoning traces (<fast>/<planning>/<reasoning>/<reflection>/<conclusion> patterns) plus per-image attributes and suspicious-region notes.

v3 — what changed

v3 is the cleaned, balanced release:

3 domains: faces, scenes, id_cards (docs and scene_text removed — see below).
id_cards rebalanced: real IDs sourced from MIDV-2020 (2,938 genuine passport/ID images: Finnish ID, Latvian passport, Russian internal passport, Slovak ID), raising id_cards reals from 94 to 2,913.
docs dropped: the synthetic tampered-document fakes were lost from source and are not redistributable; only real docs remained, so the domain was removed.
scene_text dropped: corpus contained no real scene-text images (binary task ill-posed).

Splits (`disjoint_v3/`)

Split	Images	Real	Fake	Real %
train	67,306	31,992	35,314	47.5%
val	8,880	4,179	4,701	47.1%
test_clean	8,425	4,184	4,241	49.7%
test_divergent	14,658	3,180	11,478	21.7% (agreement-only)
test_protocol2	5,238	2,913	2,325	55.6% (leave-domain-out)

Split protocol: fakes are generator-disjoint across train/val/test (a generator seen in train never appears in val/test) so evaluation measures cross-generator generalization. Reals are split image-level (each domain has a single real-capture source).

Per-domain totals (supervised splits)

Domain	Real	Fake	Fake generators
faces	9,564	17,606	36
scenes	27,878	24,325	31
id_cards	2,913	2,325	5

Files

`disjoint_v3/{train,val,test_clean,test_d

Social Proof

HuggingFace Hub

33.5KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--hardiksharma6555--forgespectrum-114k
slug: hardiksharma6555--forgespectrum-114k
source: huggingface
author: hardiksharma6555
license: CC-BY-NC-4.0
tags: task_categories:image-classification, task_categories:visual-question-answering, language:en, license:cc-by-nc-4.0, size_categories:10k<n<100k, format:imagefolder, modality:image, library:datasets, library:mlcroissant, region:us, deepfake-detection, ai-generated-image-detection, forensics, vlm, reasoning-traces

⚙️ Technical Specs

architecture: null
params billions: null
context length: 116,736
pipeline tag

📊 Engagement & Metrics

downloads: 33,477
stars: null
forks: null

Data indexed from public sources. Updated daily.