πŸ“Š
Dataset

Vector 100k

by Alfaxad alfaxad/vector-100k
Free2AITools Nexus Index
59.1
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 29
P: Popularity 58
R: Recency 84
Q: Quality 50
Tech Context
Vital Performance
Data Integrity 59.1 FNI Score
- Size
- Rows
- Tokens
Dataset Information Summary
Entity Passport
Registry ID alfaxad/vector-100k
License Other
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset_alfaxad_vector_100k,
  author = {Alfaxad},
  title = {Vector 100k Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/Alfaxad/vector-100k}},
  note = {Accessed via Free2AITools.}
}
APA Style
Alfaxad. (2026). Vector 100k [Dataset]. Free2AITools. https://huggingface.co/datasets/Alfaxad/vector-100k

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 29
Popularity (P) 58
Recency (R) 84
Quality (Q) 50

πŸ’¬ Index Insight

FNI V2.0 for Vector 100k: Authority (A:29), Popularity (P:58), Recency (R:84), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data
⬇️
Downloads
104,496

🎯 Task Categories

visual-question-answering text-generation

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

VectorOS Vector 100k SimSat VLM Dataset

VectorOS Vector 100k is a high-fidelity multimodal instruction dataset for fine-tuning vision-language models on geospatial epidemiology tasks. It was built for the VectorOS hackathon project and targets LiquidAI/LFM2.5-VL-450M.

The dataset contains 100,000 chat-style examples derived from 10,000 geospatial chips across 30 AOIs. Every accepted chip has a real SimSat Sentinel-2 true-color view, a real SimSat Sentinel-2 NIR-red-green false-color view, a real Mapbox satellite view, and an aligned open-layer evidence overlay.

Created for upload: 2026-05-06T17:28:22Z

dataset_montage

What Is Included

Each chip includes:

  • image_packets/<aoi>/<chip>_packet.png: a 1024 x 1024 four-panel visual packet.
  • sidecars/<aoi>/<chip>_sidecar.json: numeric features, source paths, provenance pointers, quality fields, and license flags.
  • targets/<aoi>/<chip>_risk_tile.json: strict VectorOS risk-tile target JSON.
  • raw_simsat/<aoi>.tar: per-AOI tar shard containing the raw per-chip SimSat products:
    • sentinel_rgb.png
    • sentinel_false_color_nir_red_green.png
    • sentinel_bands_red_green_blue_nir.npz
    • sentinel_metadata.json
    • mapbox_satellite.png
    • mapbox_metadata.json

The four-panel image packet order is:

  1. top-left: SimSat Sentinel-2 true-color RGB
  2. top-right: SimSat Sentinel-2 false color NIR-red-green
  3. bottom-left: Mapbox satellite cont

Social Proof

HuggingFace Hub
104.5KDownloads
πŸ”„ Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--alfaxad--vector-100k
slug
alfaxad--vector-100k
source
huggingface
author
Alfaxad
license
Other
tags
task_categories:visual-question-answering, task_categories:text-generation, language:en, license:other, size_categories:100k<n<1m, format:json, modality:image, modality:text, modality:geospatial, library:datasets, library:pandas, library:polars, library:mlcroissant, region:us, geospatial, remote-sensing, public-health, vector-borne-disease, sentinel-2, mapbox, multimodal, lfm2-vl, lfm25-vl, weak-supervision

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
102,400
pipeline tag

πŸ“Š Engagement & Metrics

downloads
104,496
stars
null
forks
null

Data indexed from public sources. Updated daily.