# Free2AItools

> Agent-native structured index of 561,000+ AI models, datasets, papers, tools, and benchmarks.
> Cross-source aggregator (HuggingFace, GitHub, ArXiv, Replicate, Civitai, etc.)
> ranked by FNI (Free2AITools Nexus Index) — designed for AI agents, not human browsing.

This file follows the llms.txt convention (llmstxt.org). It is the canonical
discovery surface for autonomous agents, MCP clients, and LLM-based tooling
that need to consume Free2AItools data programmatically.

## What you can do here

- Search and rank 561,000+ AI entities by FNI score across HuggingFace, GitHub, ArXiv, Ollama, Replicate, Civitai, and other sources
- Compare 2-25 models side-by-side with technical specs and license breakdown
- Fetch full structured metadata for any entity by ID
- Get FNI badges (SVG) for any entity for README embedding
- Discover via MCP (Model Context Protocol) — drop-in for Claude Desktop, Cursor, and similar

## Primary API surface

All endpoints return JSON unless noted. CORS open. Free tier hard-cap: 5 results
per search; auth/paid tiers (TBD) raise the cap.

### Discovery

- `GET /.well-known/mcp.json` — MCP server manifest (transport, protocol version,
  tool catalog with input schemas)
- `GET /llms.txt` — this file
- `GET /sitemap.xml` — full URL index

### MCP server (JSON-RPC 2.0)

- `POST /api/mcp` — JSON-RPC dispatch
  - `method: initialize` → server info + capabilities
  - `method: tools/list` → 5 tools: free2aitools_search / _rank / _explain /
    _select_model / _compare
  - `method: tools/call` → invoke a tool with arguments

### Search and lookup

- `GET /api/v1/search?q=<query>[&type=<type>][&limit=N]` — FNI-ranked search
  - `type` accepts canonical values (`model`, `paper`, `tool`, `dataset`,
    `benchmark`) AND common id-prefix aliases (`hf-model`, `arxiv-paper`,
    `gh-model`, `replicate-model`, etc.) — both resolve to the canonical type
  - `limit` clamped to 1-5 for free tier
  - Response includes FNI score breakdown (semantic / authority / popularity /
    recency / quality) per result

- `GET /api/v1/entity/<id>[?include=body]` — full structured metadata for one
  entity (search → detail Agent journey)
  - Lean default response: ~30 Agent-relevant fields grouped into identity /
    classification / fni / specs / stats / links / relations
  - `?include=body` adds readme_html (~250KB)
  - 404 with `{"error":"Entity not found: <id>"}` when no match
  - Interim: entity lookup may return transient 503 under cold-path probe
    budget; 404/503 contract under runtime diagnosis

- `GET /api/v1/compare?ids=<id1>,<id2>[,...]` — side-by-side comparison of 2-25
  entities. Returns FNI factors + technical specs + license per entity.
  - Interim: cold upper-range multi-paper requests may return a transient 503
    (retry after the indicated delay) under the cold-shard budget / fan-out cap

- `GET /api/v1/health` — VFS layer observability snapshot: cache-hit counters
  (L0/L1/L2), short-read retry stats, isolate uptime. No auth, not cached.

- `GET /api/v1/badge/<slug-or-id>` — SVG badge with FNI score, color-coded by
  FNI signal (green ≥ 90 high signal, blue ≥ 70 medium signal, yellow ≥ 50 low
  signal, red below). Suitable for README embedding.

## FNI score interpretation

FNI v2.0 = `min(99.9, 0.35·S + 0.25·A + 0.15·P + 0.15·R + 0.10·Q)`

- `S` Semantic relevance (currently dormant — keyword-index based; live
  semantic/ANN ranking is not currently provided. On static detail/select/
  compare surfaces this factor is reported as null + a note, not a value)
- `A` Authority (mesh gravity / cross-source corroboration)
- `P` Popularity (log-compressed downloads / stars / citations)
- `R` Recency (exponential decay on last-modified)
- `Q` Quality (completeness + utility signals)

Scoring is deterministic per snapshot.

## Update cadence

Updated daily through an automated data pipeline. The full chain (harvest →
enrichment → aggregation → R2 upload → CDN purge) runs daily, and a registry
snapshot is archived each cycle for FNI trend analysis over time.

## Entity types

Canonical types (use in `?type=` filter):

- `model` — AI models (HF / GH / Replicate / Civitai / Kaggle / Ollama sources)
- `paper` — research papers (ArXiv, Hugging Face papers, Semantic Scholar)
- `tool` — open-source developer tools, agent frameworks, and MCP servers (GitHub + MCP registry)
- `dataset` — public datasets
- `benchmark` — evaluation benchmarks as measured by Open LLM Leaderboard v2 (IFEval, BBH, MATH Lvl 5, GPQA, MUSR, MMLU-PRO); frozen leaderboard snapshot

## ID format

Entity IDs use the `<source>-<type>--<author>--<name>` convention:

- `hf-model--meta-llama--Llama-3-8B-Instruct`
- `arxiv-paper--unknown--<arxiv-id>`
- `gh-tool--<org>--<repo>`
- `replicate-model--<owner>--<model>`

IDs are stable across snapshots; `slug` (lowercase normalized form) is the
canonical sharding key.

## Project positioning

Free2AItools is a **structured discovery, evidence, and identity layer for AI
agents**. Raw data lives at HuggingFace, GitHub, ArXiv, etc.
This project adds:

- Unified schema across sources
- FNI ranking (deterministic, time-stable, explainable)
- Mesh-graph relations between entities (base_model, datasets_used, citations)
- Daily snapshot archive (FNI trends over time)
- Agent-first API (this file, MCP, structured JSON)

Posture: not the data source, but the canonical structured query and reference
layer that downstream consumers depend on.

## What this is NOT

Discovery layer only. The calling agent reasons over the data it returns:

- Does NOT perform compatibility analysis. Hardware/framework fields
  (ollama_compatible, can_run_local, VRAM estimates) are stored heuristics from
  metadata, not verified runtime checks on the caller's hardware or framework.
- Does NOT execute, plan, or recommend workflows. No side effects; no actions
  are taken on the caller's behalf.
- Does NOT select or decide for the caller. Endpoints filter and FNI-rank the
  catalog; final model selection is the caller's responsibility.
- Does NOT currently provide live semantic/ANN ranking. The Semantic (S) factor
  is dormant (keyword-index based); on static surfaces it is null + a note.

## Contact

- Site: https://free2aitools.com
- Strategic positioning + roadmap: see Free2AItools V∞ Roadmap (internal)
- Free tier supports anonymous public API access; auth + raised limits TBD.

## Notes for agent implementers

- Prefer MCP (`POST /api/mcp`) for first-class Agent integration — JSON-RPC 2.0,
  tool catalog discoverable via `tools/list`, structured tool responses
- For HTTP-only flows: `search` → `entity` → optional `compare`
- All scoring is FNI v2.0; the `fni_version` field in responses identifies the
  exact algorithm version (forward-compatible across snapshots)
- Field semantics: `0` means measured-zero, `null` means not-measured. Treat
  them differently when scoring downstream
- Booleans (ollama_compatible, can_run_local, is_trending) are tri-state:
  `true` / `false` / `null` (unknown)

This file is updated when the API contract changes. Last revision tracks the
state of MCP tool catalog + endpoint surface as of the current deploy.