πŸ“Š
Dataset

Terminal Bench 2 Leaderboard

by plaume8 plaume8/terminal-bench-2-leaderboard
Free2AITools Nexus Index
59.4
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 62
P: Popularity 55
R: Recency 81
Q: Quality 50
Tech Context
Vital Performance
Data Integrity 59.4 FNI Score
- Size
- Rows
- Tokens
Dataset Information Summary
Entity Passport
Registry ID plaume8/terminal-bench-2-leaderboard
License Apache-2.0
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset_plaume8_terminal_bench_2_leaderboard,
  author = {plaume8},
  title = {Terminal Bench 2 Leaderboard Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/plaume8/terminal-bench-2-leaderboard}},
  note = {Accessed via Free2AITools.}
}
APA Style
plaume8. (2026). Terminal Bench 2 Leaderboard [Dataset]. Free2AITools. https://huggingface.co/datasets/plaume8/terminal-bench-2-leaderboard

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 62
Popularity (P) 55
Recency (R) 81
Quality (Q) 50

πŸ’¬ Index Insight

FNI V2.0 for Terminal Bench 2 Leaderboard: Authority (A:62), Popularity (P:55), Recency (R:81), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data
⬇️
Downloads
62,096

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Terminal-Bench 2.0 Leaderboard Submissions

This repository accepts leaderboard submissions for Terminal-Bench 2.0.

How to Submit

  1. Fork this repository
  2. Create a new branch for your submission
  3. Add your submission (a job or folder of jobs) under submissions/terminal-bench/2.0/<agent>__<model(s)>/
  4. Open a Pull Request

Submission Structure

text
submissions/
  terminal-bench/
    2.0/
      __/
        metadata.yaml       # Required: agent and model info
        /       # One or more job directories
          config.json
          /result.json
          /result.json
          ...

Required: metadata.yaml

Each submission must include a metadata.yaml file with the following fields:

yaml
agent_url: https://...         # Required: link to agent repo/docs
agent_display_name: "My Agent" # Required: display name for leaderboard
agent_org_display_name: "Org"  # Required: organization name

models:                              # Required: list of models used
  - model_name: gpt-5                # Required: model identifier
    model_provider: openai           # Required: provider (openai, anthropic, etc.)
    model_display_name: "GPT-5"      # Required
    model_org_display_name: "OpenAI" # Required
  # - Other models if your agent used multiple

Job Directory Requirements

Each job directory must contain all of the contents of your run.

Validation Rules

Your submission will be automatically validated. To pass:

  • timeout_multiplier must equal 1.0
  • No agent timeout overrides (override_timeout_sec, max_timeout_sec)
  • No verifier timeout overrides
  • No resource overrides (override_cpus, override_memory_mb, override_storage_mb)
  • All trial directories must have valid result.json files
  • Trial directori

Social Proof

HuggingFace Hub
62.1KDownloads
πŸ”„ Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--plaume8--terminal-bench-2-leaderboard
slug
plaume8--terminal-bench-2-leaderboard
source
huggingface
author
plaume8
license
Apache-2.0
tags
license:apache-2.0, region:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
62,096
stars
null
forks
null

Data indexed from public sources. Updated daily.