📊

Dataset

Terminal Bench 2 Leaderboard

Name: Terminal Bench 2 Leaderboard
Creator: plaume8
License: Apache-2.0

by plaume8 plaume8/terminal-bench-2-leaderboard

Free2AITools Nexus Index

59.4

S: Semantic 50

Query-time baseline · scored live at search

A: Authority 62

P: Popularity 55

R: Recency 81

Q: Quality 50

Tech Context

Vital Performance —

Source →

Data Integrity 59.4 FNI Score

- Size

- Rows

- Tokens

Dataset Information Summary
Entity Passport
Registry ID	plaume8/terminal-bench-2-leaderboard
License	Apache-2.0
Provider	huggingface

📜

Cite this dataset

Academic & Research Attribution

BibTeX

@misc{hf_dataset_plaume8_terminal_bench_2_leaderboard,
  author = {plaume8},
  title = {Terminal Bench 2 Leaderboard Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/plaume8/terminal-bench-2-leaderboard}},
  note = {Accessed via Free2AITools.}
}

APA Style

plaume8. (2026). Terminal Bench 2 Leaderboard [Dataset]. Free2AITools. https://huggingface.co/datasets/plaume8/terminal-bench-2-leaderboard

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Free2AITools Nexus Index V2.0

Methodology How FNI works

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 62

Popularity (P) 55

Recency (R) 81

Quality (Q) 50

💬 Index Insight

FNI V2.0 for Terminal Bench 2 Leaderboard: Authority (A:62), Popularity (P:55), Recency (R:81), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

HuggingFace API GitHub Metadata Arxiv Citation DB Methodology

Open data Updated: Live data

⬇️

Downloads

62,096

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

Terminal-Bench 2.0 Leaderboard Submissions

This repository accepts leaderboard submissions for Terminal-Bench 2.0.

How to Submit

Fork this repository
Create a new branch for your submission
Add your submission (a job or folder of jobs) under submissions/terminal-bench/2.0/<agent>__<model(s)>/
Open a Pull Request

Submission Structure

text

submissions/
  terminal-bench/
    2.0/
      __/
        metadata.yaml       # Required: agent and model info
        /       # One or more job directories
          config.json
          /result.json
          /result.json
          ...

Required: metadata.yaml

Each submission must include a metadata.yaml file with the following fields:

yaml

agent_url: https://...         # Required: link to agent repo/docs
agent_display_name: "My Agent" # Required: display name for leaderboard
agent_org_display_name: "Org"  # Required: organization name

models:                              # Required: list of models used
  - model_name: gpt-5                # Required: model identifier
    model_provider: openai           # Required: provider (openai, anthropic, etc.)
    model_display_name: "GPT-5"      # Required
    model_org_display_name: "OpenAI" # Required
  # - Other models if your agent used multiple

Job Directory Requirements

Each job directory must contain all of the contents of your run.

Validation Rules

Your submission will be automatically validated. To pass:

timeout_multiplier must equal 1.0
No agent timeout overrides (override_timeout_sec, max_timeout_sec)
No verifier timeout overrides
No resource overrides (override_cpus, override_memory_mb, override_storage_mb)
All trial directories must have valid result.json files
Trial directori

Social Proof

HuggingFace Hub

62.1KDownloads

Hub Discussions

🤗 Data Source: Hugging Face ↗

🔄 Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id: hf-dataset--plaume8--terminal-bench-2-leaderboard
slug: plaume8--terminal-bench-2-leaderboard
source: huggingface
author: plaume8
license: Apache-2.0
tags: license:apache-2.0, region:us

⚙️ Technical Specs

architecture: null
params billions: null
context length: null
pipeline tag

📊 Engagement & Metrics

downloads: 62,096
stars: null
forks: null

Data indexed from public sources. Updated daily.