📊
Dataset

TMPFILE

by Tuyuanpeng tuyuanpeng/tmpfile
Free2AITools Nexus Index
59.7
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 61
P: Popularity 50
R: Recency 88
Q: Quality 50
Tech Context
Vital Performance
Data Integrity 59.7 FNI Score
- Size
- Rows
- Tokens
Dataset Information Summary
Entity Passport
Registry ID tuyuanpeng/tmpfile
Provider huggingface
📜

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset_tuyuanpeng_tmpfile,
  author = {Tuyuanpeng},
  title = {TMPFILE Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/Tuyuanpeng/TMPFILE}},
  note = {Accessed via Free2AITools.}
}
APA Style
Tuyuanpeng. (2026). TMPFILE [Dataset]. Free2AITools. https://huggingface.co/datasets/Tuyuanpeng/TMPFILE

🔬Technical Deep Dive

Full Specifications [+]

⚖️ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 61
Popularity (P) 50
Recency (R) 88
Quality (Q) 50

💬 Index Insight

FNI V2.0 for TMPFILE: Authority (A:61), Popularity (P:50), Recency (R:88), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data
⬇️
Downloads
27,507

👁️ Data Preview

📊

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

🔗 Explore Full Dataset ↗

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

mini-swe-agent prompt search notes

这套仓库原本就分成两层能力,但之前入口不清楚,很容易让人误以为“agent 会在单次运行里自己联网、自己改 prompt”。

  1. 运行时联网 src/minisweagent/config/benchmarks/swebench.yaml 会把 mswea-web-searchmswea-web-fetch 装进容器,并且 prompt 里会提示模型可以用它们查公开文档。

  2. Prompt 迭代 scripts/search_system_prompt.py 会跑一个离线的 prompt policy search。它会:

  • 生成 prompt override
  • 跑一轮 SWE-bench canary
  • 分析失败轨迹
  • 把失败模式反馈到下一轮 prompt 搜索

它不是 agent 在同一次任务里“边做边改 system prompt”,而是外部搜索脚本驱动的多轮评测闭环。

run_swebench_full.sh 现在支持先做 prompt search,再自动拿最佳 override 继续正式 generation/evaluation。 另外也支持一个更轻量的 8 题验证模式,专门用来做 prompt / model 快速迭代,尽量复用已有产物并减少磁盘占用:

bash
LEAN_VALIDATION=1 \
MODEL=openai/gpt-5.2-2025-12-11 \
EXTRA_CONFIG_FILE=prompt_opt_runs/search_20260313_144354/best_prompt_override.yaml \
bash run_swebench_full.sh

这个模式会默认:

  • 把 generation slice 收缩到前 8 题(可用 VALIDATION_CASESVALIDATION_SLICE_SPEC 覆盖)
  • 输出到更小的目录(默认 runs/validation_8
  • evaluation 直接只评这 8 题,不再额外做二次截断
  • 关闭激进清理,并默认单 worker,避免无意义重复构建/清理
  • 默认不重跑已有预测;如果想强制重做,显式加 REDO_EXISTING=1

推荐迭代命令:

bash
LEAN_VALIDATION=1 \
MODEL=openai/gpt-5.2-2025-12-11 \
EXTRA_CONFIG_FILE=prompt_opt_runs/search_20260313_144354/best_prompt_override.yaml \
DO_GENERATE=1 DO_EVALUATE=1 \
bash run_swebench_full.sh

只复用现有预测重评:

bash
LEAN_VALIDATION=1 DO_GENERATE=0 DO_EVALUATE=1 bash run_swebench_full.sh

500 题稳定跑

对大批量评估,run_swebench_full.sh 现在默认会自动加存储保护:

  • generation 默认 GEN_WORKERS=2
  • 评估规模达到 60 题以上时,自动切到 chunked cleanup 模式
  • 评估规模达到 300 题以上时,进一步自动收紧到更稳的模式: EVAL_CHUNK_SIZE=2EVAL_MAX_WORKERS=1DISK_GB_THRESHOLD=15

推荐直接用:

bash
MODEL=openai/gpt-5.4-2026-03-05 \
DO_GENERATE=1 \
DO_EVALUATE=1 \
bash run_swebench_full.sh

如果你优先追求“尽量多解出 case”,而不是更省 token / 时间,主入口现在可以直接切到 clean profile:

bash
MODEL=openai/gpt-5.4-2026-03-05 \
SPEED_PROFILE=clean \
HIGH_ACCURACY_PRESET=1 \
DO_GENERATE=1 \
DO_EVALUATE=1 \
bash run_swebench_full.sh

这档会保留 full multi-agent prompt 栈,但额外打开更偏高召回的 clean overlay,并关闭大批量评估时的自动存储保护切换,适合你就是想要“尽量多解题”的场景。 同时,`HIGH_ACCURACY_PRESET=1

Social Proof

HuggingFace Hub
27.5KDownloads
🔄 Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseℹ️ Verify with original source

🛡️ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

🆔 Identity & Source

id
hf-dataset--tuyuanpeng--tmpfile
slug
tuyuanpeng--tmpfile
source
huggingface
author
Tuyuanpeng
license
tags
region:us

⚙️ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

📊 Engagement & Metrics

downloads
27,507
stars
null
forks
null

Data indexed from public sources. Updated daily.