πŸ“Š
Dataset

PDB

by LiteFold litefold/pdb
Free2AITools Nexus Index
59.9
S: Semantic 50

Query-time baseline · scored live at search

A: Authority 33
P: Popularity 52
R: Recency 92
Q: Quality 50
Tech Context
Vital Performance
Data Integrity 59.9 FNI Score
- Size
- Rows
- Tokens
Dataset Information Summary
Entity Passport
Registry ID litefold/pdb
License CC0-1.0
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset_litefold_pdb,
  author = {LiteFold},
  title = {PDB Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/LiteFold/PDB}},
  note = {Accessed via Free2AITools.}
}
APA Style
LiteFold. (2026). PDB [Dataset]. Free2AITools. https://huggingface.co/datasets/LiteFold/PDB

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AITools Nexus Index V2.0

Semantic (S) 50

Query-time baseline · scored live at search

Authority (A) 33
Popularity (P) 52
Recency (R) 92
Quality (Q) 50

πŸ’¬ Index Insight

FNI V2.0 for PDB: Authority (A:33), Popularity (P:52), Recency (R:92), Quality (Q:50). Semantic (S) is a query-time baseline scored live at search.

Free2AITools Nexus Index

Data Sources / Provenance

Open data Updated: Live data
⬇️
Downloads
35,256

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

PDB mmCIF Entry Index

The Protein Data Bank is the single global archive of experimentally-determined 3D structures of biological macromolecules, established in 1971 and now holding well over 230,000 entries. It stores atomic coordinates for proteins, nucleic acids, and their complexes determined by X-ray crystallography, cryo-EM, NMR, micro-electron diffraction, and integrative methods, along with the underlying experimental data (structure factors, EM maps, NMR restraints) and rich metadata covering sequence, ligands, modifications, oligomeric state, and validation reports. Every entry has a four-character PDB ID (e.g. 7PZB) and is distributed primarily in the mmCIF format, with legacy PDB-format files retained for compatibility.Operationally, the archive is jointly managed by the wwPDB consortium: RCSB PDB at Rutgers and UCSD handles deposits from the Americas and Oceania and serves as the wwPDB Archive Keeper, PDBe at EMBL-EBI handles Europe and Africa, PDBj at Osaka University handles Asia, and BMRB hosts NMR-specific data. All wwPDB sites receive synchronized weekly updates and serve the archive free of charge under CC0. Within structural biology and protein ML, the PDB is the canonical training and validation source for structure prediction (AlphaFold2/3, RoseTTAFold, Protenix, OpenFold), inverse folding (ProteinMPNN, ESM-IF), docking, MD setup, and template-based modelling, and time-cutoff splits on PDB release dates are the standard way to control for data leakage when benchmarking these models.

Splits

Split Rows
train 88,873
test 9,951
total 98,824

The split is deterministic: sha256(pdb_id) % 10 == 0 goes to test; buckets 1 through 9 go to train.

Dataset Statistics

| Metr

Social Proof

HuggingFace Hub
35.3KDownloads
πŸ”„ Updated daily

Source summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Technical metadata sourced from upstream repositories.

Open Metadata

πŸ†” Identity & Source

id
hf-dataset--litefold--pdb
slug
litefold--pdb
source
huggingface
author
LiteFold
license
CC0-1.0
tags
license:cc0-1.0, size_categories:10k<n<100k, format:parquet, modality:tabular, modality:text, library:datasets, library:pandas, library:polars, library:mlcroissant, region:us, biology, proteins, protein-structure, pdb, rcsb, mmcif, parquet

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null
pipeline tag

πŸ“Š Engagement & Metrics

downloads
35,256
stars
null
forks
null

Data indexed from public sources. Updated daily.