πŸ“Š
Dataset

OmniWorld

by InternRobotics ID: hf-dataset--internrobotics--omniworld
FNI Rank 23
Percentile Top 2%
Activity
β†’ 0.0%

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Data Integrity 23 FNI Score
- Size
- Rows
Parquet Format
- Tokens
Dataset Information Summary
Entity Passport
Registry ID hf-dataset--internrobotics--omniworld
Provider huggingface
πŸ“œ

Cite this dataset

Academic & Research Attribution

BibTeX
@misc{hf_dataset__internrobotics__omniworld,
  author = {InternRobotics},
  title = {OmniWorld Dataset},
  year = {2026},
  howpublished = {\url{https://huggingface.co/datasets/InternRobotics/OmniWorld}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
InternRobotics. (2026). OmniWorld [Dataset]. Free2AITools. https://huggingface.co/datasets/InternRobotics/OmniWorld

πŸ”¬Technical Deep Dive

Full Specifications [+]

βš–οΈ Free2AI Nexus Index

Methodology β†’ πŸ“˜ What is FNI?
23.0
Top 2% Overall Impact
πŸ”₯ Popularity (P) 0
πŸš€ Velocity (V) 0
πŸ›‘οΈ Credibility (C) 0
πŸ”§ Utility (U) 0
Nexus Verified Data

πŸ’¬ Why this score?

The Nexus Index for OmniWorld aggregates Popularity (P:0), Velocity (V:0), and Credibility (C:0). The Utility score (U:0) represents deployment readiness, context efficiency, and structural reliability within the Nexus ecosystem.

Data Verified πŸ• Last Updated: Not calculated
Free2AI Nexus Index | Fair Β· Transparent Β· Explainable | Full Methodology
⬇️
Downloads
27,605
❀️
Likes
81

πŸ‘οΈ Data Preview

πŸ“Š

Row-level preview not available for this dataset.

Schema structure is shown in the Field Logic panel when available.

πŸ”— Explore Full Dataset β†—

🧬 Field Logic

🧬

Schema not yet indexed for this dataset.

Dataset Specification

OmniWorld-Game Benchmark Detailed Guide

The OmniWorld-Game Benchmark is a curated subset of test splits, specifically selected from the OmniWorld-Game dataset to serve as a challenging evaluation platform, as detailed in our paper.

Task Sequence Length Duration Key Modalities
Geometric Prediction 384 frames 16 seconds RGB, Depth, Camera Poses
Video Generation 81 frames 3.4 seconds RGB, Depth, Camera Poses, Text

Each sequence in the benchmark is challenging, featuring rich dynamics that accurately reflect real-world complexity. They are accompanied by high-fidelity ground-truth annotations for camera poses and depth.

Data Access and Organization

The benchmark annotation data is packaged into .tar.gz files located under the OmniWorld/benchmark directory. Each archive is named in the format <UID>_<split_index>.tar.gz.

Extracted Directory Structure

<UID>_<split_index>/
β”œβ”€ depth/
β”‚ Β β”œβ”€ 000000.npy Β  Β  Β  # (H, W) Depth map. Already processed and stored using the OmniWorld-Game Depth reading method.
β”‚ Β β”œβ”€ 000001.npy
β”‚  └─ ...
β”œβ”€ image/ Β  Β  Β  Β  Β  Β  Β # High-resolution RGB frames (720Γ—1280 pixels)
β”‚ Β β”œβ”€ 000000.png
β”‚ Β β”œβ”€ 000001.png
β”‚  └─ ...
β”œβ”€ camera_poses.npy Β  Β # (num_frames, 4, 4) Camera-to-World (C2W) transformation matrices.
β”œβ”€ intrinsics.npy Β  Β  Β # (num_frames, 3, 3) Intrinsic camera matrices in pixel space.
β”œβ”€ text_caption.json Β  # The structured text caption associated with the sequence.
└─ video.mp4 Β  Β  Β  Β  Β  # MP4 video file corresponding to the PNG frames in the 'image/' directory.

The depth maps are already processed and stored using the OmniWorld-Game Depth reading method.

OmniWorld-CityWalk Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-CityWalk dataset.

OmniWorld-CityWalk Organisation and File Structure

The OmniWorld-CityWalk dataset is a collection of re-annotated data derived from a subset of the Sekai-Real-Walking-HQ dataset. You need downloading original videos and extracting video clips.

Important Note: In this repository, we only provide the annotated data (e.g., camera poses, dynamic masks), and do not include the raw RGB image files due to licensing and size constraints. Please refer to the original project for instructions on downloading and splitting the raw video data. Our annotations are designed to align with the original video frames.

Annotation Files

The camera annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-CityWalk/.

  • Naming Convention: omniworld_citywalk_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Scene and Split Specifications

  • Video Length: Each source video scene is 60 seconds long.
  • Frame Rate: 30 FPS.
  • Total Frames: 1800 frames per scene.
  • Split Strategy: Each scene is divided into 6 splits of 300 frames each for detailed annotation.

Metadata Explained (omniworld_citywalk_metadata.csv)

Field Name Description
index The sequential index number of the scene.
videoFile The video file name, formatted as <scene_id>_<start_frame>_<end_frame>. The corresponding source video on YouTube can be accessed via https://www.youtube.com/watch?v=<scene_id>.
cameraFile The directory name for the camera annotation data, which is named after the video file.
caption The dense text description/caption for the video segment.
location The geographical location where the video was filmed.
crowdDensity An assessment of the crowd/people density within the video.
weather The general weather condition (e.g., sunny, overcast).
timeOfDay The time of day when the video was recorded (e.g., morning, afternoon).

OmniWorld-CityWalk Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one omniworld_citywalk_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

xpPEhccDNak_0023550_0025350/ Β # Example Scene name (videoFile)
β”œβ”€ gdino_mask/ Β  Β  Β  Β  Β # Per-frame dynamic-object masks (.png)
β”œβ”€ recon/ Β  Β  Β  Β  Β  Β  Β  # Camera and 3D reconstruction data per split
β”‚Β  β”œβ”€ split_0/
β”‚Β  β”‚Β  β”œβ”€ extrinsics.npz # Per-frame camera extrinsics: (frame_num, 3, 4) in OpenCV world-to-camera format
β”‚Β  β”‚Β  β”œβ”€ intrinsics.npz # Per-frame camera intrinsics: (frame_num, 3, 3) in pixel units
β”‚Β  β”‚Β  └─ points3D_ba.ply # Sparse and accurate point cloud data after Bundle Adjustment (BA) for this split
β”‚Β  β”œβ”€ split_1/
β”‚Β  β”‚Β  └─ ...
|Β  └─ ...
β”œβ”€ image_list.json Β  Β  Β # Defines the frame naming convention (e.g., 000000.png to 001799.png)
└─ split_info.json Β  Β  Β # Records how frames are grouped into 300-frame splits

2. Modality Details

2.1. Split Information (split_info.json)

Scene frames are segmented into 300-frame splits for annotation. The mapping and division information is stored in split_info.json.

2.2. Camera Poses (recon/split_<idx>/...)

Camera poses are provided as NumPy compressed files (.npz) containing the extrinsics (world-to-camera rotation and translation) and intrinsics (focal length and principal point).

Minimal Reader

import numpy as np

Load Extrinsics (World-to-Camera Transform in OpenCV format)

extrinsics = np.load("recon/split_0/extrinsics.npz")['extrinsics'] Β # Shape: (frame_num, 3, 4)

Load Intrinsics (in Pixel Units)

intrinsics = np.load("recon/split_0/intrinsics.npz")['intrinsics'] Β # Shape: (frame_num, 3, 3)

print("Extrinsics shape:", extrinsics.shape) print("Intrinsics shape:", intrinsics.shape)

OmniWorld-HOI4D Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-HOI4D dataset.

OmniWorld-HOI4D Organisation and File Structure

The OmniWorld-HOI4D dataset is a collection of re-annotated data derived from the HOI4D dataset. You need downloading original videos.

Important Note: In this repository, we only provide the annotated data (e.g., camera poses, flow, depth, text), and do not include the raw RGB image files due to licensing and size constraints. Please refer to the original project for instructions on downloading the raw video data. Our annotations are designed to align with the original video frames.

Annotation Files

The annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-HOI4D/.

  • Naming Convention: omniworld_hoi4d_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Scene and Split Specifications

  • Total Frames: 300 frames per scene.
  • Split Strategy: Each scene is divided into 1 splits of 300 frames each for detailed annotation.

Metadata Explained (omniworld_hoi4d_metadata.csv)

Field Name Description
Index The sequential index number of the scene.
Video Path The relative path of the scene in the original HOI4D dataset. Use this path to locate the corresponding source RGB video that you have downloaded. Example: ZY20210800001/H1/C1/N19/S100/s02/T1
Annotation Path The directory name for this scene's annotations inside the extracted .tar.gz archive. This is generated by replacing all / in the Video Path with _. Example: ZY20210800001_H1_C1_N19_S100_s02_T1

OmniWorld-HOI4D Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one omniworld_hoi4d_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

<Annotation Path>
# e.g., ZY20210800001_H1_C1_N19_S100_s02_T1
|
β”œβ”€β”€ camera/
β”‚   β”œβ”€β”€ recon/
β”‚   β”‚   └── split_0/
β”‚   β”‚       └── info.json        # Camera intrinsics and extrinsics for all 300 frames.
β”‚   β”œβ”€β”€ image_list.json          # Ordered list of corresponding image filenames.
β”‚   └── split_info.json          # Defines the frame segmentation (HOI4D is one 300-frame split).
|
β”œβ”€β”€ flow/                        # Just like OmniWorld-Game.
β”‚   β”œβ”€β”€ 00000/
β”‚   β”‚   β”œβ”€β”€ flow_u_16.png        # Optical flow (horizontal component). 
β”‚   β”‚   β”œβ”€β”€ flow_v_16.png        # Optical flow (vertical component).
β”‚   β”‚   └── flow_vis.png         # Visualization of the optical flow.
β”‚   β”œβ”€β”€ 00001/
β”‚   ... (up to frame 299)
|
β”œβ”€β”€ prior_depth/
β”‚   β”œβ”€β”€ 00000.png               # Monocular depth map for frame 0.
β”‚   β”œβ”€β”€ 00001.png               # Monocular depth map for frame 1.
β”‚   ... (up to frame 299)
|
└── text/                        # Just like OmniWorld-Game.
    β”œβ”€β”€ 0_80.txt                 # Text description for frames 0-80.
    β”œβ”€β”€ 120_200.txt              # Text description for frames 120-200.
    ...

2. Modality Details

2.1. Split Information (split_info.json)

Scene frames are segmented into 300-frame splits for annotation. The mapping and division information is stored in split_info.json. Each HOI4D scene consists of a single 300-frame split.

2.2 Camera Poses (info.json)

Minimal Reader

import json
import torch

def load_camera_info(info_json_path: str): """ Parses an info.json file to extract camera intrinsics and extrinsics. """ with open(info_json_path, 'r') as f: info_data = json.load(f)

# Extrinsics are provided as a list of 4x4 world-to-camera matrices (OpenCV convention)
extrinsics = torch.tensor(info_data['extrinsics'])  # Shape: (num_frames, 4, 4)

num_frames = extrinsics.shape[0]

fx, fy, cx, cy = info_data['crop_intrinsic'].values()
intrinsic = torch.eye(3)
intrinsic[0, 0] = fx
intrinsic[0, 2] = cx
intrinsic[1, 1] = fy
intrinsic[1, 2] = cy

# Repeat the intrinsic matrix for each frame
intrinsics = intrinsic.unsqueeze(0).repeat(num_frames, 1, 1)  # Shape: (num_frames, 3, 3)

return intrinsics, extrinsics

Example usage:

annotation_path = "ZY20210800001_H1_C1_N19_S100_s02_T1" info_path = f"{annotation_path}/camera/recon/split_0/info.json" intrinsics, extrinsics = load_camera_info(info_path)

print("Intrinsics shape:", intrinsics.shape) print("Extrinsics shape:", extrinsics.shape)

OmniWorld-DROID Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-DROID dataset.

OmniWorld-DROID Organisation and File Structure

The OmniWorld-DROID dataset is a collection of re-annotated data derived from the DROID dataset. You need downloading original videos.

Important Note: In this repository, we only provide the annotated data (e.g., flow, depth, text, mask), and do not include the raw RGB image files due to licensing and size constraints. Please refer to the original project for instructions on downloading the raw video data. Our annotations are designed to align with the original video frames.

Annotation Files

The annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-DROID/.

  • Naming Convention: omniworld_droid_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Metadata Explained (omniworld_droid_metadata.csv)

Field Name Description
Index The sequential index number of the scene.
Video Path The relative path of the scene in the original DROID dataset. Use this path to locate the corresponding source RGB video that you have downloaded. Example: droid_raw/1.0.1/TRI/success/2023-10-17/Tue_Oct_17_17:20:55_2023/
Annotation Path The directory name for this scene's annotations inside the extracted .tar.gz archive. Example: droid_processed/1.0.1/TRI/success/2023-10-17/Tue_Oct_17_17:20:55_2023/
Img Num The total number of image frames from one camera perspective in the scene.

OmniWorld-DROID Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one omniworld_droid_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

<Annotation Path>/
# e.g., droid_processed/1.0.1/TRI/success/2023-10-17/Tue_Oct_17_17:20:55_2023/
|
β”œβ”€β”€ flow/                        # Just like OmniWorld-Game
β”‚   └── <camera_serial_id>/      # e.g., 18026681, 22008760, etc.
β”‚       β”œβ”€β”€ 0/
β”‚       β”‚   β”œβ”€β”€ flow_u_16.png    # Optical flow (horizontal component) for frame 0
β”‚       β”‚   β”œβ”€β”€ flow_v_16.png    # Optical flow (vertical component) for frame 0
β”‚       β”‚   └── flow_vis.png     # Visualization of the optical flow for frame 0
β”‚       β”œβ”€β”€ 1/
β”‚       ... (up to Img Num - 1)
|
β”œβ”€β”€ foundation_stereo/
β”‚   └── <camera_serial_id>/
β”‚       β”œβ”€β”€ 0.png                # Monocular depth map for frame 0
β”‚       β”œβ”€β”€ 1.png                # Monocular depth map for frame 1
β”‚       ... (up to Img Num - 1)
|
β”œβ”€β”€ robot_masks/                 # Just like OmniWorld
β”‚   └── <camera_serial_id>/
β”‚       β”œβ”€β”€ mask_prompt.json
β”‚       └── tracked_masks_coco.json
|
β”œβ”€β”€ text/
β”‚   └── <camera_name>/           # e.g., ext1_cam_serial, wrist_cam_serial
β”‚       β”œβ”€β”€ 0-161.txt            # Short caption for frames 0-161
β”‚       └── 40-201.txt           # Short caption for frames 40-201
|
β”œβ”€β”€ recordings/
β”‚   └── camera_info_dict.npy         # Camera intrinsics
|
β”œβ”€β”€ <camera_name>_totalcaption.txt   # Long-form, summary caption for the entire scene from one camera's perspective
β”œβ”€β”€ meta_info.json                   # General metadata for the scene
...

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-DROID dataset.

2. Modality Details

2.1. Depth

Minimal Reader

import imageio.v2
import numpy as np

_MAX_DEPTH = 10.0

def load_depth(depthpath): """ Returns ------- depthmap : (H, W) float32 valid : (H, W) bool True for reliable pixels """

depthmap = imageio.v2.imread(depthpath).astype(np.float32) / 65535.0 * _MAX_DEPTH

valid = ((depthmap > 0) & (depthmap < _MAX_DEPTH)).astype(float)

return depthmap, valid

---------------------------- example ---------------------------------------

if name == "main": d, valid = load_depth("droid/droid_processed/1.0.1/REAL/success/2023-05-27/Sat_May_27_11:22:57_2023/foundation_stereo/23960472/160.png") print("Depth shape:", d.shape, "valid pixels:", valid.mean() * 100, "%")

2.2 Camera Pose

To streamline the data loading process, we have pre-extracted camera intrinsics from the official DROID metadata and consolidated them into camera_info_dict.npy. Alternatively, you may parse these parameters directly from the raw DROID metadata files.

Note on Camera Extrinsics: In the DROID dataset, the wrist camera pose data is often inaccurate. Consequently, we do not provide extrinsic loading for wrist-mounted views. For fixed-view cameras, the extrinsic matrix can be initialized as an identity matrix.

import numpy as np

camera_info_dict_path = "droid/droid_processed/1.0.1/REAL/success/2023-05-27/Sat_May_27_11:22:57_2023/camera_info_dict.npy"

camera_info = np.load(camera_info_dict_path, allow_pickle=True).item()

Example: Accessing intrinsics for specific camera serials

camera_serial_ids = ["18026681", "22008760", "24400334"]

for cam_id in camera_serial_ids: intrinsics = camera_info[cam_id]["cam_matrix"] print(f"Camera {cam_id} Intrinsics Shape: {intrinsics.shape}") # Output: (3, 3)

OmniWorld-RH20TRobot Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-RH20TRobot dataset.

OmniWorld-RH20TRobot Organisation and File Structure

The OmniWorld-RH20TRobot dataset is a collection of re-annotated data derived from the RH20T dataset. You need downloading original videos.

Annotation Files

The annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-RH20TRobot/.

  • Naming Convention: rh20t_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Metadata Explained (omniworld_rh20t_robot_metadata.csv)

Field Name Description
Index The sequential index number of the scene.
Video Path The relative path of the scene in the original rh20t dataset. Use this path to locate the corresponding source RGB video that you have downloaded. Example: RH20T/RH20T_cfg1/task_0030_user_0010_scene_0004_cfg_0001/cam_035622060973/color/
Annotation Path The directory name for this scene's annotations inside the extracted .tar.gz archive. Example: RH20T/RH20T_cfg1/task_0030_user_0010_scene_0004_cfg_0001/cam_035622060973/

OmniWorld-RH20TRobot Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one rh20t_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

<Annotation Path>/
# e.g., RH20T_cfg1/task_0030_user_0010_scene_0004_cfg_0001/cam_035622060973/
|
β”œβ”€β”€ robot_masks/                 # Read like OmniWorld
β”‚   β”œβ”€β”€ mask_prompt.json
|   β”œβ”€β”€ tracked_masks_coco_v2.json
β”‚   └── tracked_masks_coco.json
|
β”œβ”€β”€ text/
β”‚   β”œβ”€β”€ 0-161.txt            # caption for frames 0-161
β”‚   └── 40-201.txt           # caption for frames 40-201
|
...

OmniWorld-RH20THuman Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-RH20TTHuman dataset.

OmniWorld-RH20THuman Organisation and File Structure

The OmniWorld-RH20TTHuman dataset is a collection of re-annotated data derived from the RH20T dataset. You need downloading original videos.

Annotation Files

The annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-RH20TTHuman/.

  • Naming Convention: rh20t_human_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Metadata Explained (omniworld_rh20t_human_metadata.csv)

Field Name Description
Index The sequential index number of the scene.
Video Path The relative path of the scene in the original rh20t dataset. Use this path to locate the corresponding source RGB video that you have downloaded. Example: RH20T/RH20T_cfg1/task_0062_user_0001_scene_0010_cfg_0001_human/cam_035622060973/color/
Annotation Path The directory name for this scene's annotations inside the extracted .tar.gz archive. Example: RH20T/RH20T_cfg1/task_0062_user_0001_scene_0010_cfg_0001_human/cam_035622060973/

OmniWorld-RH20THuman Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one rh20t_human_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

<Annotation Path>/
# e.g., RH20T_cfg1/task_0062_user_0001_scene_0010_cfg_0001_human/cam_035622060973/
|
β”œβ”€β”€ text/
β”‚   β”œβ”€β”€ 0-161.txt            # caption for frames 0-161
β”‚   └── 40-201.txt           # caption for frames 40-201
|
...

OmniWorld-EgoExo4D Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-EgoExo4D dataset.

OmniWorld-EgoExo4D Organisation and File Structure

The OmniWorld-EgoExo4D dataset is a collection of re-annotated data derived from the Ego-Exo4D dataset. You need downloading original videos.

Annotation Files

The annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-EgoExo4D/.

  • Naming Convention: omniword_egoexo4d_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Metadata Explained (omniworld_egoexo4d_metadata.csv)

Field Name Description
Index The sequential index number of the scene.
Video Path The relative path of the scene in the original Ego-Exo4D dataset. Use this path to locate the corresponding source RGB video that you have downloaded. Example: egoexo4d-processed/takes/cmu_bike01_2/frame_aligned_videos/aria01_214-1-undistorted/
Annotation Path The directory name for this scene's annotations inside the extracted .tar.gz archive. Example: egoexo4d-processed/takes/cmu_bike01_2/

OmniWorld-EgoExo4D Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one omniworld_egoexo4d_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

<Annotation Path>/
# e.g., egoexo4d-processed/takes/cmu_bike01_2/
|
β”œβ”€β”€ text/
β”‚   β”œβ”€β”€ 0-161.txt            # caption for frames 0-161
β”‚   └── 40-201.txt           # caption for frames 40-201
|
...

OmniWorld-EgoDex Detailed Guide

This section provides detailed organization, metadata, and usage instructions specific to the OmniWorld-EgoDex dataset.

OmniWorld-EgoDex Organisation and File Structure

The OmniWorld-EgoDex dataset is a collection of re-annotated data derived from the EgoDex dataset. You need downloading original videos.

Annotation Files

The annotation data is packaged in .tar.gz files located under OmniWorld/annotations/OmniWorld-EgoDex/.

  • Naming Convention: omniword_egodex_<start_scene_index>_<end_scene_index>.tar.gz, where the indices correspond to the scene index range within the metadata file.

Metadata Explained (omniworld_egodex_metadata.csv)

Field Name Description
Index The sequential index number of the scene.
Video Path The relative path of the scene in the original EgoDex dataset. Use this path to locate the corresponding source RGB video that you have downloaded. Example: egodex/part1/assemble_disassemble_legos/2338/
Annotation Path The directory name for this scene's annotations inside the extracted .tar.gz archive. Example: egodex/part1/assemble_disassemble_legos/2338/

OmniWorld-EgoDex Usage Guide

1. Quick-Start: Extracting One Scene

To access the annotations for a scene, you first need to extract the corresponding .tar.gz archive. After extracting one omniworld_egodex_<start_scene_index>_<end_scene_index>.tar.gz file, the resulting folder structure for each individual scene within the archive is as follows:

<Annotation Path>/
# e.g., egodex/part1/assemble_disassemble_legos/2338/
|
β”œβ”€β”€ text/
β”‚   β”œβ”€β”€ 0-80.txt            # caption for frames 0-80
β”‚   └── 40-120.txt           # caption for frames 40-120
|
...

License

The OmniWorld dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). By accessing or using this dataset, you agree to be bound by the terms and conditions outlined in this license, as well as the specific provisions detailed below.

  • Special Note on Third-Party Content:
    A portion of this dataset is derived from third-party game content. All intellectual property rights pertaining to these original game assets (including, but not limited to, RGB and depth images) remain with their respective original game developers and publishers.

  • Permitted Uses:
    You are hereby granted permission, free of charge, to use, reproduce, and share the OmniWorld dataset and any adaptations thereof, solely for non-commercial research and educational purposes. This includes, but is not limited to: academic publications, algorithm benchmarking, reproduction of scientific results.

Under this license, you are expressly forbidden from:

  • Using the dataset, in whole or in part, for any commercial purpose, including but not limited to its incorporation into commercial products, services, or monetized applications.

  • Redistributing the original third-party game assets contained within the dataset outside the scope of legitimate research sharing.
    Removing or altering any copyright, license, or attribution notices.

The authors of the OmniWorld dataset provide this dataset "as is" and make no representations or warranties regarding the legality of the underlying data for any specific purpose. Users are solely responsible for ensuring that their use of the dataset complies with all applicable laws and the terms of service or license agreements of the original game publishers (sources of third-party content).

For the full legal text of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, please visit: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.

Citation

If you found this dataset useful, please cite our paper

@article{zhou2025omniworld,
      title={OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling}, 
      author={Yang Zhou and Yifan Wang and Jianjun Zhou and Wenzheng Chang and Haoyu Guo and Zizun Li and Kaijing Ma and Xinyue Li and Yating Wang and Haoyi Zhu and Mingyu Liu and Dingning Liu and Jiange Yang and Zhoujie Fu and Junyi Chen and Chunhua Shen and Jiangmiao Pang and Kaipeng Zhang and Tong He},
      journal={arXiv preprint arXiv:2509.12201},
      year={2025}
}
Top Tier

Social Proof

HuggingFace Hub
81Likes
27.6KDownloads
πŸ”„ Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

πŸ“Š FNI Methodology πŸ“š Knowledge Baseℹ️ Verify with original source

πŸ›‘οΈ Dataset Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

πŸ†” Identity & Source

id
hf-dataset--internrobotics--omniworld
source
huggingface
author
InternRobotics
tags
task_categories:text-to-videotask_categories:image-to-videotask_categories:image-to-3dtask_categories:roboticstask_categories:otherlanguage:enlicense:cc-by-nc-sa-4.0size_categories:1bformat:webdatasetmodality:imagemodality:textlibrary:datasetslibrary:webdatasetlibrary:mlcroissantarxiv:2509.12201region:us

βš™οΈ Technical Specs

architecture
null
params billions
null
context length
null

πŸ“Š Engagement & Metrics

likes
81
downloads
27,605

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)