Egocentric Data Samples

Real-world egocentric video data captured by a distributed network of contributors using smartphones. Every clip is recorded from a first-person perspective, metadata-enriched on-device, and quality-verified before entering the dataset. Built to train the next generation of embodied AI and robotic manipulation models.

Raw clips are captured once by contributors, then passed through an evolving server-side annotation pipeline. Each annotation layer (object detection, hand pose, action segmentation) increases the dataset value without requiring new data collection.

RoboX Data Campaigns

Each campaign targets a specific domain of embodied intelligence. Browse samples from all five active collection programs below.

Cover

EgoScene 360-degree spatial scans.

Cover

EgoDaily Everyday household and workplace activities

Cover

EgoNav Indoor navigation and path mapping

Cover

EgoGrasp Object grasping and manipulation

Cover

EgoSocial (Coming Soon) Social navigation in populated spaces

EgoGrasp

First-person recordings of object grasping and manipulation in real-world settings. Contributors pick up, move, and interact with everyday objects while wearing or holding their smartphone at chest height. This is RoboX's highest-volume campaign and the foundation for robotic manipulation training data.

Technical Specifications

Resolution

1080p egocentric video

Frame rate

30 fps

Max clip duration

15 seconds

Capture method

Phone at chest/waist height, natural grasp motion

On-device metadata

Timestamp, device model, quality score, lighting metadata, scene context, object metadata

Annotation layers

Object category, interaction type, hand pose estimation (server-side)

Current volume

1055 clips

Object Categories

490+ unique object types across household, kitchen, office, medical, personal care, and tools. Examples from the current dataset: Cup / Mug Computer Mouse Office Scissors Wardrobe Hanger Kitchen Utensils Remote Control Pen / Pencil Plushie Toy Water Bottle Headphones Phone Charger Medical Tools Stationery

Sample Clips

file-download
7MB

Use Cases

Robotic manipulation training, VLA model fine-tuning, grasp planning, object affordance learning, household robotics, sim-to-real transfer

Lab datasets typically cover 20-50 objects in controlled settings. EgoGrasp captures 490+ objects in real homes, offices, and kitchens across multiple countries, giving manipulation models the diversity they need to generalize.

EgoScene

Slow 360-degree panoramic scans of real-world environments. Contributors rotate in place to capture a full spatial view of rooms, stores, outdoor areas, and other spaces. Designed for spatial understanding, scene classification, and 3D reconstruction training.

Technical Specifications

Resolution

1080p egocentric video

Frame rate

30 fps

Avg clip duration

11 seconds

Capture method

Slow panoramic rotation from a fixed position

On-device metadata

Timestamp, device model, quality score, lighting, IMU per frame, scene context

Annotation layers

Scene classification, environment type, lighting context

Current volume

295 clips

Environment Categories

Kitchen Bedroom Bathroom Living Room Office Supermarket Shopping Mall Outdoor

Sample Clips

Use Cases

Scene understanding, spatial mapping, 3D reconstruction, environment classification, indoor navigation pre-training

EgoNav

Walking-pace egocentric recordings of indoor navigation. Contributors walk naturally through homes, stores, offices, and other indoor spaces while the app captures continuous video with GPS accuracy, heading, speed, and trajectory data. Built for autonomous navigation and path planning models.

Technical Specifications

Resolution

1080p egocentric video

Frame rate

30 fps

Avg clip duration

8-10 seconds

Capture method

Phone held naturally during walking-pace navigation

On-device metadata

Timestamp, device model, camera motion, GPS accuracy, heading, speed, path context, trajectory segmentation

Annotation layers

Environment type, trajectory segmentation, obstacle context

Current volume

167 clips

Environment Categories

Home Store Office Other

Sample Clips

Use Cases

Autonomous navigation, path planning, SLAM pre-training, obstacle avoidance, indoor localization

EgoDaily

Egocentric recordings of everyday activities in homes and workplaces. Contributors record themselves performing routine tasks like cooking, cleaning, typing, organizing, and other daily actions. Designed for activity recognition, task planning, and household robotics models that need to understand how humans perform common activities.

Technical Specifications

Resolution

1080p egocentric video

Frame rate

30 fps

Avg clip duration

15-18 seconds

Capture method

Phone at chest/waist height during routine activities

On-device metadata

Timestamp, device model, quality score, lighting metadata, environment context

Annotation layers

Activity type, environment classification, temporal segmentation

Current volume

220 clips

Activity Categories

Cooking Cleaning Laundry Workspace Tasks Organizing

Sample Clips

file-download
10MB

Use Cases

Activity recognition, task planning, household robotics, procedural learning, assistive AI

EgoSocial (Coming Soon)

Egocentric navigation through crowded, populated spaces. Contributors walk through markets, stations, malls, and public areas while the app captures crowd density, pedestrian flow, and cultural navigation context. All faces are automatically blurred on-device before data leaves the phone. Designed for social navigation, the critical unsolved problem of teaching robots implicit social rules across cultures.

Technical Specifications

Resolution

1080p egocentric video

Frame rate

30 fps

Target duration

5-12 minutes per submission

Capture method

Phone held naturally during walking through populated areas

Privacy

On-device face-blur pipeline runs before upload

On-device metadata

Timestamp, GPS, crowd density estimation, heading, speed

Validation

Minimum 2 people visible, continuous forward movement, face-blur verified, 5+ min

Current volume

Collection will begin soon

Target Environments

Street markets, train stations, shopping malls, university campuses, food courts, bus terminals, festivals, stadium exits

Use Cases

Social navigation, crowd dynamics, pedestrian prediction, cultural norm learning, public space robotics, human-robot interaction

Social norms for personal space, queuing, and crowd navigation vary dramatically across cultures. RoboX's global contributor network captures this diversity from markets in Vietnam to stations in India to malls in Europe, something no single research lab can replicate.

Last updated