# Egocentric Data Samples

Real-world egocentric video data captured by a distributed network of contributors using smartphones. Every clip is recorded from a first-person perspective, metadata-enriched on-device, and quality-verified before entering the dataset. Built to train the next generation of embodied AI and robotic manipulation models.

Raw clips are captured once by contributors, then passed through an evolving server-side annotation pipeline. Each annotation layer (object detection, hand pose, action segmentation) increases the dataset value without requiring new data collection.

### RoboX Data Campaigns <a href="#campaigns" id="campaigns"></a>

Each campaign targets a specific domain of embodied intelligence. Browse samples from all five active collection programs below.

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-cover data-type="image">Cover image</th></tr></thead><tbody><tr><td><strong>EgoScene</strong><br><br>360-degree spatial scans.</td><td><a href="https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FX5bPUrf5uYz7PLkMLcoH%2Fegoscene.png?alt=media&#x26;token=ab08df82-e6b5-4b78-a6c5-8e4a3a7695ce">egoscene.png</a></td></tr><tr><td><strong>EgoDaily</strong><br><br>Everyday household and workplace activities</td><td><a href="https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FckdgTLXP78VpHpGjAmno%2Fegodaily-5.png?alt=media&#x26;token=660204d6-9f5a-4836-b0b8-11f4ff391fe6">egodaily-5.png</a></td></tr><tr><td><strong>EgoNav</strong><br><br>Indoor navigation and path mapping</td><td><a href="https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FtIQmd5CFRvYwVpjqy8J9%2Fegonav.png?alt=media&#x26;token=41efa168-0981-4db6-81e7-b9a0d4ada4b9">egonav.png</a></td></tr><tr><td><strong>EgoGrasp</strong><br><br>Object grasping and manipulation</td><td><a href="https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FG0VSypExL7xAy3gfzS1q%2Fegograsp.png?alt=media&#x26;token=a3882d5c-e423-4f9b-a28c-5731f77db02d">egograsp.png</a></td></tr><tr><td><strong>EgoSocial (Coming Soon)</strong><br><br>Social navigation in populated spaces</td><td><a href="https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FB4t5YSgiq8tfiJXAiZIE%2Fegosocial.png?alt=media&#x26;token=1190304f-bd64-45d4-ba31-2e8d777e827a">egosocial.png</a></td></tr></tbody></table>

### EgoGrasp

First-person recordings of object grasping and manipulation in real-world settings. Contributors pick up, move, and interact with everyday objects while wearing or holding their smartphone at chest height. This is RoboX's highest-volume campaign and the foundation for robotic manipulation training data.

#### Technical Specifications

| Resolution         | `1080p` egocentric video                                                                  |
| ------------------ | ----------------------------------------------------------------------------------------- |
| Frame rate         | `30 fps`                                                                                  |
| Max clip duration  | `15 seconds`                                                                              |
| Capture method     | Phone at chest/waist height, natural grasp motion                                         |
| On-device metadata | Timestamp, device model, quality score, lighting metadata, scene context, object metadata |
| Annotation layers  | Object category, interaction type, hand pose estimation (server-side)                     |
| Current volume     | **1055 clips**                                                                            |

#### Object Categories

490+ unique object types across household, kitchen, office, medical, personal care, and tools. \
\
**Examples from the current dataset:**\
\
`Cup / Mug` `Computer Mouse` `Office Scissors` `Wardrobe Hanger` `Kitchen Utensils` `Remote Control` `Pen / Pencil` `Plushie Toy` `Water Bottle` `Headphones` `Phone Charger` `Medical Tools` `Stationery`

#### Sample Clips

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FAGmbv404joPW541ewdzw%2FInteraction%20with%20a%20plushy%20carrot%20toy.mp4?alt=media&token=d41d24cc-068f-4577-96e2-fe48049fc3a6>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2F5xwZ4ol4u4SXjwyYbu8R%2FA%20wardrobe%20hanger%20is%20grasped.mp4?alt=media&token=5493e470-a082-4e7b-b8c1-47688ab8ff5b>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2Fw8jl9u8CowB8Tg30FkTE%2FPicking%20up%20a%20glass%20of%20juice.mp4?alt=media&token=10a8c0c5-ede0-4461-9d99-9edc135a0ba7>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2F02yqNZkrintFbNlnsB4h%2FInteracting%20with%20car's%20engine.mp4?alt=media&token=e4429c3c-3258-458a-8a72-84dcbcc8f058>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FDZPDIMYIQyFwdE1to3RH%2FGrasp%20a%20mug.mp4?alt=media&token=5636dabb-dc25-4243-9990-869440c22ec8>" %}

#### Use Cases

<mark style="color:$primary;background-color:purple;">Robotic manipulation training, VLA model fine-tuning, grasp planning, object affordance learning, household robotics, sim-to-real transfer</mark>

Lab datasets typically cover 20-50 objects in controlled settings. EgoGrasp captures 490+ objects in real homes, offices, and kitchens across multiple countries, giving manipulation models the diversity they need to generalize.

### EgoScene

Slow 360-degree panoramic scans of real-world environments. Contributors rotate in place to capture a full spatial view of rooms, stores, outdoor areas, and other spaces. Designed for spatial understanding, scene classification, and 3D reconstruction training.

#### Technical Specifications

| Resolution         | `1080p` egocentric video                                                       |
| ------------------ | ------------------------------------------------------------------------------ |
| Frame rate         | `30 fps`                                                                       |
| Avg clip duration  | `11 seconds`                                                                   |
| Capture method     | Slow panoramic rotation from a fixed position                                  |
| On-device metadata | Timestamp, device model, quality score, lighting, IMU per frame, scene context |
| Annotation layers  | Scene classification, environment type, lighting context                       |
| Current volume     | **295 clips**                                                                  |

#### Environment Categories

`Kitchen` `Bedroom` `Bathroom` `Living Room` `Office` `Supermarket` `Shopping Mall` `Outdoor`

#### Sample Clips

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FIVYotCQWdIHAFdPdriGo%2F360-degree%20rotation%20of%20a%20children's%20toy%20store.mp4?alt=media&token=36cb8ff9-e2da-41b0-a868-f32327f8e24c>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FZ20DwCXmntScWd1OAwP4%2F360-degree%20rotation%20of%20a%20shop%20selling%20plants.mp4?alt=media&token=c5f119f7-fd23-46ad-8bdf-16b26db6bf2c>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FsiLVKWwldrJmbTRDHV0y%2F360-degree%20rotation%20of%20the%20living%20room.mp4?alt=media&token=2716f205-0499-43b5-acc9-2afece8eacd1>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FHmbrqUueJSwDFkoCXSBt%2F360-degree%20rotation%20of%20an%20outdoor%20area.mp4?alt=media&token=72c0d4c3-e676-402d-b81b-f03814414006>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FrDrLbjo7BfjpV7gVosnY%2F360-degree%20rotation%20of%20an%20outdoor%20park%20area.mp4?alt=media&token=39b676fb-54af-4f96-82ed-7cc789636f72>" %}

#### Use Cases

<mark style="color:$primary;background-color:purple;">Scene understanding, spatial mapping, 3D reconstruction, environment classification, indoor navigation pre-training</mark>

### EgoNav

Walking-pace egocentric recordings of indoor navigation. Contributors walk naturally through homes, stores, offices, and other indoor spaces while the app captures continuous video with GPS accuracy, heading, speed, and trajectory data. Built for autonomous navigation and path planning models.

#### Technical Specifications

| Resolution         | `1080p` egocentric video                                                                                    |
| ------------------ | ----------------------------------------------------------------------------------------------------------- |
| Frame rate         | `30 fps`                                                                                                    |
| Avg clip duration  | `8-10 seconds`                                                                                              |
| Capture method     | Phone held naturally during walking-pace navigation                                                         |
| On-device metadata | Timestamp, device model, camera motion, GPS accuracy, heading, speed, path context, trajectory segmentation |
| Annotation layers  | Environment type, trajectory segmentation, obstacle context                                                 |
| Current volume     | **167 clips**                                                                                               |

#### Environment Categories

`Home` `Store` `Office` `Other`&#x20;

#### Sample Clips

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2F6qcfOAYW3z2NLqDUZ1Dn%2FWalking%20indoors%20in%20a%20home%20environment.mp4?alt=media&token=a36931a1-c565-4789-8626-7418d9cf2bdf>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FkMgdMIQToNHdcgjE6DoV%2FWalking%20indoors%20in%20a%20retail%20store%20environment..mp4?alt=media&token=3de5cf6b-152c-44c1-905e-be901df76d9f>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FRqAE19GXAmwj4GzojDdQ%2FWalking%20indoors%20in%20a%20store%20environment..mp4?alt=media&token=55737ed0-0884-4f26-a6b3-c000ec689393>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FdbDNiLwxnxKISTVCum2m%2FWalking%20indoors%20in%20the%20office%20environment..mp4?alt=media&token=edd6342e-7bea-4a95-9fca-d03557dba9c6>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2F3ufntKEKTr9x4W5yHV9d%2FWalking%20indoors%20in%20a%20store.mp4?alt=media&token=7dadd25e-7021-4eba-bdf6-b83b277b7c19>" %}

#### Use Cases

<mark style="color:$primary;background-color:purple;">Autonomous navigation, path planning, SLAM pre-training, obstacle avoidance, indoor localization</mark>

### EgoDaily

Egocentric recordings of everyday activities in homes and workplaces. Contributors record themselves performing routine tasks like cooking, cleaning, typing, organizing, and other daily actions. Designed for activity recognition, task planning, and household robotics models that need to understand how humans perform common activities.

#### Technical Specifications

| Resolution         | `1080p` egocentric video                                                       |
| ------------------ | ------------------------------------------------------------------------------ |
| Frame rate         | `30 fps`                                                                       |
| Avg clip duration  | `15-18 seconds`                                                                |
| Capture method     | Phone at chest/waist height during routine activities                          |
| On-device metadata | Timestamp, device model, quality score, lighting metadata, environment context |
| Annotation layers  | Activity type, environment classification, temporal segmentation               |
| Current volume     | **220 clips**                                                                  |

#### Activity Categories

`Cooking` `Cleaning` `Laundry Workspace Tasks Organizing`&#x20;

#### Sample Clips

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2F7oFyQLw8WKPVNQswRrdn%2FPreparing%20of%20the%20food.mp4?alt=media&token=f5f2ca9e-8b18-4ef5-a23d-f681503a202a>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2F5w6GCLxbbM56ht4xVvfB%2FTyping%20on%20keyboard.mp4?alt=media&token=243abbec-529c-4f18-8d6a-0a11438d9321>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FYUL6uKKb1gybSatXhPxL%2FWithdrawing%20money.mp4?alt=media&token=38ba8503-aa1a-4c1c-a386-33c56638275e>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FkPM1kwa7S22XPb0ILrOA%2FDriving%20a%20car.mp4?alt=media&token=7623279e-f5dc-4af8-b151-db9196fe5624>" %}

{% file src="<https://1827718397-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Ffv8UOUo8jLOE1LBgyaJx%2Fuploads%2FtabSmuDukTLs1vUQydCc%2FPreparing%20a%20drink.mp4?alt=media&token=868fd908-e2ec-4022-865a-a14d79eae154>" %}

#### Use Cases

<mark style="background-color:purple;">Activity recognition, task planning, household robotics, procedural learning, assistive AI</mark>

### EgoSocial (Coming Soon)

Egocentric navigation through crowded, populated spaces. Contributors walk through markets, stations, malls, and public areas while the app captures crowd density, pedestrian flow, and cultural navigation context. All faces are automatically blurred on-device before data leaves the phone. Designed for social navigation, the critical unsolved problem of teaching robots implicit social rules across cultures.

#### Technical Specifications

| Resolution         | `1080p` egocentric video                                                          |
| ------------------ | --------------------------------------------------------------------------------- |
| Frame rate         | `30 fps`                                                                          |
| Target duration    | `5-12 minutes` per submission                                                     |
| Capture method     | Phone held naturally during walking through populated areas                       |
| Privacy            | On-device face-blur pipeline runs before upload                                   |
| On-device metadata | Timestamp, GPS, crowd density estimation, heading, speed                          |
| Validation         | Minimum 2 people visible, continuous forward movement, face-blur verified, 5+ min |
| Current volume     | Collection will begin soon                                                        |

#### Target Environments

<mark style="background-color:purple;">Street markets, train stations, shopping malls, university campuses, food courts, bus terminals, festivals, stadium exits</mark>

#### Use Cases

<mark style="background-color:purple;">Social navigation, crowd dynamics, pedestrian prediction, cultural norm learning, public space robotics, human-robot interaction</mark>

Social norms for personal space, queuing, and crowd navigation vary dramatically across cultures. RoboX's global contributor network captures this diversity from markets in Vietnam to stations in India to malls in Europe, something no single research lab can replicate.
