RoboX Lens
RoboX Lens is a head-mounted wearable device designed for capturing high-fidelity egocentric data. While smartphones provide broad geographic coverage and accessibility, Lens addresses the limitations of phone-based collection for applications requiring precise first-person perspective.
Limits of Handheld Data for Imitation Learning
Smartphone data scales well, but it has structural limits for imitation learning.
Inconsistent device placement Phones live in pockets, hands, bags, or dashboards. Each position produces different signals, often missing stable video, head orientation, or a true human viewpoint. That variability is useful for studying phone usage, but it breaks consistency for training humanoid robots.
Missing attention signals A phone does not track where a person is looking. Head turns, gaze shifts, and visual scanning are critical cues for navigation and decision-making, are largely lost. For imitation learning, knowing what a human attended to matters as much as where they moved.
Behavior distortion Holding a phone changes how people walk and interact. Conscious recording leads to less natural movement, reducing data quality.
RoboX Lens removes these constraints by capturing stable, head-level, hands-free egocentric data aligned with natural human behavior.
Device Specifications
Form Factor
Standard glasses frame, lightweight (~45g)
RGB Camera
12MP, 120° FOV, aligned with gaze direction
IMU
9-axis (accelerometer, gyroscope, magnetometer), 200Hz sampling
GPS
Phone-assisted for reduced power consumption
Microphone
Dual-mic array for directional audio
Storage
32GB onboard, automatic sync to phone
Battery
4+ hours continuous recording
Connectivity
Bluetooth 5.0 to companion phone app
Data Captured
Gaze-Aligned Video
The camera points where the wearer looks. When they turn their head to check for traffic, the video captures that view. When they glance at a sign, the video shows the sign. It creates natural attention labels without explicit annotation.
Head Orientation & Movement
High-frequency IMU data captures:
Head rotation (yaw, pitch, roll)
Movement patterns during walking
Micro-movements during standing/observation
Vestibular-ocular patterns
The data is particularly valuable for humanoid robot training, where natural head movement is part of realistic behavior.
6DoF Trajectory
Combining IMU integration with visual odometry, Lens produces full 6-degree-of-freedom position tracking:
X, Y, Z position
Roll, pitch, yaw orientation
That trajectory data shows not just where someone went, but how they moved through space, their gait, their pauses, their navigation decisions.
Directional Audio
The dual-microphone array captures spatial audio information:
Sound source direction estimation
Environmental acoustic characteristics
Audio-visual correspondence (what sounds accompany what views)
Supports research into audio-based navigation and multi-modal perception.
Privacy Design
Lens incorporates privacy protections consistent with the RoboX platform:
On-Device Processing
The companion phone app processes Lens data before upload:
Face detection and blurring
License plate masking
Voice removal from audio
Location anonymization
Raw data never leaves the local device-Lens system.
Recording Indicators
Lens includes visible LED indicators when recording is active. This provides transparency to people in the wearer's environment.
Consent Framework
Pilot program participants agree to collection guidelines specifying appropriate recording contexts. Public spaces only: no private residences, workplaces, or sensitive locations.
Research Applications
Humanoid Robotics
Training robots to move and behave naturally requires data from human movement. Lens captures the full sensory context of human navigation: what the person saw, how their head moved, how they responded to their environment.
Research has shown significant performance improvements when training humanoid control policies on head-mounted egocentric data versus alternative data sources.
Indoor Navigation
Lens data supports research into:
Visual place recognition (identifying locations from first-person views)
Path planning in complex environments
Obstacle avoidance and spatial reasoning
Wayfinding assistance systems
Contextual AI
Understanding human activity and context benefits from first-person perspective:
Activity recognition (what is the person doing?)
Object interaction (how do people manipulate objects?)
Scene understanding (what's happening in this environment?)
Attention modeling (what do people look at and why?)
Accessibility Technology
Lens data informs development of:
Navigation assistance for visually impaired users
Spatial audio interfaces
Environmental awareness systems
Obstacle warning systems
Future Development
Beyond the initial pilot, RoboX Lens development roadmap includes:
Hardware Iterations
Reduced weight and improved comfort
Extended battery life
Higher resolution sensors
Prescription lens compatibility
Expanded Sensors
Eye tracking (gaze direction within the frame)
Depth sensing (structured light or ToF)
Additional environmental sensors
Consumer Availability
Long-term goal is broader availability beyond research programs, pending pilot learnings and market development.
Last updated