Step-By-Step Guide

1. Joining the Campaign

Each data collection effort starts with a campaign, defining what data is needed, where it’s collected, and how contributors are compensated.

Campaigns specify:

  • Target objective (e.g. indoor navigation, perception)

  • Required sensors

  • Geographic scope and duration

  • Compensation rate

Campaign types include: navigation, perception, environmental sensing, and motion capture.


2. Data Collection

Collectors join campaigns through the RoboX mobile app.

  • Data is collected passively in the background during normal activity

  • Sensor sampling rates are campaign-defined

  • Data is encrypted and buffered locally

  • Collection pauses automatically if quality thresholds aren’t met

Uploads occur automatically when device and network conditions allow.


3. On-Device Anonymization

All anonymization happens before upload.

  • Locations are coarse-grained and hashed

  • Faces, license plates, and identifying visuals are masked on-device

  • Hardware identifiers are stripped

  • Pseudonymous IDs rotate per campaign

  • Timestamps are slightly randomized to prevent correlation

Raw, identifiable data never leaves the device.


4. Upload & Validation

  • Anonymized data is uploaded over encrypted connections and validated automatically.

    Validation checks include:

    • Completeness and continuity

    • Sensor plausibility

    • Duplicate detection

    • Cross-validation against nearby collectors

    Only validated data qualifies for compensation.


5. Aggregation & Dataset Creation

Validated data is standardized, annotated, and grouped into datasets by:

  • Campaign

  • Geography

  • Sensor modality

  • Intended use case

Metadata includes collection context, device characteristics, and quality scores.


6. Data Access

Authorized users access datasets via the RoboX API:

  • Streaming for continuous ingestion

  • Batch downloads for offline training

  • Query interface for exploration and preview

Datasets are provided in ML-ready formats (TensorFlow, PyTorch, or raw).

Each dataset includes documentation covering methodology, anonymization, limitations, and recommended use.

Last updated