Architecture

How the four pieces of VelaOS talk to each other — and what happens when the network drops.

The four components

VelaOS system topologytext
┌───────────────────────────────────────────────────────────────┐
│                        ADMIN'S BROWSER                        │
│                 console.velaos.ch (Next.js 16)                │
└──────────────────────────────┬────────────────────────────────┘
                               │ HTTPS + httpOnly JWT cookie
                               │
┌──────────────────────────────▼────────────────────────────────┐
│                    VelaOS CLOUD API                           │
│              api.velaos.ch (Go + Echo v5)                     │
│                                                               │
│  ┌──────────┐  ┌───────────┐  ┌──────────┐  ┌──────────────┐  │
│  │ Supabase │  │ EMQX MQTT │  │ Upstash  │  │ Supabase     │  │
│  │ Postgres │  │  (8883)   │  │  Redis   │  │ Storage      │  │
│  └──────────┘  └─────┬─────┘  └──────────┘  └──────────────┘  │
└─────────────────────┬┴────────────────────────────────────────┘
                      │  TLS + mTLS cert per device
                      │  QoS 0 for heartbeats
                      │  QoS 1 for commands + reports
                      │
┌─────────────────────▼──────────────────────────────────────────┐
│                    VelaOS AGENT                                │
│            (Kotlin + Jetpack Compose, Device Owner)            │
│                                                                │
│    Publishes:                     Subscribes:                  │
│    /heartbeat   (60s)             /commands                    │
│    /response    (per ack)         /policy                      │
│    /policy_report  (5m)                                        │
│    /apps_report    (5m)                                        │
└────────────────────────────────────────────────────────────────┘
                      │
┌─────────────────────▼──────────────────────────────────────────┐
│                 VelaOS Base Image                              │
│   AlmaLinux 10.1 bootc · kernel 6.12 LTS · Greenboot + Podman  │
│   UEFI x86-64-v2 (primary) · aarch64 (secondary)               │
└────────────────────────────────────────────────────────────────┘

Data flows

Enrollment (one-time per device)

  1. Device boots, contacts api.velaos.ch/api/v1/enrollment/code over HTTPS
  2. API returns a 6-character VelaOS Code + MQTT broker address + per-device TLS cert
  3. Device shows code on screen; admin types it into the console
  4. Cloud moves device from pending to approved and pushes approved command
  5. Device subscribes to its MQTT topics and syncs current policy

Heartbeat (every 60 seconds, cloud-tunable)

Every device publishes to vela/{device_id}/heartbeat:

{
  "ts": 1744545600,
  "cpu_temp": 52.3,
  "cpu_usage": 18,
  "ram_used": 2100000000,
  "ram_total": 8000000000,
  "storage_used": 4100000000,
  "storage_total": 31000000000,
  "wifi_signal_dbm": -62,
  "thermal_zones": [{"name": "cpu", "temp_c": 52.3}],
  "cpu_governor": "schedutil",
  "fan_rpm": 1800,
  "power_sources": [{"source": "USB", "voltage_v": 5.1, "current_a": 1.9}],
  "i2c_devices": [...],
  "bluetooth_devices": [...],
  "usb_devices": [...]
}

The interval is configurable via policy.agent.heartbeat_interval_seconds.

Command dispatch (real-time)

  1. Admin clicks "Reboot" in console
  2. Console POSTs /api/v1/devices/:id/reboot
  3. API validates RBAC (device.reboot permission), writes audit log
  4. API publishes to vela/{device_id}/commands with QoS 1
  5. Agent receives, executes, publishes ack to /response
  6. Console gets ack via WebSocket within ~100ms

Policy application

  1. Admin saves a policy in the editor
  2. API computes the effective policy for every device (tenant default → group → device override)
  3. API publishes policy_update commands with the merged JSON
  4. Agent calls PolicyEngine.applyPolicy(json) — runs 75+ DPM API calls
  5. Agent publishes a compliance report within 5 seconds

Protocol choices

  • MQTT over TLS (port 8883) — chose over gRPC/HTTP long-polling because: persistent connection, tiny overhead per message, built-in last-will (we know instantly when a device disconnects), QoS 1 for at-least-once command delivery.
  • Per-device mTLS cert — the EMQX broker validates every connection against a tenant-specific CA. A stolen device can't impersonate another.
  • HTTPS for enrollment + file downloads — enrollment needs to work before MQTT is set up. File downloads use presigned URLs (1-hour expiry).
  • WebSocket for console live updates — the console opens one WS to the API, which fans out device events it observed on MQTT.

Failure domains

What survives what failure

  • Agent process crashes — Android restarts it within 5s (foreground service)
  • Network drops — agent queues heartbeats, retries every 30s with backoff, applies last-known policy
  • Cloud API down — devices stay operational on cached policy; can't receive new commands
  • MQTT broker down — same as above; commands resume once broker returns
  • Postgres down — API returns 503; devices unaffected
  • Device lost entirely — 30 min offline triggers alert; admin can remote-wipe or write-off

Where data lives

  • Postgres (Supabase, eu-central-2) — tenants, devices, groups, policies, audit log, compliance results
  • Redis (Upstash) — session cache, rate limit counters, enrollment code TTLs
  • Supabase Storage — APK files, OTA images, diagnostic bundles, screenshots, wallpapers
  • Agent SecurePrefs (on-device) — enrollment config, last applied policy, agent tunables (encrypted AES-256-GCM)

Next steps

Was this helpful?
Updated 2026-04-14Edit on GitHub