Architecture
How the four pieces of VelaOS talk to each other — and what happens when the network drops.
The four components
VelaOS system topologytext
┌───────────────────────────────────────────────────────────────┐
│ ADMIN'S BROWSER │
│ console.velaos.ch (Next.js 16) │
└──────────────────────────────┬────────────────────────────────┘
│ HTTPS + httpOnly JWT cookie
│
┌──────────────────────────────▼────────────────────────────────┐
│ VelaOS CLOUD API │
│ api.velaos.ch (Go + Echo v5) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Supabase │ │ EMQX MQTT │ │ Upstash │ │ Supabase │ │
│ │ Postgres │ │ (8883) │ │ Redis │ │ Storage │ │
│ └──────────┘ └─────┬─────┘ └──────────┘ └──────────────┘ │
└─────────────────────┬┴────────────────────────────────────────┘
│ TLS + mTLS cert per device
│ QoS 0 for heartbeats
│ QoS 1 for commands + reports
│
┌─────────────────────▼──────────────────────────────────────────┐
│ VelaOS AGENT │
│ (Kotlin + Jetpack Compose, Device Owner) │
│ │
│ Publishes: Subscribes: │
│ /heartbeat (60s) /commands │
│ /response (per ack) /policy │
│ /policy_report (5m) │
│ /apps_report (5m) │
└────────────────────────────────────────────────────────────────┘
│
┌─────────────────────▼──────────────────────────────────────────┐
│ VelaOS Base Image │
│ AlmaLinux 10.1 bootc · kernel 6.12 LTS · Greenboot + Podman │
│ UEFI x86-64-v2 (primary) · aarch64 (secondary) │
└────────────────────────────────────────────────────────────────┘Data flows
Enrollment (one-time per device)
- Device boots, contacts
api.velaos.ch/api/v1/enrollment/codeover HTTPS - API returns a 6-character VelaOS Code + MQTT broker address + per-device TLS cert
- Device shows code on screen; admin types it into the console
- Cloud moves device from
pendingtoapprovedand pushesapprovedcommand - Device subscribes to its MQTT topics and syncs current policy
Heartbeat (every 60 seconds, cloud-tunable)
Every device publishes to vela/{device_id}/heartbeat:
{
"ts": 1744545600,
"cpu_temp": 52.3,
"cpu_usage": 18,
"ram_used": 2100000000,
"ram_total": 8000000000,
"storage_used": 4100000000,
"storage_total": 31000000000,
"wifi_signal_dbm": -62,
"thermal_zones": [{"name": "cpu", "temp_c": 52.3}],
"cpu_governor": "schedutil",
"fan_rpm": 1800,
"power_sources": [{"source": "USB", "voltage_v": 5.1, "current_a": 1.9}],
"i2c_devices": [...],
"bluetooth_devices": [...],
"usb_devices": [...]
}The interval is configurable via policy.agent.heartbeat_interval_seconds.
Command dispatch (real-time)
- Admin clicks "Reboot" in console
- Console POSTs
/api/v1/devices/:id/reboot - API validates RBAC (
device.rebootpermission), writes audit log - API publishes to
vela/{device_id}/commandswith QoS 1 - Agent receives, executes, publishes ack to
/response - Console gets ack via WebSocket within ~100ms
Policy application
- Admin saves a policy in the editor
- API computes the effective policy for every device (tenant default → group → device override)
- API publishes
policy_updatecommands with the merged JSON - Agent calls
PolicyEngine.applyPolicy(json)— runs 75+ DPM API calls - Agent publishes a compliance report within 5 seconds
Protocol choices
- MQTT over TLS (port 8883) — chose over gRPC/HTTP long-polling because: persistent connection, tiny overhead per message, built-in last-will (we know instantly when a device disconnects), QoS 1 for at-least-once command delivery.
- Per-device mTLS cert — the EMQX broker validates every connection against a tenant-specific CA. A stolen device can't impersonate another.
- HTTPS for enrollment + file downloads — enrollment needs to work before MQTT is set up. File downloads use presigned URLs (1-hour expiry).
- WebSocket for console live updates — the console opens one WS to the API, which fans out device events it observed on MQTT.
Failure domains
What survives what failure
- Agent process crashes — Android restarts it within 5s (foreground service)
- Network drops — agent queues heartbeats, retries every 30s with backoff, applies last-known policy
- Cloud API down — devices stay operational on cached policy; can't receive new commands
- MQTT broker down — same as above; commands resume once broker returns
- Postgres down — API returns 503; devices unaffected
- Device lost entirely — 30 min offline triggers alert; admin can remote-wipe or write-off
Where data lives
- Postgres (Supabase, eu-central-2) — tenants, devices, groups, policies, audit log, compliance results
- Redis (Upstash) — session cache, rate limit counters, enrollment code TTLs
- Supabase Storage — APK files, OTA images, diagnostic bundles, screenshots, wallpapers
- Agent SecurePrefs (on-device) — enrollment config, last applied policy, agent tunables (encrypted AES-256-GCM)
Next steps
- Security model — Device Owner privileges, RBAC, certificate chain
- Device lifecycle — every state a device can be in
- MQTT topics reference — complete topic taxonomy
Was this helpful?
Updated 2026-04-14Edit on GitHub
