Boot failure & recovery

Greenboot rolled back, the device is stuck at the splash, or it won't get past the firmware. Here's the decision tree.

VelaOS Linux has three independent safety nets that usually prevent a bad update from bricking a device: cosign signature verification before image activation, Greenboot health checks after the new slot boots, and A/B rollback to the previous slot on any greenboot failure. If all three fire, you still have an AlmaLinux minimal recovery install underneath.

Symptom: device rebooted itself + came back on an older version

That's Greenboot auto-rollback doing its job. One or more of the/etc/greenboot/check/required.d/*.sh health checks exited non-zero, so bootc reverted to the previous A/B slot.

Check which check failed via the console:

  1. Open Devices, find the device by VelaOS Code or hostname.
  2. Scroll to the History tab. Look for a bootc.rollbackaudit entry. Its details include the failing greenboot check name.
  3. If you need the full check output, Request diagnostics from the Actions menu — the agent uploads journalctl -u greenboot-healthcheckand the last 200 lines of /var/log/messages.

The most common rollback triggers:

  • 10-selinux-enforcing — a recent policy merge put SELinux into permissive. Revert the policy bump.
  • 40-cosign-verified — your cosign verifier config drifted. Check the public key bundle on the image.
  • 50-agent-healthy — vela-agent failed to heartbeat within 60 s. Usually means the image has a broken agent build.
  • 60-luks-mounted — firmware update changed a PCR; the TPM unseal can't see the right measurements.

Symptom: stuck at the Plymouth splash

That's the kernel phase hanging — most often a driver that panics on this SKU. Press Esc to dismiss Plymouth and read the console. If you see a kernel oops, capture it via the device's serial console (Wyse 5070 has a TTL header) and open a support ticket.

If pressing Esc shows nothing useful, boot once with e at GRUB to edit the cmdline, remove quiet splash, and boot. The verbose kernel log usually names the culprit module within the first two seconds of hang.

Symptom: TPM unseal prompts for recovery passphrase

Firmware update or Secure Boot policy change invalidated the PCR measurement Clevis uses. Enter the tenant recovery passphrase from the Settings → Certificatespage to unlock the disk. The agent then automatically rotates the LUKS key slot on next boot so subsequent reboots skip the prompt.

Symptom: cosign signature verification failed

bootc refuses to activate the new slot. The device stays on its previous version and logs bootc.cosign.failed to the audit log. Possible causes:

  • Cosign key rotation not deployed — push the new public key via a policy drop, then retry the update.
  • Registry tampering — the image digest served by the registry doesn't match what Sigstore attests. Escalate immediately; this is a supply-chain red flag.
  • Clock skew — cosign transparency-log verification rejects signatures from the future. Check that chrony has synced; the device needs working NTP.

Symptom: device won't boot at all

Cosign + greenboot handled every update path. The only way to truly fail is a drive or firmware hardware problem. Workflow:

  1. Boot from the AlmaLinux recovery USB you used to provision (or any AlmaLinux 10 live ISO).
  2. At the shell:
    bootc status --json
    If bootc status reports a healthy A slot, run bootc rollback to swap back. If both slots are corrupt, reprovision with bootc switch.
  3. If the disk itself is failing (SMART errors), swap the drive and reprovision from scratch per Install.

Useful commands

bootc status --json | jq             # current + staged deployment summary
bootc rollback                       # revert to previous slot manually
journalctl -u greenboot-healthcheck   # exact check outputs
journalctl -u vela-agent -n 200      # agent log around the failure
cat /sys/kernel/security/lockdown    # confirms lockdown=confidentiality took effect

When to reprovision instead of troubleshooting

VelaOS devices are cattle, not pets. If the problem isn't obviously hardware (SMART errors, fan failure) and a support round-trip hasn't produced a fix in 30 minutes, reprovision. The flow is identical to first install but takes ~4 minutes because the AlmaLinux base is already on disk — bootc switch against a known-good tag and enrol again. The device's policy and group membership come back automatically.

Was this helpful?
Updated 2026-04-18Edit on GitHub