Operations Dashboard¶
Operations Dashboard is the live operational status tab for the whole Aurora stack.
What it shows¶
- source-host probe health
- local and remote storage pressure
- Aurora Power Supply battery voltage from
DCInverterVolts, scored green above52 V, amber from50-52 V, and red below50 V - Aurora Power Supply battery state of charge from
BatterySOC, scored green at or above50 %, amber from25-50 %, and red below25 % - Aurora Power Supply internal temperature from
InternalTemperature, scored green below40 C, amber from40-45 C, and red at45 Cor above - source sync and processing health
- HATPRO source, local mirror, GWS mirror, Zarr-build, and quicklook health alongside the other science streams
- dashboard performance-log freshness, including whether browser activity is
still being written to
/data/aurora/products/dashboard/dashboard_perf.jsonl - dashboard HTTP endpoint health and response time
- dashboard and infrastructure git cleanliness and local ahead/behind counts
- recent dashboard render-performance statistics, including p50, p95, slowest timed event, and live-session counts
- root-cause grouping for source computers, network/source sync, local processing, GWS transfer, and dashboard/render behavior
- seven-day trend cards for worst storage pressure, battery SOC, battery voltage, worst source lag, and worst GWS lag
- GWS transfer status
- mirror verification and prune-readiness indicators
- per-stream archive state, including WXcam backfill progress
Performance-log freshness and render-performance statistics are diagnostic signals only. They stay visible on the Operations Dashboard and in the health reports, but they do not drive the top-level Overall action state.
Most source streams are marked stale after 1.5 h without new source files.
HATPRO is deliberately marked stale only after 3 h because it lands in hourly
batches; this requires two missed batches before the Operations Dashboard calls
the source stale.
The storage cards are intentionally broken out as:
- CL61 root and CL61 data
- ASS data and ASS root
- APS data and APS root
- AURORA Cloud product and AURORA Cloud root
- JASMIN GWS
Each card subtitle uses the resolved pwd -P path that was actually probed for
filesystem usage.
The trend cards read the operations Zarr directly and are cached briefly in the
dashboard process. They are intended as quick context, not as a replacement for
the archived HK_Operations plots.
Display model¶
This tab reads the latest operations snapshot directly rather than waiting for an archived quicklook to exist. That means a fresh deployment can show the live Operations tab before the archived operations PNGs have accumulated enough samples to plot. Archive traffic lights are based on settled mirror health, so a stream stays green when the verified GWS archive has no missing or mismatched files even if the newest just-arrived source file has not yet landed in the next transfer batch.
Archived products¶
The archived operations products live under:
/data/aurora/products/quicklooks/ops_monitor/data/aurora/products/ops_monitor/health
These include:
- summary quicklooks
HK_Operations- observe-only health JSON and daily Markdown reports
Email Alerts¶
Operations alert email is handled by send_ops_alerts.py, normally from
aurora-ops-monitor-alerts.timer. It evaluates the same latest snapshot used by
the dashboard and emails gamb2le@ncas.ac.uk for storage pressure at 80 %,
battery SOC at or below 20 %, APS internal temperature at or above 45 C,
battery voltage below 50 V, and stream-health problems that persist for
3 h.
The service keeps state under /data/aurora/products/ops_monitor/alerts so it
can send initial, repeat, and recovery messages without spamming every timer
tick. The deployed delivery path is intended to be mailx backed by msmtp or
another sendmail-compatible outbound relay.
Detailed product documentation: