systant/CLAUDE.md
ryan 1b58dfad31 Implement graceful shutdown for MQTT connection failures
- Added connection verification with timeout-based validation on startup
- Enhanced error handling to catch Tortoise.publish_sync exceptions
- Graceful exit via System.stop(1) prevents erl_crash.dump files
- Added comprehensive logging for connection failures and shutdown reasons
- Updated CLAUDE.md to document graceful shutdown behavior

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 19:21:57 -07:00

12 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Common Commands

Development

# Install dependencies
mix deps.get

# Compile the project
mix compile

# Run in development (non-halt mode)
mix run --no-halt

# Run tests
mix test

# Run specific test
mix test test/systant_test.exs

# Enter development shell (via Nix)
nix develop

# Run both server and dashboard together (recommended)
just dev
# or directly: hivemind

# Run components individually
just server      # or: cd server && mix run --no-halt
just dashboard   # or: cd dashboard && mix phx.server

# Other just commands
just deps        # Install dependencies for both projects
just compile     # Compile both projects
just test        # Run tests for both projects
just clean       # Clean both projects

Production

# Build production release
MIX_ENV=prod mix release

# Run production release
_build/prod/rel/systant/bin/systant start

Architecture Overview

This is an Elixir OTP application that serves as a systemd daemon for MQTT-based system monitoring, designed for deployment across multiple NixOS hosts to integrate with Home Assistant.

Core Components

  • Systant.Application (lib/systant/application.ex): OTP application supervisor that starts the MQTT client
  • Systant.MqttClient (lib/systant/mqtt_client.ex): GenServer handling MQTT connection, metrics publishing, and command subscriptions
  • Systant.MqttHandler (lib/systant/mqtt_handler.ex): Custom Tortoise handler for processing command messages with security validation
  • Systant.CommandExecutor (lib/systant/command_executor.ex): Secure command execution engine with whitelist validation and audit logging
  • Systant.SystemMetrics (lib/systant/system_metrics.ex): Comprehensive Linux system metrics collection with configuration support
  • Systant.Config (lib/systant/config.ex): TOML-based configuration loader with environment variable overrides
  • Dashboard.Application (dashboard/lib/dashboard/application.ex): Phoenix LiveView dashboard application
  • Dashboard.MqttSubscriber (dashboard/lib/dashboard/mqtt_subscriber.ex): Real-time MQTT subscriber that feeds data to the LiveView dashboard

Key Libraries

  • Tortoise: MQTT client library for pub/sub functionality
  • Jason: JSON encoding/decoding for message payloads
  • Toml: TOML configuration file parsing
  • Phoenix LiveView: Real-time dashboard framework

MQTT Behavior

  • Publishes comprehensive system metrics (CPU, memory, disk, GPU, network, temperature, processes) to stats topic
  • Subscribes to commands topic for incoming events that can trigger user-customizable actions
  • Uses hostname-based randomized client ID to avoid conflicts across multiple hosts
  • Configurable startup delay (default 5 seconds) before first metrics publish
  • Real-time metrics collection with configurable intervals
  • Connection verification: Tests MQTT connectivity on startup with timeout-based validation
  • Graceful shutdown: Exits cleanly via System.stop(1) when MQTT broker unavailable (prevents crash dumps)

Configuration System

Systant uses a TOML-based configuration system with environment variable overrides:

  • Config File: systant.toml (current dir, ~/.config/systant/, or /etc/systant/)
  • Module Control: Enable/disable metric collection modules (cpu, memory, disk, gpu, network, temperature, processes, system)
  • Filtering Options: Configurable filtering for disks, network interfaces, processes
  • Environment Overrides: MQTT_HOST, MQTT_PORT, SYSTANT_INTERVAL, SYSTANT_LOG_LEVEL

Key Configuration Sections

  • [general]: Collection intervals, enabled modules
  • [mqtt]: Broker settings, client ID prefix, credentials
  • [commands]: Command execution settings, security options
  • [[commands.available]]: User-defined command definitions with security parameters
  • [disk]: Mount filtering, filesystem exclusions
  • [gpu]: NVIDIA/AMD GPU limits and settings
  • [network]: Interface filtering, traffic thresholds
  • [processes]: Top process limits, sorting options
  • [temperature]: CPU/sensor temperature monitoring

Default Configuration

  • MQTT Host: mqtt.home (configurable via MQTT_HOST)
  • Stats Topic: systant/${hostname}/stats (per-host topics)
  • Command Topic: systant/${hostname}/commands (per-host topics)
  • Response Topic: systant/${hostname}/responses (command responses)
  • Publish Interval: 30 seconds (configurable via SYSTANT_INTERVAL)
  • Command System: Enabled by default with example commands (restart, info, df, ps, ping)

NixOS Deployment

This project includes a complete Nix packaging and NixOS module:

  • Package: nix/package.nix - Builds the Elixir release using beamPackages.mixRelease
  • Module: nix/nixos-module.nix - Provides services.systant configuration options
  • Development: Use nix develop for development shell with Elixir/Erlang

The NixOS module supports:

  • Configurable MQTT connection settings
  • Per-host topic naming using ${config.networking.hostName}
  • Environment variable configuration for runtime settings
  • Systemd service with security hardening
  • Auto-restart and logging to systemd journal

Dashboard

The project includes a Phoenix LiveView dashboard (dashboard/) that provides real-time monitoring of all systant instances.

Dashboard Features

  • Real-time host status updates via MQTT subscription
  • LiveView interface showing all connected hosts
  • Automatic reconnection and error handling

Dashboard MQTT Configuration

  • Subscribes to systant/+/stats to receive updates from all hosts
  • Uses hostname-based client ID: systant-dashboard-${hostname} to avoid conflicts
  • Connects to mqtt.home:1883 (same broker as systant instances)

Important Implementation Notes

  • Tortoise Handler: The handle_message/3 callback must return {:ok, state}, not []
  • Topic Parsing: Topics may arrive as lists or strings, handle both formats
  • Client ID Conflicts: Use unique client IDs to prevent connection instability

Development Roadmap

Phase 1: System Metrics Collection (Completed)

  • SystemMetrics Module: server/lib/systant/system_metrics.ex - Comprehensive metrics collection
  • CPU Metrics: Load averages (1/5/15min) via /proc/loadavg
  • Memory Metrics: System memory data via /proc/meminfo with usage percentages
  • Disk Metrics: Disk usage and capacity via df command with configurable filtering
  • GPU Metrics: NVIDIA (nvidia-smi) and AMD (rocm-smi) GPU monitoring with temperature, utilization, memory
  • Network Metrics: Interface statistics via /proc/net/dev with traffic filtering
  • Temperature Metrics: CPU temperature and lm-sensors data via system files and sensors command
  • Process Metrics: Top processes by CPU/memory via ps command with configurable limits
  • System Info: Uptime via /proc/uptime, kernel version, OS info, Erlang runtime data
  • MQTT Integration: Real metrics published with configurable intervals replacing simple messages
  • Configuration System: Complete TOML-based configuration with environment overrides
  • Dashboard Integration: Phoenix LiveView dashboard with real-time graphical metrics display

Implementation Details

  • Uses Linux native system commands and /proc filesystem for accuracy over Erlang os_mon
  • Configuration-driven metric collection with per-module enable/disable capabilities
  • Advanced filtering: disk mounts/types, network interfaces, process thresholds
  • Graceful error handling with fallbacks when commands/files unavailable
  • JSON payload structure: {timestamp, hostname, cpu, memory, disk, gpu, network, temperature, processes, system}
  • Dashboard displays metrics as progress bars and cards with color-coded status indicators
  • TOML configuration with environment variable overrides for deployment flexibility

Phase 2: Command System (Completed)

  • Command Execution: server/lib/systant/command_executor.ex - Secure command processing with whitelist validation
  • MQTT Handler: server/lib/systant/mqtt_handler.ex - Custom Tortoise handler for command message processing
  • User Configuration: Commands fully configurable via systant.toml with security parameters
  • MQTT Integration: Commands via systant/{hostname}/commands, responses via systant/{hostname}/responses
  • Security Features: Whitelist-only execution, parameter validation, timeouts, comprehensive logging
  • Built-in Commands: list command shows all available user-defined commands

Command System Features

  • User-Configurable Commands: Define custom commands in systant.toml with triggers, allowed parameters, timeouts
  • Enterprise Security: No arbitrary shell execution, strict parameter validation, execution timeouts
  • Simple Interface: Send {"command":"trigger","params":[...]}, receive structured JSON responses
  • Request Tracking: Auto-generated request IDs for command/response correlation
  • Comprehensive Logging: Full audit trail of all command executions with timing and results

Example Command Usage

# Send commands via MQTT
mosquitto_pub -t "systant/hostname/commands" -m '{"command":"list"}'
mosquitto_pub -t "systant/hostname/commands" -m '{"command":"info"}'
mosquitto_pub -t "systant/hostname/commands" -m '{"command":"df","params":["/home"]}'
mosquitto_pub -t "systant/hostname/commands" -m '{"command":"restart","params":["nginx"]}'

# Listen for responses
mosquitto_sub -t "systant/+/responses"

Phase 3: Home Assistant Integration (Completed)

  • MQTT Auto-Discovery: server/lib/systant/ha_discovery.ex - Publishes HA discovery configurations for automatic device registration
  • Device Registration: Creates unified "Systant {hostname}" device in Home Assistant with comprehensive sensor suite
  • Sensor Auto-Discovery: CPU load averages, memory usage, system uptime, temperatures, GPU metrics, disk usage, network throughput
  • Configuration Integration: TOML-based enable/disable with homeassistant.discovery_enabled setting
  • Value Templates: Proper JSON path extraction for nested metrics data with error handling
  • Real-time Updates: Seamless integration with existing MQTT stats publishing - no additional topics needed

Home Assistant Integration Features

  • Automatic Discovery: No custom integration required - uses standard MQTT discovery protocol
  • Device Grouping: All sensors grouped under single "Systant {hostname}" device for clean organization
  • Comprehensive Metrics: CPU, memory, disk, GPU (NVIDIA/AMD), network throughput, temperature, and system sensors
  • Configuration Control: Enable/disable discovery via systant.toml configuration
  • Template Flexibility: Advanced Jinja2 templates handle optional/missing data gracefully
  • Topic Structure: Discovery on homeassistant/#, stats remain on systant/{hostname}/stats

Setup Instructions

  1. Configure MQTT Discovery: Set homeassistant.discovery_enabled = true in systant.toml
  2. Start Systant: Discovery messages published automatically on startup (1s after MQTT connection)
  3. Check Home Assistant: Device and sensors appear automatically in MQTT integration
  4. Verify Metrics: All sensors should show current values within 30 seconds

Available Sensors

  • CPU: Load averages (1m, 5m, 15m), temperature
  • Memory: Usage percentage, used/total in GB
  • Disk: Root and home filesystem usage percentages
  • GPU: NVIDIA/AMD utilization, temperature, memory usage
  • Network: RX/TX throughput in MB/s for primary interface (real-time bandwidth monitoring)
  • System: Uptime in hours, kernel version, online status

Future Plans

  • Multi-host deployment for comprehensive system monitoring
  • Advanced alerting and threshold monitoring
  • Historical data retention and trending