diff --git a/CLAUDE.md b/CLAUDE.md index c7cea4d..65c3f92 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -24,9 +24,19 @@ mix test test/systant_test.exs # Enter development shell (via Nix) nix develop -# Run dashboard (Phoenix LiveView) -cd dashboard && mix phx.server -# or use justfile: just dashboard +# Run both server and dashboard together (recommended) +just dev +# or directly: hivemind + +# Run components individually +just server # or: cd server && mix run --no-halt +just dashboard # or: cd dashboard && mix phx.server + +# Other just commands +just deps # Install dependencies for both projects +just compile # Compile both projects +just test # Run tests for both projects +just clean # Clean both projects ``` ### Production @@ -44,26 +54,53 @@ This is an Elixir OTP application that serves as a systemd daemon for MQTT-based ### Core Components - **Systant.Application** (`lib/systant/application.ex`): OTP application supervisor that starts the MQTT client -- **Systant.MqttClient** (`lib/systant/mqtt_client.ex`): GenServer that handles MQTT connection, publishes stats every 30 seconds, and listens for commands +- **Systant.MqttClient** (`lib/systant/mqtt_client.ex`): GenServer handling MQTT connection, metrics publishing, and command subscriptions +- **Systant.MqttHandler** (`lib/systant/mqtt_handler.ex`): Custom Tortoise handler for processing command messages with security validation +- **Systant.CommandExecutor** (`lib/systant/command_executor.ex`): Secure command execution engine with whitelist validation and audit logging +- **Systant.SystemMetrics** (`lib/systant/system_metrics.ex`): Comprehensive Linux system metrics collection with configuration support +- **Systant.Config** (`lib/systant/config.ex`): TOML-based configuration loader with environment variable overrides - **Dashboard.Application** (`dashboard/lib/dashboard/application.ex`): Phoenix LiveView dashboard application - **Dashboard.MqttSubscriber** (`dashboard/lib/dashboard/mqtt_subscriber.ex`): Real-time MQTT subscriber that feeds data to the LiveView dashboard -- **Configuration**: MQTT settings configurable via environment variables or config files ### Key Libraries - **Tortoise**: MQTT client library for pub/sub functionality - **Jason**: JSON encoding/decoding for message payloads +- **Toml**: TOML configuration file parsing +- **Phoenix LiveView**: Real-time dashboard framework ### MQTT Behavior -- Publishes "Hello from systant" messages with timestamp and hostname to stats topic every 30 seconds +- Publishes comprehensive system metrics (CPU, memory, disk, GPU, network, temperature, processes) to stats topic - Subscribes to commands topic for incoming events that can trigger user-customizable actions -- Uses randomized client ID to avoid conflicts across multiple hosts -- Sends immediate hello message on startup +- Uses hostname-based randomized client ID to avoid conflicts across multiple hosts +- Configurable startup delay (default 5 seconds) before first metrics publish +- Real-time metrics collection with configurable intervals + +### Configuration System +Systant uses a TOML-based configuration system with environment variable overrides: + +- **Config File**: `systant.toml` (current dir, `~/.config/systant/`, or `/etc/systant/`) +- **Module Control**: Enable/disable metric collection modules (cpu, memory, disk, gpu, network, temperature, processes, system) +- **Filtering Options**: Configurable filtering for disks, network interfaces, processes +- **Environment Overrides**: `MQTT_HOST`, `MQTT_PORT`, `SYSTANT_INTERVAL`, `SYSTANT_LOG_LEVEL` + +#### Key Configuration Sections +- `[general]`: Collection intervals, enabled modules +- `[mqtt]`: Broker settings, client ID prefix, credentials +- `[commands]`: Command execution settings, security options +- `[[commands.available]]`: User-defined command definitions with security parameters +- `[disk]`: Mount filtering, filesystem exclusions +- `[gpu]`: NVIDIA/AMD GPU limits and settings +- `[network]`: Interface filtering, traffic thresholds +- `[processes]`: Top process limits, sorting options +- `[temperature]`: CPU/sensor temperature monitoring ### Default Configuration -- **MQTT Host**: `mqtt.home` (not localhost) +- **MQTT Host**: `mqtt.home` (configurable via `MQTT_HOST`) - **Stats Topic**: `systant/${hostname}/stats` (per-host topics) -- **Command Topic**: `systant/${hostname}/commands` (per-host topics) -- **Publish Interval**: 30 seconds +- **Command Topic**: `systant/${hostname}/commands` (per-host topics) +- **Response Topic**: `systant/${hostname}/responses` (command responses) +- **Publish Interval**: 30 seconds (configurable via `SYSTANT_INTERVAL`) +- **Command System**: Enabled by default with example commands (restart, info, df, ps, ping) ### NixOS Deployment This project includes a complete Nix packaging and NixOS module: @@ -101,29 +138,54 @@ The project includes a Phoenix LiveView dashboard (`dashboard/`) that provides r ## Development Roadmap ### Phase 1: System Metrics Collection (Completed) -- ✅ **SystemMetrics Module**: `server/lib/systant/system_metrics.ex` - Comprehensive metrics collection -- ✅ **CPU Metrics**: Load averages (1/5/15min) and utilization via `:cpu_sup` -- ✅ **Memory Metrics**: System memory data and monitoring via `:memsup` -- ✅ **Disk Metrics**: Disk usage and capacity for all mounted drives via `:disksup` -- ✅ **System Info**: Uptime, Erlang/OTP versions, scheduler info -- ✅ **System Alarms**: Active os_mon alarms (disk_almost_full, memory_high_watermark, etc.) -- ✅ **MQTT Integration**: Real metrics published every 30 seconds replacing simple messages -- 🔄 **Network Metrics**: TODO - Interface statistics, bandwidth utilization -- 🔄 **GPU Metrics**: TODO - NVIDIA/AMD GPU utilization, temperatures, memory usage +- ✅ **SystemMetrics Module**: `server/lib/systant/system_metrics.ex` - Comprehensive metrics collection +- ✅ **CPU Metrics**: Load averages (1/5/15min) via `/proc/loadavg` +- ✅ **Memory Metrics**: System memory data via `/proc/meminfo` with usage percentages +- ✅ **Disk Metrics**: Disk usage and capacity via `df` command with configurable filtering +- ✅ **GPU Metrics**: NVIDIA (nvidia-smi) and AMD (rocm-smi) GPU monitoring with temperature, utilization, memory +- ✅ **Network Metrics**: Interface statistics via `/proc/net/dev` with traffic filtering +- ✅ **Temperature Metrics**: CPU temperature and lm-sensors data via system files and `sensors` command +- ✅ **Process Metrics**: Top processes by CPU/memory via `ps` command with configurable limits +- ✅ **System Info**: Uptime via `/proc/uptime`, kernel version, OS info, Erlang runtime data +- ✅ **MQTT Integration**: Real metrics published with configurable intervals replacing simple messages +- ✅ **Configuration System**: Complete TOML-based configuration with environment overrides +- ✅ **Dashboard Integration**: Phoenix LiveView dashboard with real-time graphical metrics display #### Implementation Details -- Uses Erlang's built-in `:os_mon` application (cpu_sup, memsup, disksup) -- Collects active system alarms from `:alarm_handler` with structured format -- Graceful error handling with fallbacks when metrics unavailable -- JSON payload structure: `{timestamp, hostname, cpu, memory, disk, system, alarms}` -- Dashboard automatically receives and displays real-time system data and alerts -- Alarm format: `{severity, path/details, id}` for clean consumption +- Uses Linux native system commands and `/proc` filesystem for accuracy over Erlang os_mon +- Configuration-driven metric collection with per-module enable/disable capabilities +- Advanced filtering: disk mounts/types, network interfaces, process thresholds +- Graceful error handling with fallbacks when commands/files unavailable +- JSON payload structure: `{timestamp, hostname, cpu, memory, disk, gpu, network, temperature, processes, system}` +- Dashboard displays metrics as progress bars and cards with color-coded status indicators +- TOML configuration with environment variable overrides for deployment flexibility -### Phase 2: Command System -- Subscribe to `systant/+/commands` in MqttClient -- Implement secure command execution framework with validation/whitelisting -- Support commands like: restart services, update packages, system queries -- Response mechanism to send command results back via MQTT +### Phase 2: Command System (Completed) +- ✅ **Command Execution**: `server/lib/systant/command_executor.ex` - Secure command processing with whitelist validation +- ✅ **MQTT Handler**: `server/lib/systant/mqtt_handler.ex` - Custom Tortoise handler for command message processing +- ✅ **User Configuration**: Commands fully configurable via `systant.toml` with security parameters +- ✅ **MQTT Integration**: Commands via `systant/{hostname}/commands`, responses via `systant/{hostname}/responses` +- ✅ **Security Features**: Whitelist-only execution, parameter validation, timeouts, comprehensive logging +- ✅ **Built-in Commands**: `list` command shows all available user-defined commands + +#### Command System Features +- **User-Configurable Commands**: Define custom commands in `systant.toml` with triggers, allowed parameters, timeouts +- **Enterprise Security**: No arbitrary shell execution, strict parameter validation, execution timeouts +- **Simple Interface**: Send `{"command":"trigger","params":[...]}`, receive structured JSON responses +- **Request Tracking**: Auto-generated request IDs for command/response correlation +- **Comprehensive Logging**: Full audit trail of all command executions with timing and results + +#### Example Command Usage +```bash +# Send commands via MQTT +mosquitto_pub -t "systant/hostname/commands" -m '{"command":"list"}' +mosquitto_pub -t "systant/hostname/commands" -m '{"command":"info"}' +mosquitto_pub -t "systant/hostname/commands" -m '{"command":"df","params":["/home"]}' +mosquitto_pub -t "systant/hostname/commands" -m '{"command":"restart","params":["nginx"]}' + +# Listen for responses +mosquitto_sub -t "systant/+/responses" +``` ### Phase 3: Home Assistant Integration - Custom MQTT integration following Home Assistant patterns diff --git a/Procfile b/Procfile new file mode 100644 index 0000000..faf3c0b --- /dev/null +++ b/Procfile @@ -0,0 +1,2 @@ +server: cd server && mix run --no-halt +dashboard: cd dashboard && mix phx.server \ No newline at end of file diff --git a/README.md b/README.md index 07d8f11..d14e9cc 100644 --- a/README.md +++ b/README.md @@ -35,14 +35,26 @@ A comprehensive Elixir-based system monitoring solution with real-time dashboard # Enter Nix development shell nix develop -# Run the server -cd server && mix run --no-halt +# Run both server and dashboard together (recommended) +just dev -# Run the dashboard (separate terminal) -just dashboard -# or: cd dashboard && mix phx.server +# Or run components individually +just server # Start systant server +just dashboard # Start Phoenix LiveView dashboard + +# Other development commands +just deps # Install dependencies for both projects +just compile # Compile both projects +just test # Run tests for both projects +just clean # Clean both projects ``` +#### Hivemind Process Management +The project uses Hivemind for managing multiple processes during development: +- Server runs on MQTT publishing system metrics every 30 seconds +- Dashboard runs on http://localhost:4000 with real-time LiveView interface +- Color-coded logs for easy debugging (server=green, dashboard=yellow) + ### Production Deployment (NixOS) ```bash # Build and install via Nix diff --git a/justfile b/justfile new file mode 100644 index 0000000..03502d8 --- /dev/null +++ b/justfile @@ -0,0 +1,33 @@ +# Systant development tasks + +# Start both server and dashboard +dev: + hivemind + +# Start just the server +server: + cd server && mix run --no-halt + +# Start just the dashboard +dashboard: + cd dashboard && mix phx.server + +# Install dependencies for both projects +deps: + cd server && mix deps.get + cd dashboard && mix deps.get + +# Compile both projects +compile: + cd server && mix compile + cd dashboard && mix compile + +# Run tests for both projects +test: + cd server && mix test + cd dashboard && mix test + +# Clean both projects +clean: + cd server && mix clean + cd dashboard && mix clean \ No newline at end of file