## System Metrics Collection
- Add SystemMetrics module with CPU, memory, disk, and system info collection
- Integrate Erlang :os_mon application (cpu_sup, memsup, disksup)
- Collect and format active system alarms with structured JSON output
- Replace simple "Hello" messages with rich system data in MQTT payloads
## MQTT Integration
- Update MqttClient to publish comprehensive metrics every 30 seconds
- Add :os_mon to application dependencies for system monitoring
- Maintain backward compatibility with existing dashboard consumption
## Documentation Updates
- Update CLAUDE.md with Phase 1 completion status and implementation details
- Completely rewrite README.md to reflect current project capabilities
- Document alarm format, architecture, and development workflow
## Technical Improvements
- Graceful error handling for metrics collection failures
- Clean alarm formatting: {severity, path/details, id}
- Dashboard automatically receives and displays real-time system data and alerts
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
137 lines
5.6 KiB
Markdown
137 lines
5.6 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Common Commands
|
|
|
|
### Development
|
|
```bash
|
|
# Install dependencies
|
|
mix deps.get
|
|
|
|
# Compile the project
|
|
mix compile
|
|
|
|
# Run in development (non-halt mode)
|
|
mix run --no-halt
|
|
|
|
# Run tests
|
|
mix test
|
|
|
|
# Run specific test
|
|
mix test test/systant_test.exs
|
|
|
|
# Enter development shell (via Nix)
|
|
nix develop
|
|
|
|
# Run dashboard (Phoenix LiveView)
|
|
cd dashboard && mix phx.server
|
|
# or use justfile: just dashboard
|
|
```
|
|
|
|
### Production
|
|
```bash
|
|
# Build production release
|
|
MIX_ENV=prod mix release
|
|
|
|
# Run production release
|
|
_build/prod/rel/systant/bin/systant start
|
|
```
|
|
|
|
## Architecture Overview
|
|
|
|
This is an Elixir OTP application that serves as a systemd daemon for MQTT-based system monitoring, designed for deployment across multiple NixOS hosts to integrate with Home Assistant.
|
|
|
|
### Core Components
|
|
- **Systant.Application** (`lib/systant/application.ex`): OTP application supervisor that starts the MQTT client
|
|
- **Systant.MqttClient** (`lib/systant/mqtt_client.ex`): GenServer that handles MQTT connection, publishes stats every 30 seconds, and listens for commands
|
|
- **Dashboard.Application** (`dashboard/lib/dashboard/application.ex`): Phoenix LiveView dashboard application
|
|
- **Dashboard.MqttSubscriber** (`dashboard/lib/dashboard/mqtt_subscriber.ex`): Real-time MQTT subscriber that feeds data to the LiveView dashboard
|
|
- **Configuration**: MQTT settings configurable via environment variables or config files
|
|
|
|
### Key Libraries
|
|
- **Tortoise**: MQTT client library for pub/sub functionality
|
|
- **Jason**: JSON encoding/decoding for message payloads
|
|
|
|
### MQTT Behavior
|
|
- Publishes "Hello from systant" messages with timestamp and hostname to stats topic every 30 seconds
|
|
- Subscribes to commands topic for incoming events that can trigger user-customizable actions
|
|
- Uses randomized client ID to avoid conflicts across multiple hosts
|
|
- Sends immediate hello message on startup
|
|
|
|
### Default Configuration
|
|
- **MQTT Host**: `mqtt.home` (not localhost)
|
|
- **Stats Topic**: `systant/${hostname}/stats` (per-host topics)
|
|
- **Command Topic**: `systant/${hostname}/commands` (per-host topics)
|
|
- **Publish Interval**: 30 seconds
|
|
|
|
### NixOS Deployment
|
|
This project includes a complete Nix packaging and NixOS module:
|
|
|
|
- **Package**: `nix/package.nix` - Builds the Elixir release using beamPackages.mixRelease
|
|
- **Module**: `nix/nixos-module.nix` - Provides `services.systant` configuration options
|
|
- **Development**: Use `nix develop` for development shell with Elixir/Erlang
|
|
|
|
The NixOS module supports:
|
|
- Configurable MQTT connection settings
|
|
- Per-host topic naming using `${config.networking.hostName}`
|
|
- Environment variable configuration for runtime settings
|
|
- Systemd service with security hardening
|
|
- Auto-restart and logging to systemd journal
|
|
|
|
## Dashboard
|
|
|
|
The project includes a Phoenix LiveView dashboard (`dashboard/`) that provides real-time monitoring of all systant instances.
|
|
|
|
### Dashboard Features
|
|
- Real-time host status updates via MQTT subscription
|
|
- LiveView interface showing all connected hosts
|
|
- Automatic reconnection and error handling
|
|
|
|
### Dashboard MQTT Configuration
|
|
- Subscribes to `systant/+/stats` to receive updates from all hosts
|
|
- Uses hostname-based client ID: `systant-dashboard-${hostname}` to avoid conflicts
|
|
- Connects to `mqtt.home:1883` (same broker as systant instances)
|
|
|
|
### Important Implementation Notes
|
|
- **Tortoise Handler**: The `handle_message/3` callback must return `{:ok, state}`, not `[]`
|
|
- **Topic Parsing**: Topics may arrive as lists or strings, handle both formats
|
|
- **Client ID Conflicts**: Use unique client IDs to prevent connection instability
|
|
|
|
## Development Roadmap
|
|
|
|
### Phase 1: System Metrics Collection (Completed)
|
|
- ✅ **SystemMetrics Module**: `server/lib/systant/system_metrics.ex` - Comprehensive metrics collection
|
|
- ✅ **CPU Metrics**: Load averages (1/5/15min) and utilization via `:cpu_sup`
|
|
- ✅ **Memory Metrics**: System memory data and monitoring via `:memsup`
|
|
- ✅ **Disk Metrics**: Disk usage and capacity for all mounted drives via `:disksup`
|
|
- ✅ **System Info**: Uptime, Erlang/OTP versions, scheduler info
|
|
- ✅ **System Alarms**: Active os_mon alarms (disk_almost_full, memory_high_watermark, etc.)
|
|
- ✅ **MQTT Integration**: Real metrics published every 30 seconds replacing simple messages
|
|
- 🔄 **Network Metrics**: TODO - Interface statistics, bandwidth utilization
|
|
- 🔄 **GPU Metrics**: TODO - NVIDIA/AMD GPU utilization, temperatures, memory usage
|
|
|
|
#### Implementation Details
|
|
- Uses Erlang's built-in `:os_mon` application (cpu_sup, memsup, disksup)
|
|
- Collects active system alarms from `:alarm_handler` with structured format
|
|
- Graceful error handling with fallbacks when metrics unavailable
|
|
- JSON payload structure: `{timestamp, hostname, cpu, memory, disk, system, alarms}`
|
|
- Dashboard automatically receives and displays real-time system data and alerts
|
|
- Alarm format: `{severity, path/details, id}` for clean consumption
|
|
|
|
### Phase 2: Command System
|
|
- Subscribe to `systant/+/commands` in MqttClient
|
|
- Implement secure command execution framework with validation/whitelisting
|
|
- Support commands like: restart services, update packages, system queries
|
|
- Response mechanism to send command results back via MQTT
|
|
|
|
### Phase 3: Home Assistant Integration
|
|
- Custom MQTT integration following Home Assistant patterns
|
|
- Auto-discovery of systant hosts via MQTT discovery protocol
|
|
- Create entities for metrics (sensors) and commands (buttons/services)
|
|
- Dashboard cards and automation support
|
|
|
|
### Future Plans
|
|
- Multi-host deployment for comprehensive system monitoring
|
|
- Advanced alerting and threshold monitoring
|
|
- Historical data retention and trending |