Monitoring
Monitoring
Section titled “Monitoring”Monitor SSH-KLM health, performance, and security metrics.
Health Endpoints
Section titled “Health Endpoints”# Basic health checkcurl https://ssh-klm.example.com/health
# Detailed healthcurl https://ssh-klm.example.com/health/detailedResponse:
{ "status": "healthy", "components": { "database": "healthy", "redis": "healthy", "vault": "healthy" }, "version": "2.0.0"}Metrics (Prometheus)
Section titled “Metrics (Prometheus)”Metrics available at /metrics:
Key Metrics
Section titled “Key Metrics”| Metric | Type | Description |
|---|---|---|
sshklm_hosts_total | Gauge | Total managed hosts |
sshklm_keys_total | Gauge | Total SSH keys |
sshklm_rotations_total | Counter | Key rotations performed |
sshklm_rotations_failed_total | Counter | Failed rotations |
sshklm_discoveries_duration_seconds | Histogram | Discovery scan duration |
sshklm_api_requests_total | Counter | API requests by endpoint |
sshklm_agents_connected | Gauge | Connected agents |
Prometheus Config
Section titled “Prometheus Config”scrape_configs: - job_name: 'ssh-klm' static_configs: - targets: ['ssh-klm.example.com:9090'] metrics_path: /metrics scheme: https bearer_token: 'YOUR_METRICS_TOKEN'Grafana Dashboard
Section titled “Grafana Dashboard”Import the official dashboard: ID: 18456
Or create custom panels:
Host Overview
Section titled “Host Overview”# Total hosts by statussum by (status) (sshklm_hosts_total)
# Hosts with outdated keyssshklm_keys_total{algorithm="dsa"} + sshklm_keys_total{algorithm="rsa-1024"}Rotation Metrics
Section titled “Rotation Metrics”# Rotation success rate (last 24h)sum(rate(sshklm_rotations_total[24h])) /(sum(rate(sshklm_rotations_total[24h])) + sum(rate(sshklm_rotations_failed_total[24h])))
# Rotations per hoursum(rate(sshklm_rotations_total[1h])) * 3600Alerting
Section titled “Alerting”Prometheus Alert Rules
Section titled “Prometheus Alert Rules”groups: - name: ssh-klm rules: - alert: SSHKLMHighFailedRotations expr: rate(sshklm_rotations_failed_total[5m]) > 0.1 for: 10m labels: severity: warning annotations: summary: High rotation failure rate
- alert: SSHKLMAgentDisconnected expr: sshklm_agents_connected < sshklm_agents_registered for: 5m labels: severity: warning annotations: summary: Agents disconnected
- alert: SSHKLMDatabaseUnhealthy expr: sshklm_health_database != 1 for: 1m labels: severity: critical annotations: summary: Database unhealthyLogging
Section titled “Logging”Structured Logs
Section titled “Structured Logs”{ "level": "info", "timestamp": "2026-01-06T10:00:00Z", "message": "Key rotation completed", "keyId": "key_abc123", "hostId": "host_xyz789", "duration_ms": 1234, "traceId": "abc123"}Log Aggregation
Section titled “Log Aggregation”Datadog:
logs: - type: file path: /var/log/ssh-klm/*.log service: ssh-klm source: ssh-klmELK Stack:
filebeat.inputs: - type: log paths: - /var/log/ssh-klm/*.log json.keys_under_root: trueAudit Events
Section titled “Audit Events”SSH-KLM logs all security-relevant events:
- User authentication
- Key rotations
- Policy changes
- Agent registrations
- API key creation/deletion
Query audit logs:
ssh-klm admin audit:query \ --start "2026-01-01" \ --end "2026-01-07" \ --action "key.rotate"