MeshMon Documentation¶
MeshMon: a distributed peer-to-peer monitoring system.
What is MeshMon?¶
MeshMon monitors services from multiple vantage points and reaches a cluster-wide consensus on their status without relying on a single central coordinator. Each node:
- Runs local checks (HTTP, pings, etc.) and publishes results to peers
- Exchanges signed state with other nodes over a lightweight mesh
- Uses a clock table and leader election to coordinate propagation and resolve disagreements
Why this design:
- Resilience: no single control-plane to fail; nodes can join/leave with minimal impact
- Trust: signatures and version gates prevent untrusted peers from poisoning the cluster
- Real-world signal: monitoring from different networks avoids false positives from a single observer
Key concepts:
- Networks: logical groups of nodes that share a config (
config/networks/<id>/config.yml) - Node config: which networks to join and optional webhooks per node
- Monitors: HTTP/ping checks run by specific nodes, with allow/block targeting
- Status convergence: nodes exchange evidence until the cluster agrees; webhooks trigger once consensus is reached
Getting Started¶
- Quick Start Guide
- Configuration: Define per-node and per-network behaviour.
- Node Configuration — which networks to join, webhooks, and runtime knobs.
- Network Configuration — topology, monitors, cluster timings, and defaults.
- Config Management — local vs Git-based workflows and keys.