mirror of
https://github.com/altstackHQ/altstack-data.git
synced 2026-04-18 03:53:14 +02:00
Initialize public data and docs repository
This commit is contained in:
163
docs/app/concepts/monitoring/page.mdx
Normal file
163
docs/app/concepts/monitoring/page.mdx
Normal file
@@ -0,0 +1,163 @@
|
||||
---
|
||||
title: "Monitoring & Observability"
|
||||
description: "Know when things break before your users do. Uptime monitoring, disk alerts, log aggregation, and observability for self-hosters."
|
||||
---
|
||||
|
||||
# Monitoring & Observability
|
||||
|
||||
You deployed 5 tools. They're running great. You go to bed. At 3 AM, the disk fills up, Postgres crashes, and everything dies. You find out at 9 AM when a user emails you.
|
||||
|
||||
**Monitoring prevents this.**
|
||||
|
||||
## The Three Layers
|
||||
|
||||
| Layer | What It Watches | Tool |
|
||||
|---|---|---|
|
||||
| **Uptime** | "Is the service responding?" | Uptime Kuma |
|
||||
| **System** | CPU, RAM, disk, network | Node Exporter + Grafana |
|
||||
| **Logs** | What's actually happening inside | Docker logs, Dozzle, SigNoz |
|
||||
|
||||
You need **at least** the first layer. The other two are for when you get serious.
|
||||
|
||||
## Layer 1: Uptime Monitoring (Essential)
|
||||
|
||||
[Uptime Kuma](/deploy/uptime-kuma) is the single best tool for self-hosters. Deploy it first, always.
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
uptime-kuma:
|
||||
image: louislam/uptime-kuma:1
|
||||
container_name: uptime-kuma
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- "3001:3001"
|
||||
volumes:
|
||||
- uptime_data:/app/data
|
||||
|
||||
volumes:
|
||||
uptime_data:
|
||||
```
|
||||
|
||||
### What to Monitor
|
||||
|
||||
Add a monitor for **every** service you run:
|
||||
|
||||
| Type | Target | Check Interval |
|
||||
|---|---|---|
|
||||
| HTTP(s) | `https://plausible.yourdomain.com` | 60s |
|
||||
| HTTP(s) | `https://uptime.yourdomain.com` | 60s |
|
||||
| TCP Port | `localhost:5432` (Postgres) | 120s |
|
||||
| Docker Container | Container name | 60s |
|
||||
| DNS | `yourdomain.com` | 300s |
|
||||
|
||||
### Notifications
|
||||
|
||||
Uptime Kuma supports 90+ notification channels. Set up **at least two**:
|
||||
|
||||
- **Email** — For non-urgent alerts
|
||||
- **Telegram/Discord/Slack** — For instant mobile alerts
|
||||
|
||||
> 🔥 **Pro Tip:** Monitor your monitoring. Set up an external free ping service (like [UptimeRobot](https://uptimerobot.com)) to watch your Uptime Kuma instance.
|
||||
|
||||
## Layer 2: System Metrics
|
||||
|
||||
### Quick Disk Alert Script
|
||||
|
||||
The #1 cause of self-hosting outages is **running out of disk space**. This script sends an alert when disk usage exceeds 80%:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /opt/scripts/disk-alert.sh
|
||||
|
||||
THRESHOLD=80
|
||||
USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//')
|
||||
|
||||
if [ "$USAGE" -gt "$THRESHOLD" ]; then
|
||||
echo "⚠️ Disk usage is at ${USAGE}% on $(hostname)" | \
|
||||
mail -s "Disk Alert: ${USAGE}%" you@yourdomain.com
|
||||
fi
|
||||
```
|
||||
|
||||
Add to cron:
|
||||
|
||||
```bash
|
||||
# Check every hour
|
||||
0 * * * * /opt/scripts/disk-alert.sh
|
||||
```
|
||||
|
||||
### What to Watch
|
||||
|
||||
| Metric | Warning Threshold | Critical Threshold |
|
||||
|---|---|---|
|
||||
| Disk usage | 70% | 85% |
|
||||
| RAM usage | 80% | 95% |
|
||||
| CPU sustained | 80% for 5 min | 95% for 5 min |
|
||||
| Container restarts | 3 in 1 hour | 10 in 1 hour |
|
||||
|
||||
### Docker Resource Monitoring
|
||||
|
||||
Quick commands to check what's eating your resources:
|
||||
|
||||
```bash
|
||||
# Live resource usage per container
|
||||
docker stats
|
||||
|
||||
# Show container sizes (disk)
|
||||
docker system df -v
|
||||
|
||||
# Find large volumes
|
||||
du -sh /var/lib/docker/volumes/*/
|
||||
```
|
||||
|
||||
## Layer 3: Log Aggregation
|
||||
|
||||
Docker captures all stdout/stderr from your containers. Use it:
|
||||
|
||||
```bash
|
||||
# Live logs for a service
|
||||
docker compose logs -f plausible
|
||||
|
||||
# Last 100 lines
|
||||
docker compose logs --tail=100 plausible
|
||||
|
||||
# Logs since a specific time
|
||||
docker compose logs --since="2h" plausible
|
||||
```
|
||||
|
||||
### Dozzle (Docker Log Viewer)
|
||||
|
||||
For a beautiful web-based log viewer:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
dozzle:
|
||||
image: amir20/dozzle:latest
|
||||
container_name: dozzle
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
```
|
||||
|
||||
### For Serious Setups: SigNoz
|
||||
|
||||
If you need traces, metrics, **and** logs in one place, deploy [SigNoz](/deploy/signoz). It's an open-source Datadog alternative built on OpenTelemetry.
|
||||
|
||||
## Maintenance Routine
|
||||
|
||||
Set a weekly calendar reminder:
|
||||
|
||||
```
|
||||
☐ Check Uptime Kuma — all green?
|
||||
☐ Run `docker stats` — anything hogging resources?
|
||||
☐ Run `df -h` — disk space OK?
|
||||
☐ Run `docker system prune -f` — clean unused images
|
||||
☐ Check logs for any errors — `docker compose logs --since=168h | grep -i error`
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
→ [Updating & Maintaining Containers](/concepts/updates) — Keep your tools up to date safely
|
||||
→ [Backups That Actually Work](/concepts/backups) — Protect your data
|
||||
→ [Deploy Uptime Kuma](/deploy/uptime-kuma) — Set up monitoring now
|
||||
Reference in New Issue
Block a user