Monitoring
Health checks, log monitoring, and alerting for Konsolidat.
Health Endpoints
Frappe / Konsol API
curl http://localhost:8069/api/method/konsol.api.health
# {"message": {"status": "ok", "app": "konsol",
# "clickhouse": {"status": "healthy", "clickhouse_reachable": true,
# "recent_sync_failures": [], "message": "All systems operational"}}}
- Guest-accessible — no auth needed
- Confirms the Frappe web server and Konsol app are running and checks ClickHouse (
SELECT 1) plus recent ingestion sync failures - Top-level
statusisok(ClickHouse healthy),degraded(reachable but tables have recent sync failures), ordown(ClickHouse unreachable); the nestedclickhouseobject has the detail - A single guest-accessible call therefore covers the Frappe → ClickHouse path; see Full Stack Check below only if you want to exercise the authenticated query path end-to-end
ClickHouse
curl "http://localhost:8123/?query=SELECT+1"
# 1
# Or via Docker health check:
docker inspect --format='{{.State.Health.Status}}' konsolidat_clickhouse
# healthy
Full Stack Check
Test the complete path (Frappe → ClickHouse):
# Login first
curl -c cookies.txt -X POST http://localhost:8069/api/method/login \
-H "Content-Type: application/json" \
-d '{"usr": "admin", "pwd": "password"}'
# Query via API
curl -b cookies.txt "http://localhost:8069/api/method/konsol.api.epm_value?\
entity=USMF&year=2024&period=1&account=401100"
A successful response with a numeric value confirms the entire pipeline is working.
ClickHouse System Tables
Query Performance
-- Slowest queries in the last hour
SELECT
query_start_time,
query_duration_ms,
read_rows,
formatReadableSize(read_bytes) as data_read,
query
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_start_time > now() - INTERVAL 1 HOUR
ORDER BY query_duration_ms DESC
LIMIT 10
Database Sizes
SELECT
database,
formatReadableSize(sum(bytes_on_disk)) as disk_size,
formatReadableQuantity(sum(rows)) as total_rows,
count() as parts
FROM system.parts
WHERE active AND database LIKE 'epm_%'
GROUP BY database
ORDER BY sum(bytes_on_disk) DESC
Table-Level Detail
SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) as size,
formatReadableQuantity(sum(rows)) as rows
FROM system.parts
WHERE active AND database = 'epm_gold'
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC
Active Connections
SELECT * FROM system.processes
Error Log
SELECT
event_time,
message
FROM system.text_log
WHERE level = 'Error'
AND event_time > now() - INTERVAL 1 DAY
ORDER BY event_time DESC
LIMIT 20
Frappe Logs
Log Files
| Log | Path | Content |
|---|---|---|
| Web | ~/frappe-bench/logs/web.log | HTTP requests to Frappe |
| Worker | ~/frappe-bench/logs/worker.log | Background job execution |
| Frappe | ~/frappe-bench/logs/frappe.log | Application errors |
| Scheduler | ~/frappe-bench/logs/scheduler.log | Scheduled tasks |
Tail API Errors
grep "konsol.api" ~/frappe-bench/logs/frappe.log | tail -20
Common Error Patterns
| Log Pattern | Meaning |
|---|---|
ClickHouse connection failed | ClickHouse unreachable from Frappe |
ClickHouse query timeout | Query exceeded 30s |
ValidationError: Invalid scenario | Client sent bad scenario value |
403 Forbidden | User not logged in or lacks permission |
Monitoring Setup
UptimeRobot / Pingdom
Point at: https://epm.yourcompany.com/api/method/konsol.api.health
- Expected: HTTP 200 with
"status": "ok" - Alert on: Non-200 or timeout
Docker Health Checks
Already configured in docker-compose.yml:
healthcheck:
test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
interval: 10s
timeout: 5s
retries: 5
dbt Test Monitoring
After each dbt build, check exit code:
dbt build && echo "SUCCESS" || echo "FAILED — check test output"
For scheduled runs, pipe test results to a log or alert system:
dbt test --output json > /var/log/dbt-test-results.json 2>&1
Alert Thresholds
| Metric | Warning | Critical |
|---|---|---|
| API response time | > 2s | > 10s |
| ClickHouse query time | > 5s | > 30s (timeout) |
| ClickHouse disk usage | > 70% | > 90% |
| dbt test failures | Any warn severity | Any error severity |
| Airbyte sync failure | — | Any failure |
VBA Debug Logging
The Excel VBA module includes a built-in logging system:
- Run
EPM_ToggleLogto enable logging - A hidden
_EPM_Logsheet captures:- Timestamp
- Level (INFO, ERROR, DEBUG)
- Message (HTTP requests, responses, parse results)
- Run
EPM_Debugfor a full diagnostic test
Next Steps
- Operations Runbook — Maintenance procedures
- Deployment Guide — Infrastructure setup
- Troubleshooting — Problem resolution