Use cases — the six monitoring workflows
Error and uptime monitoring is the most time-sensitive work you do as an engineer. You need to know immediately when a production endpoint is down, when a new error type appears in your error tracking, and whether last night's incidents have resolved. OpenClaw automates the daily check-in.
🟢 Endpoint health checks
Ping your URLs on a schedule, alert on non-200 or slow responses
🐞 Sentry error summary
Group and rank errors by frequency, flag new errors seen for first time
🔕 Alert de-duplication
Group similar errors to cut noise, surface root cause candidates
🚨 New error detection
Alert only when an error is seen for the first time, not every occurrence
☀️ On-call morning briefing
Single summary of overnight incidents, top errors, and uptime status
📉 Error trend tracking
Flag error groups that are growing week over week
Sentry API setup (4 steps)
To summarise errors and track uptime, you need a Sentry account with API access. The free tier is sufficient for most projects and includes 5,000 events per month.
Step 1: Generate an API token
Log in to Sentry, then navigate to Your avatar (top right) → Settings → API → Auth Tokens → Create New Token. Give it a name like "OpenClaw Monitoring".
Step 2: Set scopes
Select these scopes only (you don't need write access):
project:read— read project settings and metadataorg:read— read organisation dataevent:read— read error events and groups
Step 3: Find your organisation and project slugs
The slugs appear in your Sentry URL. For example, if your Sentry URL is https://sentry.io/organizations/my-company/projects/web-app/, your organisation slug is my-company and your project slug is web-app.
Step 4: Store in secrets
Create or update your secrets.env with the token and slugs:
SENTRY_TOKEN=sntrys_xxxxxxxxxxxx
SENTRY_ORG=your-org-slug
SENTRY_PROJECT=your-project-slug
secrets.env to version control. Store it locally or in a secrets management service, and reference it when running OpenClaw via openclaw run --secrets-file secrets.env.
HTTP endpoint health checks
The uptime-checker agent pings your production and staging endpoints on a schedule and alerts if any return non-200 status, timeout, or respond too slowly.
Configuration
Add this agent to your AGENTS.md:
agents:
uptime-checker:
description: "Ping endpoints and alert on failures or slow responses"
tools:
- http-checker
config:
endpoints:
- url: "https://yourapp.com/health"
name: "App health endpoint"
expected_status: 200
timeout_ms: 3000
- url: "https://yourapp.com/api/v1/ping"
name: "API ping"
expected_status: 200
timeout_ms: 2000
expected_body_contains: "pong"
- url: "https://yourapp.com"
name: "Homepage"
expected_status: 200
timeout_ms: 5000
alert_on:
- wrong_status_code
- timeout
- body_mismatch
- response_time_ms_above: 4000
state_file: "uptime_state.json"
alert_after_consecutive_failures: 2
output:
format: markdown
include: [endpoint_name, status, response_time_ms, checked_at, failure_reason]
only_report_problems: true
Key settings
- endpoints: Add as many URLs as you monitor. Each can have different expected status codes and timeouts.
- expected_body_contains: Optional. If set, the agent checks that the response body contains this string (e.g., "pong" for a health check).
- alert_after_consecutive_failures: How many consecutive failures before alerting (default 2 prevents flapping).
- only_report_problems: true means no output if all endpoints are healthy — the agent stays silent.
- state_file: The agent persists state to this file to track consecutive failures across runs.
Sentry error group summarisation
The sentry-summariser agent pulls the top error groups from Sentry over a configurable time window, ranks them by frequency, and flags new errors (never seen before).
Configuration
agents:
sentry-summariser:
description: "Summarise top Sentry error groups from the last 24 hours"
tools:
- sentry-api
config:
token: "${SENTRY_TOKEN}"
organisation: "${SENTRY_ORG}"
project: "${SENTRY_PROJECT}"
time_window_hours: 24
top_n: 10
new_errors_window_hours: 24
output:
format: markdown
sections:
- new_errors_first_seen
- top_errors_by_events
- growing_errors
include_fields: [title, culprit, event_count, user_count, first_seen, last_seen, assignee, is_new]
prompt: |
For each new error (first_seen within window), write one sentence explaining what the error is
and where in the code it occurs (culprit field). Mark it [NEW] in bold.
For top errors, just list them with counts — no prose needed.
Output sections
- new_errors_first_seen: Errors appearing in your codebase for the first time in the past 24 hours. Highest priority — these are unfamiliar failure modes.
- top_errors_by_events: Most frequent errors, ranked by event count.
- growing_errors: Errors whose event count increased by 50% or more compared to the prior day.
Alert de-duplication and noise reduction
Error tracking tools can be noisy if one underlying bug causes multiple distinct error signatures. The alert-deduplicator agent groups related errors by culprit (file/function) and error type, surfacing the root cause instead of the symptom list.
Configuration
agents:
alert-deduplicator:
description: "Group similar Sentry errors to reduce noise and surface root causes"
tools:
- sentry-api
config:
token: "${SENTRY_TOKEN}"
organisation: "${SENTRY_ORG}"
project: "${SENTRY_PROJECT}"
time_window_hours: 24
analysis:
group_by_culprit: true
group_by_error_type: true
root_cause_hint: true
output:
format: markdown
prompt: |
Instead of listing every error individually, group them:
- If 3+ errors share the same culprit (file/function), group them under "Possible root cause: [culprit]"
- List the individual error types underneath, indented
- Estimate the blast radius: how many users/events are affected by the group total
This output replaces the raw Sentry list in the on-call briefing.
How it works
The agent queries Sentry for all errors in the window, clusters them by culprit and error type, and generates a grouped report. For example, if you have TypeError, AttributeError, and KeyError all originating from the same function in auth/middleware.js, the output flags that as a single root cause cluster with three error types — signalling to your on-call engineer to focus on that one location.
Morning on-call briefing
The oncall-brief agent combines uptime status, new errors, and top error trends into a single morning report. Engineers read this at standup instead of checking three separate dashboards.
Configuration
agents:
oncall-brief:
description: "Daily on-call briefing combining uptime and error status"
tools:
- sentry-api
- http-checker
config:
sentry:
token: "${SENTRY_TOKEN}"
organisation: "${SENTRY_ORG}"
project: "${SENTRY_PROJECT}"
time_window_hours: 12
endpoints: "${MONITORED_ENDPOINTS}"
prompt: |
Generate a concise on-call morning briefing. Structure:
## 🟢/🔴 Uptime Status
List all monitored endpoints. Use 🟢 if up, 🔴 if down or degraded.
## 🚨 New Errors (last 12h)
List any errors seen for the first time. If none: "✓ No new error types overnight."
## 📊 Top Errors by Volume
Top 3 errors by event count. One line each: [count] events — [title] — [culprit]
## 📈 Growing Issues
Any errors that grew >50% vs the prior 12h window.
Keep the entire briefing under 30 lines. Engineers read this in standup.
Typical output structure
## 🟢/🔴 Uptime Status
🟢 App health endpoint (98ms)
🟢 API ping (156ms)
🟢 Homepage (287ms)
## 🚨 New Errors (last 12h)
[NEW] TypeError: Cannot read property 'user' of undefined — auth/middleware.js:47 — 3 events, 3 users
## 📊 Top Errors by Volume
24 events — NetworkError: Failed to fetch — services/api.js:102
12 events — ValidationError: Invalid request body — routes/webhook.js:33
8 events — TimeoutError: Request exceeded 5000ms — services/database.js:156
## 📈 Growing Issues
ValidationError growing 120% (8 → 18 events vs yesterday)
HEARTBEAT templates
Use these three HEARTBEAT.md configurations to automate your monitoring schedule.
Every 5 minutes: uptime check
schedules:
- id: uptime-check-5m
agent: uptime-checker
cron: "*/5 * * * *"
output: slack://your-channel-webhook
notify_on_failure: true
Weekday mornings: on-call briefing
schedules:
- id: oncall-brief-morning
agent: oncall-brief
cron: "0 8 * * 1-5" # 8 AM, Monday to Friday
output: slack://your-channel-webhook
notify_on_failure: true
Monday mornings: weekly error trend report
schedules:
- id: weekly-error-trends
agent: sentry-summariser
cron: "0 9 * * 1" # 9 AM, Mondays
config:
time_window_hours: 168 # 7 days
top_n: 15
output: slack://your-channel-webhook
Sample on-call briefing
Here's what a real morning on-call briefing looks like:
## 🟢/🔴 Uptime Status 🟢 App health endpoint (45ms) 🟢 API v1 endpoint (123ms) 🟢 Homepage (201ms) 🟢 Documentation site (156ms) ## 🚨 New Errors (last 12h) [NEW] TypeError: Cannot read property 'user' of undefined — app/middleware/auth.js:47 — 3 events, 3 users ## 📊 Top Errors by Volume 18 events — ValidationError: Invalid email format — app/validators/user.js:23 12 events — NetworkError: Failed to connect to payment service — app/services/payments.js:89 8 events — TimeoutError: Redis connection timeout — app/cache/redis.js:156 ## 📈 Growing Issues ValidationError growing 120% (8 → 18 events vs yesterday window) NetworkError stable (10 → 12 events, +20%)
Frequently asked questions
Does OpenClaw replace PagerDuty or OpsGenie?
No. OpenClaw's uptime checker is for visibility and reporting, not incident management. It can post to a Slack channel or write a file, but it doesn't have PagerDuty's on-call rotation scheduling, escalation policies, or phone/SMS alerting. Use it alongside, not instead of, a dedicated incident tool.
How often should I run endpoint health checks?
Every 5 minutes is a reasonable starting point for production endpoints. For internal services, every 15 minutes is usually enough. Don't check too frequently with Sentry summarisation — every 5 minutes would generate too many calls; daily or hourly is more appropriate.
Can OpenClaw monitor multiple Sentry projects?
Yes. Add multiple project slugs to the config and the agent will query each one. Output is grouped by project in the report.
What happens if my Sentry free tier is exceeded?
The Sentry API still responds but event data may be sampled or capped. OpenClaw will report whatever data the API returns. If you're hitting Sentry's free tier limits, the agent config has a note_if_data_may_be_sampled: true option that adds a warning to the output.