Build health use cases
Build pipelines are where code quality lives or dies. A single failed test or a flaky deployment can cascade into hours of debugging, yet teams often notice them only after rollback requests or escalations. OpenClaw watches your CI/CD pipelines and surfaces what matters: failed builds, tests that flip pass/fail, deployments lagging behind, and error spikes after deploys.
The six most common use cases are:
GitHub Actions API setup
GitHub Actions publishes workflow run metadata and logs to the REST API. The good news: you already have what you need if you completed Part 1. The personal access token you created then works here too — it just needs an Actions: read scope, which is usually already included in fine-grained tokens configured with read access to your repositories.
Verify your token has the right scopes at GitHub → Settings → Developer settings → Personal access tokens → Fine-grained tokens. Look for Actions: read under "Repository permissions". If it's not there, regenerate your token or create a new one with that scope added.
Configuration in secrets.env
Add these environment variables to your secrets.env file. Most are the same as Part 1; only the workflow list is new:
# Reuse from Part 1
GITHUB_TOKEN=ghp_your_fine_grained_token_here
GITHUB_OWNER=your-github-org-or-username
MONITORED_REPOS=repo1,repo2,repo3
# New for CI/CD monitoring
GITHUB_ACTIONS_WORKFLOW_IDS=build.yml,deploy.yml,test.yml
GITHUB_ACTIONS_WORKFLOW_IDS is a comma-separated list of workflow file names you want to monitor. Typically this includes build.yml, test.yml, and your deployment workflows. OpenClaw will read runs from all listed workflows across all MONITORED_REPOS.
Detecting and summarising build failures
The build-failure-detector agent reads workflow runs from the past 24 hours (or a time window you set), identifies which ones failed, extracts the failing step, and pulls the relevant error log lines.
Add this agent to your AGENTS.md:
agents:
build-failure-detector:
description: "Detect failed GitHub Actions workflow runs and extract failure details"
tools:
- github-api
config:
owner: "${GITHUB_OWNER}"
repos: "${MONITORED_REPOS}"
token: "${GITHUB_TOKEN}"
workflows: "${GITHUB_ACTIONS_WORKFLOW_IDS}"
time_window_hours: 24
branch_filter:
- main
- develop
- "release/*"
ignore_branches:
- "dependabot/*"
analysis:
extract_failure_step: true
extract_log_lines: 20
classify_failure_type:
patterns:
- keyword: "ENOENT"
label: "File not found"
- keyword: "npm ERR"
label: "npm install failure"
- keyword: "Error: Process completed with exit code 1"
label: "Script exit error"
- keyword: "Out of memory"
label: "Memory limit exceeded"
output:
format: markdown
include_fields: [workflow_name, branch, run_id, triggered_by, failed_step, failure_type, log_excerpt, run_url]
sort_by: created_at_desc
What this agent does
time_window_hours: 24— Looks back 24 hours for failed runs. Adjust to 48 or 72 for a broader window.branch_filter— Only alerts on failures in these branches (usually main, develop, release branches). Adjust to match your branch strategy.ignore_branches— Skips dependabot PRs and other noisy branches that often fail for lockfile reasons.extract_log_lines: 20— Pulls the last 20 lines of the failing step's log. This usually contains the actual error message.classify_failure_type— Pattern-matching to label common failure types (npm install, out of memory, etc.). Add or update patterns based on your own failure modes.
Identifying flaky tests
A flaky test is one that sometimes passes and sometimes fails, even on the same code. They are different from broken tests (always fail) or healthy tests (always pass). Flaky tests destroy CI trust and slow down deployments because engineers run the pipeline multiple times hoping for green.
The flaky-test-finder reads GitHub check annotations (test result metadata) from the last N workflow runs, tracks which test names appear in both passed and failed runs, and flags any that flip back and forth more than a threshold you set.
Add this to AGENTS.md:
agents:
flaky-test-finder:
description: "Identify test names that alternate pass/fail across recent CI runs"
tools:
- github-api
config:
owner: "${GITHUB_OWNER}"
repos: "${MONITORED_REPOS}"
token: "${GITHUB_TOKEN}"
runs_to_analyse: 20
min_alternations: 3
annotation_api: true
analysis:
flaky_definition: |
A test is flaky if it appeared in both passed and failed states across the
analysed run window, with at least min_alternations flip events.
Consistently failing tests are NOT flaky — they are broken.
Only tests that sometimes pass are candidates.
include_failure_rate: true
output:
format: markdown_table
columns: [test_name, workflow, failure_rate_pct, total_runs, last_failed, likely_flaky]
sort_by: failure_rate_pct
callout: |
Flaky tests degrade CI trust and slow down development. Consider:
- Adding retry logic for network-dependent tests
- Moving slow integration tests to a separate workflow
- Quarantining confirmed flaky tests with a skip label until fixed
schedule: weekly
Key settings
runs_to_analyse: 20— Examines the last 20 runs per workflow. More runs = higher confidence but slower analysis. For frequently-run tests, 20 is reasonable. For nightly tests, increase to 30 or 50.min_alternations: 3— Only flags tests that flip pass/fail at least 3 times. This filters out one-off failures. Adjust up if you want only the most obviously flaky tests.annotation_api: true— Uses GitHub check annotations to extract test names. Works with most testing frameworks (Jest, pytest, Go test, etc.) as long as they post annotations.failure_rate_pct— The output includes the percentage of runs where the test failed, helping you prioritize which flaky tests to fix first.
Tracking deployment status
Know when each environment was deployed and flag gaps. This agent reads successful deployment workflow runs and outputs the latest deploy time, commit, and triggered-by user for each environment.
Add to AGENTS.md:
agents:
deployment-tracker:
description: "Track latest deployment per environment and flag deployment gaps"
tools:
- github-api
config:
owner: "${GITHUB_OWNER}"
repos: "${MONITORED_REPOS}"
token: "${GITHUB_TOKEN}"
environment_workflows:
production: "deploy-prod.yml"
staging: "deploy-staging.yml"
preview: "deploy-preview.yml"
stale_deployment_alert_hours:
production: 168
staging: 48
output:
format: markdown
include_fields: [environment, last_deployed_at, commit_sha, commit_message, triggered_by, workflow_run_url]
note: "Deployment data sourced from successful workflow runs only."
Configuration details
environment_workflows— Maps environment names to the workflow files that deploy to them. Adjust to match your deployment naming (e.g.,deploy-aws-prod.yml).stale_deployment_alert_hours— Flags an environment if it hasn't been deployed in this many hours. For production, 168 hours (7 days) is typical; staging can be 48 hours (2 days).- This agent only counts successful workflow runs, so failed deploy attempts are not counted as deployments.
Rollback trigger alerts
Imagine this scenario: you deploy at 2 PM, and at 2:15 PM your error rate on Sentry jumps from 5 errors/hour to 50 errors/hour. That spike is a signal to investigate the deploy immediately — possibly to roll it back.
The rollback-alerter watches your production deployments and compares error rates (from Sentry) before and after the deploy. If errors spike by a threshold you set (default: 3×), it flags the deploy as a possible rollback candidate and surfaces the top new error types.
Add to AGENTS.md:
agents:
rollback-alerter:
description: "Flag when error rates spike after a deploy, suggesting possible rollback"
tools:
- github-api
- sentry-api
config:
github:
owner: "${GITHUB_OWNER}"
token: "${GITHUB_TOKEN}"
deployment_workflow: "deploy-prod.yml"
sentry:
token: "${SENTRY_TOKEN}"
organisation: "${SENTRY_ORG}"
project: "${SENTRY_PROJECT}"
spike_threshold_multiplier: 3
window_minutes_post_deploy: 60
prompt: |
Check the last production deployment time. Compare Sentry error rates in the
60 minutes before vs 60 minutes after. If error rate increased by more than
spike_threshold_multiplier, output a ROLLBACK CONSIDERATION alert with:
- Deploy time and commit
- Pre-deploy error rate vs post-deploy error rate
- Top 3 new or spiking error types
- Recommendation: investigate or rollback
If no spike: output "✓ No error spike detected after latest deploy."
Setup notes
- You need both
GITHUB_TOKENandSENTRY_TOKENinsecrets.env. spike_threshold_multiplier: 3means errors must triple (3×) before flagging. Adjust down to 2 for more sensitivity, or up to 5 for less noise.window_minutes_post_deploy: 60— Checks the 60 minutes after deploy. For rapid-iteration teams, 30 minutes may be enough. For services with slow user paths, increase to 120.
CircleCI alternative
If you use CircleCI instead of GitHub Actions, the same logic applies — only the API and config differ. CircleCI API v2 requires a personal API token from User Settings → Personal API Tokens. Store it as CIRCLECI_TOKEN in secrets.env.
Example agent config for CircleCI:
agents:
circleci-build-detector:
description: "Detect failed CircleCI pipeline runs"
tools:
- circleci-api
config:
token: "${CIRCLECI_TOKEN}"
organisation: "your-circleci-org"
projects: ["project-a", "project-b"]
time_window_hours: 24
analysis:
extract_failure_step: true
extract_log_lines: 20
output:
format: markdown
include_fields: [project, pipeline_id, failed_job, branch, created_at, workflow_url]
Note: CircleCI has a built-in flaky test detection feature in its test insights endpoint. If you use CircleCI, check Insights → Test Insights in the web UI first; you may not need a separate flaky-test-finder agent.
HEARTBEAT templates
Three common schedules for build health monitoring:
Post-deploy check (webhook-triggered)
Runs 60 minutes after a production deploy to check for error spikes:
schedules:
post-deploy-rollback-check:
trigger: webhook
agents: [rollback-alerter]
delay_minutes: 60
Add this webhook call to the end of your deploy-prod.yml workflow:
- name: Check for rollback need
run: |
curl -X POST https://your-openclaw-instance.com/webhook \
-H "Authorization: Bearer ${OPENCLAW_TOKEN}" \
-d '{"schedule": "post-deploy-rollback-check"}'
Daily build health (weekday mornings)
Runs Monday–Friday at 8 AM to summarise the last 24 hours of builds:
schedules:
daily-build-health:
cron: "0 8 * * 1-5"
agents: [build-failure-detector, deployment-tracker]
Weekly flaky test report (Monday morning)
Runs every Monday at 9 AM to surface the week's flakiest tests:
schedules:
weekly-flaky-tests:
cron: "0 9 * * 1"
agents: [flaky-test-finder]
output_destination: slack
channel: "#engineering"
Sample report walkthrough
Here's what a daily build health report looks like. This is a real-world example generated by build-failure-detector, flaky-test-finder, and deployment-tracker running together:
npm ERR! syscall rename
npm ERR! path /home/runner/node_modules/.staging/@types/node-12345
Notice what the report includes: which step failed (not just "the build failed"), the actual error type, and the context needed to act (branch, commit, etc.). And notice what it doesn't: false alarms from dependabot, noise from PR branches, or tests that always fail.
Frequently asked questions
Can OpenClaw trigger a rollback automatically?
No. OpenClaw identifies the conditions that suggest a rollback may be needed and flags them in the report. Triggering an actual rollback requires a write-enabled token and explicit configuration — and is a high-stakes operation that should always involve human judgement. The recommended pattern is to alert on-call and let them decide.
How does flaky test detection work?
OpenClaw reads GitHub check annotations (test results attached to workflow runs) across the last N runs and finds test names that appear in both passed and failed states. It counts alternations — how many times the result flipped — and flags tests that flip more than your configured threshold. Tests that always fail are not classified as flaky.
Does this work with GitLab CI or Jenkins?
GitLab CI has an API similar to GitHub Actions and can be adapted with the same approach using the GitLab API base URL and token. Jenkins has a REST API but the structure varies significantly by plugin. GitLab CI is straightforward to adapt; Jenkins requires more custom config.
How do I connect the rollback alerter to my deployment workflow?
Add a workflow_dispatch trigger or a webhook call to OpenClaw at the end of your deploy workflow. The rollback-alerter then uses the deploy timestamp from the GitHub API to anchor its before/after error rate comparison.