Skip to main content
Independent community resource — not affiliated with the official OpenClaw project. Learn more
Part 4 of 5OpenClaw for Developers

CI/CD Pipeline Monitoring with OpenClaw

Build health use cases

Build pipelines are where code quality lives or dies. A single failed test or a flaky deployment can cascade into hours of debugging, yet teams often notice them only after rollback requests or escalations. OpenClaw watches your CI/CD pipelines and surfaces what matters: failed builds, tests that flip pass/fail, deployments lagging behind, and error spikes after deploys.

The six most common use cases are:

🔴
Build failure detection
Alert when any workflow run fails, with a summary of which step failed and why
🔍
Failure root cause
Extract the failing step's log lines to surface the actual error message
🎲
Flaky test identification
Find test names that alternate pass/fail across recent runs (not just always failing)
🚀
Deployment tracking
Know when each environment was last deployed and from which commit
Rollback trigger alerts
Flag when error rates or failed runs spike after a deploy, suggesting a rollback
⏱️
Build time regression
Alert when a workflow starts taking significantly longer than its baseline
💡 This guide assumes: You have a working OpenClaw instance (see Learn for setup) and completed Part 1 to set up a GitHub personal access token. If not, go there first.

GitHub Actions API setup

GitHub Actions publishes workflow run metadata and logs to the REST API. The good news: you already have what you need if you completed Part 1. The personal access token you created then works here too — it just needs an Actions: read scope, which is usually already included in fine-grained tokens configured with read access to your repositories.

Verify your token has the right scopes at GitHub → Settings → Developer settings → Personal access tokens → Fine-grained tokens. Look for Actions: read under "Repository permissions". If it's not there, regenerate your token or create a new one with that scope added.

Configuration in secrets.env

Add these environment variables to your secrets.env file. Most are the same as Part 1; only the workflow list is new:

# Reuse from Part 1
GITHUB_TOKEN=ghp_your_fine_grained_token_here
GITHUB_OWNER=your-github-org-or-username
MONITORED_REPOS=repo1,repo2,repo3

# New for CI/CD monitoring
GITHUB_ACTIONS_WORKFLOW_IDS=build.yml,deploy.yml,test.yml

GITHUB_ACTIONS_WORKFLOW_IDS is a comma-separated list of workflow file names you want to monitor. Typically this includes build.yml, test.yml, and your deployment workflows. OpenClaw will read runs from all listed workflows across all MONITORED_REPOS.

Detecting and summarising build failures

The build-failure-detector agent reads workflow runs from the past 24 hours (or a time window you set), identifies which ones failed, extracts the failing step, and pulls the relevant error log lines.

Add this agent to your AGENTS.md:

agents:
  build-failure-detector:
    description: "Detect failed GitHub Actions workflow runs and extract failure details"
    tools:
      - github-api
    config:
      owner: "${GITHUB_OWNER}"
      repos: "${MONITORED_REPOS}"
      token: "${GITHUB_TOKEN}"
      workflows: "${GITHUB_ACTIONS_WORKFLOW_IDS}"
      time_window_hours: 24
      branch_filter:
        - main
        - develop
        - "release/*"
      ignore_branches:
        - "dependabot/*"
    analysis:
      extract_failure_step: true
      extract_log_lines: 20
      classify_failure_type:
        patterns:
          - keyword: "ENOENT"
            label: "File not found"
          - keyword: "npm ERR"
            label: "npm install failure"
          - keyword: "Error: Process completed with exit code 1"
            label: "Script exit error"
          - keyword: "Out of memory"
            label: "Memory limit exceeded"
    output:
      format: markdown
      include_fields: [workflow_name, branch, run_id, triggered_by, failed_step, failure_type, log_excerpt, run_url]
      sort_by: created_at_desc

What this agent does

Identifying flaky tests

A flaky test is one that sometimes passes and sometimes fails, even on the same code. They are different from broken tests (always fail) or healthy tests (always pass). Flaky tests destroy CI trust and slow down deployments because engineers run the pipeline multiple times hoping for green.

The flaky-test-finder reads GitHub check annotations (test result metadata) from the last N workflow runs, tracks which test names appear in both passed and failed runs, and flags any that flip back and forth more than a threshold you set.

Add this to AGENTS.md:

agents:
  flaky-test-finder:
    description: "Identify test names that alternate pass/fail across recent CI runs"
    tools:
      - github-api
    config:
      owner: "${GITHUB_OWNER}"
      repos: "${MONITORED_REPOS}"
      token: "${GITHUB_TOKEN}"
      runs_to_analyse: 20
      min_alternations: 3
      annotation_api: true
    analysis:
      flaky_definition: |
        A test is flaky if it appeared in both passed and failed states across the
        analysed run window, with at least min_alternations flip events.
        Consistently failing tests are NOT flaky — they are broken.
        Only tests that sometimes pass are candidates.
      include_failure_rate: true
    output:
      format: markdown_table
      columns: [test_name, workflow, failure_rate_pct, total_runs, last_failed, likely_flaky]
      sort_by: failure_rate_pct
      callout: |
        Flaky tests degrade CI trust and slow down development. Consider:
        - Adding retry logic for network-dependent tests
        - Moving slow integration tests to a separate workflow
        - Quarantining confirmed flaky tests with a skip label until fixed
    schedule: weekly

Key settings

Tracking deployment status

Know when each environment was deployed and flag gaps. This agent reads successful deployment workflow runs and outputs the latest deploy time, commit, and triggered-by user for each environment.

Add to AGENTS.md:

agents:
  deployment-tracker:
    description: "Track latest deployment per environment and flag deployment gaps"
    tools:
      - github-api
    config:
      owner: "${GITHUB_OWNER}"
      repos: "${MONITORED_REPOS}"
      token: "${GITHUB_TOKEN}"
      environment_workflows:
        production: "deploy-prod.yml"
        staging: "deploy-staging.yml"
        preview: "deploy-preview.yml"
      stale_deployment_alert_hours:
        production: 168
        staging: 48
    output:
      format: markdown
      include_fields: [environment, last_deployed_at, commit_sha, commit_message, triggered_by, workflow_run_url]
      note: "Deployment data sourced from successful workflow runs only."

Configuration details

Rollback trigger alerts

Imagine this scenario: you deploy at 2 PM, and at 2:15 PM your error rate on Sentry jumps from 5 errors/hour to 50 errors/hour. That spike is a signal to investigate the deploy immediately — possibly to roll it back.

The rollback-alerter watches your production deployments and compares error rates (from Sentry) before and after the deploy. If errors spike by a threshold you set (default: 3×), it flags the deploy as a possible rollback candidate and surfaces the top new error types.

Add to AGENTS.md:

agents:
  rollback-alerter:
    description: "Flag when error rates spike after a deploy, suggesting possible rollback"
    tools:
      - github-api
      - sentry-api
    config:
      github:
        owner: "${GITHUB_OWNER}"
        token: "${GITHUB_TOKEN}"
        deployment_workflow: "deploy-prod.yml"
      sentry:
        token: "${SENTRY_TOKEN}"
        organisation: "${SENTRY_ORG}"
        project: "${SENTRY_PROJECT}"
      spike_threshold_multiplier: 3
      window_minutes_post_deploy: 60
    prompt: |
      Check the last production deployment time. Compare Sentry error rates in the
      60 minutes before vs 60 minutes after. If error rate increased by more than
      spike_threshold_multiplier, output a ROLLBACK CONSIDERATION alert with:
      - Deploy time and commit
      - Pre-deploy error rate vs post-deploy error rate
      - Top 3 new or spiking error types
      - Recommendation: investigate or rollback
      If no spike: output "✓ No error spike detected after latest deploy."
⚠️ Important: This agent flags potential rollback candidates but does not trigger rollbacks automatically. Rollback is always a human decision. Use this to alert the on-call engineer quickly so they can investigate and decide.

Setup notes

CircleCI alternative

If you use CircleCI instead of GitHub Actions, the same logic applies — only the API and config differ. CircleCI API v2 requires a personal API token from User Settings → Personal API Tokens. Store it as CIRCLECI_TOKEN in secrets.env.

Example agent config for CircleCI:

agents:
  circleci-build-detector:
    description: "Detect failed CircleCI pipeline runs"
    tools:
      - circleci-api
    config:
      token: "${CIRCLECI_TOKEN}"
      organisation: "your-circleci-org"
      projects: ["project-a", "project-b"]
      time_window_hours: 24
    analysis:
      extract_failure_step: true
      extract_log_lines: 20
    output:
      format: markdown
      include_fields: [project, pipeline_id, failed_job, branch, created_at, workflow_url]

Note: CircleCI has a built-in flaky test detection feature in its test insights endpoint. If you use CircleCI, check Insights → Test Insights in the web UI first; you may not need a separate flaky-test-finder agent.

HEARTBEAT templates

Three common schedules for build health monitoring:

Post-deploy check (webhook-triggered)

Runs 60 minutes after a production deploy to check for error spikes:

schedules:
  post-deploy-rollback-check:
    trigger: webhook
    agents: [rollback-alerter]
    delay_minutes: 60

Add this webhook call to the end of your deploy-prod.yml workflow:

- name: Check for rollback need
  run: |
    curl -X POST https://your-openclaw-instance.com/webhook \
      -H "Authorization: Bearer ${OPENCLAW_TOKEN}" \
      -d '{"schedule": "post-deploy-rollback-check"}'

Daily build health (weekday mornings)

Runs Monday–Friday at 8 AM to summarise the last 24 hours of builds:

schedules:
  daily-build-health:
    cron: "0 8 * * 1-5"
    agents: [build-failure-detector, deployment-tracker]

Weekly flaky test report (Monday morning)

Runs every Monday at 9 AM to surface the week's flakiest tests:

schedules:
  weekly-flaky-tests:
    cron: "0 9 * * 1"
    agents: [flaky-test-finder]
    output_destination: slack
    channel: "#engineering"

Sample report walkthrough

Here's what a daily build health report looks like. This is a real-world example generated by build-failure-detector, flaky-test-finder, and deployment-tracker running together:

📋 Daily Build Health — March 25, 2026
🔴 Failed Builds (last 24h)
1 failed run on branch main
Workflow: CI / test
Failed step: Run unit tests
Failure type: npm install failure
Error excerpt:
npm ERR! code ENOENT
npm ERR! syscall rename
npm ERR! path /home/runner/node_modules/.staging/@types/node-12345
🎲 Flaky Tests (last 20 runs)
test/auth.spec.js › should validate password — failure rate 35%
test/api.spec.js › should handle concurrent requests — failure rate 22%
🚀 Deployments
production: deployed 4 hours ago (commit abc123f)
staging: deployed 12 hours ago (commit def456a)

Notice what the report includes: which step failed (not just "the build failed"), the actual error type, and the context needed to act (branch, commit, etc.). And notice what it doesn't: false alarms from dependabot, noise from PR branches, or tests that always fail.

Frequently asked questions

Can OpenClaw trigger a rollback automatically?

No. OpenClaw identifies the conditions that suggest a rollback may be needed and flags them in the report. Triggering an actual rollback requires a write-enabled token and explicit configuration — and is a high-stakes operation that should always involve human judgement. The recommended pattern is to alert on-call and let them decide.

How does flaky test detection work?

OpenClaw reads GitHub check annotations (test results attached to workflow runs) across the last N runs and finds test names that appear in both passed and failed states. It counts alternations — how many times the result flipped — and flags tests that flip more than your configured threshold. Tests that always fail are not classified as flaky.

Does this work with GitLab CI or Jenkins?

GitLab CI has an API similar to GitHub Actions and can be adapted with the same approach using the GitLab API base URL and token. Jenkins has a REST API but the structure varies significantly by plugin. GitLab CI is straightforward to adapt; Jenkins requires more custom config.

How do I connect the rollback alerter to my deployment workflow?

Add a workflow_dispatch trigger or a webhook call to OpenClaw at the end of your deploy workflow. The rollback-alerter then uses the deploy timestamp from the GitHub API to anchor its before/after error rate comparison.