Audio Transcription Pipeline with OpenClaw — Transcripts & Audio Part 3 of 5

Why audio transcription matters

Not everything gets transcribed. Investor days, analyst days, roadshows, and some international earnings calls are published as audio or video only. Whisper ($0.006/min) turns any MP3 or MP4 into searchable, analyzable text. A 90-minute investor day costs about $0.54 to transcribe.

Cost breakdown

Event type	Typical length	Whisper cost	Notes
Earnings call	45–75 min	$0.27–$0.45	Often transcribed free elsewhere
Investor day	2–6 hours	$0.72–$2.16	Rarely transcribed free
Analyst day presentation	30–60 min	$0.18–$0.36	Often not transcribed
Fed press conference	45–60 min	$0.27–$0.36	Fed publishes free transcript
Conference presentation	20–40 min	$0.12–$0.24	Almost never transcribed

Pipeline architecture

IR page / audio URL → OpenClaw detects new file → download audio → chunk if >25MB → Whisper API → clean text → LLM analysis → notify

HEARTBEAT configuration

name: audio_transcription_pipeline
schedule: "0 9 * * 1-5"
steps:
  - check_ir_pages:
      companies:
        - name: "Nvidia"
          ir_url: "https://investor.nvidia.com/events-and-presentations/events/default.aspx"
          audio_patterns: ["*.mp3", "*.mp4", "*.m4a", "webcast"]
        - name: "Tesla"
          ir_url: "https://ir.tesla.com/events-and-presentations"
          audio_patterns: ["*.mp3", "*.mp4", "webcast"]
  - check_seen:
      dedup_key: audio_url
      store: seen_audio.json
  - download_audio:
      max_file_size_mb: 200
      format_preference: ["mp3", "m4a", "mp4"]
  - transcribe:
      provider: openai_whisper
      model: whisper-1
      language: en
      chunk_size_mb: 24
  - llm:
      prompt: |
        This is a transcript of {{ event_type }} for {{ company }}.
        Produce a structured brief:
        1. KEY ANNOUNCEMENTS: New products, partnerships, financial targets
        2. STRATEGIC DIRECTION: Where is management taking the company?
        3. MANAGEMENT TONE: Confident, cautious, or defensive?
        4. NOTABLE QUOTES: 3-4 direct quotes worth preserving
        5. FOLLOW-UP QUESTIONS: What would you ask in Q&A?
        Transcript: {{ transcript }}
  - save:
      path: "transcripts/{{ company }}_{{ date }}.txt"
  - notify:
      subject: "🎙️ Transcribed: {{ company }} {{ event_type }} — {{ date }}"

Whisper API — Python snippet

import openai
import os
import math

def transcribe_audio(file_path: str, language: str = "en") -> str:
    """Transcribe audio file using OpenAI Whisper API."""
    client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
    if file_size_mb > 24:
        return transcribe_chunked(file_path, client, language)
    with open(file_path, "rb") as f:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=f,
            language=language,
            response_format="text"
        )
    return response

def transcribe_chunked(file_path: str, client, language: str = "en") -> str:
    """Split large audio files and transcribe in chunks."""
    import subprocess, tempfile
    duration_cmd = ["ffprobe", "-v", "quiet", "-show_entries",
                    "format=duration", "-of", "csv=p=0", file_path]
    duration = float(subprocess.check_output(duration_cmd))
    chunk_duration = 1400  # 23 min chunks to stay under 25MB
    num_chunks = math.ceil(duration / chunk_duration)
    transcripts = []
    with tempfile.TemporaryDirectory() as tmp:
        for i in range(num_chunks):
            start = i * chunk_duration
            chunk_path = os.path.join(tmp, f"chunk_{i}.mp3")
            subprocess.run(["ffmpeg", "-ss", str(start), "-t", str(chunk_duration),
                           "-i", file_path, "-q:a", "0", chunk_path, "-y"], check=True)
            with open(chunk_path, "rb") as f:
                result = client.audio.transcriptions.create(
                    model="whisper-1", file=f, language=language, response_format="text")
            transcripts.append(result)
    return " ".join(transcripts)

IR page audio detection — Python snippet

import httpx
from bs4 import BeautifulSoup
import re

def find_audio_links(ir_url: str) -> list:
    """Scrape IR page for audio/video links."""
    r = httpx.get(ir_url, headers={"User-Agent": "TranscriptBot/1.0 contact@youremail.com"},
                  timeout=15, follow_redirects=True)
    soup = BeautifulSoup(r.text, "html.parser")
    audio_extensions = re.compile(r'\.(mp3|mp4|m4a|wav|ogg|webm)(\?|$)', re.IGNORECASE)
    webcast_patterns = re.compile(r'webcast|listen|audio|replay', re.IGNORECASE)
    links = []
    for a in soup.find_all("a", href=True):
        href = a["href"]
        if audio_extensions.search(href) or webcast_patterns.search(a.get_text()):
            links.append({"text": a.get_text(strip=True), "url": href})
    return links

Frequently asked questions

Q: Does Whisper support languages other than English?

A: Yes — Whisper supports 99 languages. Set the language parameter or leave it blank for auto-detection.

Q: What's the maximum file size for Whisper?

A: 25MB per request. Use the chunked transcription function above for longer files. ffmpeg is required for chunking.

Q: How accurate is Whisper on financial terminology?

A: Very accurate on earnings calls. Occasional errors on proper names and ticker symbols — consider a post-processing glossary substitution step.