Why audio transcription matters
Not everything gets transcribed. Investor days, analyst days, roadshows, and some international earnings calls are published as audio or video only. Whisper ($0.006/min) turns any MP3 or MP4 into searchable, analyzable text. A 90-minute investor day costs about $0.54 to transcribe.
Cost breakdown
| Event type | Typical length | Whisper cost | Notes |
|---|---|---|---|
| Earnings call | 45–75 min | $0.27–$0.45 | Often transcribed free elsewhere |
| Investor day | 2–6 hours | $0.72–$2.16 | Rarely transcribed free |
| Analyst day presentation | 30–60 min | $0.18–$0.36 | Often not transcribed |
| Fed press conference | 45–60 min | $0.27–$0.36 | Fed publishes free transcript |
| Conference presentation | 20–40 min | $0.12–$0.24 | Almost never transcribed |
Pipeline architecture
IR page / audio URL → OpenClaw detects new file → download audio → chunk if >25MB → Whisper API → clean text → LLM analysis → notify
HEARTBEAT configuration
name: audio_transcription_pipeline
schedule: "0 9 * * 1-5"
steps:
- check_ir_pages:
companies:
- name: "Nvidia"
ir_url: "https://investor.nvidia.com/events-and-presentations/events/default.aspx"
audio_patterns: ["*.mp3", "*.mp4", "*.m4a", "webcast"]
- name: "Tesla"
ir_url: "https://ir.tesla.com/events-and-presentations"
audio_patterns: ["*.mp3", "*.mp4", "webcast"]
- check_seen:
dedup_key: audio_url
store: seen_audio.json
- download_audio:
max_file_size_mb: 200
format_preference: ["mp3", "m4a", "mp4"]
- transcribe:
provider: openai_whisper
model: whisper-1
language: en
chunk_size_mb: 24
- llm:
prompt: |
This is a transcript of {{ event_type }} for {{ company }}.
Produce a structured brief:
1. KEY ANNOUNCEMENTS: New products, partnerships, financial targets
2. STRATEGIC DIRECTION: Where is management taking the company?
3. MANAGEMENT TONE: Confident, cautious, or defensive?
4. NOTABLE QUOTES: 3-4 direct quotes worth preserving
5. FOLLOW-UP QUESTIONS: What would you ask in Q&A?
Transcript: {{ transcript }}
- save:
path: "transcripts/{{ company }}_{{ date }}.txt"
- notify:
subject: "🎙️ Transcribed: {{ company }} {{ event_type }} — {{ date }}"
Whisper API — Python snippet
import openai
import os
import math
def transcribe_audio(file_path: str, language: str = "en") -> str:
"""Transcribe audio file using OpenAI Whisper API."""
client = openai.OpenAI(api_key=os.environ["OPENAI_API_KEY"])
file_size_mb = os.path.getsize(file_path) / (1024 * 1024)
if file_size_mb > 24:
return transcribe_chunked(file_path, client, language)
with open(file_path, "rb") as f:
response = client.audio.transcriptions.create(
model="whisper-1",
file=f,
language=language,
response_format="text"
)
return response
def transcribe_chunked(file_path: str, client, language: str = "en") -> str:
"""Split large audio files and transcribe in chunks."""
import subprocess, tempfile
duration_cmd = ["ffprobe", "-v", "quiet", "-show_entries",
"format=duration", "-of", "csv=p=0", file_path]
duration = float(subprocess.check_output(duration_cmd))
chunk_duration = 1400 # 23 min chunks to stay under 25MB
num_chunks = math.ceil(duration / chunk_duration)
transcripts = []
with tempfile.TemporaryDirectory() as tmp:
for i in range(num_chunks):
start = i * chunk_duration
chunk_path = os.path.join(tmp, f"chunk_{i}.mp3")
subprocess.run(["ffmpeg", "-ss", str(start), "-t", str(chunk_duration),
"-i", file_path, "-q:a", "0", chunk_path, "-y"], check=True)
with open(chunk_path, "rb") as f:
result = client.audio.transcriptions.create(
model="whisper-1", file=f, language=language, response_format="text")
transcripts.append(result)
return " ".join(transcripts)
IR page audio detection — Python snippet
import httpx
from bs4 import BeautifulSoup
import re
def find_audio_links(ir_url: str) -> list:
"""Scrape IR page for audio/video links."""
r = httpx.get(ir_url, headers={"User-Agent": "TranscriptBot/1.0 contact@youremail.com"},
timeout=15, follow_redirects=True)
soup = BeautifulSoup(r.text, "html.parser")
audio_extensions = re.compile(r'\.(mp3|mp4|m4a|wav|ogg|webm)(\?|$)', re.IGNORECASE)
webcast_patterns = re.compile(r'webcast|listen|audio|replay', re.IGNORECASE)
links = []
for a in soup.find_all("a", href=True):
href = a["href"]
if audio_extensions.search(href) or webcast_patterns.search(a.get_text()):
links.append({"text": a.get_text(strip=True), "url": href})
return links
Frequently asked questions
Q: Does Whisper support languages other than English?
A: Yes — Whisper supports 99 languages. Set the language parameter or leave it blank for auto-detection.
Q: What's the maximum file size for Whisper?
A: 25MB per request. Use the chunked transcription function above for longer files. ffmpeg is required for chunking.
Q: How accurate is Whisper on financial terminology?
A: Very accurate on earnings calls. Occasional errors on proper names and ticker symbols — consider a post-processing glossary substitution step.