Transcript NLP & Tone Analysis with OpenClaw — Transcripts & Audio Part 4 of 5

Q: How do I split prepared remarks from Q&A reliably?

Most transcripts include a clear 'Question-and-Answer Session' or 'QUESTION AND ANSWER' header. Split on that string (case-insensitive).

What NLP adds

Raw transcripts are useful. Processed transcripts are a dataset. Once you're collecting transcripts systematically (Parts 1–3), you can build longitudinal metrics: Is management getting more cautious? Are they using more hedging language? Did the tone shift between the Q3 and Q4 call? These are signals that don't exist anywhere else.

Four NLP layers

Layer 1 — Confidence scoring

Confident language: specific numbers, definitive statements, "we will," "we expect," concrete timelines

Hedging language: "potentially," "we believe," "subject to," "depending on," "uncertain," "difficult to predict"

Score: (confident_count - hedge_count) / total_sentences × 100

Layer 2 — Management vs analyst tone

Analyze prepared remarks separately from Q&A. Management often sounds more optimistic in prepared remarks. A gap between prepared-remarks confidence and Q&A defensiveness is a signal.

Layer 3 — Quarter-over-quarter language diff

Compare word frequency across calls. Rising frequency of "tariff," "inventory," "macro" signals emerging concerns. Declining mentions of a product line signals deprioritization.

Layer 4 — Keyword alert system

Specific phrases trigger immediate alerts regardless of overall tone: "going concern," "material weakness," "restatement," "withdraw guidance," "SEC investigation."

HEARTBEAT configuration

name: transcript_nlp_pipeline
schedule: "on_new_transcript"
steps:
  - load_transcript:
      source: "{{ transcript_path }}"
  - split_sections:
      sections:
        - name: prepared_remarks
          end_marker: "questions and answers"
        - name: qa_section
          start_marker: "questions and answers"
  - score_confidence:
      sections: [prepared_remarks, qa_section]
  - extract_keywords:
      track:
        - "guidance"
        - "tariff"
        - "inventory"
        - "AI"
        - "macro"
        - "uncertainty"
        - "pricing"
  - check_red_flags:
      phrases:
        - "going concern"
        - "material weakness"
        - "restatement"
        - "withdraw guidance"
        - "SEC"
        - "DOJ"
      alert_immediately: true
  - compare_to_prior:
      quarters: 4
      metric: confidence_score
  - llm:
      prompt: |
        Compare this quarter's transcript tone to the prior 4 quarters.
        Focus on: confidence trend, keyword frequency shifts, Q&A defensiveness.
        Current quarter data: {{ current_metrics }}
        Prior quarters: {{ historical_metrics }}
  - save:
      path: "nlp/{{ company }}_{{ quarter }}_metrics.json"
  - notify:
      subject: "🧠 NLP Brief: {{ company }} Q{{ quarter }} — Confidence {{ score }}/100"

Confidence scorer — Python snippet

import re

CONFIDENT_PHRASES = [
    "we will", "we expect", "we are confident", "we delivered",
    "record", "exceeded", "outperformed", "strong demand", "accelerating"
]
HEDGE_PHRASES = [
    "potentially", "we believe", "subject to", "depending on",
    "uncertain", "difficult to predict", "we hope", "if conditions",
    "may", "might", "could be", "we cannot guarantee"
]

def score_confidence(text: str) -> dict:
    text_lower = text.lower()
    sentences = re.split(r'[.!?]', text_lower)
    confident = sum(1 for s in sentences if any(p in s for p in CONFIDENT_PHRASES))
    hedging = sum(1 for s in sentences if any(p in s for p in HEDGE_PHRASES))
    total = len([s for s in sentences if len(s.strip()) > 10])
    score = round(((confident - hedging) / total * 100) + 50) if total > 0 else 50
    score = max(0, min(100, score))
    return {"confidence_score": score, "confident_sentences": confident,
            "hedging_sentences": hedging, "total_sentences": total}

Red flag detector — Python snippet

RED_FLAGS = [
    "going concern", "material weakness", "restatement",
    "withdraw guidance", "sec investigation", "doj", "class action",
    "liquidity concerns", "covenant violation", "impairment charge"
]

def check_red_flags(text: str) -> list:
    text_lower = text.lower()
    found = []
    for flag in RED_FLAGS:
        if flag in text_lower:
            # Extract surrounding context
            idx = text_lower.find(flag)
            context = text[max(0, idx-100):idx+200].strip()
            found.append({"flag": flag, "context": context})
    return found

Quarter-over-quarter comparison — Python snippet

import json, os, glob

def load_historical_metrics(company: str, quarters: int = 4) -> list:
    pattern = f"nlp/{company}_*_metrics.json"
    files = sorted(glob.glob(pattern), reverse=True)[:quarters]
    history = []
    for f in files:
        with open(f) as fh:
            history.append(json.load(fh))
    return history

def trend_analysis(current: dict, history: list) -> dict:
    if not history:
        return {"trend": "insufficient_data"}
    scores = [h.get("confidence_score", 50) for h in history]
    avg_prior = sum(scores) / len(scores)
    delta = current["confidence_score"] - avg_prior
    trend = "improving" if delta > 5 else "declining" if delta < -5 else "stable"
    return {"trend": trend, "current_score": current["confidence_score"],
            "prior_avg": round(avg_prior, 1), "delta": round(delta, 1)}

Frequently asked questions

Q: How do I split prepared remarks from Q&A reliably?

A: Most transcripts include a clear "Question-and-Answer Session" or "QUESTION AND ANSWER" header. Split on that string (case-insensitive).

Q: Is LLM-based analysis more accurate than keyword counting?

A: For nuance, yes — LLMs understand context better than keyword lists. Use keyword scoring for speed and consistency, LLM analysis for depth on important calls.

Q: How many quarters do I need before the trend analysis is useful?

A: Minimum 3 quarters for a trend, 6+ for statistical reliability.