I didn't plan to stress-test my own pipeline. I was catching up on a backlog. A streamer I'm working with had been live almost every day for two weeks, and I'd fallen behind on ingesting VODs. So I pointed the auto-processor at the archive and let it rip.
What followed was the most instructive disaster I've had in months.
Here's what April looked like, raw:
That's $0.03 per video. Three cents. I'll get into why that number is both impressive and misleading in a second.
But first, the thing that broke.
On April 11th, the pipeline stopped processing new VODs. No errors in the logs. No crash. Just... nothing. New streams were going live, VODs were appearing, and my system was ignoring all of them.
I spent an embarrassing amount of time checking Twitch's EventSub webhooks, restarting services, reviewing the watcher script. Everything looked healthy. The VOD watcher was detecting new streams. It was writing them to the queue. The auto-processor was running on its cron schedule every 30 minutes.
It just wasn't doing anything.
Here's the log line I eventually found:
I had hardcoded a monthly API budget ceiling of $80. A safety net from early development when I was terrified of a runaway loop burning through OpenAI credits overnight. Totally reasonable precaution. I set it in February and forgot about it.
The pipeline hit $81.15 on April 11th and silently shut itself off. No alert. No Discord notification. No email. Just a single log line in a file I don't check daily.
Five days. Five days of streams going unprocessed because past-me was worried about a $200 API bill and didn't wire up an alert when the limit triggered.
The budget limit was the showstopper, but it wasn't the only problem. Running 92 VODs through the pipeline in a compressed window surfaced bugs that would've taken months to find at normal pace.
Twitch's EventSub webhook system is supposed to notify me when a stream goes live and when a VOD becomes available. It works great. Until it doesn't. Twice during this run, EventSub simply stopped sending notifications for 6-8 hours. No error on their status page. No degradation notice. Just silence.
My backup is a polling loop that checks for new VODs every 30 minutes regardless of webhooks. But the poller was deferring to EventSub when EventSub claimed to be healthy (the subscription was still listed as "enabled"). So I had a backup that trusted the primary system's self-reporting. That's not a backup. That's two systems that fail together.
Fixed it. The poller now runs independently and deduplicates against the processing queue. If EventSub already queued a VOD, the poller skips it. If EventSub missed it, the poller catches it. They don't talk to each other.
Here's one that's subtle. A streamer goes live, streams for 6 hours, ends the stream. Twitch creates a VOD. My system processes it. Good. But sometimes Twitch re-encodes the VOD a few hours later (different quality tiers become available), and the VOD ID stays the same but the duration changes slightly. My dedup was checking VOD ID + exact duration. The re-encoded version had a duration 2 seconds longer. Different enough to bypass dedup. Same enough to produce identical output.
I caught this because I had 3 duplicate videos in the review queue with timestamps 4 hours apart. Three sets of wasted GPU time, wasted API calls, wasted storage. At scale, that's real money.
Fixed it with a duration tolerance window of 30 seconds. Same VOD ID + duration within 30 seconds = same VOD. Simple. Should've thought of it earlier.
My workstation has a 24GB GPU. Whisper large-v3 (the transcription model) loads about 6GB. Scene detection analysis needs another 3-4GB depending on the frame batch size. If I'm processing a long VOD (8+ hours — yes, some of these streams are 8 hours), the feature extraction step can spike to 20-22GB.
At 96% VRAM utilization, the GPU starts thermal throttling. Not crashing, just getting slower. A transcription pass that normally takes 14 minutes stretches to 25. Across 74 VODs, that's hours of extra processing time.
I haven't solved this one cleanly. Right now I'm limiting concurrent processing to one VOD at a time and forcing a 60-second cooldown between jobs. It's slow, but it doesn't crash. The real fix is batching the feature extraction to cap at 18GB, but that requires rewriting the scene analyzer's memory management. It's on the list.
Not everything was a disaster. Some parts of the pipeline performed better than I expected under load.
74 VODs. Hundreds of hours of audio. Whisper large-v3 nailed the transcription on almost all of it. I spot-checked maybe 30 segments across different VODs and found two real errors — both were character names from GTA RP that Whisper had never seen before. It transcribed "Wrangler" as "Rangler" once and "Jean Pierre" as "John Pierre" twice.
For GTA RP content specifically (heavy slang, character voices, people talking over each other, in-game radio chatter), a word error rate under 3% is genuinely impressive. I've tested commercial transcription services that do worse on this kind of audio.
The arc extractor identifies narrative segments within a VOD — complete story arcs with a beginning, middle, and end. It's the core of the pipeline. If it pulls out garbage arcs, everything downstream (titles, descriptions, review scores) is garbage too.
I scored a sample of 40 extracted arcs on a 1-10 scale for narrative completeness. The average was 8.2. Thirty-one of them scored 8 or above. Six scored 7. Three scored below 6, and all three were from VODs where the streamer was mostly AFK or doing menu navigation for extended periods. The extractor tried to find a story where there wasn't one.
That's a failure mode I can live with. It means the extractor is aggressive about finding content (good) but doesn't know when to give up (fixable). I added a minimum speech density threshold — if a segment has less than 40% active speech, skip it. That killed the worst false positives.
$148 total API spend. 2,664 videos produced. $0.03 per video.
That includes transcription (local Whisper, so $0 API cost), arc extraction (GPT-4o-mini for analysis), title generation (GPT-4o-mini), description writing (GPT-4o-mini), and 7-agent review (mix of GPT-4o-mini and GPT-4o for the brand safety agent).
Look, I need to be honest about what "$0.03 per video" means. It doesn't include GPU electricity costs, storage costs, or my time debugging the five problems I just described. If I billed my own hours at freelance rates, the real cost per video is... a lot more than three cents.
But the API cost — the variable cost that scales with volume — is three cents. That matters because it means processing 10,000 videos a month doesn't cost $10,000. It costs $300 in API fees. The economics of this work at scale in a way I didn't expect.
92 tracked. 74 processed. What happened to the other 18?
The 4 lost to the budget freeze are the ones that sting. Those VODs are gone now. I can't get them back. Content that could've been 120+ videos, lost because I hardcoded a number in February and forgot to set up an alert.
If I were running this volume again from scratch:
Alert on everything that stops the pipeline. Not just errors. Anything that causes the auto-processor to skip work — budget limits, auth failures, API rate limits, disk space thresholds. If the pipeline decides not to process something, I want to know within 5 minutes, not 5 days.
Don't trust Twitch's EventSub status. Run independent detection. Always. EventSub is a nice-to-have for low latency, but it's not reliable enough to be the sole detection mechanism for anything you care about.
Set VRAM budgets per pipeline stage. "Use whatever the GPU has available" is not a strategy. Each stage should declare its maximum memory footprint, and the scheduler should refuse to start a stage if there isn't enough headroom.
Process VODs within 48 hours or flag them red. Twitch VOD retention is unreliable. Some expire in 14 days, some in 60. If a VOD has been in the queue for more than 48 hours without being processed, that should be a critical alert, because every hour increases the risk of losing it.
Here's the thing — none of these are hard problems. They're all "I should have thought of that" problems. The pipeline itself works. The transcription works. The arc extraction works. The review system works. What doesn't work is the operational scaffolding around it, the alerting and scheduling and resource management that keeps the whole thing running when I'm not watching.
That's the unsexy part of automation. The automation itself is the easy part. Keeping it automated is the job.
Automated VOD processing that handles the hard parts. Try a free demo.
Try the free demo