About this video
Stop settling for manual captioning when you can build a custom AI engine in an afternoon. In this video, I walk you through how I built a bespoke subtitle generator using the new Veed API on Fal, Mux for hosting, and Claude for the heavy lifting. This tool has completely revolutionised my workflow for the Command AI podcast, and the best part is that you can grab the source code to use it yourself. Key Takeaways: - How to integrate the Veed Subtitle API for automated word-level captions. - Using Mux as a temporary storage solution to keep your project on a free tier. - Leveraging Claude and Warp AI to build and redesign functional apps in record time. - A breakdown of the cost: 10 cents per minute for high-quality, styled video production. - How to customise fonts and colours to match your brand perfectly.
Stop wasting your life on manual video captions
Most creators are still stuck in the dark ages of video editing. They spend hours scrubbing through timelines, manually adjusting word timings, and wrestling with clunky caption tools. It is a massive drain on productivity that can be solved with a bit of code and the right API.
In my latest project, I decided to automate this entire headache. By leveraging the newly launched Veed subtitle API on Fal, I built a custom subtitle generator tailored specifically for my podcast, Command AI. The goal was simple: upload a video, get perfectly styled captions back, and avoid the premium price tags of enterprise software.
The Tech Stack
To get this running, I utilised a lean but powerful stack:
- React & Vite: For a snappy, functional user interface.
- Veed Subtitle API (via Fal): The engine that handles the heavy lifting of transcription and burning subtitles into the video.
- Mux: Used as a temporary bucket for video hosting to provide the API with a public URL.
- Claude (Anthropic): My AI pair programmer that helped rattle through the boilerplate and logic.
The Workflow
The application works by taking a video file and uploading it to Mux. Because Mux offers a generous free tier for internal tools, it is the perfect temporary home for the files. Once the Veed API processes the video and burns in the subtitles, the final result is downloaded, and the asset is immediately deleted from Mux to keep costs at zero.
Customisation and Branding
One of the best features of using an API over a standard editor is the level of control. I was able to match the exact font and colour palette used in my podcast branding. By feeding these parameters directly into the API calls, every video comes out looking consistent without me having to lift a finger in Final Cut Pro.
Why You Should Build Your Own Tools
The barrier to entry for building custom software has never been lower. With AI agents like Claude and the Warp terminal at your disposal, you can visualise a problem and build a solution in an afternoon. You do not need to wait for a SaaS company to build the feature you want: you can just build it yourself.
If you are interested in the source code or want to see the different styling presets available, check out the links in the video description. Let's get vibing.
Transcript▾
Veed have just launched the subtitle API over on Fal, and they've kindly sponsored this video, which I'm really excited about because it means we get to build something. We're going to be building a subtitle generator that I will genuinely be using for editing and creating videos for our podcast Command AI. So, you'll see exactly how I use AI to build my apps, and also how I'll integrate the Veed subtitle API in order to generate subtitles against the videos that we create.
I'll leave links to everything down below, including a link to the source code, so you can just use this app as you want. And there'll also be some useful information over on my website, which you'll learn about further on in the video. Let's get vibing.
So, to give you an insight into my current workflow, this is a Command AI edit, which is my podcast, which I host with Kabaza. And what I tend to do is edit the introduction here, command at shift C, which adds these captions inside of Final Cut, and then export these to a caption file. And then within Claude, I have a Remotion project here, which I then trigger the Command AI captions here, and I drag in that script. I export it from Remotion, and then what you'll see is it matches the style and all the rest of it. Really nice, but the problem is, again, it's not matching those individual words. So, that's what I want to achieve with the Veed API.
Okay, so looking at the playground here, I can see that we need to pass it a URL. We can use some presets here, which is quite nice. This might be a good thing to test things out with, and of course, selecting the language. So, we can leave that as default. Let's run that and see what Whisper looks like. Okay, nice. So, it's like a centralised thing. Now, I'm sure we can move that around, and then advanced settings, we've got positioning, shadows, various things like that. Going to the API here, we have a seven instructions with JavaScript and all the rest of it. All I'm going to do is copy that URL and use it a little bit later on.
So, I think we're going to build a basic React app that allows me to upload a video, a short video, to send to the Veed API and then get back a video with the subtitles. I think it needs a public URL, so I think I'm going to use Mux, which is a service I use for my internal tool here called Greenlit. This is a video tool that I send videos to my sponsors and they can comment. Basically, a Frame.io clone that I don't have to pay for, basically. And it's exactly what I want. Uses a service called Mux, which is a free video hosting tool. I'm probably going to get the tool to delete the video once it's been downloaded from the Veed and from Fal, just so I can maintain under the free tier. But yeah, I could use AWS, but this is just a very, very nice, easy tool. Let's get this going on.
So, let's launch Claude here. I want to build a basic React app that allows me to upload a short video, give it to the Veed API, and allow me to download the video that gets produced from the Veed API. Excellent. I want to use Mux as my video upload bucket. And once the video has been created and downloaded from Veed / Fal, delete the video from Mux. Excellent. Now, what I'm going to do, we copied the URL, so I'm going to put that in there. I'm going to say Veed API documentation. Let's go to Opus. We'll go into plan mode, and let it do its thing.
Cool. Next JS, Vite React, Express, or Vite React only. I think we could probably get away with Vite and React only because I'm not going to be doing anything special. I'm not going to be pushing this online. It's going to be your own personal tool. So, we'll do that. Auto detect and it's saying glass preset. That's fine. Delete immediately after power job completes: recommended. Yeah, let's do that. This is about what happens with the video that we download. I'm happy just to download that immediately and this looks good.
Okay, we've got another set of questions here. Mux direct upload creation and asset deletion both require server-side API calls with secret credentials. Pure client-only Vite app can't safely do this. How should we handle that? [snorts] Um I mean, again, we are not going to be pushing this live. It's just going to be on my machine, but let's go with the recommendation. Going to quickly rattle through this so I won't bore you.
Fresh project and Vite subtitle them to a directory, build a small app that uploads a short video, sends it to Fal's subtitles model for subtitle burning, then download the result. Mux is used as a temporary upload bucket. Pass to Fal's video URL and then the Mux asset is deleted as soon as the job completes. That's exactly what we want. Vite React TypeScript back end, tiny Express server, package manager bun, external services. That's it. I think that's kind of all we need to go for.
We're going to use the new auto-accept mode. Although, I am trying to get better at being more explicit with the permissions I'm giving it rather than using extra tokens and extra usage on this auto mode. Trying to just get better at defining the permissions that I want, but we'll do that for now. While it's doing that, inside of Mux, I'm going to create a new project here and I'm going to grab an access token. And video is fine. So, if we make the env example, paste in our secrets, obviously passwords, so you wouldn't be sharing these at all. Also pasted in. And there we go. So, built, type-checked, and clean. App is ready to run. To use it, use these: bun run dev. Let's give it a go.
Excellent. Okay, let's give this a go. So, what I'm going to do, my workflow from now on will be: I'll edit together the actual video, the intro there, and I will make that into a new compound clip, just so I can easily click in and export it. Adding with subtitles. Nice. Download subtitle video. Let's see what we've got.
'Warp, my favourite terminal, and it's slowly becoming your favourite terminal, too. Our favourite terminal has become open source, which is really exciting. But, what does it mean? It means quite a few interesting things actually that's come out of it.'
That's spot on. Now, what is important now: we should see nothing in Mux, which is good. Any logs? There we go. So, it was deleted. So, we're always going to hopefully maintain our free plan. Looking back to here, we've got a bunch of presets, which we can add. And if I go to this, I go new, add a drop-down to add additional presets to send to Fal. But, what I'm really interested in is the customisations we're allowed. So, if I go to playground here, we want to try and match exactly what we've got before. So, if I go into here, we should find the font that we're using, which I think is a Google font. In fact, let's just go match the styling to send to Veed with the following. I'll just drag that in as defaults. See how it goes with that.
Okay, so I ended up removing the presets because of course, every single video we upload will have the same style, the same everything else. So, I'll give this another go.
'Warp, my favourite terminal...'
So, it's using our font now. It's using the colour. It looks good and everything's in time. This is perfect. And just because it's super easy to do and it took me 2 seconds, I just did a quick redesign and show you how I did that. Inside of Warp, which gives me access to any agent that I want, I literally just asked it: 'Redesign the app. Change nothing about its structure. This is purely a visual change.' Because what I find, if you ask an agent to redesign an app, it will just change the structure and just add features that weren't there and things like that. So, I just made it clear: 'Change nothing about its structure.' And it went through and just made it a little bit more nice.
So, that's the new design. And you'll notice I've added a little drop-down for presets because I'll likely add this to a few of my different projects and I'll have a different preset for each of those projects. So, there's a nice little UI change there. What I would love to see is that I actually have a Veed account and we've built our own styling. We've got our own branding in here. I would love to be able to access my brand kits and things like that. Access to the Veed account or at the very least be able to access the highlighted word. So that I can say the text is white, but the spoken word is green as an example. Now, they have some settings down here which say they highlight viral-ranked words or highlighted words. And I tried this, but it wasn't doing it. I guess it's automatically highlighting certain words as opposed to every spoken word. So, yeah, I think it's more important that we highlight the spoken word rather than just arbitrary viral words, whatever this means.
What I'd like to see is actually just some examples. Like, I wasted a bunch of credits just going through these baseline presets because you're effectively setting the baseline preset and then changing the font and the colour and the various things like that. So, I'd love to be able to just see these and decide and be able to test them out before I then change them. So, what we'll try and do is: I've got a bunch of credit here from the sponsor. So, I'm going to go through and I'm just going to put a page on my website that actually shows all of the different styles so that you can make an informed decision before then you add customisations on top of that. So, links for that down below.
So, to talk about pricing, the keen eyes among you might have seen this down here. So, requests will cost 10 cents per minute with a two times multiplier for resolutions above 1080p. We don't do anything above 1080p anyway, so we don't have to worry about that. So, 10 cents per minute video and a 2x multiplier for dynamic styling. I'm guessing the dynamic styling is these highlighted words here. So, a 2x multiplier on top of that with a minimum charge of 1 minute. So, what I can see myself doing is chunking a bunch of videos because they're only a few seconds, putting a bunch of intros together in one video to equal those minute blocks as possible and then running it through this upload feature.
So, that will do it. Thank you so much to Veed for sponsoring this video. It was really fun to actually build a tool. Like, I think we all need to be doing more of this. It's so easy now with AI to build your own tool. I think the Veed subtitles API is amazing. Definitely going to save a bunch of time and just make for better quality videos knowing that we've got subtitles that are working in time with the words that are actually spoken. This is music to my ears. I love this. So, once again, thanks Veed for sponsoring the video. Links to everything for all that will be down below in the video. Like, subscribe if you haven't already. Until next time, keep on vibing.