Use the YoutubeToText REST API to generate transcripts, SRT/WebVTT subtitles, and subtitled videos from any Youtube link. Start a job, poll until it's done, read the result.
Doing one transcript by hand is easy. Doing five hundred is not. Once transcription becomes part of a product or a pipeline (a content tool, a research dataset, an internal dashboard), you don't want a human pasting links into a website. You want code to do it.
That's what our REST API is for. The same engine behind the website is now available as a handful of endpoints, so you can generate plain text transcripts, SRT or WebVTT subtitle files, and videos with burned-in subtitles, all programmatically. This guide walks through how it works end to end.
The API mirrors the three things you can do on the site, each with its own endpoint:
srt and webvtt formats.Every job follows the same shape: you start it with a POST, then poll a single status endpoint until it finishes. That uniform pattern keeps your integration simple no matter which output you need.
Every endpoint requires a bearer token. Generate one on your account's API page (API access is included on our paid plans), then send it in the Authorization header on every request:
Authorization: Bearer ytt_<your-token>
A missing or invalid token returns 401 Unauthorized with a WWW-Authenticate: Bearer response header. Treat the token like a password: keep it server-side and out of public code.
The whole flow is two steps: start a job, then poll until it's done.
curl -X POST https://api.youtubetotext.ai/v1/api/transcribe \
-H "Authorization: Bearer ytt_<your-token>" \
-H "Content-Type: application/json" \
-d '{ "url": "https://youtube.com/watch?v=dQw4w9WgXcQ", "verbatim": false }'
The response hands back a job id:
{ "id": "uuid" }
curl https://api.youtubetotext.ai/v1/api/transcription/<id> \
-H "Authorization: Bearer ytt_<your-token>"
Keep polling that endpoint every 2–3 seconds. When state becomes "done", read the result field that matches the endpoint you called. The same id works across every endpoint.
The base URL shown here is illustrative. Copy the exact endpoint base and a working example straight from your account's API page, which is always in sync with the live service.
Transcription isn't instant, so the API is asynchronous. You create a job, get an id immediately, and the work happens in the background. Polling the status endpoint tells you where it is. A typical status response looks like this:
{
"id": "uuid",
"state": "processing", // waiting | preparing | downloading
// | converting | uploading | processing
// | aligning | burning | done | failed
"progress": 42, // 0-100
"error": null,
"type": "transcript", // transcript | subtitles | burned
"verbatim": false,
"quality": "default", // default | hd | uhd
"txt": null,
"srt": null,
"webvtt": null,
"video_url": null,
"burned_video_url": null
}
Show state and progress in your own UI if you have one, and stop polling once you reach done or failed.
There are three job-creating endpoints and one polling endpoint.
Generate a plain text transcript.
{
"url": "https://youtube.com/watch?v=...",
"type": "transcript",
"verbatim": false
}
When done, read txt.
Generate subtitles in SRT and WebVTT.
{
"url": "https://youtube.com/watch?v=...",
"type": "subtitles",
"verbatim": false
}
When done, read srt or webvtt.
Generate a downloadable video with the subtitles burned in.
{
"url": "https://youtube.com/watch?v=...",
"type": "burned",
"quality": "default", // "default" (720p) | "hd" (1080p) | "uhd" (4K)
"verbatim": false
}
When done, read burned_video_url, which is a direct download link.
Poll the job created by any of the three endpoints above using its id. Poll every 2–3 seconds until state is "done" or "failed".
The field you read depends on the mode you started:
transcribe → txtsubtitles → srt or webvttburn-subtitles → burned_video_urlIf state comes back "failed", the error field explains why. Queue-level problems (no credits, an invalid URL, a private video) come back as standard HTTP error responses with a JSON detail field.
verbatim: true to keep filler words and exact phrasing. The default trims them for a cleaner read.POST /transcribe is fetched by the same GET /transcription/{id}, with no separate status routes to remember.Anywhere you'd otherwise transcribe at volume or on a schedule. Batch-process a back catalogue of videos overnight. Add a "get the transcript" button to your own app. Feed clean transcripts into a search index or an LLM pipeline. Auto-generate subtitle files for every new upload. The endpoints are deliberately small, so wiring them into an existing system is quick.
Generate one on the API page in your account. API access is included on our paid plans. Check the pricing page for details. Send the token in the Authorization header as Bearer ytt_<your-token>.
Because transcription takes time. Returning a job id immediately and letting you poll keeps requests fast and reliable, and means a long video never holds a connection open. Poll the status endpoint every 2–3 seconds until the job is done.
Yes. The service can translate captions into other languages alongside the original. If you mainly need translated subtitles, our guides on translating Youtube videos walk through the workflow.
There's no practical cap for most videos. We've confirmed transcripts on videos up to 10 hours, and very long videos simply take longer to process. See our long video guide for what to expect.
Ready to put transcription on autopilot? Generate a token on your YoutubeToText account and start your first job in minutes. Open the API docs or try a transcript on the web at youtubetotext.ai.
