Use the YoutubeToText REST API to generate transcripts, SRT/WebVTT subtitles, and subtitled videos from any Youtube link. Start a job, poll until it's done, read the result.

Automate Youtube Transcription with the YoutubeToText API

By YoutubeToTextPublished June 23, 2026

Doing one transcript by hand is easy. Doing five hundred is not. Once transcription becomes part of a product or a pipeline (a content tool, a research dataset, an internal dashboard), you don't want a human pasting links into a website. You want code to do it.

That's what our REST API is for. The same engine behind the website is now available as a handful of endpoints, so you can generate plain text transcripts, SRT or WebVTT subtitle files, and videos with burned-in subtitles, all programmatically. This guide walks through how it works end to end.

What You Can Build

The API mirrors the three things you can do on the site, each with its own endpoint:

Transcribe: a clean, plain text transcript of any Youtube video.
Subtitles: a timed caption file in both srt and webvtt formats.
Burn subtitles: a downloadable video with the captions rendered into the picture, at 720p, 1080p, or 4K.

Every job follows the same shape: you start it with a POST, then poll a single status endpoint until it finishes. That uniform pattern keeps your integration simple no matter which output you need.

Authentication

Every endpoint requires a bearer token. Generate one on your account's API page (API access is included on our paid plans), then send it in the Authorization header on every request:

Authorization: Bearer ytt_<your-token>

A missing or invalid token returns 401 Unauthorized with a WWW-Authenticate: Bearer response header. Treat the token like a password: keep it server-side and out of public code.

Quick Start

The whole flow is two steps: start a job, then poll until it's done.

1. Start a job

curl -X POST https://api.youtubetotext.ai/v1/api/transcribe \
  -H "Authorization: Bearer ytt_<your-token>" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://youtube.com/watch?v=dQw4w9WgXcQ", "verbatim": false }'

The response hands back a job id:

{ "id": "uuid" }

2. Poll until done

curl https://api.youtubetotext.ai/v1/api/transcription/<id> \
  -H "Authorization: Bearer ytt_<your-token>"

Keep polling that endpoint every 2–3 seconds. When state becomes "done", read the result field that matches the endpoint you called. The same id works across every endpoint.

The base URL shown here is illustrative. Copy the exact endpoint base and a working example straight from your account's API page, which is always in sync with the live service.

The Job Model

Transcription isn't instant, so the API is asynchronous. You create a job, get an id immediately, and the work happens in the background. Polling the status endpoint tells you where it is. A typical status response looks like this:

{
  "id": "uuid",
  "state": "processing",   // waiting | preparing | downloading
                           // | converting | uploading | processing
                           // | aligning | burning | done | failed
  "progress": 42,          // 0-100
  "error": null,

  "type": "transcript",    // transcript | subtitles | burned
  "verbatim": false,
  "quality": "default",    // default | hd | uhd

  "txt": null,
  "srt": null,
  "webvtt": null,
  "video_url": null,
  "burned_video_url": null
}

Show state and progress in your own UI if you have one, and stop polling once you reach done or failed.

The Endpoints

There are three job-creating endpoints and one polling endpoint.

POST /v1/api/transcribe

Generate a plain text transcript.

{
  "url": "https://youtube.com/watch?v=...",
  "type": "transcript",
  "verbatim": false
}

When done, read txt.

POST /v1/api/subtitles

Generate subtitles in SRT and WebVTT.

{
  "url": "https://youtube.com/watch?v=...",
  "type": "subtitles",
  "verbatim": false
}

When done, read srt or webvtt.

POST /v1/api/burn-subtitles

Generate a downloadable video with the subtitles burned in.

{
  "url": "https://youtube.com/watch?v=...",
  "type": "burned",
  "quality": "default",     // "default" (720p) | "hd" (1080p) | "uhd" (4K)
  "verbatim": false
}

When done, read burned_video_url, which is a direct download link.

GET /v1/api/transcription/{id}

Poll the job created by any of the three endpoints above using its id. Poll every 2–3 seconds until state is "done" or "failed".

Reading the Result

The field you read depends on the mode you started:

transcribe → txt
subtitles → srt or webvtt
burn-subtitles → burned_video_url

If state comes back "failed", the error field explains why. Queue-level problems (no credits, an invalid URL, a private video) come back as standard HTTP error responses with a JSON detail field.

A Couple of Notes

Verbatim mode. Set verbatim: true to keep filler words and exact phrasing. The default trims them for a cleaner read.
One id, all endpoints. The job started by POST /transcribe is fetched by the same GET /transcription/{id}, with no separate status routes to remember.
Long videos take longer. The API handles long content (we've confirmed videos up to 10 hours), but a multi-hour video naturally takes more time to process, so size your polling timeouts generously.
It uses your plan minutes. API jobs draw from the same balance as the website.

Where the API Shines

Anywhere you'd otherwise transcribe at volume or on a schedule. Batch-process a back catalogue of videos overnight. Add a "get the transcript" button to your own app. Feed clean transcripts into a search index or an LLM pipeline. Auto-generate subtitle files for every new upload. The endpoints are deliberately small, so wiring them into an existing system is quick.

Frequently Asked Questions

How do I get an API token?

Generate one on the API page in your account. API access is included on our paid plans. Check the pricing page for details. Send the token in the Authorization header as Bearer ytt_<your-token>.

Why does the API use polling instead of returning the transcript directly?

Because transcription takes time. Returning a job id immediately and letting you poll keeps requests fast and reliable, and means a long video never holds a connection open. Poll the status endpoint every 2–3 seconds until the job is done.

Can I get subtitles in another language?

Yes. The service can translate captions into other languages alongside the original. If you mainly need translated subtitles, our guides on translating Youtube videos walk through the workflow.

Is there a maximum video length?

There's no practical cap for most videos. We've confirmed transcripts on videos up to 10 hours, and very long videos simply take longer to process. See our long video guide for what to expect.

Ready to put transcription on autopilot? Generate a token on your YoutubeToText account and start your first job in minutes. Open the API docs or try a transcript on the web at youtubetotext.ai.

Automate Youtube Transcription with the YoutubeToText API — illustration — transcription api, youtube to text api, rest api, srt api, developer tools