microsoft/ai-audio-descriptions
TypeScript
Captured source
source ↗microsoft/ai-audio-descriptions
Language: TypeScript
License: MIT
Stars: 44
Forks: 15
Open issues: 4
Created: 2024-08-22T22:38:17Z
Pushed: 2026-06-24T04:40:01Z
Default branch: main
Fork: no
Archived: no
README:
AI Audio Descriptions
Introduction
Audio Description is a technique for describing what is happening during a video, to benefit audience members who are blind or have low vision. This generally takes the form of a second audio track, and is available on TV, streaming services, and at movie theaters. The narration is timed to fit within silent parts of the video, so it doesn't overlap the dialog, and does not increase the length of the program (as would be the case if the video was paused to provide a description).
This project leverages Artificial Intelligence to assist in the process of generating the Audio Description track. First, a description is generated for each scene, along with a transcript of the dialog. Silences are then identified, and the descriptions rewritten to fit in the gaps. This is presented to the human AD editor as a draft to review and update. Once the script is finalized, the video can be downloaded with Audio Descriptions inserted using Text-To-Speech.
We hope that making the AD authoring process faster, and thus less expensive, will result in more inclusive content being created. Providing content with AD tracks is a legal requirement in several countries, and this will also help media companies meet these requirements.
We'd love to hear what you think. Especially if you deploy this solution within your organization. Email [aiad@microsoft.com](mailto:aiad@microsoft.com).
Examples
https://github.com/user-attachments/assets/c880afc3-1b5a-403b-9610-0503bccbd21c
https://github.com/user-attachments/assets/e724070a-bca9-417a-8f08-85c5e30779f7
Try It Yourself
We are providing this solution as open source to enable content creators to incorporate it into their workflows. The web app allows uploading of MP4 videos, having the draft AD script generated, editing the script, and generating a new video file with the audio description inserted.
While we provide an end-to-end user experience, aspects such as hosting, authentication and authorization will differ customer-to-customer.
The below details will enable a developer to run the solution on their dev box.
Setup Azure
We provide two options for setting up your Azure environment:
Option 1: Automated Setup (Recommended) - Zero to Hero in 5 Minutes! 🚀
Prerequisites:
- Azure subscription (get a free one here)
- Azure CLI installed
- Bash shell (Linux, macOS, or WSL on Windows)
Steps: 1. Login to Azure: az login 2. Run the setup script: ./deploy/setup.sh 3. Done! All resources are created and configured automatically.
The automation creates:
- Resource Group:
aiad - Azure AI Services: Multi-service cognitive services resource with GPT-5.5 model deployment
- Storage Account: Blob storage with
audio-descriptioncontainer - CORS configuration: Enabled for local development
- SAS token: Generated with 1-year validity for secure access
- Environment file: Automatic
.envfile creation with all configuration
Customization: You can modify deployment parameters directly in the deploy/setup.sh script to customize resource names, regions, and other settings.
Security: The automation follows best practices with minimal required permissions, secure SAS tokens, and no secrets in source control.
Cost Warning: ⚠️ The created resources will incur Azure costs. Monitor your usage in the Azure Portal to avoid unexpected charges.
Option 2: Manual Setup
If you prefer to create resources manually:
- Azure Subscription: If you don't already have one, you can get a free Azure subscription here.
- Azure AI Services: Provides access to Azure Content Understanding, Open AI, and speech APIs. When creating the resource, select a region where GPT-5.5 is available (such as East US 2, Sweden Central, or one of the other regions allowed in
deploy/main.bicep). - Azure Storage Account: Used to store the videos. After creating the account, create a container named "audio-description" and generate a Shared Access Signiture for the container. You will also need to enable CORS to allow the app to retrieve data from blob storage (select CORS from the storage account settings and create a new rule: set Allowed Origins to be the URL where the app is running, Allowed Methods to get/put/options/delete, Allowed Headers to *, and Max Age 9999).
- GPT model: Go into the AI Services resource created above, and deploy a GPT-5.5 model.
Configure the Solution
If you used the automated setup:
The .env file has been created automatically with all the correct values. You can skip this section.
If you used manual setup:
After cloning this repo, create a file called .env. Add lines in the format key=value with the following entries:
VITE_AI_SERVICES_RESOURCE: The name of the resource (not the full domain name).VITE_AI_SERVICES_KEY: Can be copied from the portal.VITE_AI_SERVICES_REGION: All one word, such aseastus2orswedencentral.VITE_STORAGE_ACCOUNT: The name of the resource (not the full domain name).VITE_BLOB_SAS_TOKEN: The Shared Access Signiture created above. This should be a set of keys and values, such as:sp=…&st=…&se=…&spr=…&sv=…&sr=…&sig=….VITE_GPT_DEPLOYMENT: The name you chose when creating the deployment, such asgpt-5.5.
Run the App
- In the project directory, run
npm installto install required packages. - Make sure the
.envfile created above is in this directory too. - Run
npm run devto run the project locally. - The URL, such as [http://localhost:5173], will be displayed in the terminal. Visit that URL in your browser to view the app.
Cleanup Azure Resources
If you...
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New AI audio description repo, 44 stars.