Sora System Card
Captured source
source ↗Sora System Card | OpenAI
December 9, 2024
Sora System Card
Listen to article
Share
Introduction
Overview of Sora
Sora is OpenAI’s video generation model, designed to take text, image, and video inputs and generate a new video as an output. Users can create videos up to 1080p resolution (20 seconds max) in various formats, generate new content from text, or enhance, remix, and blend their own assets. Users will be able to explore the Featured and Recent feeds which showcase community creations and offer inspiration for new ideas. Sora builds on learnings from DALL·E and GPT models, and is designed to give people expanded tools for storytelling and creative expression.
Sora is a diffusion model, which generates a video by starting off with a base video that looks like static noise and gradually transforms it by removing the noise over many steps. By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily. Similar to GPT models, Sora uses a transformer architecture, unlocking superior scaling performance.
Sora uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully.
In addition to being able to generate a video solely from text instructions, the model is able to take an existing still image and generate a video from it, animating the image’s contents with accuracy and attention to small detail. The model can also take an existing video and extend it or fill in missing frames. Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI.
Sora’s capabilities may also introduce novel risks, such as the potential for misuse of likeness or the generation of misleading or explicit video content. In order to safely deploy Sora in a product, we built on learnings from safety work for DALL·E’s deployment in ChatGPT and the API and safety mitigations for other OpenAI products such as ChatGPT. This system card outlines the resulting mitigation stack, external red teaming efforts, evaluations, and ongoing research to refine these safeguards further.
Model Data
As described in our technical report 1 from February 2024, Sora takes inspiration from large language models which acquire generalist capabilities by training on internet-scale data. The success of the LLM paradigm is enabled in part by the use of tokens that elegantly unify diverse modalities of text—code, math and various natural languages. With Sora, we considered how generative models of visual data can inherit such benefits. Whereas LLMs have text tokens, Sora has visual patches. Patches have previously been shown to be an effective representation for models of visual data. We found that patches are a highly-scalable and effective representation for training generative models on diverse types of videos and images. At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.
Sora was trained on diverse datasets, including a mix of publicly available data, proprietary data accessed through partnerships, and custom datasets developed in-house. These consist of:
- Select publicly available data, mostly collected from industry-standard machine learning datasets and web crawls.
- Proprietary data from data partnerships. We form partnerships to access non-publicly available data. For example, we partnered with Shutterstock Pond5 on building and delivering AI-generated images. We also partner to commission and create datasets fit for our needs.
- Human data: Feedback from AI trainers, red teamers, and employees.
Pretraining filtering and Data Preprocessing
In addition to mitigations implemented after the pre-training stage, pre-training filtering mitigations can provide an additional layer of defense that, along with other safety mitigations, help exclude unwanted and harmful data from our datasets. Before training, all datasets thus undergo this filtering process, removing the most explicit, violent, or otherwise sensitive content (for instance, some hate symbols), representing an extension of the methods used to filter the data on which we trained our other models, including DALL·E 2 and DALL·E 3.
Risk Identification And Deployment Preparation
We undertook a robust process to understand both potential misuse and real-world creative uses to help inform Sora’s designs and safety mitigations. After the Sora announcement in February of 2024, we worked with hundreds of visual artists, designers, and filmmakers from more than 60 countries to gain feedback on how to advance the model to be most helpful for creative professionals. We also crafted a number of evaluations internally and with external red-teamers to discover and assess risks and iteratively improve our safety and risk mitigations.
Our safety stack for Sora builds on these learnings and on existing safety mitigations we employ in other models and products such as DALL·E and ChatGPT, as well as custom-built mitigations specific to our video product. Because this is a powerful tool, we are taking an iterative approach to safety, particularly in areas where context is important or we foresee novel risks related to video. Examples of our iterative approach include age gating access to users who are 18 or older, restricting the use of likeness/face-uploads, and having more conservative moderation thresholds on prompts and uploads of minors at launch. We want to continue to learn how people use Sora and iterate to best balance safety while maximizing creative potential for our users.
External Red Teaming
OpenAI worked with external red teamers located in nine different countries to test Sora, identify weaknesses in the safety mitigations, and give feedback on risks associated with Sora’s new product capabilities. Red teamers had access to the Sora product with various iterations of safety mitigations and system maturity starting in September and continuing into December 2024, testing more than 15,000 generations. This red teaming effort builds upon work in early 2024 where a Sora model without production mitigations was tested.…
Excerpt shown — open the source for the full document.
Notability
notability 9.0/10Frontier model system card