OpenAI launches Sora: from text to AI video

OpenAI introduces a generative AI model that can convert text into moving videos: Sora. The model can create realistic videos up to a minute long.

OpenAI launches a brother for Dall-E: Sora. While Dall-E can convert detailed text descriptions into an image, Sora creates a video based on your description that can be up to a minute long. Sora can handle scenes with multiple characters, carefully considering the background and understanding movement. “The model understands not only what the user asked in their prompt, but also how these things exist in the physical world,” OpenAI itself said in a blog post.

The model, like other OpenAI LLMs, has extensive language skills. It can create a video with different camera angles based on a question. Different visual styles are possible.

On Sora’s website we see examples such as a woman walking through a neon-lit city, complete with reflections in puddles. Another video shows an art gallery whose space itself looks realistic and is filled with AI-generated paintings. The works of art hanging on the wall here as extras would have made headlines a good year ago as a product of Dall-E.

In principle, Sora is also able to edit existing videos. The model can therefore expand existing images or replace the background. Videographer Marques Brownlee analyzes the footage OpenAI shared in a YouTube video we came across during our research for this article. The video is worth watching.

Video games

One article shows that Sora’s abilities theoretically extend beyond just video creation. The AI even appears to be able to create simulated digital worlds. In other words, Sora can essentially create a video game. Not only does the AI think creatively like other LLMs, but it also has a data-driven component that takes into account an object’s position in the 3D world. Combine that with rudimentary physics rules and you get an algorithm that can create a world you can walk around in in real time.

The model is not perfect, OpenAI readily admits. The simulation of physical consequences remains a complex matter. For example, Sora can create a video of someone biting into a cookie, but that cookie may reappear intact in subsequent frames. There are also challenges left and right right now. OpenAI shows a detailed video of a man running on a treadmill. All details are correct and appear photorealistic, except that the man is standing backwards on the device.

On the way to all-AI

Sora uses a diffusion model. The AI starts with a video that looks like noise and gradually transforms it into the desired video. This allows Sora to create videos in one go, but the technology is also suitable for expanding existing videos. The basis of this generative AI is formed by previously developed LLM techniques, such as those developed for Dall-E 3.

OpenAI sees Sora as an important foundational model capable of understanding and simulating the real world. The company’s mission remains the development of “general AI”: AI that is not just good at one task, but at all tasks, analogous to a human. Sora is an important step toward that end goal.

Security and abuse

This may sound dangerous and OpenAI is aware of it. Sora is not yet available to the general public. Save teamUsers are currently working on the model to make its behavior acceptable. Bias, misinformation, and hate have no place in the finished product. At the same time, OpenAI is working on tools to detect misleading content. There will also be a watermark of sorts in the videos, making it theoretically easy to tell whether a video was created by Sora. As with OpenAI’s other LLMs, Sora does not respond to prompts that lead to malicious content.

OpenAI also says it is working with policymakers, educators and artists worldwide to understand their concerns and find positive applications for the new technology. There is one caveat: Sora is a trained model again, just like ChatGPT and Dall-E. Data such as artist videos have already been used to build Sora without permission. As for artists, they are faced with the fait accompli of competing with a videographer who has learned from their work for free.

Milestone in AI

Sora seems to be a huge step forward in video generation. Other tools already exist, but are significantly less extensive. For example, Google launched Lumière based on its own distribution model: STUNet. Stunet is also trained on moving images, but cannot create as extensive videos as Sora and does not have the ability to specifically take the position of objects in space into account.

The field of generative AI is evolving rapidly and OpenAI continues to lead the way. In September 2022, the company surprised the world with the first version of Dall-E, which subsequently delivered mediocre images. Less than a year and a half later, we’re seeing photorealistic videos in FHD resolution that are one minute long. Rest assured that OpenAI will not stand still after this success.

It is not yet known when this model will be made available to the public and to what extent this will be chargeable. On the one hand, OpenAI has a habit of making its LLMs available to the general public in at least a basic form, but on the other hand, we suspect that Sora is seriously greedy when it comes to inference hardware. Whether OpenAI (and Microsoft) has enough hardware for a tsunami of prompts from curious users is therefore an open question.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.