After Microsoft’s Copilot AI succeeded in generating audio from text prompts, Google introduced VideoPoet, a large language model (LLM) that pushes the boundaries of video creation with 10-second clips that produce less artifacts. The model supports a range of video creation tasks, including text-to-video conversion, image-to-video conversion, video shaping, drawing, and video-to-audio functions.
Creates 10-second videos from text prompts and can also animate still images
Unlike its predecessors, VideoPoet stands out because it produces sequential videos with a lot of movement. The model demonstrates its prowess by creating ten-second videos, leaving its competitors, including the Gen-2, behind. Notably, VideoPoet does not rely on specific data to create a video; This distinguishes it from other models that require detailed input for best results.
This versatile capability is made possible through the use of a multi-modal, large-scale model, which puts it on a trajectory that will potentially become mainstream in video production.
Google’s VideoPOET departs from the prevailing trend in video creation models that rely heavily on propagation-based approaches. Instead, VideoPoet leverages the power of large language models (LLM). The model seamlessly integrates different video creation tasks into a single LLM, eliminating the need for separately trained components for each function.
The resulting videos display different lengths, different actions, and styles depending on the input text content. Additionally, VideoPoet can perform animation conversion of input images according to the provided prompts, demonstrating its adaptability to different inputs. The launch of VideoPOET adds a new dimension to AI-powered video production and hints at the possibilities that will emerge in 2024.