Meta challenges OpenAI’s Sora with Movie Gen video model: What is it?

Meta seems to be leaving no stone unturned in its bid to dominate the AI landscape. On Friday, the Mark Zuckerberg-led tech giant announced that it had created a new AI model named Movie Gen which can generate realistic video and audio clips in response to prompts. The company claims that it can rival some of the most capable AI video tools from the likes of OpenAI and ElevenLabs.

This development comes months after OpenAI introduced its video model Sora to the world. Even though Sora is not yet out in public, demos shared by OpenAI created a frenzy on the internet especially owing to its hyperrealistic footage and motion that seemed straight out of Hollywood.

What is Movie Gen?

Meta’s Movie Gen generates videos with the help of text inputs and it can even edit existing footage or still images. Reportedly, the audio added to the video is also generated by AI and this is in sync with the visual. Meta’s AI model also allows users to generate videos in different aspect ratios.

“Our latest research breakthroughs demonstrate how you can use simple text inputs to produce custom videos and sounds, edit existing videos, or transform your personal image into a unique video,” Meta wrote on its official website.

According to the research paper shared by Meta, Movie Gen is an advanced AI video creation tool that has been designed to create high-quality 1080p HD videos from prompts in natural language along with synchronised audio. Along with producing videos in various aspect ratios, it is capable of doing specific edits to videos and even creating personalised content using images provided by a user.

How does Movie Gen work?

Movie Gen is powered by large AI models also known as foundation models to create media. However, the main components of the model are Movie Gen Video and Movie Gen Audio. Movie Gen Audio is a 30bn parametre model that can produce videos based on text prompts. This model combines text-to-image and text-to-video capabilities to come up with realistic videos that last up to 16 seconds at 16 frames per second. On the other hand, Movie Gen Audio is a 13 bn parametre model that generates audio to match the video or text prompt. This can create realistic sound, ambient noise, or even music that fits the scene being described in a prompt.

According to the research paper, Movie Gen uses methods like temporal autoencoding to compress video information to make it easy for the model to process longer and higher-quality videos. Much similar to language models, Movie Gen also uses Transformer Architecture, but for visual and audio data.

How is it different from OpenAI Sora?

Movie Gen stands out with its superior video resolution, synchronised audio generation, ability to personalise videos based on images provided by users, and advanced video editing capabilities. Based on the paper, Movie Gen is more a dynamic and high-quality tool for AI video generation.

When it comes to video quality and resolution, Movie Gen creates 1080p HD videos and Sora although produces similar output but its overall quality is not as high as the latter. Generating synchronised audio to match visual content is Movie Gen’s distinction as so far Sora solely focuses on video creation and lacks audio generation capabilities. We do not know if Sora is capable of offering personalisation based on user-provided images or if it will come with built-in editing capabilities.

On the technical side, Movie Gen uses a 30bn parametre model for video generation and a 13 bn parametre model for audio. It needs to be noted that the larger the model size and extensive training data, would make this model come with complex scenes.

Sora was demoed in February this year. The AI video generation model from OpenAI can create HD videos spanning a minute using prompts in natural language. Sora can be described as a diffusion model that is capable of creating videos and even extending existing videos. Much like movie Gen, Sora uses a transformer architecture that allows for superior scaling in performance. At the time of its launch, OpenAI revealed that the model has been built upon past research conducted for DALL-E and GPT Models.

When it comes to availability, the paper does not offer any specific details of Movie Gen’s release. For now, the model is in the research and testing phase.