The Times Australia
The Times World News

.
Times Media

.

What is Sora? A new generative AI tool could transform video production and amplify disinformation risks

  • Written by Vahid Pooryousef, PhD candidate in Human Computer Interaction, Monash University
What is Sora? A new generative AI tool could transform video production and amplify disinformation risks

Late last week, OpenAI announced a new generative AI system named Sora[1], which produces short videos from text prompts. While Sora is not yet available to the public, the high quality of the sample outputs published so far has provoked both excited[2] and concerned[3] reactions.

The sample videos[4] published by OpenAI, which the company says were created directly by Sora without modification, show outputs from prompts like “photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee” and “historical footage of California during the gold rush”.

At first glance, it is often hard to tell they are generated by AI, due to the high quality of the videos, textures, dynamics of scenes, camera movements, and a good level of consistency.

OpenAI chief executive Sam Altman also posted some videos to X (formerly Twitter) generated in response to user-suggested prompts, to demonstrate Sora’s capabilities.

How does Sora work?

Sora combines features of text and image generating tools in what is called a “diffusion transformer model[5]”.

Transformers are a type of neural network first introduced by Google in 2017[6]. They are best known for their use in large language models such as ChatGPT and Google Gemini.

Diffusion models, on the other hand, are the foundation of many AI image generators. They work by starting with random noise and iterating towards a “clean” image that fits an input prompt.

A series of images showing a picture of a castle emerging from static.
Diffusion models (in this case Stable Diffusion) generate images from noise over many iterations. Stable Diffusion / Benlisquare / Wikimedia, CC BY-SA[7][8]

A video can be made from a sequence of such images. However, in a video, coherence and consistency between frames are essential.

Sora uses the transformer architecture to handle how frames relate to one another. While transformers were initially designed to find patterns in tokens representing text, Sora instead uses tokens representing small patches of space and time[9].

Leading the pack

Sora is not the first text-to-video model. Earlier models include Emu[10] by Meta, Gen-2[11] by Runway, Stable Video Diffusion[12] by Stability AI, and recently Lumiere[13] by Google.

Lumiere, released just a few weeks ago, claimed[14] to produce better video than its predecessors. But Sora appears to be more powerful than Lumiere in at least some respects.

Sora can generate videos with a resolution of up to 1920 × 1080 pixels, and in a variety of aspect ratios, while Lumiere is limited to 512 × 512 pixels. Lumiere’s videos are around 5 seconds long, while Sora makes videos up to 60 seconds.

Lumiere cannot make videos composed of multiple shots, while Sora can. Sora, like other models, is also reportedly capable of video-editing tasks such as creating videos from images or other videos, combining elements from different videos, and extending videos in time.

Both models generate broadly realistic videos, but may suffer from hallucinations. Lumiere’s videos may be more easily recognised as AI-generated. Sora’s videos look more dynamic, having more interactions between elements.

However, in many of the example videos inconsistencies become apparent on close inspection.

Promising applications

Video content is currently produced either by filming the real world or by using special effects, both of which can be costly and time consuming. If Sora becomes available at a reasonable price, people may start using it as a prototyping software to visualise ideas at a much lower cost.

Based on what we know of Sora’s capabilities it could even be used to create short videos for some applications in entertainment, advertising and education.

OpenAI’s technical paper[15] about Sora is titled “Video generation models as world simulators”. The paper argues that bigger versions of video generators like Sora may be “capable simulators of the physical and digital world, and the objects, animals and people that live within them”.

If this is correct, future versions may have scientific applications for physical, chemical, and even societal experiments. For example, one might be able to test the impact of tsunamis of different sizes on different kinds of infrastructure – and on the physical and mental health of the people nearby.

Achieving this level of simulation is highly challenging, and some experts say a system like Sora is fundamentally incapable[16] of doing it.

A complete simulator would need to calculate physical and chemical reactions at the most detailed levels of the universe. However, simulating a rough approximation of the world and making realistic videos to human eyes might be within reach in the coming years.

Risks and ethical concerns

The main concerns around tools like Sora revolve around their societal and ethical impact. In a world already plagued by disinformation[17], tools like Sora may make things worse.

It’s easy to see how the ability to generate realistic video of any scene you can describe could be used to spread convincing fake news or throw doubt on real footage. It may endanger public health measures, be used to influence elections, or even burden the justice system with potential fake evidence[18].

Read more: Whether of politicians, pop stars or teenage girls, sexualised deepfakes are on the rise. They hold a mirror to our sexist world[19]

Video generators may also enable direct threats to targeted individuals, via deepfakes – particularly pornographic ones[20]. These may have terrible repercussions on the lives of the affected individuals and their families.

Beyond these concerns, there are also questions of copyright and intellectual property. Generative AI tools require vast amounts of data for training, and OpenAI has not revealed where Sora’s training data came from.

Large language models and image generators have also been criticised for this reason. In the United States, a group of famous authors have sued OpenAI[21] over a potential misuse of their materials. The case argues that large language models and the companies who use them are stealing the authors’ work to create new content.

Read more: Two authors are suing OpenAI for training ChatGPT with their books. Could they win?[22]

It is not the first time in recent memory that technology has run ahead of the law. For instance, the question of the obligations of social media platforms in moderating content has created heated debate in the past couple of years – much of it revolving around Section 230 of the US Code[23].

While these concerns are real, based on past experience we would not expect them to stop the development of video-generating technology. OpenAI says[24] it is “taking several important safety steps” before making Sora available to the public, including working with experts in “misinformation, hateful content, and bias” and “building tools to help detect misleading content”.

References

  1. ^ generative AI system named Sora (openai.com)
  2. ^ excited (www.aljazeera.com)
  3. ^ concerned (www.newscientist.com)
  4. ^ sample videos (openai.com)
  5. ^ diffusion transformer model (openai.com)
  6. ^ introduced by Google in 2017 (dl.acm.org)
  7. ^ Stable Diffusion / Benlisquare / Wikimedia (en.wikipedia.org)
  8. ^ CC BY-SA (creativecommons.org)
  9. ^ small patches of space and time (openai.com)
  10. ^ Emu (ai.meta.com)
  11. ^ Gen-2 (research.runwayml.com)
  12. ^ Stable Video Diffusion (stability.ai)
  13. ^ Lumiere (lumiere-video.github.io)
  14. ^ claimed (arxiv.org)
  15. ^ technical paper (openai.com)
  16. ^ fundamentally incapable (twitter.com)
  17. ^ plagued by disinformation (www.who.int)
  18. ^ potential fake evidence (www.jdsupra.com)
  19. ^ Whether of politicians, pop stars or teenage girls, sexualised deepfakes are on the rise. They hold a mirror to our sexist world (theconversation.com)
  20. ^ pornographic ones (en.wikipedia.org)
  21. ^ group of famous authors have sued OpenAI (abcnews.go.com)
  22. ^ Two authors are suing OpenAI for training ChatGPT with their books. Could they win? (theconversation.com)
  23. ^ Section 230 of the US Code (www.vox.com)
  24. ^ says (openai.com)

Read more https://theconversation.com/what-is-sora-a-new-generative-ai-tool-could-transform-video-production-and-amplify-disinformation-risks-223850

The Times Features

Group Adventures Made Easy: How to Coordinate Shuttle Services from DCA to IAD

Traveling as a large group can be both exciting and challenging, especially when navigating busy airports like DCA (Ronald Reagan Washington National Airport) and IAD (Washington...

From Anxiety to Assurance: Proven Strategies to Support Your Child's Emotional Health

Navigating the intricate landscape of childhood emotions can be a daunting task for any parent, especially when faced with common fears and anxieties. However, transforming anxie...

The Rise of Meal Replacement Shakes in Australia: Why The Lady Shake Is Leading the Pack

Source Meal replacement shakes are having a moment in Australia, and it’s not hard to see why. They’re quick, convenient, and packed with nutrition, making them the perfect solu...

HCF’s Healthy Hearts Roadshow Wraps Up 2024 with a Final Regional Sprint

Next week marks the final leg of the HCF Healthy Hearts Roadshow for 2024, bringing free heart health checks to some of NSW’s most vibrant regional communities. As Australia’s ...

The Budget-Friendly Traveler: How Off-Airport Car Hire Can Save You Money

When planning a trip, transportation is one of the most crucial considerations. For many, the go-to option is renting a car at the airport for convenience. But what if we told ...

Air is an overlooked source of nutrients – evidence shows we can inhale some vitamins

You know that feeling you get when you take a breath of fresh air in nature? There may be more to it than a simple lack of pollution. When we think of nutrients, we think of t...

Times Magazine

How to Choose the Right Collar for Your Cat

It's easy to buy any old collar for your cat to wear, but how do you find one that provides you with peace of mind knowing your cat is comfortable and secure? Here's a handy guide to choosing a cat collar that caters for your cat's specific needs...

The Complete Guide to Best Poland Proxy and How They are Disrupting the Internet

What is a Poland Proxy Server and How Does it Actually Work? A proxy server is a computer system that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some ...

TWS Andes Earbuds with Active Noise Cancelling

TWS Andes Earbuds with ANC Boasting the most up-to-the-minute Dual Mic Active Noise Cancelling (ANC), the EFM TWS Andes Earbuds offer complete peace as well as peace of mind. The TWS Andes are sweat and dust-resistant IP54 rated and equi...

What is the difference between a Plumber and a Master Plumber in Victoria, Australia?

In the realm of plumbing services in Victoria, Australia, there exists a significant difference between a certified plumber and a master plumber. The distinction goes beyond a title; it delves into expertise, qualifications, and the level of skills...

Designing for Accessibility: How Toilet Signs Can Promote Inclusivity

Toilet signs are a crucial aspect of any public facility or establishment. They play an important role in guiding individuals to the appropriate restroom while ensuring that everyone feels safe and comfortable while using the facilities. Toilet sig...

How to Spot a Good Psychologist

If you are trying to look for a psychologist in Bayswater or wherever you live, then you might often hesitate when you do because you can’t be sure how good they are. Not many of us are so experienced at choosing psychologists, therapists and oth...