The next leap forward for AI animation

AI is everywhere these days.

Despite concerns about accuracy, the potential for disinformation, or the ethical and legal issues surrounding the source of the material used to train AI models, companies seem to be tripping over themselves in their haste to add AI features to their products.

In a sense, none of this is any great surprise. In a world which demands increasing quantities of “content”, at ever greater speeds, AI offers the potential to increase output significantly.

Whilst numerous issues still remain, generative AI tools have already reached a point where, in many cases, the results are as good as those produced by humans.

But what does this have to do with animation?

Around a year ago, I wrote an article entitled “Has animation changed forever?” about the way in which generative AI was starting to be used to create animation.

At the time, this involved training an AI image generation tool (Stable Diffusion) to essentially rotoscope live action footage. This process was still hugely labour intensive but it led to a result which was far superior to any AI generated animation or video of the time.

I concluded the article by stating that the project offered “a fascinating glimpse at how these tools may be used more widely in the future.”

But, that was almost a year ago and, in the world of AI, the pace of development is pretty staggering.

The next leap forward for AI animation

Last week, OpenAI, the company behind ChatGPT and the AI model which powers Microsoft’s Copilot, announced Sora a text-to-video AI model designed to “create realistic and imaginative scenes from text instructions”.

Sora is not yet open to the public, beyond a limited test group of visual artists, designers, and filmmakers, but they’ve released a series of examples which clearly show it’s potential.

‍

Sora is not the first generative text-to-video tool but the results that they’re showcasing are clearly a considerable leap forward.

Sora generates videos of up to a minute in length and is able to produce results which, at a first glance, can easily be mistaken for live action footage.

I say “at a first glance” because the results are still not perfect. Every video has issues if you look for them but, the results are already good enough to convince most people if they’re not aware they’re watching something generated by AI.

Whilst the realistic results are impressive, Sora also generates imaginative scenes.

The showcase includes multiple instances of videos which look as though they were created with 3D animation.

Again, all of these videos have their issues but, when you consider that these are the first examples generated by a tool which is still in development, the results really are staggering.

The key limitation

Whilst there may still be obvious visual glitches in the current results, I can see these becoming fewer and further between as the technology develops.

For me however, there is a very key limitation to this technology as it stands. Creative control.

Generative AI tools may be able to create unexpected, surprising, and impressive results from a simple text prompt, but their results are really a mash up of the many creative sources that they’ve been trained upon.

It’s arguable that, in most cases, humans do exactly the same thing. We combine our various influences and references to create something new. The difference is, we combine those sources into a specific creative vision and then work within our chosen medium to get as close to that vision as possible within the limits of our ability.

When working with, text based, generative AI, it’s more like asking an artist in a different country to create an animation for us based upon a basic text description when they have an entirely different creative vision to our own. They may create something which looks impressive but ultimately has only a loose connection to our original vision.

With any video which is created by a text prompt alone, there is going to be a high degree of interpretation required. Of course, it will be possible to try variations of a prompt until you get closer to your vision but the end result is simply not editable in the way that 3D animation is.

It remains to be seen how easy it might be, for example, to adjust one specific part of a generated video whilst keeping all other elements the same.

Why is creative control important?

Well, for the majority of users, it simply wont be.

The average user creating videos for social media is likely to be more than happy with whatever is generated on the first pass.

Companies who currently purchase stock footage for use in advertising or corporate videos are likely to be ecstatic that they can now get far more specific footage for their needs without hiring a team to create it for them.

It is the higher end which would need the ability to adjust the results if they are likely to be of use.

Sora is already able to take a source image and use it to generate a video so, in theory, concept art could be used to define the characters in a video alongside a text description but, on a TV series or film production, shots will go through multiple revisions until they are fine tuned to the Director’s liking.

This iterative process might involve adjusting the models, the animation, the textures, or the lighting in very specific ways.

The example videos which are meant to look animated simply don’t move that well at the moment. Whilst the results might be good enough for many people, they simply wouldn’t pass as high-end animation.

Without the ability to fine tune the animation and have granular control over all of the other elements which make up a shot, it’s unlikely to be used in higher budget tv or films, or any project which favours creative control, in the near future.

What’s the likely impact?

I think it’s safe to say that when Sora is released to a wider audience we’ll be seeing AI generated videos everywhere.

Beyond the fact that AI videos will start clogging up your social media feeds, there are likely to be some more significant impacts that arise from this technology.

Any company that currently pays for stock footage is likely to find Sora extremely attractive and I can see this causing a major shift in that industry.

By its nature, stock footage is generic and is typically used where there isn’t a budget to create something more specific. If more specific results can be generated with AI, that will look convincing enough for the average audience, and the price is right, stock footage will be facing some stiff competition.

Likewise, there is bound to be an impact on lower budget animation. Creating animation is time consuming and, therefore, expensive. Any company who simply wants a dancing kangaroo to promote their product is highly likely to consider using AI in preference to hiring a small studio to create something for them.

On the positive side, I can also see Sora being of use in lower budget, independent, films. Many independent filmmakers are likely to be willing to forgo a certain amount of creative control in order to add some high-end looking visual effects shots or crowd scenes to their films. Used in a limited way, these could actually be used to enhance their stories in ways which were not previously possible.

Of course, there are also serious questions about how this technology might be misused.

This is something which OpenAI are well aware of and safety is one of the areas they’re working on in their testing. Obviously, it remains to be seen how effective any measures they implement might be against those with malicious intent.

What next?

Tools such as Sora are bound to have a disruptive influence when they become more widely available.

Whilst I don’t see Sora, or tools of its nature, replacing high-end animation in the near future, what I can see clear potential for is the increasing use of AI tools to handle certain parts of a 3D animation pipeline.

This view is supported by a recent study, commissioned in part by The Animation Guild, called Future Unscripted: The Impact of Generative AI on Entertainment Industry Jobs.

One of the conclusions of the report is that “About 21.4% of Film, Television, and Animation jobs (or approximately 118,500 jobs) are likely to have enough tasks affected to be either consolidated, replaced, or eliminated by GenAI in the U.S. by 2026.”

OpenAI state that they are “sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.”

Clearly, what’s “on the horizon” is already impressive and, regardless of any personal feelings about AI generated imagery, it would be foolish to ignore the technology or it’s potential impact.

Do I believe that all animators are doomed to be replaced by AI? Not in the slightest! But it’s always wise to be aware of what’s “on the horizon” so it’s easy to chart a path forward in a changing world.

‍

The next leap forward for AI animation

But what does this have to do with animation?

The next leap forward for AI animation

The key limitation

Why is creative control important?

What’s the likely impact?

What next?

Other articles you might enjoy:

Not a genre

Which software should you learn for 3D animation?

Art vs Content