The beautiful, hilarious surrealism of early text-to-video AIs

April 02, 2023

Will Smith eating spaghetti in a video generated by Modelscope, with a faint logo from Shutterstock revealing its source

Shutterstock

View 1 Image

1/1

Will Smith eating spaghetti in a video generated by Modelscope, with a faint logo from Shutterstock revealing its source

Shutterstock

A new creative AI system called ModelScope is now pumping out short videos in response to text prompts. The early results are wonderfully bizarre and thoroughly memeworthy – but it's immediately clear how immensely powerful these tools will become.

Developed by a collaborative team at Huggingface, Modelscope is a "multi-stage text-to-video diffusion model," which takes plain English text prompts, attempts to understand what you're hoping to see, then generates and de-noises a short video for you. You can play with it online through a very simple interface. It's very early days for this sort of thing, making it the perfect time to marvel both at its incredible capabilities and at its bizarre misunderstanding of the world.

I just made my own Star Wars clip using AI (text-to-video)! pic.twitter.com/Yj8HGUl5Lf
— Victor M (@victormustar) March 19, 2023

Macron is cleaning up Paris#ai #aiart #modelscope #text2video pic.twitter.com/0xX8kR23Ls
— AI Insight (@ai_insight1) March 30, 2023

pic.twitter.com/ok8515VoiI
— Vhagar 🐲 (@edho__ala) April 2, 2023

modelscope text2video generation

"Barry Chuckle absolutely shredding in front of an erupting volcano, badass, trending on coolstation"
by reddit user u/BILL_HOBBES

reddit thread: https://t.co/dhTt0eNAPq
demo: https://t.co/DaX8fDSnlN
model: https://t.co/JYEcm4bEVV pic.twitter.com/flGgTVXlnh
— AK (@_akhaliq) March 25, 2023

ModelScopeのtext2vidでアニメ風の出せるか試してる。
最初NAIからのプロンプトでやってみたけど上手くいかなかったので、wdの時のプロンプト引っ張り出してきた。
wd1.3ぐらいの出力できてそう。 pic.twitter.com/fq0HdvzmMC
— POPPIN (@POPPIN30521640) March 27, 2023

By far the most popular use of this tech at the moment seems to be making celebrities eat things, and It's easy to see why.

AI-generated video of Will Smith attempting to eat spaghetti without making a mess astounds with comedic horror

Open source "text2video" ModelScope AI made the viral sensation possible, but it seems like poor Will Smith couldn't catch a break - or a noodle. 😂🍝 pic.twitter.com/fDbUS6FlQx
— neonpulse (@neonpulsedaily) April 1, 2023

Dwayne Johnson eating rocks#ai #aiart #modelscope #text2video pic.twitter.com/m36YG4QHsD
— AI Insight (@ai_insight1) March 31, 2023

Arnold fights with pizza pic.twitter.com/vIe3ewgx4Z
— Ananth (@itsananth_) March 29, 2023

As always, this generative AI has been trained on a large dataset of existing human-created video, raising some interesting legal questions when it comes to IP owned by large copyright holders.

"The fundamental problem with generative AI and deep fakes in all of these new AI systems is that the training data that is being used is not owned by the deep fakers," says Hyperreal founder and CEO Remington Scott. "And the copyright holders aren't getting paid. It's a fundamental problem that is going to become really big in IP. Soon, people will be training AIs on all the Avatar movies, then building whole new stories using AI. That's not gonna fly. We saw how bad Napster was for the music industry; this is Napster 2.0 for the whole IP industry."

"We're in the Wild West right now, but watch how it's gonna play out," he continues. "One studio is going to take somebody to court and say 'open up the training data, let's see what you trained that on.' And if they didn't use that studio's material, every other studio will be watching to say 'ah, but you used mine.'"

Fascinating stuff. Watch how quickly this technology evolves, if image and text generation are any indication, things are about to go asymptotic.

Source: Huggingface