Discord Bot Makes Impressive AI Videos From Chat Requests
Text-to-video conversion is AI’s next big challenge. A few weeks ago we saw how awesome (and a little creepy) an AI-generated Pepperoni Hugspot commercial is. The person who created the video called Pizza Rater Runway Gen-2 (opens in new tab) Create a video with that project. Use a text-to-video conversion engine to display simple prompts such as “happy men/women/family eating pizza at restaurant, TV commercial” and get photo-realistic content I was able to.
I just visited the public beta of Runway Gen-2 and am really impressed with the realistic nature of its output. The videos are short at only 4 seconds each, but the image quality is impressive and everything works by sending short requests to Runway ML’s bot on his Discord server.
Send a few words of text to the @Gen-2 bot to make short, photorealistic (or cartoon-style) clips of anything from a family enjoying a sushi dinner to a robot with a serious drinking problem I was able to. The output was often not exactly what I was looking for, but it was always interesting and better than the NeuralInternet Text-to-Video Playground I wrote last week.
Anyone can join the server, but the list of Gen-2 chat rooms will only appear once they have access to the beta program (many of which are on the waiting list). There are several rooms where you can chat and share projects with other users and there are 3 of him named Generate One, Generate Two and Generate Three where you can send prompts directly to the @Gen-2 bot . Moderators are encouraged to keep prompting in the same thread to avoid cluttering each chat room.
Prompting Runway Gen-2
The Runway Gen-2 prompt is something like “a drunken humanoid robot looking at @Gen-2 camera and spitting tiny screws out of its mouth.” The bot will immediately respond with a prompt and some parameters it’s using (e.g. “upscale”). This can be changed by issuing a new prompt (more on that below). After a few minutes, based on the prompt he will be shown a 4 second video.
This is what my drunk robot looks like. All videos are playable from within his Discord and can be downloaded as MP4 files. All the video samples shown in this article have been individually converted to animated GIFs so they can be viewed more easily (without pre-roll ads).
You’ll notice the clip above is not exactly what I was after. The robot is not spitting out screws as intended. Instead, it just looks menacingly at a pint of beer. Other attempts at this prompt were also not what I was hoping for. If I omit the word “drunk,” the robot opens its mouth and doesn’t spit out anything.
Use of photos in Runway Gen-2 Prompt
You can also feed images to your bot by copying the image and pasting it into Discord with a text prompt, or by entering the image’s URL into the prompt. However, Runway Gen-2 does not actually use the images you upload. Just take inspiration from the images as you create your own video. I’ve uploaded images of myself many times, and now I’ve seen videos of people who look a little like me but definitely aren’t me.
For example, when I uploaded a photo of myself and didn’t provide any further information, it showed a middle-aged bald man with sunglasses that wasn’t me, standing next to a river and some buildings. . His mouth moved, the water moved.
The Runway Gen-2 bot is great at copying the sentiment and topic of the images you provide. I showed him an image of myself with disgust on my face and asked, “This guy looks at the camera and says, ‘Oh, oh,’ and opens his mouth.”
Many users of our Discord server generate static images with another AI tool such as Midjourney or Stable Diffusion, and use the images as Hugface’s CLIP Interrogator 2.1a tool that looks at an image and prompts you with what it thinks is a reference to that image.
I experimented with the process and commissioned Stable Diffusion to create an image of a boy playing with a toy robot on a sidewalk in the 1980s. Then he took that image into the CLIP Interrogator and got a very obvious sample his prompt, such as “boy standing next to a robot.” Still, entering the same image into the prompt didn’t quite get what I wanted, in front of the street he found a boy with two robots standing there, but even on the same street It wasn’t even a boy.
move or not
The time limit itself usually means you don’t have much time to move around in each clip. But on top of that, I found that many clips had very little movement. Often it was just someone’s head shaking, liquid running, or smoke rising from the fire.
A good way to get more movement is to put in a timelapse or some kind of panning prompt. I’ve requested shots of him lapsing time on Iceland’s volcanoes and panning him on the New York subway, and the results are pretty good. When asked for a pan view of the Taipei skyline, the clouds were moving but no pan. This city was definitely not Taipei.
Asking to run, chase, or ride may or may not get the job done. It’s been rolling down. However, when I asked if the Intel and AMD boxers were fighting each other, he got a photo of the two boxers who were completely motionless (also without the Intel and AMD logos).
Pros and Cons of Runway Gen-2
Like other AI image generators, Runway Gen-2 does a great job of recreating very specific branded characters, products, or places. When I asked for Mario and Luigi boxing, two characters that looked like imitations of Nintendo characters came out. I asked for Godzilla videos many times and got some giant lizards that even the most casual fan wouldn’t confuse with King of the Monsters.
A bit better with minecraft references. I asked for a Creeper and an Enderman eating pizza and a Creeper eating at McDonald’s and it was a decent looking Creeper but an inaccurate Enderman. When I asked for a family of pizza-eating creepers, I found a family of humanoids that looked like they came from Minecraft. Anyone who has played Minecraft knows that creepers are green monsters with black spots.
This tool has a terrible logo. I put the Tom’s Hardware logo on it and asked for it to be used in a commercial and got this weird thing back.
I asked for an AMD Ryzen CPU to burn and somehow I got what looks like a PCU. Logos must be verified by yourself (below).
What Runway Gen-2 does really well is providing a general image of people and families eating and such. You may or may not let them eat exactly what you want. If you asked a family to eat live worms, you got a family that seemed to be eating salad. The family eating sushi in a 1970s pizza restaurant looked especially real.
I have to point out that when I asked people without specifying their ethnicity, I almost always got a white person. is only when I ask my family to eat sushi. This is a well-known problem when training data in many generative AI models.
special parameters
There are a few parameters you can add to the end of Runway Gen-2’s prompts to slightly modify the output. I didn’t mess around with these much.
- –luxury Achieve higher resolution
- –interpolate smooth the video
- –cfg [number] Take control of your AI creativity. The higher the value, the closer to what you want.
- – green screen Outputs video with green screen area available for editing
- – seed A number that helps determine the outcome. The default is a random number each time, but using the same number again should give similar results.
tie it all together
If you search the Internet for examples of Runway Gen-2 videos, you may find that there are many videos with audio that are over 4 seconds long. These videos are created by stitching together various 4-second clips in a video editor and adding sounds and music obtained elsewhere.
One of the most famous of these runway Gen-2 videos is the aforementioned pepperoni Hugspot pizza commercial. But on the Runway ML Discord I see a lot of people posting his YouTube links to their work.one of my favorite “Spaghetti Terror” Posted on Twitter by Andy McNamara. And Pizza Laiter’s new lawyer commercial sucks.
Conclusion
Runway Gen-2 is in private beta as I write this, but the company said it plans to make it available to everyone soon, as it has already done with its Gen-1 product. I’m here. As a technology demo it’s really impressive and you can see someone using short clips instead of stock videos and stock animated GIFs.
Even if the time is extended to 60 seconds, it is unlikely that this tool will replace professional (or amateur) video anytime soon. That’s very unfortunate, but it’s also a limitation we’ve seen in all image generation AI to date. However, the technology is just around the corner, and this could become even more impressive as the training data scales up.