Since we’ve seen how to make a consistent character, we can now try animating them. I said this was an area I wanted to delve into in my follow up to the consistent character post. I had a lot of fun animating stable diffusion characters, as you can see below in these stable diffusion animations.
Results
The Flow
I made a script that would work with ControlNet. This script does several things.
First it takes a video file. Then with the hit of a button (and a minute), it splits the file into frames, runs each frame through openpose, and adds that pose to the center of an image with poses on either side.
For this example I chose to use a short loop of the Gangnam Style Dance
Making a Base
Next we need the base image done in the same way as before, with charturner.
Now that we have the starting image, we can take the poses and run each set through. I tried out a few settings, and got the best results from img2img at 0.8 denoising. I also kept the same seed to help with consistency.
I ran the poses only once, and got a starting five across set for each animation and then run it. The total run time of an animation at 512 high was about 10 minutes, at 28 frames. For 1024 across it was up to 45 minutes for the animation.
Learnings
Some things I tried didn’t work so well, and there are clear limitations around this style of animation.
First I tried using txt2img, but it had a lot of flicker, even when using the same seed.
Here is the woman in the red dress with different seeds per image.
Front facing animations seemed to be more stable. There is still a lot of flicker, and you can tell the noise patterns of the image are sticking from one frame to the next. This is most apparent below with the way the flower moves and the hat doesn’t want to move away from where it started. The chain also remains in the same place at the start instead of moving with the body.
Moving over to img2img I got more consistency. Keeping the seed had drawbacks and changing the seed had drawbacks. When keeping the seed there was a decent amount of flicker still. When changing the seed elements got enhanced as the frames continued, especially in the background. I tried modifying the denoising to make it more smooth, but that made the poses not shift.
Not quite a Stable Diffusion Animation
Denoising at 0.7
Denoising at 0.5
Since there is no temporal knowledge from the model, I think this method of animation is inherently limited. It was fun to put together. If you want to use the script reach out, it isn’t in a good enough state to post here and because I don’t plan to continue down this path I don’t want to put too much effort into refining it.