Process

Stable Diffusion Animation for Characters with ControlNet

Since we’ve seen how to make a consistent character, we can now try animating them. I said this was an area I wanted to delve into in my follow up to the consistent character post. I had a lot of fun animating stable diffusion characters, as you can see below in these stable diffusion animations.

Results

Stable Diffusion Animation of an Egyptian stone carving.
Stable Diffusion Animation of a robot pirate
Stable Diffusion Animation of a cat statue
Stable Diffusion Animation of a cat girl
Stable Diffusion Animation of a panda
Stable Diffusion Animation of the woman in the red dress

The Flow

I made a script that would work with ControlNet. This script does several things.

First it takes a video file. Then with the hit of a button (and a minute), it splits the file into frames, runs each frame through openpose, and adds that pose to the center of an image with poses on either side.

For this example I chose to use a short loop of the Gangnam Style Dance

Video frame used in animating stable diffusion characters
ControlNet Openpose for the stable diffusion animation
Five across starting controlnet openposes for animating stable diffusion

Making a Base

Next we need the base image done in the same way as before, with charturner.

The five across cat statues

Now that we have the starting image, we can take the poses and run each set through. I tried out a few settings, and got the best results from img2img at 0.8 denoising. I also kept the same seed to help with consistency.

I ran the poses only once, and got a starting five across set for each animation and then run it. The total run time of an animation at 512 high was about 10 minutes, at 28 frames. For 1024 across it was up to 45 minutes for the animation.

gif of the animation
animation applied to the cat statue

Learnings

Some things I tried didn’t work so well, and there are clear limitations around this style of animation.

First I tried using txt2img, but it had a lot of flicker, even when using the same seed.

Here is the woman in the red dress with different seeds per image.

first pass of the animation script, very rough poses and lots of character flicker

Front facing animations seemed to be more stable. There is still a lot of flicker, and you can tell the noise patterns of the image are sticking from one frame to the next. This is most apparent below with the way the flower moves and the hat doesn’t want to move away from where it started. The chain also remains in the same place at the start instead of moving with the body.

small dance moves controlnet openpose animation
stable diffusion animation of a character using the controlnet and openpose flow

Moving over to img2img I got more consistency. Keeping the seed had drawbacks and changing the seed had drawbacks. When keeping the seed there was a decent amount of flicker still. When changing the seed elements got enhanced as the frames continued, especially in the background. I tried modifying the denoising to make it more smooth, but that made the poses not shift.

Not quite a Stable Diffusion Animation

Denoising at 0.7

failed attempt at animating a character in stable diffusion

Denoising at 0.5

failed attempt at a stable diffusion character animationa

Since there is no temporal knowledge from the model, I think this method of animation is inherently limited. It was fun to put together. If you want to use the script reach out, it isn’t in a good enough state to post here and because I don’t plan to continue down this path I don’t want to put too much effort into refining it.

How To Set Up ControlNet Models in Stable Diffusion
How to Train a Custom Embedding in Stable Diffusion Tutorial