Lessons Learned From Using AI – Once Upon an Algorithm

I’m going to admit this right now: using AI art generators and text writers is a lot harder than I anticipated. Part of that is because I’m still learning AI prompt parameters and settings. It also doesn’t help that I’m no expert in image editing software like Photoshop and GIMP. However, the biggest challenges for the AI generated art for Little Frog, Big Dragon were of my own making: I selected both a style and subject that are really not well suited for publicly available Stable Diffusion models.

Compositing AI Generated Art Into One Image

For Little Frog, Big Dragon, I set a goal of creating just one really good image with minimal artifacts and no uncanny valley moments like the women in Clara and the City. For that story, I used only txt2image in Stable Diffusion, but moving forward I want to use img2img to have more control over the scene.

My first attempt was overly ambitious in retrospect, but not entirely a failure. My first mistake was the art style. I’ve always been a fan of the art of Vicki Wong and Michael Murphy of Meomi, the illustrators behind the successful Octonauts series of books and TV show. While I did not use them in my prompts, I wanted to see if I could use AI art generators to replicate a somewhat similar style of art to their work, for reasons I explained in another post.

First, I drew an incredibly basic and terrible scene in Photoshop and fed it into Stable Diffusion:

Starting image for an ai art generator — Original image prompt

Image produced by an AI art generator — Subsequent AI generated art

Here is the prompt:

a smiling (((green frog))) sitting on a (wooden stump), view of a (pond) with (lily pads), background of reeds and trees. cute, happy, beautiful, pastel color, (low perspective), sharp focus, global lighting, digital vector clip art

Not half bad, but this is where things started going really awry. Instead of making separate elements and compositing them, I initially tried to create the scene all in one go using the crop feature in img2img to add in trees and a fish. The trees came out ok, but Stable Diffusion really struggled to get the fish right.

Iterating on Fishes using an AI Art Generator

To my knowledge, at time of posting, all AI art generators are pretty bad at generating images that have two separate subjects. Instead of creating an image of a frog and a fish, it wanted to make an image with two frogs. Also, I did not realize this at the time, but the model of Stable Diffusion I used is terrible at rendering fish. In hindsight, I should have checked Lexica first. One quick search would have instantly told me that I was setting myself up for failure. Here’s one of the more amusing “fish” that Stable Diffusion made for me:

Scene created by an AI art generator — A “fish” peeking out of the water, according to Stable Diffusion

Extracting just the lower right corner to feed into img2img eventually got me to a usable, recognizable fish but it was a long process of manually editing each iteration to get Stable Diffusion to understand basic fish anatomy.

Deformed fish created by an AI art generator — Getting closer

Final finish used in the story — Much better

Generating the background

I also ended up generating the background separate from the frog. This was its own challenge because I needed the background to match the low perspective of the frog and fish and their clip art style. Getting the low perspective to match the frog and fish turned out to be a lot of dice rolling and hoping I’d get a couple usable results. Eliminating the trees definitely helped.

I also quickly discovered that generating landscapes in a flat vector style is really not Stable Diffusion’s forte. It is way better at doing landscapes and scenery via digital painting, watercolor and pencil sketch. Eventually I got something that I could use:

Pond and lily pads background scene — AI generated art of a pond with lily pads and reeds

I took this picture that leans more towards digital painting and used image adjustments and filters in Photoshop to turn it into something more akin of low-detail vector clip art that matches the frog. I also increased the saturation and contrast and modified the curves to get the background to have the same vibrant color palette as the frog. In the future, what I should do is have the AI art generator make the background first and then use crop to add in the foreground subjects.

Compositing Everything Together

Here is where my choice of using a vector clip art style became especially annoying. Not only did I have to convert the background image into something vector-like, but the flat nature of this style with large swaths of solid color meant that I had to thoroughly go through and eliminate all image artifacts left over from the heal and clone tools I used to make the composition look right and fix issues with the frog’s anatomy.

Eventually, after going through many a tutorial to learn about Photoshop adjustment layers, filters and editing tools, I got to my final result:

Composite image for little frog big dragon — Final composition for Little Frog, Big Dragon intro scene

In the end I got an image I am happy, I am unsure if using AI art generators for this piece was really worth the time invested.

AI Text Generation

Using OpenAI’s Playground has consistently been the easiest element of creating original stories. Even with significant prompting, editing and some writing on my part, producing the text for Little Frog, Big Dragon only took about 40 minutes.

My strategy for prompting the AI was different this time than for Clara and the City. Instead of having Playground come up with the entire story and plot elements by asking it to write several paragraphs all in one go, I mostly planned out the story to follow the typical six-part story structure. Then I asked Playground to write just a few sentences at a time to describe specific characters, scenes, conversations.

I got the idea to composite AI-written story elements together from composited AI generated art. Also, when I first tried to get Playground to write a whole plot, it gave me something completely nonsensical:

Prompt: Write a story about a frog saving a village from a giant dragon.

Playground: There was once a small village that was constantly being terrorized by a giant dragon. The villagers had tried everything to get rid of the dragon, but nothing worked. They were about to give up hope when a frog appeared and offered to help them.

The frog told the villagers to build a large fire in the center of the village. The dragon would come, as it always did, and see the fire. But this time, the frog would be waiting for it. As the dragon flew down to the village, the frog jumped out and swallowed it whole. The village was saved and the villagers were very grateful to the frog.

That’s pretty funny and absurd in its own way, but not really a strong or interesting story. Playground did much better when asked to answer very specific questions, although it still gave me some pretty hilarious responses:

Prompt: Why did a dragon destroy a pond?

Playground: There could be many reasons why a dragon might destroy a pond. Maybe the dragon is hungry and is looking for food. Maybe the dragon is angry and is taking its frustration out on the pond. Or, maybe the dragon just doesn’t like ponds!

Wise words there, Playground. Wise words.

This section was actually halfway decent, except it wasn’t really the direction I wanted to go in, so I excluded it:

Prompt: Describe a scene in which a frog finds a knight fighting a dragon.

Playground: The frog sees a knight fighting a dragon and hops over to watch. The knight is sweating and breathing heavily, and the dragon is hissing and snapping at him. The frog croaks and the knight looks over at it. The knight is about to say something when the dragon breathes fire at him and he falls to the ground, dead.

I asked Playground to rewrite this part to involve the frog and to have the knight escape. The story flows more logically by letting the knight live so that the dragon is left hungry and therefore motivated to go for a tiny snack like the frog.

Lessons Learned

In the future, I think I will make AI art generation and story significantly easier by doing the following:

Selecting subjects that are well represented in the Stable Diffusion model to reduce iterations. No more fishes until future notice.
Using an art style that is conducive to hiding my finger prints when compositing different elements together into a single image
Generating the background of the image first and then adding in foreground elements one at a time using crop or inpainting.

Using Stable Diffusion to make composited images for Sarah in the Secret Garden.

Why AI makes sense for writing illustrated stories for kids

How to use GPT-3 to write original short stories