Process

Automating Inline Image Creation with GPT4 and Stable Diffusion

Using multiple types of generative AI together can be complicated. This is especially true if you want images inline with text. The text generation and image generation needs to be synced, and what is described in the text might not make for great images. Wading through all of that text to figure out what the prompt should be is time consuming.

Here I’ll go through a method of making GPT give us the image it things should be generated and in the place it thinks the image should go, using Stable Diffusion via the runpod.io api.

Setup

You’ll want to use the script I provided with Unity integrations to make this easier. You can also use whatever other format and language, but I’m most familiar with Unity, so we will use that as the example space.

Prompt Design

We then need to design our prompt for GPT4, so that it knows how to format the image prompts. For that we are using the term ‘alt text.’ This term refers to the text that describes the content of an image online. We know that GPT4 is trained using a lot of scraped data from the internet, so alt text descriptions of images is something it is familiar with.

We will need that alt text to be parsable by our scripts in a repeatable manner. To get this we need to designate the way we should get our alt text. I’m using square brackets around our alt text, [ and ]. This will let us know what is and is not an image description.

We can take that and add in our other bits to give an example prompt

Give a long and detailed character description of an NPC in a fantasy world.  Where appropriate give image descriptions of the npc in the format [alt text]. Give descriptions and images for her appearance, her shop, and her magical pet.

Prepare the Image Description for Stable Diffusion

There are three main things we want to do for the image descriptions: Parsing, cleaning, and styling.

Parsing involves getting just each image string out of the text. We can do this by splitting the string returned by GPT4 into parts via the square brackets we are expecting.

string[] responseSections = response.Split(new char[] { '[', ']' }));

This will give us alternating display text and image descriptions. Based on the way the images will be inserted we know that the descriptions will be the odd numbered strings.

For cleaning we want to remove the references to ‘alt text’ in the image descriptions, as that will likely add garbled text to the image. We can do that with a simple string replace.

Then finally we want to add a style. This should start with a comma or period to separate it from the image description. You can use whatever style you like, but it is best to make one that is easy to make consistent.

The final code for that looks like this:

    string imageStyle = ", gorgeous, epic lighting, high quality, 4k, digital painting, trending, artstation, concept art.";
    for (int i = 0; i < responseSections.Length; i++)
    {
        if (i % 2 == 1)
        {
            string imageDescription = responseSections[i].ToLower().Replace("alt text: ").Replace("alt text");
            AIManager.Instance.GetImage(imageDescription + imageStyle, ImageCallback);
        }
    }

In the callbacks for the text and images I’m just saving them as files in this example.

Examples

Here are some example outputs from this process, with the full alt text from the GPT4 response show as the image caption.

NPC Shopkeeper Example

Prompt:

Give a long and detailed character description of an NPC in a fantasy world.  Where appropriate give image descriptions of the npc in the format [alt text]. Give descriptions and images for her appearance, her shop, and her magical pet.

Response:

Name: Lady Ophira Ravenshadow

Appearance:
Lady Ophira Ravenshadow is a tall, slender woman in her late forties with an air of mystery and elegance that surrounds her. Her dark raven-black hair cascades down to the middle of her back, with a few strands of silver starting to weave their way through it. Her piercing ice-blue eyes seem to see right through you, as though she knows your deepest secrets. She has high cheekbones, a sharp jawline, and an aristocratic nose that give her face a regal appearance. Her full lips are usually drawn into a knowing smile that hints at hidden knowledge.

[alt text: A tall, elegant woman with long black hair and piercing blue eyes. She wears dark clothing with intricate patterns.]

Lady Ophira’s attire is always elegant and understated. She favors long, flowing gowns in deep shades of purple, black or midnight blue that complement her pale complexion. The gowns are often adorned with intricate embroidery depicting various mystical symbols or constellations. Around her neck, she wears a silver pendant in the shape of a raven with outstretched wings clutching an amethyst crystal.

[alt text: A close-up of Lady Ophira’s silver raven pendant with an amethyst crystal.]

Her Shop – The Enchanted Reliquary:
Lady Ophira owns and operates a small shop called “The Enchanted Reliquary,” tucked away in the winding streets of an ancient city. The shop itself is housed in an old stone building covered in ivy and moss, giving it an air of enchantment even from the outside. The arched wooden door creaks softly when opened, revealing a dimly lit room filled with wonders.

[alt text: An old stone building covered in ivy and moss with an arched wooden door.]

Inside the shop, every available surface is covered with a vast array of magical items, from enchanted trinkets to powerful artifacts. The air smells of old parchment, dried herbs, and a hint of something otherworldly. Glass cases hold gleaming potions and delicate crystal orbs, while shelves are lined with leather-bound grimoires and ancient scrolls.

Lady Ophira can often be found standing behind the shop’s wooden counter, her sharp eyes watching her patrons carefully as they peruse her wares. She is always ready to offer assistance or information about any item in her shop, sharing her vast knowledge of magical lore with those who seek it.

AI Generated Stable Diffusion Image of a library
[alt text: A dimly lit room filled with magical items, including potions, crystal orbs, and leather-bound grimoires.]

Her Magical Pet – Noctis:
Lady Ophira has a magical pet named Noctis, a large raven with iridescent black feathers that shimmer with an array of colors in the right light. Noctis is no ordinary raven; he is an intelligent creature gifted with the ability to speak and understand human languages. He often perches on Lady Ophira’s shoulder or on one of the high shelves in the shop, keeping watch over customers or engaging them in witty banter.

AI Generated Stable Diffusion Image of a raven
[alt text: A large raven with iridescent black feathers perched on Lady Ophira’s shoulder.]

Noctis also serves as Lady Ophira’s familiar, a magical companion that aids her in her spellcasting and other arcane endeavors. The bond between Lady Ophira and Noctis is strong and goes beyond mere companionship- they share an unspoken understanding that transcends spoken language.

Together, Lady Ophira Ravenshadow and Noctis create an atmosphere of wonder and enchantment in their little corner of the world, sharing their love for magic and mystery with all who enter their domain.

Quick Story Example

Prompt:

Response:

Once upon a time, in a magical land called Etheria, there lived a young and adorable mage named Lily. She was small in stature, with big round glasses and a heart full of dreams. Her long, wavy purple hair always seemed to be dancing in the wind, giving her an ethereal appearance.

AI Generated Stable Diffusion Image of a purple haired mage
[alt text: A cute mage with long purple hair wearing big round glasses smiles as she holds her staff]

Lily was a student at the prestigious Arcanum Academy, where the most talented magic users from all over the kingdom came to learn and hone their skills. However, despite her natural talent for magic and unwavering determination, Lily was often underestimated by her peers due to her small size and cute appearance.

Nevertheless, she continued to persevere in her studies, determined to prove herself and become an extraordinary mage.

One day, as Lily sat reading a dusty old tome in the academy’s library, she stumbled upon an ancient prophecy. The prophecy foretold of a great darkness that would descend upon Etheria, consuming all light and leaving destruction in its wake. As she read these chilling words, Lily couldn’t help but feel a strange sense of urgency within her.

Soon enough, the prophecy began to come true. Dark clouds gathered over Etheria’s skies, casting eerie shadows on the once bright land. An evil sorcerer named Mordak had arrived seeking to claim Etheria as his own kingdom by summoning forth an army of monstrous creatures.

AI Generated Stable Diffusion Image of an evil wizard
[alt text: An evil sorcerer with dark robes and sinister eyes casts dark magic]

The mages of Arcanum Academy were called upon to protect the kingdom from this terrible threat. However, despite their best efforts, they found themselves outmatched by Mordak’s powerful dark magic and his relentless army of monsters.

As her fellow mages struggled against this seemingly unstoppable force, Lily knew that it was up to her to save the day. She remembered a particular spell in the ancient tome that she believed could turn the tide of the battle. It was an immensely powerful spell that required great courage and skill to perform, but Lily was determined to try.

With her heart pounding in her chest, she stood on the front lines of the battle, holding her staff high above her head as she began to chant the incantation. As she did so, a brilliant light began to radiate from her staff, growing brighter and brighter with each word.

[alt text: The cute mage standing confidently on the battlefield, casting a powerful light spell]

The monsters that had once seemed so terrifying now cowered in fear as they were engulfed by Lily’s radiant magic. One by one, they vanished into thin air, banished back to the dark realm from which they came.

As Lily continued to cast her spell, she felt a surge of energy like never before. The darkness that had consumed Etheria began to recede, chased away by her incredible magic.

Finally, it was Mordak himself who stood before Lily. His eyes burned with fury as he realized he had underestimated the young mage, just as her peers had done. But it was too late for him; Lily’s spell had weakened him greatly and he could no longer hold onto his power over Etheria.

With one final burst of magical energy from Lily’s staff, Mordak was defeated and banished from Etheria for good.

AI Generated Stable Diffusion Image of two characters
[alt text: The evil sorcerer Mordak being defeated by Lily’s powerful magic]

In the aftermath of the battle, Lily emerged from the battlefield victorious. Her bravery and magical prowess had saved Etheria from certain doom, and she had proven herself as a true hero.

Her fellow mages at Arcanum Academy could no longer underestimate her; they saw her for what she truly was – an extraordinary mage with an incredible heart.

From that day on, Lily was celebrated as a hero throughout Etheria. No longer a small and underestimated mage, she had become a symbol of hope and strength for all who witnessed her great power.

AI Generated Stable Diffusion Image of a character casting a spell
[alt text: The cute mage standing triumphantly with her staff, now respected and admired by all]

Conclusion

This inline image generation with GPT4 can be a great boost to any GPT4 based storytelling or world building. I’m sure chatbots could also use this for sharing cute cat images.

It is also worth noting that these images are the original attempts with no editing, so any extra effort will of course make them better and more fitting for the stories.

Next Steps

There are things that need refining. Similar to my other recent posts about consistent characters, this needs better regulation of the characters being shown in the story. The mage and villain are not well represented, and I think we should be able to replace any names mentioned with a character description.

I want to combine the work from the other work stream here, so we can have stories generating long form with consistent characters.

Other Guides on Stable Diffusion

How To Set Up ControlNet Models in Stable Diffusion
How to Train a Custom Embedding in Stable Diffusion Tutorial
Stable Diffusion Denoising Strength Explained
How to integrate Stable Diffusion and GPT into Unity