Process

Using AI Image Generators for Children’s Books

I’ll admit that when we came up with the idea of doing a blog, using AI image generators to produce kids stories was not at the top of our list. I figured a travel blog would be more fun. That being said, I’ve been playing with AI’s to generate images for some time now, but initially considered it just an amusing distraction. However, as it exploded across the internet, we quickly realized three things:

  1. Artificial intelligence will disrupt the creative and artistic industries, which had previously been considered “safe” from automation
  2. AI can create enormous benefits to a lot of industries by increasing diversity through lower barriers of entry without necessarily compromising on quality
  3. Current AI generated content is quite narrowly focused on just a handful of artistic styles, such as digitally painted concept art or photorealistic renders

With the above in mind, I quickly discovered a niche that AI generated art that to our knowledge has been completely overlooked by others: illustrated picture books for kids.

Addressing a Real Need

I’m at an age where a lot of my friends have young children. Family gatherings are filled with little kids running under foot. For their parents, the need to entertain and teach their kids is constant. As a result, they go through a shocking amount of new media content very quickly. Video is both free and plentiful thanks to streaming services and YouTube, but books also serve an important role in learning and cognitive development. You know, that little thing called reading. The parents in my acquaintance are always on the hunt for new books and stories to read to their kids at bedtime.

Demand for children’s books increased 2% every year since 2010 and spiked 8% in 2020. However, industry revenue is shrinking. According to Statista, the children’s book industry revenue decreased 7.5% from 3.07 billion USD in 2018 to 2.84 billion USD in 2019. These two opposing trends put authors and illustrators in a real bind (pun intended): people want children’s books, but either prices are falling or parents are less willing to pay for them.

The children’s book publishing industry is dominated by just a tiny handful of huge companies, particularly Penguin Random House and Simon & Shuster. Luckily, self-publishing is becoming increasingly more popular with the help of e-books and print-on-demand services. So with the need to reduce cost and the democratization of book publishing, we have an industry ripe for disruption.

an AI generated image of a woman reading a book to a child in a library
An original image produced by Stable Diffusion, an AI image generator

The Cost of Illustrating a Kids Book

A couple of my friends have tried writing children’s books. According to them, the up front cost can be staggering. From what they’ve told me, an illustrated picture book can easily cost over 12,000USD to produce and publish. One of the biggest, if not THE biggest, contributor to that cost is the illustration.

Reedsy, an online service that matches professional illustrators with aspiring authors, estimates that the average professionally illustrated book cover costs between 500-1,500 USD. A fully illustrated book can cost between 2,000 and 10,000 USD just for the art. Another online service charges flat fees of 115USD per full-page illustration, 210USD per full spread and 235USD for a custom cover wrap. With labor and publishing costs increasing, that makes it difficult for the industry to reduce their prices to compete. This is where AI image generators come in.

Professionally generated illustrations are expensive
$210 for a full spread illustration by a professional artist is a pretty good deal, but still out of the price range of a lot of aspiring self-published picture book authors.

Breaking down the Barriers with AI image generators

Enter artificial intelligence technology. With AI image generators like Stable Diffusion to generate illustrations, the time and cost to produce kids stories and picture books can go way, way down. For example, I took about 45 minutes to produce the text and three accompanying images for Clara and the City using AI prompting. I did zero post processing on the images and a bit of editing on the text to make it sound more cohesive. The only direct cost was from running AWS for our Stable Diffusion server. It came out to just a few cents.

Admittedly, Clara and the City is a long way from the quality people would expect from a professionally produced picture book. I am by no means a good writer, and I absolutely am not a good artist. All the portraits I’ve painted half inebriated during Paint Nite are incredibly amateurish. The scribbles I make in MS Paint are no better. However, with almost no artistic training or talent, I still managed to produce this image in about 3 minutes using text2img:

AI generated image for Clara in the City
Image from Clara and the City

It’s far from perfect, but at first glance it’s not half bad for three minutes of effort. There are obviously some anatomical issues with the horse’s legs and ears and the woman’s right arm. Some massaging in Photoshop and additional img2img iterations in Stable Diffusion will quickly clean those up. With more practice over time, I hope to eventually achieve higher quality stories and images.

A Style Underrepresented in AI Generated Images

Based on my purely subjective and anecdotal observation on Reddit, Discord, Youtube and other online forums and pages, the most popular forms of AI generated art that gets posted tend to fall in one of the following categories:

  • Digital concept art of fantastical scenes and landscapes
  • Board or video game design elements or items
  • Portraits of ridiculously busty girls in skimpy clothing
  • Chimeras that combine different animals together
  • Funny meme-like images of absurd situations

The first one is probably because a very popular set of parameters for txt2img prompts are, “intricate, highly detailed, 4k, trending on artstation, art by greg rutkowski.” The rest I assume are a result of the demographics and interests of the people currently using AI image generators. Nothing wrong with that. You do you, folks. But that’s not really the kind of art that I’m personally interested in and want to blog about. It’s probably for the best too, since Greg Rutkowski hates that his work is being used by AIs.

Common subject by AI image generators
Example of a typical image generated by current Stable Diffusion users: a digital painting of a pretty girl in a sci-fi setting, in the style of Greg Rutkowski

Maybe there are other people making family-oriented content out there from AIs and I just haven’t found it. Maybe there are folks but they’re being discrete about it. I certainly understand if it’s the latter given the moral dilemmas around AI image generators. Who knows? For now, I see picture book illustrations for kids as a very under-represented medium in the world of AI generated art.

The Correct Reading Level

So far I have written a lot about AI image generators, but we can’t forget about text generators. That’s arguably the more important part, since the whole point of books is to read. At the moment, the sophistication of open source text generators is relatively low. Or at least it’s low when I prompt it, since I’m not very experienced yet. For example, here are a few sentences written by Open AI’s Playground:

“The young girl had always been fascinated by the ocean. She had seen pictures of it in books and on TV, but it always looked so different in person. She had always wanted to explore it for herself. Finally, she got her chance. She went on a vacation with her family to a beach house. As soon as they arrived, she ran down to the beach. She couldn’t wait to get in the water.”

Written using the text-davinci-002 model, 0.7 temperature, 0.5 Top P

I plugged the above into a readability calculator, which says that it has a Flesch-Kincaid Grade Level of 3.8. That means a child needs to have 4 years of US grade-level education to understand this passage. Looking further at the results and its New Dale Chall Readability score, the AI uses words that should be understood by a U.S. second grader. The highest scores I’ve gotten out of Playground indicate a maximum sophistication level of about 10 years of education. Say what you will about the quality of the US education system, but this data indicates to me that Playground is well suited for writing kids stories.

Summary

In short, there’s a lot of potential for AI image generators to be used in the production of children’s books. The current industry is shifting towards self publication and needs to change to compete against alternative forms of media, but is hampered by the high cost of illustration. Within the world of AI generated art, family and kid-oriented content is also underrepresented. Perhaps with this blog, we can help change both of these industries by adding more diversity and opportunity for those who wish to enter these spaces.

But at the same time, this AI technology is young and I am a beginner in using it, so this blog is going to be a learning process. There will be mistakes. There will be room for improvement in both the stories and the images. Hopefully in time, they will get better. Towards that end, I leave you with an outtake image that Stable Diffusion generated of the first picture of a woman reading a book to her son in a library, only it came out very badly…

Terrible picture produced by an AI image generator
This is definitely not Stable Diffusion’s finest work. What happened to her hands and legs?! And her necklace. And her face.