Process

GPT3’s DaVinci 3 vs 2 Models

Recently OpenAI has released a new model version in GPT3, Davinci-3. This model greatly improves on the formerly most powerful model, Davinci-2. However it also has some oddities that are worth knowing.

You can select Davinci-3 in the Model drop down menu on the right side of the Playground web interface.

A dropdown menu for selecting the davinci 3 model in GPT3 playground.
Davinci-2 vs. Davinci-3 selection in GPT3 playground

Poetry and the Concept of Rhyming

First, the concept of rhyming is much more advanced in the Davinci-3 model, with every couplet having a rhyming pair. The Davinci-2 model grabs many words that might structurally make sense to rhyme such as ‘head’ and ‘asleep.’ The model hasn’t quite grasped the groupings for rhymes yet in Davinci-2.

The below poem examples show a few things about the differences between Davinci-2 and Davinci-3. For a GPT3 poem with Stable Diffusion illustration, check out Goodnight Poem.

Example Prompt 1: Write a rhyming poem about a girl going to bed.

Davinci-2

Once upon a time there was a girl
Her name was Ann and she loved her curl.
She went to bed with teddy by her head
And dreamt of candy canes before she fell asleep.

Davinci-3

The wind is blowing outside with a gentle moan,
As sweet young girl decides it’s time to go home.
She lingers in the summer night air as she gathers her things,
Mental lists made of all tomorrow brings.

Her pretty bedroom awaits and pajamas are set out fair,
A quick brush through hair then off come their shoes for another day there.
The bedcovers turned down just so ready and neat –
As sleep invades like birds on a long summers street!

Davinci 3 Outputs are Longer

You’ll also notice the Davinci-3 poems are significantly longer. I’ve found this to be the case with any prompt I give to Davinci-3. When working with Davinci-2 it was common that I would just leave the length cap at 2000 and let GPT3 finish whatever it wanted. Most of the time it would get to 200 or 300 tokens and just stop on its own. That isn’t what Davinci-3 is doing. It will go for 600, 800, or thousands of tokens sometimes. Much like Davinci-2, the longer Davinci-3 goes on a single prompt, the less coherent it becomes. I’ve rarely had a response be coherent past 400 tokens.

Prompt Example 2: Write a rhyming poem about a boy and his dog.

Davinci-2

I have a little boy
Who loves his doggy so
He follows him around everywhere
Everywhere that he will go.

Davinci-3

A boy and his pup, they both loved to play.
One paws at a ball, the other loves it that way.
Their bond was strong as their love began to grow.
They chased each others tails around in the snow.
No matter how old he became the boys trust never faded away
He’d cuddle with Fido when times were gray

Their hearts now intertwined for eternity
Filled forever with affection’s energy
Such is life between a boy and his dog!

Different Failure Modes

Davinci-3 still fails when it goes for too long. It is interesting that the way Davinci-3 fails is different from Davinci-2. Davinci-2 would most often to into tight repeating loops. Davinci-3 on the other hand will make long run on sentences that drop the small connector words and just have larger words. If you try to read them it still somewhat makes sense as a story, but reformatting it would be way too much effort. It is much easier to let it go for 300 or 400 tokens, have it cut off early, then fix any small formatting issues and run for another 300 to 400 tokens. I’ve found Davinci-3 to fail in this run on way much more often than Davinci-2. I would estimate it to be about ten times more likely to occur in Davinci-3.

Vocabulary and Sentence Complexity

The complexity of the language is much higher in Davinci 3 vs 2. Let’s look at just the starting line of each poem:

Davinci-2

Lily and Logan were two cats

I have a little boy

Davinci-3

Two cats went exploring on a sunny day

A boy and his pup, they both loved to play.

You notice not just that the sentences are longer, but the concepts and descriptors are more complex. It can also set up a concept, as in the second poem, with the girl not appearing until the second line.

Conclusion

OpenAI’s GPT3 upgrade to Davinci-3 is a big step forward, and brings a lot more creativity and complexity to the already amazing generative text model. It isn’t perfect, but the ways to mitigate issues from the previous Davinci-2 model still work, and in some cases are more necessary.