This page gives some brief technical information about Robot Burns Infinite.

The base model is GPT-3, and I'm using the babbage model, rather than davinci, basically because it's much cheaper to run. I use the same original dataset of 6000 lines of Burns poetry that I used for the 2020 Robot Burns book, which used GPT-2.

The thing I've found to be most important is how the dataset is prepared, and the model has gone through several iterations. There are still some improvements I hope to make, for example the number of stanzas it generates, as well as how well it takes your themes as inspiration.

For the "themes" feature, I leveraged text-davinci-003 to analyse each poem in the dataset for its key themes. AI-bootstrapping!

For example, for the below original Burns stanza, it said the 5 main themes were "Mountains, snow, valleys, forests, floods".

Farewell to the mountains, high-coverd with snow, Farewell to the straths and green vallies below; Farewell to the forests and wild-hanging woods, Farewell to the torrents and loud-pouring floods. My hearts in the Highlands, etc. The Sun had closd the winter day, The curless quat their roarin play, And hungerd maukin taen her way, To kail-yards green, While faithless snaws ilk step betray Whare she has been.

I have explored a few different dataset structures:

I'm currently looking at doing multiple stanzas, but my dataset that I collected back in 2019 doesn't delineate between poems, it's just a big text file with lots of stanzas. Stanzas of the same poem are grouped together, but it's not represented where one poem ends and another begins

