Paint Me Like One of your French Bots
Image-heavy issue: AI Imagery for RPGs, and how to write good prompts for bots
I have struggled with images for RPGs for most of the time I’ve run them. I can sketch objects from life reasonably well, and I was able in school to do decent technical drawing, some of which could be used to draw buildings with strict perspective or the like. But I’ve never been able to draw faces well, and figures in motion are well beyond me. Add colour, the drape of cloth, transparent things, smoke, or any other such atmospheric elements, and I am lost.
However, “artificial intelligence” image generation bots have now reached a state where I can make use of them. There’s a future career in “prompt engineering”, I think; giving exactly the right words in an AI prompt to get the results in the mind’s eye, or better still, your client’s mind’s eye. I’m not there yet, but I’d like to talk about using the MidJourney bot to generate images for games.
The image above, for instance, is one of a series I’ve worked up to represent the interiors of dwellings and workspaces in Green Bastion. The simple version of the prompt is “a druid's workshop interior”. MidJourney will give you four interpretations of your prompt, and you can then select to upgrade an image - provide a higher resolution version with a little further calculation - or to iterate another four images based on one of the original four. The image above is after several iterations, selecting images I thought worked well.
But the simple prompt isn’t the full story. What I’ve given it in full is “a druid's workshop interior :: green, glimmering, sage :: Harry Wingfield --ar 16:9”. The double colons separate sections of the prompt; each section is given equal weight, and divided by the words within it. It mostly ignores “a” and “the”; I like them for readability. So the “druid’s workshop interior” counts for 33% of the image generation; the colours and light condition of “green, glimmering, sage” another 33%, and the name of artist Harry Wingfield is providing the last third. The “—ar 16:9” bit at the end tells it to use that aspect ratio; that gives me a landscape-oriented image rather than the square it’ll otherwise generate.
Specifying the colours and lighting - and I could use very different things here, like “morning light” or “backlit” or “noon” or “cinematic lighting” - gives me a bit of control over the atmosphere of the image. I can include words like “mist” or “clear” in here too.
Here, for instance, are the four images I got back from “a victorian workshop, morning light, hyper detailed, 4k”:
Specifying an artist (or more than one) changes the style of the image. Amusingly, “unreal engine” gets very sharp, clear images, although I find them to often be a little too sharp, bringing up details that aren’t right. I’m learning a lot about 20th century illustrators and fantasy artists through working on this stuff. So far, I’ve made use of the names of Robin Wood, Warwick Goble, Harry Wingfield (as above), Brom, Frank Frazetta, and various others. Here’s one in the interiors series with Warwick Goble influencing it instead (“a druid's workshop interior :: green, glimmering, sage :: Warwick Goble --ar 16:9“).
It’s not making a huge difference there, because the other elements are the same, and honestly Wingfield and Goble aren’t all that different in style. Here’s the same thing with Brom instead, so that there are slightly clearer lines, fewer stylised edges, and so on. I’m still skewing it massively with the word “glimmering”, mind. Brom’s work does not typically glimmer.
Here are a few more of the interiors series:
(“Intricate workshop interior, inexplicable magical mystical objects :: edwardian workbenches :: dark wood, brass :: sunlight, hyperrealistic, 4K, unreal engine, highly detailed --ar 16:9“)
(“a sorcerer's workshop interior :: Edwardian :: blue, cream, glimmering, emerald :: unreal engine :: 4k, hyper detailed, photo-realistic :: Vanessa Bell“)
(“an oneiromancer's workshop interior :: Edwardian :: brown, cream, glimmering, sage :: unreal engine :: 4k, hyper detailed, photo-realistic :: Robin Wood“)
Obviously, since I’m chasing down some of the same things, these images do feel similar. However, MidJourney also does some excellent portrait work:
(“portrait of a sorceress with facial tattoos looking like Adrianne Palicki :: green, emerald, gold :: photo-realistic :: Luis Royo“)
(“a portrait of a yew elemental :: green, black, red :: photo-realistic :: Luis Royo“)
And I feel it absolutely excels on pencil sketches:
(“a pencil sketch of a tarot card --ar 9:16“)
It also does very well with random snippets of atmospheric text. I read:
undead-parasite armada of online “joke” structures, tweet formulas, overdetermined fashion-subcultural signifiers, soyjaks, chads, and “opinions” about the new J. Crew
in Blackbird Spyplane, and threw it in (again specifying my preferred aspect ratio). It didn’t like the word “parasite” (it has also objected to “blood”; it runs to an approximate PG13 standard), so I replaced that with “symbiote”, and got this:
… which is, frankly, pretty awesome.
Few enough of these images have been exposed to players yet, but I’ll see if they work well. For the moment, they give me clearer - and sometimes unexpected - ideas about what a place looks like, or how it operates. Or, you know, just what the vibes are like.