Neural network capable of generating (semi)realistic Minecraft screenshots.
MCGAN is a progressive growing generative adversarial network initially based on the StyleGAN paper from NVIDIA.
It first started as a fun on-the-side experiment to dip my toes into the world of machine learning, but quickly became one of my most passionate projects.
Tweaking the algorithm to optimize image quality while also balancing training time is extremely tricky, but also a very fun challenge.
In order to generate images of a given subject, the network must first be trained on a largemassive set of example images.
The raw training data alone (including compression) takes up well over 120GB.
This was one of the driving reasons behind choosing Minecraft images as a target.
Minecraft's essentially endless randomly generated nature makes it a perfect candidate for machine learning.
Write up a screenshot script and you can relatively easily build up a dataset of hundreds of thousands of images.
With the dataset problem being solved, the biggest limiting factor in terms of final image quality and realism becomes training time.
I'm currently training MCGAN on a single GTX 1080.
For reference, Nvidia's training time for 1024x1024 resolution on their GAN is about a week, and that's using their recommended configuration of eight $10,000 Tesla V100 GPUs.
For one GPU the training time goes up to over 40 days.
This makes training tricky as it's quite a commitment, and if there's a problem discovered later on it can waste a lot of time.
Even though this iteration of MCGAN could probably be trained further, I'm very happy with the performance it's given me so far.
I was especially impressed by how quickly it learned more abstract features like distance fog and water depth transparency on shore lines.
Some generated images contain strange blob-like artifacts, even late into training.
They seem to appear randomly, move around a little for a few thousand iterations, then eventually fade away.
I'm still not completely certain what they are, but they seem to point to a weakness somewhere in my network architecture.
They could be a quirk of how the network "fills in" new features as it learns, but a lot more time may be required to see if they ever do go away.
One thing that surprised me was how quickly the network noticed and took interest in the animals that ended up in some of the screenshots.
There weren't that many of them, so I didn't expect it to pick up on them in any meaningful way, but it really seemed to like generating them, especially cows and pigs.
In fact, it went through a whole phase around 640,000 images into training where it thought cows should be in a rather large chunk of all images.
(Although it did eventually get past this.)
Blob artifacts appearing early in training. (128px, 3.5 LOD)
Pigs (reference, and attempted), one even seems to have a primitive face.
TRY IT OUT
I have embedded the network on this website, allowing for new images to be generated on the fly.
My most recent iteration of this network is still being trained, so these won't be the highest quality currently possible, but they give a good idea of what it's capable of.
Also, thanks to feature disentanglement, even finer control over the output image should be possible in the future. (choosing biomes, zoom, tree coverage etc.)