What model does Midjourney use?

Midjourney hasn't made any official statements regarding the exact methods, models, and datasets it has used. However, it is safe to assume that it is some form of a combination of CLIP and diffusion models, which are commonly used by its primary competitors as well.

Does Midjourney steal art?

The answer to this question largely depends on the definition of theft. Most specialized lawyers will admit that the AI industry is in uncharted territory and the existing legal frameworks weren't made to cover such cases. It is true that many (if not all) commercial AI models have been trained using large quantities of imagery, often without asking for consent. However, the question remains whether this is fundamentally different from an artist who uses other artists' work to improve their skills.

Why is Midjourney so good?

Very little is known about the exact methods that Midjourney uses. However, Midjourney founder David Holz has admitted to scraping millions of images from the internet. It is safe to assume that many of the sources were websites that host high-end digital artwork. This would also explain why Midjourney is particularly effective at digital art and is very popular among gaming and fantasy fans.

What We Know About the Midjourney Model

Are you curious to know more about the exact Midjourney model?

Ever since Midjourney first caught the world’s attention, people have been wondering about what type of AI model it might be using.

Questions have also been raised regarding what sort of dataset was used to train the Midjourney model and whether any copyright may have been infringed upon.

In this guide, we’ll share everything there currently is to know about what methods and datasets it uses.

Let’s dive right in.

Table of Contents Show

What’s Known About the Midjourney Model?

Infographic summarizing what is known about the Midjourney model so far.

I personally prefer to start off with the bad news before moving on to the more optimistic part.

So let us get straight to the point and say that very little is known about the exact methods and datasets that were used to create the Midjourney model.

We can make a lot of assumptions based on what we’ve seen so far but we will never know with absolute certainty.

Nevertheless, let’s go over some of the things that we believe to know about the Midjourney model.

The AI Model

We won’t bore you with an elaborate overview of all of the different AI models in existence.

While most of us are capable of understanding how these models work on a superficial level, the nitty-gritty details are primarily reserved for data scientists.

This is high-end nerd stuff.

Unfortunately, Midjourney has not shared any information regarding the specific methods or combinations of different AI models they use.

However, considering that Midjourney offers a number of experimental algorithms that its users can choose from, it’s safe to assume that they don’t necessarily limit themselves to one specific approach.

In fact, it is very likely that Midjourney is drawing from and experimenting with a wide range of different methods.

A recent amendment to the Midjourney terms of service (from Aug 28, 2022) even implies that some of its more experimental algorithms may be subject to license limitations and restrictions under the Creative ML OpenRAIL-M license.

So what exactly does this mean for the Midjourney model?

Well, this refers to the very same permissive license (it’s not 100% open-source) under which Stable Diffusion was released.

And considering the timing, it does suggest that Midjourney is actively experimenting with the very same model that Stable Diffusion uses as well.

Our guess is that Midjourney has been using a combination of CLIP for representation learning of images and diffusion models for generative modeling.

If you don’t know what that means, that’s alright.

At its core, the Midjourney model is basically doing the same or similar things that DALL-E 2 and Stable Diffusion do as well.

Key Points (tl;dr)

Very little is known about the exact methods of datasets that were used to train the Midjourney model.
However, it’s safe to assume that a combination of CLIP and diffusion models was used, similar to what DALL-E and Stable Diffusion do.
Founder David Holz has also admitted to scraping millions of images off the internet without explicit consent.

The Dataset

So far there doesn’t seem to be anything particularly special about what Midjourney does.

But, as always, the devil is in the details.

In a recent interview in December 2022, Midjourney founder David Holz was quoted saying that the Midjourney model’s dataset was built from:

“[..] a big scrape of the internet. […] We use the open data sets that are published and train across those. And I’d say that’s something that 100% of people do.”
David Holz, Founder of Midjourney in Forbes

To provide some extra context, when asked whether Midjourney seeks consent from living artists or work still under copyright, he responded:

“No. There isn’t really a way to get a hundred million images and know where they’re coming from. It would be cool if images had metadata embedded in them about the copyright owner or something. But that’s not a thing; there’s not a registry. There’s no way to find a picture on the Internet, and then automatically trace it to an owner and then have any way of doing anything to authenticate it.”
David Holz, Founder of Midjourney in Forbes

What he’s basically saying is that they scrape millions of images on the internet and don’t ask for people’s consent.

It’s no surprise that artists are outraged by this, irrespective of whether their own work was actually impacted.

Most of them have a limited understanding of what AI models actually do and some simply assume that they are work stolen directly from a specific artist.

This is not the case, the images were simply used to train the model.

But is that really fundamentally different from what existing artists do?

Sure, they can’t ingest millions of images but they also get inspiration from others, copy their styles, and whatnot.

Anyway, we do not believe that Midjourney just took some random set of images scraped off the internet.

Let’s face it, there’s a reason why their images look so damn good and have a very distinct style.

In all likelihood, they scraped images from a very specific set of websites that are filled to the brim with high-end digital artwork.

Add to that the fact that their open community approach further stimulates the improvement of the Midjourney model by using quality signals of their users.

So while the original dataset may have been scraped, at some point the model continues to learn from its own work and the feedback provided by its users.

How Do AI Art Generators Even Work?

Example of the Midjourney model in action.

This isn’t really the place to go into too much detail, but we want to at least provide you with a quick executive summary.

It starts off with your prompt (a text description), which is mapped onto a “virtual canvas” and compared with millions of captions and alt text from images.

In other words, it tries to understand what you are trying to achieve and looks for images that fit that description.

It’s not actually looking up any specific images per se, it’s just using encoded information in the model (in our case the Midjourney model).

Imagine throwing dozens of ingredients into a blender and mixing it all together into a pile of goo.

You can’t tell one ingredient from the other.

Then it goes through a so-called “diffusion” process.

It adds noise and then reduces it again, and it repeats this process over and over.

The AI generator then tries to reverse engineer from the goo and reassembles everything into something recognizable by using your text description.

And finally, it tries to assess whether its new creation is consistent with what you’ve described in your prompt.

This is why some images look weird and you basically never get the same image twice.

And the more often you do this, the better the Midjourney model becomes.

Frequently Asked Questions (FAQ)

Before we close off this guide, let’s quickly address some of the most common questions related to the Midjourney model.

What model does Midjourney use?

Midjourney hasn’t made any official statements regarding the exact methods, models, and datasets it has used. However, it is safe to assume that it is some form of a combination of CLIP and diffusion models, which are commonly used by its primary competitors as well.
Does Midjourney steal art?

The answer to this question largely depends on the definition of theft. Most specialized lawyers will admit that the AI industry is in uncharted territory and the existing legal frameworks weren’t made to cover such cases. It is true that many (if not all) commercial AI models have been trained using large quantities of imagery, often without asking for consent. However, the question remains whether this is fundamentally different from an artist who uses other artists’ work to improve their skills.
Why is Midjourney so good?

Very little is known about the exact methods that Midjourney uses. However, Midjourney founder David Holz has admitted to scraping millions of images from the internet. It is safe to assume that many of the sources were websites that host high-end digital artwork. This would also explain why Midjourney is particularly effective at digital art and is very popular among gaming and fantasy fans.

Conclusion

The current debate reminds us a lot of the Google Books controversy in the early 2000s when “fair use” was used in defense of the experiment.

We could imagine that Midjourney is doing something very similar here, moving fast and potentially apologizing later with direct reference to “fair use” clauses that cover educational use.

Either way, we’re glad that Midjourney exists and hope it has a bright future ahead of it.

Here at Tokenized, we want to help you learn as much as possible about the AI software industry. We help you navigate the world of tech and the digitalization of our society at large, including the tokenization of assets and services.