Av: Niklas Furberg and Victor Ahlberg
2024-02-20
Midjourney vs DALL-E
At Dynabyte, we have a group we call Competence Lead, which currently consists of Niklas Furberg and Victor Ahlberg. Last fall, we held an internal presentation on the theme of AI-generated images where we compared two different services in this area: Midjourney and DALL-E. In this post, we want to share a summary of that presentation.
First, we send the same prompt to Midjourney and DALL-E. The prompt is: “Darth Vader playing basketball”. Here are the results from both services:
Midjourney:
DALL-E:
We can see right away that Midjourney gives us a picture of Darth Vader, while DALL-E does not. Why don’t we get the result we expect from DALL-E? The answer lies in the fact that DALL-E revises the prompt before generating the image. Here’s what the revision looks like:
Original prompt:
Darth Vader playing basketball
Revised prompt:
A towering figure, draped in a flowing cloak and armed with a strange weapon resembling a futuristic sword, engaging in a game of basketball. The figure is clad in glossy black armor, with a chestplate of control switches and buttons. Its face is concealed by a sleek black helmet with a T-shaped visor. The lights from the stadium reflect off the armor, as he holds a basketball in his gloved hands, getting ready to take a shot.
Under the hood, it appears that DALL-E extracts basic character traits and inputs them into the image generation step.
We then try sending DALL-E’s revised prompt directly to Midjourney to get a more fair comparison:
One observation we can make here is that Midjourney’s image is more stylized and not as photorealistic as DALL-E’s image. Midjourney also demonstrates a classic problem with its ability to generate good images of human hands.
Finally, we try to get Midjourney to describe DALL-E’s image with text, which results in:
a robot dressed in black holds a basketball, in the style of epic fantasy scenes, hyper-realistic representation, imax, detailed costumes, monochromatic color schemes, princesscore, hieratic visionary.
Then we input that text into Midjourney to generate an image. This is what the image looks like:
Again, we get a stylized image from Midjourney. Interestingly, we see a common denominator in all the images, which are dark colors, cloaks, and evil eyes. We started at the specific and ended up at the generic – Darth Vader became the interpretation of his salient features.
Technically, the two services work quite differently. Midjourney is more packaged for end-users in the form of an app/plugin in Discord. Midjourney prohibits any form of deeper integration and automation. DALL-E, on the other hand, is used through an API on the OPEN-AI platform. Therefore, the two services might cater to different use cases.