Av: Niklas Furberg and Victor Ahlberg

2024-02-20

Midjourney vs DALL-E

At Dynabyte, we have a group we call Competence Lead, which currently consists of Niklas Furberg and Victor Ahlberg. Last fall, we held an internal presentation on the theme of AI-generated images where we compared two different services in this area: Midjourney and DALL-E. In this post, we want to share a summary of that presentation.

First, we send the same prompt to Midjourney and DALL-E. The prompt is: “Darth Vader playing basketball”. Here are the results from both services:

Midjourney:

DALL-E:

We can see right away that Midjourney gives us a picture of Darth Vader, while DALL-E does not. Why don’t we get the result we expect from DALL-E? The answer lies in the fact that DALL-E revises the prompt before generating the image. Here’s what the revision looks like:

Original prompt:

Darth Vader playing basketball

Revised prompt:

A towering figure, draped in a flowing cloak and armed with a strange weapon resembling a futuristic sword, engaging in a game of basketball. The figure is clad in glossy black armor, with a chestplate of control switches and buttons. Its face is concealed by a sleek black helmet with a T-shaped visor. The lights from the stadium reflect off the armor, as he holds a basketball in his gloved hands, getting ready to take a shot.

Under the hood, it appears that DALL-E extracts basic character traits and inputs them into the image generation step.

We then try sending DALL-E’s revised prompt directly to Midjourney to get a more fair comparison:

One observation we can make here is that Midjourney’s image is more stylized and not as photorealistic as DALL-E’s image. Midjourney also demonstrates a classic problem with its ability to generate good images of human hands.

Finally, we try to get Midjourney to describe DALL-E’s image with text, which results in:

a robot dressed in black holds a basketball, in the style of epic fantasy scenes, hyper-realistic representation, imax, detailed costumes, monochromatic color schemes, princesscore, hieratic visionary.

Then we input that text into Midjourney to generate an image. This is what the image looks like:

Again, we get a stylized image from Midjourney. Interestingly, we see a common denominator in all the images, which are dark colors, cloaks, and evil eyes. We started at the specific and ended up at the generic – Darth Vader became the interpretation of his salient features.

Technically, the two services work quite differently. Midjourney is more packaged for end-users in the form of an app/plugin in Discord. Midjourney prohibits any form of deeper integration and automation. DALL-E, on the other hand, is used through an API on the OPEN-AI platform. Therefore, the two services might cater to different use cases.

#AI #DALL-E #Midjourney

Relaterade inlägg

The Good and the Bad in Real Life

GitHub Copilot Insights