The Face That AI Built

My explorations in image generation

Image prompt: A cubist painting of robot hell

To begin, we must establish that I am a moron. 

My code writing skills are so subpar that it’s an insult to the word skill to apply it to my abilities. GitHub is frightening, engineers are wizards, etc., etc. You combine this idiocy with my Macbook Air from 2018 (that has water damage and randomly restarts twice a day), and I shouldn’t even be able to type. Despite that, I was able to use open-source software and 5 pictures of myself to make a self-portrait using AI. It took maybe 15 minutes.

I then decided to go to space. 

Then I wanted to see what I would look like in Cyberpunk Jersey Shore. 

Then I chose to become a neckbeard.

Then I sat down for a portrait session with Monet. 

These creations are remarkable in and of themselves—if you add in the fact that all of the code for this was released in the last 6 weeks it becomes overwhelming. By combining Stable Diffusion 1.4 and a model called Dream Booth from Google, anyone can do what I did. If you are looking to follow in my footsteps, just follow the instructions in this video. The end result is that you can make a portrait of anyone doing anything. All it takes is 5 photos of their face and some free time. It doesn’t even cost money.

AI is exciting because it has the potential to remake the power structures of software. By 2020, we had pretty much established who made money in the software industry versus who got their margins squeezed to zero. Crypto promised to totally upend the power dynamics of the internet (and has thus far utterly failed in that promise)—but I think AI can do it for real. 

I will warn you, my thesis on this topic is still developing. I remain unsure of how it will all shake out. However, my dumb little portraits are useful because they are a visual reminder of what is possible. Today will be a quick post where I walk through the components of how this works and the ethical implications.

AI Power Dynamics

There are 4 necessary components for my pictures.  

  1. Backend Compute: The computing power that the models run on
  2. Foundational Model: This is the trained AI model that has some sort of broad applicability
  3. Fine Tuning: In some cases, foundational models can be tuned to specific use cases. For example, I may use a foundational language model but then use a fine-tuning of it specifically around creating marketing copy. 
  4. Access Point: The end user will have access to the tuned model through some sort of endpoint. In some cases, it will be directly integrated into existing software and in other cases, it will be a stand-alone application. 

For my portraits this stack worked out like this:

  1. Backend Compute: These image-generation AIs require a specialized chip known as a GPU. Depending on what model you end up using, there are specific requirements for which GPUs you can run it on. It is much easier to run this locally, but because I have a very bad computer I had to run mine on the cloud. In this case, I used an NVIDIA T4
  2. Foundational Model: The foundational model used was Stable Diffusion which is an open-source AI model. 
  3. Fine Tuning: Stable Diffusion doesn’t allow for uploading your own images as training data so I used Dreambooth.
  4. Access Point: While the makers of Stable Diffusion have their own UI (confusingly called Dream Studio) it doesn’t currently allow me to use Dreambooth. I’m reduced to something much more basic from Google called Colab. This is a research environment that allows for access to GPUs for free.

Again, I am a moron. These instructions sound scary but I promise it is stupid easy. 

Even so, it’s fascinating to consider what happens when this technology is not so intimidating. I predict that everywhere there is an “upload image” button on the internet, there will soon be a “generate image” button. When GIFs first hit the internet, they spread like wildfire. Every conversation embedded those little animated images, and it permeated the cultural zeitgeist. These AI image generators will do the same. However, it will extend far beyond consumer communication tools: generating images will occur in website design, in product design, in photoshop, and in all sorts of B2B applications. With the current state of this technology, these use cases could occur right now. What about in 2-3 years, after the technology is 5-10x better? Images will totally change in their cultural value.  

This technology is exciting! However, I find myself more than a little troubled by the implications. 

Ick

As I was doing my research for this piece on AI internet forums, I found multiple guides explaining how to turn off the “Not Suitable For Work” filters on Stable Diffusion. Soon after, I found discussions on how to generate photorealistic porn with just a text prompt. After apologizing to my wife for the search history I was about to create, I found even grosser forums where users were talking about how to generate porn with people’s faces without their consent. The users were freely swapping results of using AI to put celebrities’ faces on porn. Within 6 weeks of Stable Diffusion’s release, it is already being used to exploit women. 

To put it simply, if there are 5 photos of your face on the internet, someone can now generate porn with you in it with ~30 minutes of work. Unless they upload it to the internet somewhere, you’ll never know about it. The image generation algorithms are all open-source, this code can be run locally, and no one can stop a bad actor. To be fair, doing a similar thing has been possible since the 90s with Photoshop. And since 2017, there has been a popular discourse around deepfakes and the role that AI could play in non-consensual porn generation. 

However, this new generation of tooling makes it possible for me, a moron, to do so. Imagine if someone had a working computer and was actually half-way decent at any of this? By dramatically decreasing the oversight and barriers to entry for image generation, a whole host of ugly problems have been wrought.

Other ethical concerns are abundant: this will decrease the total number of graphic designers needed in the global economy. Creative destruction is a net positive thing for society, but that doesn’t mean we can rejoice about the many people who will be hurt along the way. 

The lazy answer for these conundrums would be “ban all open-source image generation AI!” I expect that we will see takes like that in popular media/politics in the upcoming weeks or months. However, that doesn’t solve the problem either! It just consolidates the results into the hands of the powerful (big tech companies and government agencies).

There are no easy answers, but there is easy image generation. Technology doesn’t care about the answer; it cares about progress. So I expect all of this stuff to continue to progress quickly while the rest of the world scrambles to catch up.

In the next two weeks, I’ll be publishing a fairly extensive market map of AI companies and the power dynamics at play. Make sure you subscribe so you can get access to it.


The Every x Muse Bundle ends tonight at midnight! If you become a paid Every subscriber you can claim access to 1 year of Muse as part of your subscription. Muse is a visual notetaking tool for deep work on iPad and Mac—and it’s usually $39.99 / year. But if you’re an Every subscriber it’s free!

Like this?
Become a subscriber.

Subscribe →

Or, learn more.

Read this next:

Napkin Math

Revenue: It’s Simple, Until It Isn’t

Finally, a clear explanation of bookings, billings and revenue

141 Feb 5, 2021 by Evan Armstrong

Napkin Math

The Addiction Economy

Addicted, Overwhelmed, Oversubscribed: How Technology Hooked the World

136 🔒 May 26, 2022 by Evan Armstrong

Napkin Math

Product-Led Growth’s Failure

How a Scrappy Utah Software Company Ignored Every Silicon Valley Heuristic and Won Anyway

193 Jun 3, 2021 by Evan Armstrong

The Sunday Digest

Can GPT-3 Explain My Past (and More)

Everything we published this week.

9 Jan 22, 2023

Chain of Thought

Permission to Be Excited About AI

Skepticism, curiosity, and the Current Thing

76 Jan 27, 2023 by Dan Shipper

Comments

You need to login before you can comment.
Don't have an account? Sign up!

If you’re not curious you’re not doing your job.

Get one essay a day from the most interesting thinkers in tech

Subscribe

Already a subscriber? Login