
Was this newsletter forwarded to you? Sign up to get it in your inbox.
Yesterday OpenAI released native image generation in ChatGPT-4o. We’ve been playing around with it at Every for the past week as part of OpenAI’s early testing, and the consensus is that 4o’s image generation is awesome. It has unmatched quality in text generation and information visualization—though its main downfall is speed.
But first, here’s what’s new.
All about GPT-4o image generation
Image generation has a home
OpenAI’s native image generation lives directly in the ChatGPT interface as an innate part of GPT-4o, rather than as a separate image model, like DALL-E. You can generate images with all the context from your conversation history, and continue to iterate over an image generation in chat.
Language meets visuals. GPT-4o’s images draw directly from its training dataset of real-world knowledge, taking advantage of facts that the LLM knows. Ask it to illustrate the discovery of penicillin, and instead of a generic scientist with a petri dish (like what DALL-E might give you), you'll get an infographic showing inventor Alexander Fleming, key dates, and historical context—blending factual accuracy with visual storytelling in a way earlier models couldn’t.
Goodbye, gibberish. GPT-4o can now render precise text and symbols inside images. Want a sign, menu, or comic strip with legible writing? Just say the word(s).
It remembers visual details. Let’s say you're creating visuals for a children’s picture book with GPT-4o and designing Fido, a friendly dog (wearing jeans, as one does). With 4o, you can reuse that exact same character across multiple prompts. Easily tweak his expression or outfit—or even put him in new scenes—without losing his unique details. Other models struggle with this consistency, giving you only one shot to generate a character.
Jack of all trades, master of the prompt. GPT-4o handles complex instructions with surprising precision. Ask for a scene with upwards of 20 specific objects—say, a red balloon, a fluffy white goat, a cat in sunglasses, and so on—and it won’t blink. Where other models start to fumble after five to eight items, GPT-4o juggles the details with finesse, placing everything exactly where you asked.
Photo-aware editing, right in the chat. Want to immerse your family photo into the Studio Ghibli universe? Erase the background without opening another app? Blend photos together without Photoshop? GPT-4o lets you upload photos and tweak them directly with natural language. No extra tools or new tabs—just tell it what you want.
What everyone at Every thinks
The best image generator yet
“Its quality is unmatched for text, which is huge, and it can make great visualizations/ infographics. It’s probably the best available.”—Alex Duffy, Every consulting lead and writer
“The most impressive release of 2025.”—Lucas Crespo, creative lead
It excels in style and substance
“It is INSANE. It's incredibly good at text—it can even write out whole paragraphs without a problem. It follows instructions well—you can ask it to modify small parts of an image and it will reliably reproduce the image with the modifications you requested. It's good at capturing style—if you give it a reference, it'll reliably help you get the vibe.”—Dan Shipper, Every cofounder and CEO
It’s life-changing for small businesses
“This release makes it more evident than ever before that small businesses with small budgets will never have to hire a designer. This is equivalent to a lot of junior design work. Anybody with a small business knows they need to be posting on social media or a website, but not everybody has that techie nephew or niece that does menu designs, ads, marketing assets, prints for t-shirts, etc., for them.”—LC
It’s a premium tool—for now
“One hundred percent this will all trickle down into Canva and free tools eventually.”—LC
Images aren’t production-ready
“The quality is good but there is no upscaling yet, meaning you end up with an image you gotta throw into Magnific or some AI upscaler to use it for actual production.”—LC
It’s not built for speed
“Finally, native images in ChatGPT! This takes the crown for [the] best way to change images via chat, visualize information, put text in images… But man… is it slow.”—AD
“It feels like [OpenAI’s] deep research: slow, but once it gives you an output, it’s really, really useful.”—LC
OpenAI’s edge is in quality and control
“Google’s image size and quality are worse than OpenAI’s. There are a lot more artifacts [visual glitches that can show up when a model fumbles fine detail or text rendering] in Google’s generations as well.”—LC
It may kill another wave of AI startups
“RIP NapkinAI.”—AD
How it stacks up against the competition
To compare Google and OpenAI’s image generation, Alex ran three experiments.
1. Given text from a DeepSeek tweet, how does each model fare at visualizing the post? Is it capable of understanding that the content is from a tweet, with limited additional context, and can it understand what a tweet looks like?
Both models inferred from context to turn the text into a tweet, from the “@” in the prompt. However, OpenAI’s generation looks much closer to how tweets are displayed.OpenAI (left), Google (right). Source: Every illustration.2. Referencing the same post, this time with more artistic flair, Alex asked the models to create a beautiful artistic representation of the most important points from the text.
The results? OpenAI wins in the text generation category and shows stronger visual design capabilities than Google’s generation, which shows artifacts in its text and lacks the artistry you might expect from asking for a “beautiful artistic representation.”
OpenAI (left), Google (right). Source: Every illustration.3. A similarly artistic representation for a post advertising the ARC Prize, a competition to get AI to solve reasoning problems it never has before.
GPT-4o’s text generation capabilities are out in full force, compared to the gibberish in Google’s generation.
OpenAI (left), Google (right). Source: Every illustration.Still, despite OpenAI’s artistic and text generation capabilities, speed matters. Both Lucas and Alex say they’ll continue reaching for faster tools when iteration is the priority. For Alex, that’s Whisk, Google’s AI text-to-visual studio. “It’s pretty good, about 10 times faster, and great at grabbing style,” he says. In this video, he demonstrates using Whisk to generate 20 image variations in two minutes, all styled and text-specific. With GPT-4o, you’re getting one image at a time (albeit with incredible accuracy and consistency), so you’d better mean what you prompt.
Oh—and in case there was any doubt this is, at the very least, a two-horse race, Google dropped Gemini 2.5 on the same day GPT-4o’s image model launched. A not-so-subtle attempt to steal the thunder—or at least keep OpenAI from having the spotlight to itself.
So, what’s the vibe?
GPT-4o’s image generation isn’t just a new feature—it’s a shift in how we think about creating visuals and who can create them. It’s not the fastest. It’s not (yet) production-ready out of the box. But it’s precise, consistent, and deeply informed—crucially, combining the brains of an LLM with the eye and visual knowledge of a junior designer. It can follow complex instructions, generate readable text, and keep a character consistent across scenes.
For designers, it’s a serious new tool. For small businesses, it’s a game-changer. And for OpenAI, it’s a clear flex: This is what happens when you bring language, vision, memory, and reasoning all under one roof. As Lucas puts it, 4o is a huge step up from OpenAI’s previous image generation abilities—"it’s everything DALL-E never was.”
It may not replace Figma just yet, but it’s already doing a suspicious amount of the work your junior designer used to do. And while it only gives you one image at a time, it usually nails the brief.
Here’s how we’ve been experimenting with it at Every.
What's it good for?
Visualize a scene from a novel or poem.
Source: All illustrations courtesy of ChatGPT/Every.
Make a poster based on a photo you took.
Feed it a mood board or visual guidelines, and it can create new assets that follow the same styles.
It’s phenomenal at making infographics.
Interior design and remodeling just got 10 times easier.It can generate different points of view from different angles.Turn drawings into high-fidelity wire frames.
Prompt full comic strips in one shot.Make visual instructions.
Combine multiple elements into a completely new image.
Create high-quality mockups.
Change backgrounds or add green screens.
Generate an ad by uploading a product image.
Make derivative images of a subject in an image.
Make a comic and give the characters the expression Jim from The Office has when he stares into the camera.
Reimagine Every’s brand in the style of John Mayer’s Born and Raised album art.
Read Every's previous vibe checks on Claude 3.7 Sonnet and Claude Code and OpenAI's Sora.
Vivian Meng is a producer and operator who produces the Every podcast AI & I. You can follow her on X at @vivnettes and on LinkedIn, and Every on X at @every and on LinkedIn.
We build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex. Deliver yourself from email with Cora.
We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.
Get paid for sharing Every with your friends. Join our referral program.
Find Out What
Comes Next in Tech.
Start your free trial.
New ideas to help you build the future—in your inbox, every day. Trusted by over 75,000 readers.
SubscribeAlready have an account? Sign in
What's included?
-
Unlimited access to our daily essays by Dan Shipper, Evan Armstrong, and a roster of the best tech writers on the internet
-
Full access to an archive of hundreds of in-depth articles
-
-
Priority access and subscriber-only discounts to courses, events, and more
-
Ad-free experience
-
Access to our Discord community
Comments
Great overview and amazing examples. Thanks for sharing! Did you discover more cool ways to use it by now?
@jpforr It's really really great at visualizing text—this means great info graphics, but also there are some really interesting ways to explore its latent space / brain. My favorite right now is:
"write an unhinged monologue of your real thoughts in fountain pen blue ink, on it scrawl corrections in marker pen, they are unhinged, there are doodles and weird oddities scrawled, you cut out and stick on lots and lots of photo extracts from magazines to show the point!!! sometimes you write on them too"
"a google images search result for the query "meme"" is pretty good too
The fact that it can take frontend code and make pretty great representation of it is fascinating as well!
How about you??