The Ur-model Cometh

Hello, and happy Sunday! This past week was like Christmas in February. Anthropic and OpenAI both dropped significantly improved models that moved us closer to peak general-purpose AI, and it was all-hands here at Every to share what our advance testing revealed about each of them. The result was Vibe Checks on both Opus 4.6 and Codex 5.3, the inevitable head-to-head showdown, and a livestream featuring Sam Altman himself.—Kate Lee

Courtesy of Rachel Braun.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Knowledge base

“GPT-5.3 Codex vs. Opus 4.6: The Great Convergence” by Dan Shipper/Vibe Check: Opus picked up Codex’s precision. Codex gained Opus’s warmth and willingness. Every CEO Dan Shipper and the team tested both extensively, and the verdict is that these models are—in a good way—beginning to resemble each other. Most of the Every team is now using both. Read this for the full head-to-head breakdown, including which one has the higher ceiling and which delivers steadier, faster autonomous execution.

“Vibe Check: Opus 4.6—The Best Coding Model We’ve Tested (With Some Maddening Habits)” by Dan Shipper and Katie Parrott/Vibe Check: In 15 minutes, Opus 4.6 solved a Monologue iOS problem that stumped both Codex and Opus 4.5—researching competitors and open-source repos to find the perfect solution. Put simply: It’s extremely smart. As proof, it set the high score on Cora general manager’s Kieran Klaassen’s LFG benchmark. Some trade-offs exist, though—it’s slower and occasionally confabulates, and the team preferred Opus 4.5’s prose in blind tests. But for vibe coders? Switch now. Read this to learn why.

“GPT-5.3 Codex: The 10x Engineer, Now More Fun at Parties” by Dan Shipper and Katie Parrott/Vibe Check: Codex has always been brilliant but rigid, like a senior engineer who only speaks in implementation details. GPT-5.3 loosens up. It’s faster, warmer, more creative, and finally stops asking permission. Dan ran it overnight on difficult bugs and watched it execute full test loops autonomously with great results. Read this to see how its benchmark scores stack up and to learn why even Kieran—Every’s most devoted Claude Code user—now reaches for Codex sometimes.

“Vibe Check: OpenAI’s Codex App Gains Ground on Claude Code” by Dan Shipper and Katie Parrott/Vibe Check: OpenAI’s new Codex desktop app is the first graphical user interface that’s pulled Dan out of his terminal since Claude Code launched. The Mac app serves as a “command center for agents” with cloud-to-local sync, a skills library, automations, and one-click YOLO mode. Most of the Every engineering team went green on this release; read this to see why it’s competitive for professional programmers orchestrating agents, but vibe coders should stick with Claude.

“The Next Chapter of Every Consulting” by Natalia Quintero/On Every: Every’s consulting practice unveils specialized AI playbooks for tech and finance companies, drawing from work with hedge funds managing over $100 billion in combined assets. Head of consulting Natalia Quintero shares a four-level AI maturity framework, from basic ChatGPT usage to agents that take over your entire workflow. The results: One hedge fund now screens companies in minutes instead of a week, and an investment firm saves 50 hours per memo. Read this for Every’s four-step consulting process, including a DIY roadmap any team can use now.

🎧 🖥 “Every’s Head of Consulting Just Automated Her Job” by Tom Matsuda/AI & I: Natalia wakes at 6 a.m. daily to vibe code—and now calls herself “a bonafide vibe code addict.” The result: Claudie, a project manager that slashed her weekly admin work from 15 hours to one. In this conversation with Dan, she shares what she’s learned from working with companies like the New York Times and Walleye Capital: AI success requires top-down commitment, empowered internal champions, and creative space to experiment. 🎧 🖥 Listen on Spotify or Apple Podcasts, or watch on X or YouTube.

“We Trained an AI on a Board Game. It Became a Better Customer Support Agent.” by Alex Duffy/Playtesting: When Good Start Labs fine-tuned an open-source model on thousands of rounds of Diplomacy—the WWI strategy game favored by JFK and Henry Kissinger—it didn’t just get better at games. The model improved over 10 percent on customer support and industrial operations benchmarks. The reason: Diplomacy rewards context-tracking, shifting priorities, and strategic communication, and models learn more from playing games with feedback than from scraping the web. Read this for why games are becoming AI’s new curriculum.

“What Is Taste, Really?” by Jack Cheng: When we talk about taste in the AI age, we’re conflating two things: personal taste (what you like) and “good taste” (what’s culturally valued). Every contributing editor Jack Cheng unpacks both, drawing from his early days at a SoHo ad agency and Steve Jobs’s family dinner debates over which laundry machines to buy. The key insight: Learn to articulate why you like something, because that articulation builds your toolbox. Read this for a framework to sharpen your creative edge.

Alignment

The intentionality engine. I was never someone who tracked things. Every thought or feeling lived inside my head, which is a polite way of saying it lived nowhere. I once had a killer idea for an essay on a walk and vowed that I’d remember it by the time I got home. Of course, I didn’t. This has happened so many times over the past few years that it stopped being funny and started feeling like self-sabotage. I’ve always admired people who kept meticulous journals and developed “second brains” for ideas, but I couldn’t be bothered.

Then over the holidays I started using Obsidian, a knowledge management system built on simple markdown files, alongside Claude Code and Monologue, a voice-note app that lets me talk instead of type. What makes this setup special is that every idea links to everything else. A half-formed idea during a walk can link to an essay draft I’ve been chewing on, which connects to a tweet I bookmarked three weeks ago. Over time these separate pieces of information become an intricate web of patterns that start to surface on their own.

Courtesy of Ashwin Sharma.

But what started as a way to hold onto ideas has become something more than that. Every day has become a tiny experiment informed by the last: When I know yesterday was a low mood day—because I logged it—I wake up knowing I need to get outside or call a friend. I’m overflowing with intention and focus.

There’s another big reason to care about this right now. As AI agents get more powerful, the most important thing you can do is give them context about who you are. These notes aren’t scattered diary entries or half-remembered preferences, but structured, comprehensive self-knowledge. I want my system to know everything about me, because that’s how I’ll get the most from AI. Your personal knowledge base is the training data for your future agent.

Obsidian used to intimidate me. I assumed it was for hardcore developers, but vibe coding has made it so accessible that I now can’t imagine life without it. And what it’s really taught me is that when you’re intentional about recording your days, you become intentional about living them.—Ashwin Sharma

That’s all for this week! Be sure to follow Every on X at @every and on LinkedIn.

We build AI tools for readers like you. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora. Dictate effortlessly with Monologue.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.

For sponsorship opportunities, reach out to [email protected].

Help us scale the only subscription you need to stay at the edge of AI. Explore open roles at Every.

Upgrade to paid