I Spent a Week With Gemini Pro 1.5

Why size matters (when it comes to a context window)

I’ve been reading Chaim Potok’s 1967 novel, The Chosen. It features a classic enemies-to-lovers storyline about two Brooklyn Jews who find friendship and personal growth in the midst of a horrible softball accident. (As a Jew, let me say that yes, “horrible softball accident” is the most Jewish inciting incident in a book since Moses parted the Red Sea.)

In the book, Reuven Malter and his Orthodox yeshiva softball team are playing against a Hasidic team led by Danny Saunders, the son of the rebbe. In a pivotal early scene, Danny is at bat and full of rage. He hits a line drive toward Reuven, who catches the ball with his face. It smashes his glasses, spraying shards of glass into his eye and nearly blinding him. Despite his injury, Reuven catches the ball. The first thing his teammates care about is not his eye or the traumatic head injury he just suffered—it’s that he made the catch.

If you’re a writer like me and you’re typing an anecdote like the one I just wrote, you might want to put into your article the quote from one of Reuven’s teammates right after he caught the ball to make it come alive.

If you go to ChatGPT for help, it’s not going to do a good job initially:

This is wrong. Because, as I said, Sydney Goldberg did not care about Reuven’s injury—he cared about the game! But all is not lost. If you give ChatGPT a plain text version of The Chosen and ask the same question, it’ll return a great answer:

This is correct! (It also confirms for us that Sydney Goldberg has his priorities straight.) So what happened?

ChatGPT behaved as if I’d given it an open-book test. We can improve ChatGPT’s responses by, when asking it a question, giving it a little notecard with some extra information that it might use to answer it.

In this case we gave it an entire book to read through. But you’ll notice a problem: The entire book can’t fit into ChatGPT’s context window. So how does it work?

Why size matters (when it comes to a context window)

If you go to ChatGPT for help, it’s not going to do a good job initially:

This is correct! (It also confirms for us that Sydney Goldberg has his priorities straight.) So what happened?

In this case we gave it an entire book to read through. But you’ll notice a problem: The entire book can’t fit into ChatGPT’s context window. So how does it work?

In order to answer my question, there’s a lot of code in ChatGPT that performs retrieval: It divides The Chosen up into small chunks through which it searches to find ones that seem relevant to the query. The retrieval code passes the original question, “What’s the first thing that Sydney Goldberg says to Reuven after he gets hit in the eye by the baseball?” and the most relevant sections of text it can find in the book to GPT-4, which produces an answer. (For a more detailed explanation, read this piece.)

Again, we have to pass GPT-4 chunks of text—not the whole book—because GPT-4 can only fit so much text into its context window. If you’re paying attention, you’ll see the problem: Because the context window is so small, the performance of our model for answering certain kinds of queries is bottle-necked by how good we are at searching for relevant pieces of information to give to the model. (I wrote about this phenomenon about a year ago in this piece.)

If our search functionality doesn’t turn up relevant text chunks, well, GPT-4’s answer won’t be good. It doesn’t matter how smart GPT-4 is—it’s only as good as the chunks we turn up.

Let’s say we’re picking up The Chosen after a few weeks. We’ve read the first two sections, and before we begin the third we want to get a summary of what’s already happened in the book. We upload it to ChatGPT and ask it to summarize:

ChatGPT gives us a vague answer that’s correct, but it’s not very detailed because it can’t fit enough of the book into its context window to output a great one.

Let’s see what happens when we don’t have to divide the book up into chunks. Instead, we use Gemini, which can read through the entire book at once:

You’ll notice that Gemini’s answer is significantly more detailed and provides key plot points from the book that ChatGPT can’t give. (Technically, we could probably get a similar summary out of GPT-4 if we devised a clever system for chunking and summarizing it, but it would take a lot of work, and Gemini makes that work unnecessary.)

Gemini’s use cases aren’t limited to reading novels of self-discovery through softball accidents. There are hundreds of others that it unlocks that were previously difficult to do with ChatGPT, or with a custom solution.

For example, at Every, we’re incubating a software product that can help you organize your files with AI. I wrote the original code for the file organizer, and our lead engineer, Avishek, wrote a GPT-4 integration. He wanted to know where to hook the GPT-4 integration into the existing codebase. So we uploaded it to Gemini and asked:

It found the right place in the code and wrote the code Avishek needed in order to complete the integration. This is something just short of magic, dramatically accelerating developer productivity, especially on larger projects.

It doesn’t stop there, either. I’ve been writing for a long time about how transformer models might become copilots for the mind—and end our need to organize our notes forever. Gemini Pro 1.5 is a step in that direction. For example, recently I was writing a piece about an effect I’ve noticed that I’m calling “I can do it myself” syndrome, where people tend to not use ChatGPT and similar tools because they feel like they can get the same task done more quickly, at better quality, if they do it themselves. It’s like inexperienced managers who micromanage their reports to the point of doing most of the work themselves, guaranteeing it’s done the way they want it to, but sacrificing a lot of leverage in the process.

I wanted an anecdote to open the essay with, so I asked Gemini to find one in my reading highlights. It came up with something perfect:

I could not have found a better anecdote, and it’s not a generic one—it’s from my own reading history and taste.

Except: I later learned that the anecdote is made up. The general thrust of the idea is true—Luce did run both the editorial and business sides of Time—so it is pointing me in the right direction. But after I reviewed my Readwise highlights, I couldn't find the exact quote Gemini came up with. (I only figured this out after Gwern and other savvy Hacker News commenters pointed it out in a previous version of this article.)

So Gemini is not perfect. You do need to check its work. But if you're careful, it's a powerful tool.

Again, all of this comes back to the context window. This kind of performance is only possible because with Gemini we don’t need to search for or sort relevant pieces of information before we hand it to the model. We just feed it everything we have and let the model do the rest.

It’s much easier to work with large context windows, and they can deliver far more consistent and powerful results without extra retrieval code. The question is: What’s next?

The future of large context models

About a year ago I wrote:

“People have been saying that data is the new oil for a long time. But I do think, in this case, if you’ve spent a lot of time collecting and curating your own personal set of notes, articles, books, and highlights it’ll be the equivalent of having a topped-off oil drum in your bedroom during an OPEC crisis.”

Gemini is the perfect example of why this is true. With its large context window, all of the personal data you’ve been collecting is at the tip of your fingers ready to be deployed at the right place and the right time, in whatever task you need it for. The more personal data you have—even if it’s disorganized—the better.

There are a few important caveats to note, though:

First, this is a private beta that I can use for free. These models often perform differently (read: worse) when they are released publicly, and we don’t know how Gemini will perform when it’s tasked with operating at Google scale. There’s also no telling how much pumping 1 million tokens is going to cost into Gemini when it’s live. Over time the cost of using it will likely significantly decrease, but it will take a while.

Second, Gemini is pretty slow. Many requests took a minute or more to return, so it’s not a drop-in replacement for every LLM use case. It’s for the heavy lifting that you can’t get done with ChatGPT, which you probably don’t need to do on a regular basis. I would expect speed to increase significantly over time as well, but it’s still not there yet.

OpenAI has some catching up to do, and I’ll be watching to see how they respond. But the other players on my mind—companies like Langchain, LlamaIndex (where I’m an investor), Pinecone, and Weaviate—are to some degree betting on retrieval being an important component of LLM usage. They either provide the library that does the chunking and searching for information to pass to the LLM, or the datastore that keeps the information searchable and safe. As I mentioned earlier, retrieval is less relevant when you have a large context window, because you can input all of your information into each request.

You might think those companies are in trouble. Gemini’s huge context window does make some of what they’re building less important for basic queries. But I think retrieval will still be important long-term.

If there’s one thing we know about humanity, it’s that our ambition scales with the tools we have available to satisfy it. If 1 million token context models become the norm, we’ll learn to fill them. Every chat prompt will include all of our emails, and all of our journal entries, and maybe a book or two for good measure. Retrieval will still be used to figure out which 1 million tokens are the most relevant, rather than what it’s used for now: to find which 1,000 tokens are the most relevant.

It’s an exciting time. Expect more experiments from me in the weeks to come!

Dan Shipper is the co-founder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast How Do You Use ChatGPT? You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

Correction: An earlier version of this article did not note that the Henry Luce quote was hallucinated. It has been updated with that information.

What did you think of this post?

Amazing Good Meh Bad

Comments

You need to login before you can comment.
Don't have an account? Sign up!

@doogiesjunkdrawer over 1 year ago

Great article... stimulating. The phrase “context is everything” comes to mind. While that’s an oversimplification, the more complete the context, usually the better the answer, decision or outcome. But that depends heavily on the quality and precision of the question. E.g. the hologram in “I Robot”. AI can’t fix a poor prompt. That relies on us. We need to know why and what value we are pursuing (and even its use) when we ask a question and seek its answer. To use AI/GPT effectively, we need to level up our ability to ask good questions.

BTW- that’s applies to today, right now. In general, we need a renewed effort on improving critical thinking and reasoning skills focused around the why of our what's if we expect to get effective how's for the given Job-To-Be-Done (JTBD). Don't get me started about "Prompt Engineering"... so far that sounds like the job we should be doing right now. Crappy inputs, crappy outputs.

Thanks again for the thought provoking article.

♡ 1 · Reply

Colleen Cole over 1 year ago

I just signed up for Gemini 1.5 yesterday and my first run at it was initially disappointing. I fed it a transcript of a podcast episode and asked it to sum up the frameworks discussed during the episode. Strangely it kept telling me it didn’t know the people on the podcast and couldn’t respond as them. I rejigged my prompt and basically got the same answer. The third time, I got a high school level answer. Finally, I fed it an AI answer from another tool, and at that point, it apologized and then listed the steps it would take to do a better job in the future. Methinks I need to sort out my prompts with it a bit.

That said, I moved on asking it to use some of the frameworks I outlined to brainstorm about a launch, and at that point, it started to shine. I’m looking forward to see how it develops as a tool.

♡ 0 · Reply

@julianbeggs over 1 year ago

Of course you get better pattern recognition from a larger sample of patterns. We'll all be uploading our second brains into Google before long.

@debt.foreignaffairssettlement 2 months ago

A wallet recovery service "KeycheinX" helped us recover a long lost wallet!!

I've been active on reddit since 2010, I started one of the first Bitcoin startups in the Philippines in 2014, and Bitcoin has been my life since then. Naturally, I told all my friends about it, encouraged them to buy some, or gave some of them some free coins (it wasn't worth that much, 0.05 was $10 on the average in 2015 for example).
One of my closest buddies of almost 40 years was one of those friends. We always bet on boxing and UFC fights for fun, and everytime I lost, I paid him him in BTC. He won a total of about $50 -ish in BTC from me from 2014-2015 and kept it in a block chain dot info wallet. At the time, it was really one of the only few decent choices for noobs.

Long story short, he lost his password and only had this 17- word password recovery seed from the old version of the block chain wallet. The seed didn't work. I thought he made a mistake because it's either 12 or 24 words for a proper seed phrase. He never heard the end of it from us and all our friends as we watched Bitcoin rise and watched his "lost" funds go higher and higher in value. It was like having an indestructible glass safe without a key. He already wrote it off and charged it to experience. I honestly thought it was gone forever.

Then on January 18th, 2025, I was tweeting about this exact thing on a random thread when "KeycheinX" a wallet recovery company responded with "We can try to help you! If your friend remember any hints." I was skeptical at first, but after a private message exchange and doing some verification, I figured it was worth a shot.

Fast forward to today, after giving several clues and 30 hours of work on their end, I got the message that they finally cracked it! Unbelievable. Just got off the phone with my friend now and he is absolutely pumped, because the BTC is now worth about $5,000~!

I was told by the Peter and his team that they will be publishing a blog post about the whole thing soon, so I am looking forward to seeing the process of how they recovered the wallet.

At these BTC prices, I do hope more people are able to recover their funds from lost wallets. Be careful with scammers though! There are many of them out there. Make sure you can verify the recovery service you are using before giving them any vital information. Contact KEYCHEINX wallet recovery services today

Telegram: @keycheinX

Mail: [email protected]

G-Mail: [email protected]

I Spent a Week With Gemini Pro 1.5—It’s Fantastic

Sponsored By: Destiny

Own Game-changing Companies

Why size matters (when it comes to a context window)

Sponsored By: Destiny

Own Game-changing Companies

Why size matters (when it comes to a context window)

The future of large context models

What did you think of this post?

Ideas and Apps to
Thrive in the AI Age

What is included in a subscription?

Ideas and Apps to
Thrive in the AI Age

What is included in a subscription?

Related Essays

OpenAI’s New Model, Strawberry, Explained

Is AI Progress Hitting a Wall?

Microsoft’s AI Vision: An Open Internet Made for Agents

Comments

I Spent a Week With Gemini Pro 1.5—It’s Fantastic

Sponsored By: Destiny

﻿﻿

Own Game-changing Companies

Why size matters (when it comes to a context window)

Sponsored By: Destiny

﻿﻿

Own Game-changing Companies

Why size matters (when it comes to a context window)

The future of large context models

What did you think of this post?

Ideas and Apps to Thrive in the AI Age

What is included in a subscription?

Ideas and Apps to Thrive in the AI Age

What is included in a subscription?

Related Essays

OpenAI’s New Model, Strawberry, Explained

Is AI Progress Hitting a Wall?

Microsoft’s AI Vision: An Open Internet Made for Agents

Comments

Learn the SkillsAI Can't Replace

Ideas and Apps to
Thrive in the AI Age

Ideas and Apps to
Thrive in the AI Age

Learn the Skills
AI Can't Replace