Was this newsletter forwarded to you? Sign up to get it in your inbox.
This past week, a federal judge ruled in favor of Anthropic in a copyright case contested by five authors. At the center of the case was a creative, almost analog act: Anthropic purchased millions of physical books, ripped off their bindings, scanned each page, and used the resulting digital files to train its AI models. In a summary judgment, the court called this act “transformative” and ruled that it was protected under the principle of fair use.
While explaining his rationale, Judge William Alsup said, “They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable.” (The ruling didn’t greenlight everything Anthropic did. The court took issue with another set of books: pirated files, downloaded en masse and stored in Anthropic’s systems even though the company decided not to train on them, and that part of the case will go to trial.)
This case underscores that data is the nexus of AI’s value. But once the data is in hand, the real work begins—making it useful for LLMs as they take on increasingly complex tasks.
Teaching AI to hunt: New methods of reinforcement learning
One way to do that is reinforcement learning. In simplistic terms, reinforcement learning (RL) is like training a puppy: The model tries different actions, and you reward it for good ones and not for bad ones, Over time it figures out which actions get the most rewards, and does more of that.
Machine learning researcher Nathan Lambert has found that OpenAI’s reasoning model o3 is incredible for search. In particular, Lambert noted its relentlessness in finding an obscure piece of information, comparing it to a “trained hunting dog on the scent.” This is a big deal in RL, where models are known to give up quickly if a tool—in this case, the search engine that the model is accessing—isn’t immediately helpful. According to Lambert, o3’s persistence suggests that OpenAI has figured out how to get AI not to quit prematurely, turning it into a more effective learner.
Meanwhile, at Japanese research lab Sakana AI, a team is rethinking how to train AI through reinforcement learning entirely. Instead of traditional RL methods that reward models for their ability to solve problems, Sakana is training models to teach. The models are given problems—along with the correct solution—and evaluated on their ability to explain the solution in a clear, helpful way. If you can train small, efficient models to teach well, you can use them to educate larger, more capable models much faster and cheaper than before. And long-term, you might even get models that teach themselves.
Why setting the stage is everything
The Only Subscription
You Need to
Stay at the
Edge of AI
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators
Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Comments
Don't have an account? Sign up!