Week 2

Week 2: The interesting problems moved to the input

May 30, 20264 min read

Three things crossed my desk this week and not one of them was about which model is best. The model questions feel settled enough to be boring: they're all good, they all get cheaper every quarter, you pick one and move on. What kept pulling my attention was on the input side: what kind of media should you feed an agent? How do agents talk to each other? And what do they drag in from the open web when you ask them to search?

Feed the agent better, not just more

The first one is small and immediately useful: when something looks wrong in a UI I'm building with an agent I usually simply type (or dictate) the correction. A tip from Ben Tossell made me rethink my approach: screen record the session with a voiceover instead, and hand the agent the video.

The agent transcribes the recording, pulls keyframes from the exact moments I narrate, and writes an HTML doc with timestamped cards and a checklist of actions. Jacob Samuelson turned the idea into a portable skill that installs in Claude Code, Cursor, and Codex.

The reason this matters beyond the convenience: most of the conversation about context has been about restraint, about feeding the model less and cleaner. This is the opposite move, feeding it a richer input. A video can say in twenty seconds what three paragraphs of "no, the button on the left, when you hover" often struggles to achieve. Tokens aren't free, so it's not the move for every back-and-forth, but for visual work it pays for itself instantly.

A hand offering a film play-button card to a small robot, a checklist on the robot's body

Agents that stop talking in words

The second one is stranger. There's a recent paper where the agents stop passing text to each other and pass their internal representations instead, the raw numbers, not the human readable text. Skipping the round trip through language gets them up to 75% more efficient on the tasks tested.

You can only do this if you have access to the model's internals, which means it works on open-weight models but not with the frontier ones.

I don't know yet whether passing numbers between agents will be the way of the future, but the direction is worth watching, the communication channel between two agents might be where the low hanging fruit is, where the gains are hiding.

Two figures passing a stream of numbers between them, an empty speech bubble crossed out above

Treat the web like it's poisoned

The third one is the one with stakes. I research with AI agents constantly, the first thing I do on most tasks is send an agent to find out what the community or the experts are saying about the topic of it. I did that for a long time without thinking about it, until I realized just how naive it was. Four sources agreeing on something means nothing if all four are copying one original blog post. A vendor's own docs are the best source for "how does this work" and the worst possible source for "is this the best tool." And somewhere out there is a popular tutorial with a quietly malicious line in it, waiting for an agent to paste it into something that runs in production.

So I treat retrieved content as suspect by default now. The piece I published yesterday is the full version, but the one habit worth stealing even if you read nothing else is this: when an agent finds a fix online and then plans to apply it, it's good practice to have a second agent, which never sees the original pages, audit the writeups as if it might be hostile. I tested it on a Python retry pattern that looked fine on the surface, and the second agent caught a real bug the first one had written.

This week on the blog: Treating web search as poisoned, the long version of the third story, including the two other failure modes I skipped here and what I plan to do next with it.

Thinking about building something? If you want a technical partner to figure it out with, reply to this email and tell me about it, that's the kind of conversation I like having.

If someone forwarded this and you want it directly, sign up at christianvismara.com/newsletter. Past issues live in the archive.

Christian

The newsletter

Get the next one in your inbox

One email a week on what the AI labs shipped and what it means. Free, and you can leave whenever.

I'm a fractional CTO and AI product builder in New York. If you're working on something and want a technical partner to think it through with, get in touch.

Previous issues

Week 3

Week 3: Big deal, and only a big deal

Week 1

Week 1: The labs drew lines, the products grew personalities

Read every issue in the archive