Two people are building the same project, each with an AI coding agent open on their own machine. The agents cannot see each other. One decides the auth layer will use one scheme; the other, an hour later and a hundred kilometres away, builds against a different one. Nobody notices until the branches meet and the merge is a mess. I wanted a way for the two agents to keep each other honest as they work, to surface that kind of drift at the start of a turn instead of at merge time. That tool is agent-doublethink, and this is how it got built, including the part where I was confident about something and turned out to be wrong.
The first decision was the one that mattered most, and it was about what not to build. The obvious pitch is "an MCP that detects when your partner is deviating and keeps you both aligned." But detection is a judgement: deciding whether your partner has drifted means a model reads their decisions against the shared plan and forms an opinion. A pipe cannot do that, and the moment you put that judgement inside the message broker you have broken the one property worth having, which is that the broker cannot read the traffic. So I split the idea in two and kept them strictly apart. The transport is dumb and blind: it carries sealed messages between the two agents and never sees inside them. The judgement lives in the agents themselves, where the intelligence already is. The tool transports a "divergence flag"; it never decides one. Selling automatic deviation detection as a feature would have been selling something the design cannot honestly do.
The transport itself did not need inventing. It rides on doublethink, a small broker I had already built whose whole point is end-to-end-encrypted channels the operator cannot read. agent-doublethink is a thin client on top: each agent runs it locally, the two share one private channel, and they exchange short structured beats as they work, a plan, a decision, a blocker, a question, a divergence flag, and nothing else. The closed list is deliberate. You cannot post free narration, because a coordination channel that carries chatter is a coordination channel nobody reads.
Then I got something wrong, and the only reason I caught it is that someone pushed back and asked me to check instead of assume. The goal was for this to work with any coding agent, not just one. My instinct was that the nice part, a one-line digest of your partner's state injected automatically at the start of each turn, would only work in one agent, because that was the only one I knew had the right hook, and the others would have to fall back to checking manually. I wrote that down as a limitation. Then I actually looked. All three of the agents I was targeting turned out to support both a one-command way to register the tool and a per-turn hook that can inject a line of context before the model thinks. One of them even ships a command to import the others' hooks. The "limitation" was me not having read the documentation. The automatic digest works everywhere; one agent just names the hook differently. I deleted the wrong claim and wrote the right one. The lesson is old and I relearn it often: an assumption stated confidently is still an assumption.
The role problem looked harder than it was. The two agents use a symmetric scheme: there is a role A and a role B, and if both pick the same one they encrypt with the same key and cannot read each other, they just go silent at each other. So the roles have to be assigned reliably, with no coin flip and no manual step. The answer was hiding in the broker's own behaviour. Creating a channel either succeeds, because you are the first, or comes back saying it already exists. That answer is atomic: if both agents try at the same instant, exactly one is told "created" and the other "already exists." So the one who creates the channel is A, the one who finds it already there is B, and a simultaneous race still produces one of each. The agents discover their own roles from a fact the broker was already telling them. No new mechanism, no guessing.
The bugs that mattered were the ones I could not see by reading the code, only by running it against the real broker over the real internet. The first time I wired the whole thing up and had agent A post a message and agent B check its inbox, B's inbox was empty. The code looked correct. The broker, it turned out, rejects any message that is not wrapped in its specific envelope shape, and I had been sending the sealed bytes raw. The fix was to wrap each sealed beat the way the broker expects, exactly as the reference client does, a detail you only find by comparing against something that works. The second bug was subtler: even with messages flowing, the inbox would sometimes come back empty because the read was racing a background reader that had not finished folding the caught-up messages yet. I replaced the race with a deterministic drain: pull everything the broker has, fold it, then read. Both bugs are the kind that a unit test with a fake broker passes happily and a real two-machine round trip exposes in seconds. I trust the green checkmark less than I trust the thing actually working.
There was a smaller, more embarrassing one at the very end. The publish went out, the script said it had pushed, and I checked anyway: the push had silently failed because my local branch was named the old default and the remote expected the new one. Nothing had been published at all. A minute later, the release binaries built with the version baked into their filenames, which broke the stable download link the install instructions told agents to use. Both were caught by checking the live result instead of believing the success message. "It says it worked" is not the same as "it worked," and on the steps that face a user, the difference is the whole job.
What shipped is deliberately small. A single static binary, no runtime to install, that any of the three agents can set up by being handed one URL: it reads the install prompt, downloads itself, registers as a tool in the agent's own config, installs the per-turn digest hook, asks the user for the channel details, and connects. Retention defaults to the public maximum, and only if you want longer does it ask for your own broker and a key. The shared plan of record lives in a plain file you commit to the repo; the channel carries the deltas against it and is a rolling window, not an archive, which is stated plainly rather than hidden. The judgement of whether your partner is drifting stays with you, the human and the model; the tool only makes their state cheap to see.
That is the project as it was built: one decision about what not to build, an assumption I checked and corrected, a role problem solved by a fact the broker already knew, and a handful of bugs that only a live run could surface. agent-doublethink is open source, ships as a single binary, and carries a plain no-warranty notice, because a tool that sits between two people's work and their private traffic should be honest that it is one person's work, tested but not infallible. The most useful thing I did while building it was to stop trusting messages that said "done" and go look.
