Ramazan Yavuz: Articles

Ramazan Yavuz: Articles Long-form writing on local LLMs, retrieval, Linux tools, AI agents, and the engineering behind each. https://ramazan-yavuz.tr/ 2026-06-25T10:00:00+02:00 Ramazan Yavuzhttps://ramazan-yavuz.tr/ The Token Meter: Why AI Agents Are Rewarded for Wasting Tokens https://ramazan-yavuz.tr/articles/the-token-meter-why-ai-agents-are-paid-to-waste-your-tokens.html 2026-06-25T10:00:00+02:00 2026-06-25T10:00:00+02:00

Coding agents are billed by the token, output costs five times more than input, and the training rewards length. Whether or not anyone planned it, the incentives point one way: more output. A look at the numbers, with the math laid out so you can redo it.

Ramazan Yavuz Two Agents, One Channel: Building agent-doublethink https://ramazan-yavuz.tr/articles/agent-doublethink-two-agents-one-channel.html 2026-06-20T18:00:00+02:00 2026-06-20T18:00:00+02:00

How I built agent-doublethink, an agent-agnostic MCP server that lets two coding agents on two machines coordinate over an end-to-end-encrypted channel: the design, the assumption I got wrong, and the bugs only a live test could catch.

Ramazan Yavuz doublethink, Told in Commits: How the Broker Got Built https://ramazan-yavuz.tr/articles/doublethink-a-broker-told-in-commits.html 2026-06-19T14:00:00+02:00 2026-06-19T14:00:00+02:00

The development of doublethink read straight from its git history: the first commit, the milestones, the over-engineered keypair design I deleted, the Redis swap, going public, and a live demo you can run in your browser.

Ramazan Yavuz The Web Revival: the Internet Didn't Die, I Just Wasn't On It https://ramazan-yavuz.tr/articles/web-revival-the-internet-didnt-die.html 2026-06-19T10:00:00+02:00 2026-06-19T10:00:00+02:00

A video sent me down a rabbit hole into the web revival: thousands of hand-made personal sites, the XXIIVV webring, the Merveilles community, and a quieter, more intentional internet that has been growing since 2016.

Ramazan Yavuz doublethink: ntfy You Can Trust With Private Data https://ramazan-yavuz.tr/articles/doublethink-ntfy-you-can-trust-with-private-data.html 2026-06-17T20:30:00+02:00 2026-06-17T20:30:00+02:00

Why I built doublethink: a pub/sub broker as easy to stand up as ntfy, but with genuinely private, end-to-end-encrypted channels. The stepping stones, the over-engineered dead end I threw away, and how I use it to debug Android APKs and wire tools and agents together.

Ramazan Yavuz Making Claude Code Talk Back https://ramazan-yavuz.tr/articles/claude-can-speak-making-claude-code-talk-back.html 2026-06-04T20:00:00+03:00 2026-06-04T20:00:00+03:00

How I built claude-can-speak: speech-out for Claude Code. A concept-to-release story, including the TTS engine bake-off (Kokoro vs Piper vs XTTS), why I dropped multilingual for naturalness, the two-modes design, and shipping it on npm.

Ramazan Yavuz Substance over polish: a guide to the DACH Lebenslauf https://ramazan-yavuz.tr/articles/cv-that-actually-works-in-dach.html 2026-05-11T10:00:00+02:00 2026-05-11T10:00:00+02:00

What a Lebenslauf in the DACH market really gets judged on in 2026, for both salaried applicants (Bewerbungsmappe with Anschreiben and Arbeitszeugnisse) and freelance contractors (Projektliste-led Profil). Covers scanning, typography, cognitive load, ATS and AI matching, anti-patterns, plus a downloadable template. Substance over polish.

Ramazan Yavuz Retrieval-Augmented Generation: What RAG Is and How It Actually Works https://ramazan-yavuz.tr/articles/retrieval-augmented-generation-how-rag-works.html 2026-05-03T11:00:00+02:00 2026-05-03T11:00:00+02:00

A clear, in-depth introduction to retrieval-augmented generation (RAG): how it differs from plain LLM inference, how embeddings, vector search, and prompt assembly fit together, and how a local stack like hydra-llm puts the whole loop on your own machine.

Ramazan Yavuz lillycoder: A Local-First Coder REPL With Permission Gates https://ramazan-yavuz.tr/articles/lillycoder-a-local-coder-repl-with-permission-gates.html 2026-04-19T10:00:00+02:00 2026-04-19T10:00:00+02:00

How lillycoder turns any local LLM into a coding agent: file and shell tools, a per-tool permission prompt, a hard-deny safety classifier, and an OpenAI-compatible /v1 endpoint as the only thing it talks to.

Ramazan Yavuz LLM Quantization Explained: What Q4_K_M, Q5_K_S, and IQ3_XXS Actually Mean https://ramazan-yavuz.tr/articles/quantization-what-the-numbers-mean.html 2026-04-04T14:30:00+02:00 2026-04-04T14:30:00+02:00

A practical guide to LLM quantization: what the numbers and letters mean (Q4_K_M, Q5_K_S, IQ3_XXS, F16), how to pick a quantization for your hardware, and what trade-offs you are actually making.

Ramazan Yavuz Why Clauding Was Built: Enabling Real Workflows for AI Agents Without Compromising Control https://ramazan-yavuz.tr/articles/clauding-enabling-real-workflows-for-ai-agents-without-compromising-control.html 2026-03-21T19:33:00+02:00 2026-03-21T19:33:00+02:00

How Clauding solves the gap between what AI agents can do and what they can safely be allowed to do, by isolating execution inside containers.

Ramazan Yavuz How a Stubborn AI Agent Led Me to Redesign How APIs Talk to Machines https://ramazan-yavuz.tr/articles/tekir-how-a-stubborn-ai-agent-led-me-to-redesign-how-apis-talk-to-machines.html 2026-03-08T08:45:00+02:00 2026-03-08T08:45:00+02:00

The story behind TEKIR: how a misbehaving AI agent exposed a fundamental gap in API design and led to an open standard for agent-friendly HTTP responses.

Ramazan Yavuz The ThinkPad That Changed My Mind About Local LLMs https://ramazan-yavuz.tr/articles/thinkpad-strix-halo-the-laptop-that-changed-my-mind.html 2026-02-28T17:20:00+02:00 2026-02-28T17:20:00+02:00

A review of a Strix Halo ThinkPad with unified memory, written from the perspective of someone who used to think local LLMs needed a discrete GPU. They don't, and the laptop costs less than I expected.

Ramazan Yavuz A Quote at the Top of Every Terminal https://ramazan-yavuz.tr/articles/herald-a-quote-at-the-top-of-every-terminal.html 2026-02-15T13:55:00+02:00 2026-02-15T13:55:00+02:00

Notes on herald: a small Linux daemon that prints a daily quote at the top of every new terminal and at login. Cache, fallback pool, and the boring decisions that make it pleasant to live with.

Ramazan Yavuz An Emoji That Gossips About Your Machine https://ramazan-yavuz.tr/articles/meowtrics-an-emoji-that-gossips-about-your-machine.html 2026-02-07T22:08:00+02:00 2026-02-07T22:08:00+02:00

Notes on meowtrics: a tiny Linux tray daemon that turns system telemetry into an animated emoji and a hover-line, with a debounced state machine doing the unglamorous work.

Ramazan Yavuz Parking a Laptop Battery Properly https://ramazan-yavuz.tr/articles/inhibit-charge-parking-a-laptop-battery-properly.html 2026-01-25T15:30:00+02:00 2026-01-25T15:30:00+02:00

Notes on inhibit-charge: a small Linux daemon that uses the kernel's inhibit-charge mode to hold a laptop battery at a target percentage instead of cycling it.

Ramazan Yavuz Running Local LLMs Without the Rituals https://ramazan-yavuz.tr/articles/hydra-llm-running-local-llms-without-the-rituals.html 2026-01-11T09:15:00+02:00 2026-01-11T09:15:00+02:00

Notes on hydra-llm: a small CLI and Plasma widget that wraps llama.cpp in Docker, picks a model that fits your machine, and stays out of the way.

Ramazan Yavuz Text Adventures with an AI Parser https://ramazan-yavuz.tr/articles/baseline-engine-text-adventures-with-an-ai-parser.html 2026-01-03T11:42:00+02:00 2026-01-03T11:42:00+02:00

Notes on baseline-engine: an interactive fiction engine that uses LLMs to map free-form player input to structured story intents, with a node-based editor and a player-driven correction loop.

Ramazan Yavuz