Optimize Claude Token Usage — Building Friday

Friday talks to Claude through ACP – the same protocol Claude Code runs on. That means subscription billing, not per-token metering. So token usage shouldn't matter, right?

It turns out it does. Claude subscriptions come with a usage window. Burn through tokens fast enough and you'll hit a cooldown. The same task that takes 12k input tokens in the CLI is chewing through noticeably more when run through Friday. The difference isn't the model or the prompt – it's the plumbing.

The Session class called sdkQuery() with minimal options:

const options: Record<string, unknown> = {
    canUseTool,
    persistSession: true,
    settingSources: ['user', 'project', 'local'],
};

const sessionId = this.#state.get(SESSION_ID_KEY);
if (sessionId) {
    options.resume = sessionId;
} else {
    options.cwd = this.#options.cwd;
    if (this.#options.model) options.model = this.#options.model;
    if (this.#options.systemPrompt)
        options.systemPrompt = this.#options.systemPrompt;
}

This is from src/acp/session.ts (before)

No tools restriction – so the SDK loads every Claude Code tool on every call. Things like NotebookEdit, TodoWrite, and CronCreate that Friday never touches. Each one sent as input tokens.

No systemPrompt preset – so the model isn't getting the optimised instructions the CLI uses.

No resumeSessionAt – so when a session resumes, the full conversation history replays from the beginning.

The invisible cost wasn't money. It was capacity – the 5-hour usage window draining faster than it should've been.

Where the Tokens Go

Every API call carries unnecessary weight from three sources.

Tool definitions. The ACP SDK sends every registered tool's JSON schema as part of the input on every call. Claude Code has over 30 built-in tools. Friday uses 11. The other 20+ are dead weight – bytes counting against the usage window on every single turn.

No system prompt preset. The Claude Code CLI uses { type: 'preset', preset: 'claude_code' } – an optimised system prompt that tells the model how to use its tools efficiently. It's not prominently documented in the SDK. The types hint at it, but I found it by reading the CLI source. Without it, the model is flying blind – no instructions to make tool use cheaper.

Full history replay on resume. When we resume a session, the SDK replays the conversation history. If the session auto-compacted at some point – collapsing earlier turns into a summary – there's a UUID marking that boundary. Without passing that UUID as resumeSessionAt, the SDK replays everything before the compaction too. Redundant context, paid for again.

None of this shows up as a line item. There's no "you sent 20 unused tool definitions" warning. The SDK does what we ask. Don't restrict tools? It sends all of them. Don't set a resume point? It replays everything. The defaults are correct – they're just expensive when we don't need them.

The Allowlist

The SDK accepts a tools option as a string[]. Pass it an array of tool names and only those get sent as input tokens.

const ALLOWED_TOOLS = [
    'Bash',
    'Read',
    'Write',
    'Edit',
    'Glob',
    'Grep',
    'WebSearch',
    'WebFetch',
    'Agent',
    'Skill',
    'AskUserQuestion',
];

This is from src/acp/session.ts

Eleven tools. Everything Friday actually uses – file operations, search, web access, sub-agents, skills, and the ability to ask a question. Everything else – NotebookEdit, TodoWrite, CronCreate, and the rest – gone from the input payload.

We went with an allowlist, not a blocklist. A blocklist would need updating every time the SDK adds a new tool. An allowlist only changes when Friday genuinely needs a new capability – and that's a deliberate decision, not a default.

It's a constant, not a config value. Moving it to config would imply it's something we tune per-environment. It isn't. Adding a tool to this list means adding a capability to Friday. That's a code change, not a setting.

The constant gets passed straight to the SDK:

const options: Record<string, unknown> = {
    canUseTool,
    persistSession: true,
    settingSources: ['user', 'project', 'local'],
    tools: ALLOWED_TOOLS,
};

One line. The saving compounds on every turn of every session.

Compaction Boundaries

When a session's context grows large enough, the SDK auto-compacts – it summarises earlier turns and emits a system message with subtype: 'compact_boundary'. That message includes a UUID.

If we store that UUID and pass it as resumeSessionAt on the next resume, the SDK skips everything before the compaction point. The summary's already in the context. Replaying the pre-summary turns is redundant – we're paying for context the model already has in compressed form.

Here's the detection in #streamTurn:

if (
    message.type === 'system' &&
    'subtype' in message &&
    message.subtype === 'compact_boundary'
) {
    const uuid = (message as { uuid?: string }).uuid;
    if (uuid) {
        this.#state.set(COMPACT_BOUNDARY_KEY, uuid);
    }
    this.#buffer.push({
        type: 'compact',
    });
    continue;
}

This is from src/acp/session.ts

And the resume logic reads it back:

const sessionId = this.#state.get(SESSION_ID_KEY);
if (sessionId) {
    options.resume = sessionId;
    const compactBoundary = this.#state.get(COMPACT_BOUNDARY_KEY);
    if (compactBoundary) {
        options.resumeSessionAt = compactBoundary;
    }
}

Both values live in the same StateStore – a key-value store backed by SQLite. We were already persisting the session ID. The compaction boundary is one more key alongside it.

When the user runs /new to reset a session, both get cleared:

resetSession(): void {
    this.#state.delete(SESSION_ID_KEY);
    this.#state.delete(COMPACT_BOUNDARY_KEY);
    this.resetUsage();
}

Free savings. No behaviour change, no user-visible difference. Sessions that resume after a compaction just skip the redundant prefix.

Visibility, Not Automation

The first implementation showed per-turn token counts in every Telegram message: Done (1.2s · $0.04 · 12k in / 2k out). It was noise. You don't care that a single turn cost four cents. You care whether the session's getting large enough to cause problems.

Visibility surfaces at three levels in the final design.

Context size warning. A one-time alert when input tokens exceed a configurable threshold – 200k by default. At that point the context is full and compaction's imminent, so starting a new session is worth considering. The warning fires once per session, not on every turn.

The /cost command. On-demand session stats – total input tokens, output tokens, cache hits, cumulative cost, turn count. Pull, not push. It's there when we want it and invisible when we don't.

JSONL usage log. Every turn appends a line to meta/friday-usage.log:

export interface UsageLogEntry {
    timestamp: string;
    prompt: string;
    inputTokens: number;
    outputTokens: number;
    cacheReadTokens: number;
    cacheCreationTokens: number;
    costUsd: number;
    numTurns: number;
    durationMs: number;
    model: string;
    status: 'success' | 'error';
}

This is from src/lib/usage-log.ts

The prompt field gets truncated to 100 characters – enough to identify the task, not enough to leak anything sensitive. The log's append-only, one JSON object per line, trivial to grep or pipe into analysis later.

The session also tracks cumulative stats in memory:

export interface UsageStats {
    inputTokens: number;
    outputTokens: number;
    cacheReadInputTokens: number;
    cacheCreationInputTokens: number;
    costUsd: number;
    turnCount: number;
}

When session cost crosses $1.50, Friday warns once: "Session cost: $X.XX – consider /new to reset." It doesn't auto-reset. Auto-reset would lose context mid-task, and that's worse than the cost it saves. The user decides when to start fresh.

The pattern's consistent: show the information, don't act on it. You know your workload. The system's job is to make the invisible visible – not to make decisions with incomplete context.

The Display Detour

This wasn't in the plan.

While I was touching the Telegram display code for usage stats, it became obvious that Claude's text responses contain markdown – bold, italic, code blocks, fenced code – and the existing escaping turns all of it into flat text. A **bold** becomes literal asterisks. A code block loses its formatting entirely.

The fix was a markdownToTelegramHtml() function:

export function markdownToTelegramHtml(text: string): string {
    const blocks: string[] = [];
    let result = text.replace(
        /```(?:\w*)\n?([\s\S]*?)```/g,
        (_match, code) => {
            const idx = blocks.length;
            blocks.push(`<pre>${escapeHtml(code.replace(/\n$/, ''))}</pre>`);
            return `\x00BLOCK${idx}\x00`;
        }
    );

    const inlines: string[] = [];
    result = result.replace(/`([^`]+)`/g, (_match, code) => {
        const idx = inlines.length;
        inlines.push(`<code>${escapeHtml(code)}</code>`);
        return `\x00INLINE${idx}\x00`;
    });

    result = escapeHtml(result);

    result = result.replace(/\*\*(.+?)\*\*/g, '<b>$1</b>');
    result = result.replace(/__(.+?)__/g, '<b>$1</b>');
    result = result.replace(/\*(.+?)\*/g, '<i>$1</i>');

    result = result.replace(
        /\x00INLINE(\d+)\x00/g,
        (_m, idx) => inlines[+idx]
    );
    result = result.replace(
        /\x00BLOCK(\d+)\x00/g,
        (_m, idx) => blocks[+idx]
    );

    return result;
}

This is from src/telegram/message-builder.ts

The approach is extract-then-restore. Fenced code blocks get pulled out first and replaced with sentinel tokens. Then inline code. Then the remaining text gets HTML-escaped and bold/italic markers get converted. Finally the sentinels are restored with their pre-formatted HTML. This prevents code content from being mangled by the inline rules.

It's not a full markdown parser and doesn't try to be. It handles the four things Claude actually produces in conversational output: fenced code, inline code, bold, and italic. That covers most of what shows up in practice.

The same pass through this code surfaced another small thing. When a new turn starts and the first poll fires before any content has arrived, the Telegram message is empty – or shows stale content from the previous turn. The fix was a "Thinking..." placeholder that renders when no segments exist yet. Small, but it got rid of a confusing visual glitch where the message would flash empty before content appeared.

Neither of these was in the feature doc. They fell out of working in the same area. That's the nature of infrastructure work – we go in to change one thing and the adjacent problems become impossible to ignore.

Conclusion

Before: every API call sent 30+ tool definitions. No system prompt preset. Session resumptions replayed full history. No visibility into token usage, context size, or session cost. The only signal that something was wrong was when the usage window started to push back – and by then the damage was done.

After: every call sends 11 tool definitions. The claude_code preset is set on new sessions. Resumptions skip pre-compaction history when a boundary exists. A context warning fires at 200k input tokens. /cost shows session stats on demand. A JSONL log records every turn for post-hoc analysis. A cost warning fires once when the session crosses $1.50.

The practical effect? The same tasks that were burning through the usage window now leave more headroom. Not because the model's doing less work – the output is identical. The input is just smaller. Fewer tool schemas, less redundant history, better instructions.

The markdown-to-HTML conversion and the "Thinking..." placeholder weren't about tokens at all. They were about the experience of checking Telegram – which is, ultimately, the same concern. An assistant that's expensive to run and unpleasant to read isn't one I'll keep open. An assistant that's lean and legible is one I'll leave running.

The lesson here is that defaults are expensive. Not wrong – the SDK's defaults are correct and safe. But correct defaults optimise for the general case. Friday isn't the general case. It's a specific system that uses specific tools in a specific way. Every place where "just use the defaults" got replaced with "use exactly what we need" saved tokens that compound across every turn of every session of every day.

The invisible became visible. The expensive became cheap. And the code that did it was, in every case, less than ten lines.

Corrections

resumeSessionAt does not mean "start from here". The Compaction Boundaries section gets this backwards. resumeSessionAt tells the SDK to discard everything after the specified message, not everything before it. Passing the compact boundary UUID meant every resumed session lost all post-compaction context. The fix was to remove resumeSessionAt entirely and let the SDK handle compaction internally.