LogoSteve
  • Blog
  • About
Prompt Caching vs Tool Result Replacement
2026/03/30

Prompt Caching vs Tool Result Replacement

Prompt caching cuts the price of repeated prefixes. Tool-result replacement shrinks the prompt itself. Both can save money, but they push on different parts of the bill.

When people try to optimize agent cost, they often reach for two ideas at the same time:

  1. use prompt caching
  2. replace large tool outputs with short placeholders

The problem is that these two strategies are not naturally aligned.

Cache Savings Depend on an Unchanged Prefix

For prefix-based systems such as Claude or OpenAI-style prompt caching, the core rule is simple: the prefix usually needs to remain byte-for-byte consistent for cache reuse to work well.

Request A:
[system][msg1][msg2][msg3][msg4]
-> first request, full price, cache write

Request B:
[system][msg1][msg2][msg3][msg4][msg5][msg6]
-> prefix hit, only the new part is typically charged at full price

Normal Chat History Already Helps Cache

Because a normal conversation usually appends new messages to the end of the old history.

Turn 1: [system][user1]
Turn 4: [system][user1][assistant2][tool3][assistant4]
Turn 5: [system][user1][assistant2][tool3][assistant4][user5]
Turn 8: [system][user1][assistant2][tool3][assistant4][user5][assistant6][tool7]

Each request preserves the full prefix of the previous one, so the precondition for cache hits is often naturally satisfied.

Replacing Tool Results Changes the Old Prefix

Because you are rewriting history.

If a large tool output is replaced with a short placeholder, the historical prefix changes:

Before replacement:
[system][user1][assistant2][tool3_full_output][assistant4][user5]

After replacement:
[system][user1][assistant2][tool3_short_placeholder][assistant4][user5]

From the replacement point onward, the byte sequence is different. That means later requests no longer share the same prefix with the older cached version.

That breaks the old cache path.

Replacing historical tool output breaks the cache prefix

They Reduce Different Costs

They optimize different parts of the problem:

Strategy A: Keep History Intact and Rely on Prompt Caching

  • upside: stable prefix and high cache hit potential
  • downside: long tool outputs remain in context and token volume keeps growing

Strategy B: Compress or Replace Tool Results

  • upside: smaller prompt size and lower raw token volume
  • downside: broken cache prefixes and a likely full-price reset on the next round

So the two strategies often pull in opposite directions.

Which One Wins Depends on Size and Remaining Turns

The deciding variables are:

  1. how large the tool output is
  2. how many more turns the conversation is likely to have

A rough rule of thumb:

SituationBetter strategy
Large tool output and many future turnsPrefer replacement
Small tool output but many future turnsPrefer keeping it and using cache
Conversation is nearly overPrefer keeping it and using cache
Tool calls are extremely frequent and history grows fastPrefer replacement

A decision matrix for choosing between caching and replacement

If the tool output is huge and the conversation is going to continue for a while, replacement often wins. If the output is small or the chat is nearly over, leaving the history intact usually gets more out of caching.

All Posts

Author

avatar for Steve
Steve

Categories

  • AI
Cache Savings Depend on an Unchanged PrefixNormal Chat History Already Helps CacheReplacing Tool Results Changes the Old PrefixThey Reduce Different CostsStrategy A: Keep History Intact and Rely on Prompt CachingStrategy B: Compress or Replace Tool ResultsWhich One Wins Depends on Size and Remaining Turns

More Posts

Is a Worktree Temporary?
Development

Is a Worktree Temporary?

Usually yes. A worktree works best as a disposable task folder: create it for the job, merge the branch, then remove the directory and keep the commits.

avatar for Steve
Steve
2026/03/30
How I Think About Agent Memory
AI

How I Think About Agent Memory

I prefer agent memory on demand, not by default. The real difficulty is deciding what to keep, when to keep it, and how to stop memory from turning into noise.

avatar for Steve
Steve
2026/03/30
Is Multi-Agent Worktree Just Git Worktree?
Development

Is Multi-Agent Worktree Just Git Worktree?

Yes. It is git worktree used as isolation for agent workflows: one repository, multiple directories, and far fewer workspace collisions.

avatar for Steve
Steve
2026/03/30
LogoSteve

Steve's Blog

© 2026 Steve