LogoSteve
  • Blog
  • About
What Is the Scope of Prompt Caching?
2026/03/30

What Is the Scope of Prompt Caching?

Prompt caching is driven by prefix matching, not by session identity. If the same prefix shows up again, reuse can happen across conversations and sometimes across users.

One of the most common misunderstandings about prompt caching is that it only works inside one session. That mental model is too narrow.

It Matches Prefixes, Not Sessions

Prompt caching is about the prefix itself.

Usually, the cache key is tied to the hash of the prefix content, not to which user or which chat thread produced it.

Conversation A:
[system][msg1][msg2] -> cache write

Conversation B:
[system][msg1][msg2] -> same prefix, cache hit

Conversation C:
[system][msg3] -> different prefix, no hit

The Provider Differences Are Mostly About Control

The exact implementation varies across vendors, but the general pattern is consistent:

  • Anthropic Claude commonly works through prefix matching within the same organization, with short TTL windows that refresh on hits
  • OpenAI also centers on automatic prefix matching, although TTL details are more opaque
  • Google Gemini exposes a more explicit model where developers create cache objects and reuse them by reference

Prompt caching often operates on two layers at the same time:

  1. reuse across separate conversations
  2. reuse within the same conversation as the history grows turn by turn

Long System Prompts Benefit the Most

If every user request begins with the same long system prompt, then that shared prefix becomes a strong caching candidate.

User A: [8000-token system prompt][user message...]
User B: [8000-token system prompt][user message...]
User C: [8000-token system prompt][user message...]

The shared 8000-token prefix can potentially be reused again and again

The leverage of caching a long shared system prompt

That is how products like AI coding assistants can carry very long system prompts without paying full price on every request. The deciding factor is how broadly that prefix is shared.

Both Kinds of Reuse Can Happen Together

Inside a multi-turn conversation, every new request usually appends to the previous history:

Request 1: [system][user1]
Request 2: [system][user1][assistant2][tool3][assistant4]
Request 3: [system][user1][assistant2][tool3][assistant4][user5]

So you can get both of these effects at once:

  • the same system prompt reused across different conversations
  • the same growing history reused inside one conversation

They are not mutually exclusive. You can get both at once: shared system prompts across conversations, and longer shared history inside a single conversation.

All Posts

Author

avatar for Steve
Steve

Categories

  • AI
It Matches Prefixes, Not SessionsThe Provider Differences Are Mostly About ControlLong System Prompts Benefit the MostBoth Kinds of Reuse Can Happen Together

More Posts

Is Multi-Agent Worktree Just Git Worktree?
Development

Is Multi-Agent Worktree Just Git Worktree?

Yes. It is git worktree used as isolation for agent workflows: one repository, multiple directories, and far fewer workspace collisions.

avatar for Steve
Steve
2026/03/30
How I Think About Agent Memory
AI

How I Think About Agent Memory

I prefer agent memory on demand, not by default. The real difficulty is deciding what to keep, when to keep it, and how to stop memory from turning into noise.

avatar for Steve
Steve
2026/03/30
Cursor Rules Guide: How to Write Effective Prompts
AI

Cursor Rules Guide: How to Write Effective Prompts

Deep dive into Cursor Rules classification and prompt structure to make AI your top developer assistant

avatar for Steve
Steve
2026/03/21
LogoSteve

Steve's Blog

© 2026 Steve