
What Is the Scope of Prompt Caching?
Prompt caching is driven by prefix matching, not by session identity. If the same prefix shows up again, reuse can happen across conversations and sometimes across users.
One of the most common misunderstandings about prompt caching is that it only works inside one session. That mental model is too narrow.
It Matches Prefixes, Not Sessions
Prompt caching is about the prefix itself.
Usually, the cache key is tied to the hash of the prefix content, not to which user or which chat thread produced it.
Conversation A:
[system][msg1][msg2] -> cache write
Conversation B:
[system][msg1][msg2] -> same prefix, cache hit
Conversation C:
[system][msg3] -> different prefix, no hitThe Provider Differences Are Mostly About Control
The exact implementation varies across vendors, but the general pattern is consistent:
- Anthropic Claude commonly works through prefix matching within the same organization, with short TTL windows that refresh on hits
- OpenAI also centers on automatic prefix matching, although TTL details are more opaque
- Google Gemini exposes a more explicit model where developers create cache objects and reuse them by reference
Prompt caching often operates on two layers at the same time:
- reuse across separate conversations
- reuse within the same conversation as the history grows turn by turn
Long System Prompts Benefit the Most
If every user request begins with the same long system prompt, then that shared prefix becomes a strong caching candidate.
User A: [8000-token system prompt][user message...]
User B: [8000-token system prompt][user message...]
User C: [8000-token system prompt][user message...]
The shared 8000-token prefix can potentially be reused again and again
That is how products like AI coding assistants can carry very long system prompts without paying full price on every request. The deciding factor is how broadly that prefix is shared.
Both Kinds of Reuse Can Happen Together
Inside a multi-turn conversation, every new request usually appends to the previous history:
Request 1: [system][user1]
Request 2: [system][user1][assistant2][tool3][assistant4]
Request 3: [system][user1][assistant2][tool3][assistant4][user5]So you can get both of these effects at once:
- the same system prompt reused across different conversations
- the same growing history reused inside one conversation
They are not mutually exclusive. You can get both at once: shared system prompts across conversations, and longer shared history inside a single conversation.


