What Is the Scope of Prompt Caching?

One of the most common misunderstandings about prompt caching is that it only works inside one session. That mental model is too narrow.

It Matches Prefixes, Not Sessions

Prompt caching is about the prefix itself.

Usually, the cache key is tied to the hash of the prefix content, not to which user or which chat thread produced it.

Conversation A:
[system][msg1][msg2] -> cache write

Conversation B:
[system][msg1][msg2] -> same prefix, cache hit

Conversation C:
[system][msg3] -> different prefix, no hit

The Provider Differences Are Mostly About Control

The exact implementation varies across vendors, but the general pattern is consistent:

Anthropic Claude commonly works through prefix matching within the same organization, with short TTL windows that refresh on hits
OpenAI also centers on automatic prefix matching, although TTL details are more opaque
Google Gemini exposes a more explicit model where developers create cache objects and reuse them by reference

Prompt caching often operates on two layers at the same time:

reuse across separate conversations
reuse within the same conversation as the history grows turn by turn

Long System Prompts Benefit the Most

If every user request begins with the same long system prompt, then that shared prefix becomes a strong caching candidate.

User A: [8000-token system prompt][user message...]
User B: [8000-token system prompt][user message...]
User C: [8000-token system prompt][user message...]

The shared 8000-token prefix can potentially be reused again and again

The leverage of caching a long shared system prompt

That is how products like AI coding assistants can carry very long system prompts without paying full price on every request. The deciding factor is how broadly that prefix is shared.

Both Kinds of Reuse Can Happen Together

Inside a multi-turn conversation, every new request usually appends to the previous history:

Request 1: [system][user1]
Request 2: [system][user1][assistant2][tool3][assistant4]
Request 3: [system][user1][assistant2][tool3][assistant4][user5]

So you can get both of these effects at once:

the same system prompt reused across different conversations
the same growing history reused inside one conversation

They are not mutually exclusive. You can get both at once: shared system prompts across conversations, and longer shared history inside a single conversation.

One of the most common misunderstandings about prompt caching is that it only works inside one session. That mental model is too narrow.

It Matches Prefixes, Not Sessions

Prompt caching is about the prefix itself.

Usually, the cache key is tied to the hash of the prefix content, not to which user or which chat thread produced it.

Conversation A:
[system][msg1][msg2] -> cache write

Conversation B:
[system][msg1][msg2] -> same prefix, cache hit

Conversation C:
[system][msg3] -> different prefix, no hit

The Provider Differences Are Mostly About Control

The exact implementation varies across vendors, but the general pattern is consistent:

Anthropic Claude commonly works through prefix matching within the same organization, with short TTL windows that refresh on hits
OpenAI also centers on automatic prefix matching, although TTL details are more opaque
Google Gemini exposes a more explicit model where developers create cache objects and reuse them by reference

Prompt caching often operates on two layers at the same time:

reuse across separate conversations
reuse within the same conversation as the history grows turn by turn

Long System Prompts Benefit the Most

If every user request begins with the same long system prompt, then that shared prefix becomes a strong caching candidate.

User A: [8000-token system prompt][user message...]
User B: [8000-token system prompt][user message...]
User C: [8000-token system prompt][user message...]

The shared 8000-token prefix can potentially be reused again and again

The leverage of caching a long shared system prompt

That is how products like AI coding assistants can carry very long system prompts without paying full price on every request. The deciding factor is how broadly that prefix is shared.

Both Kinds of Reuse Can Happen Together

Inside a multi-turn conversation, every new request usually appends to the previous history:

Request 1: [system][user1]
Request 2: [system][user1][assistant2][tool3][assistant4]
Request 3: [system][user1][assistant2][tool3][assistant4][user5]

So you can get both of these effects at once:

the same system prompt reused across different conversations
the same growing history reused inside one conversation

They are not mutually exclusive. You can get both at once: shared system prompts across conversations, and longer shared history inside a single conversation.

What Is the Scope of Prompt Caching?

It Matches Prefixes, Not Sessions

The Provider Differences Are Mostly About Control

Long System Prompts Benefit the Most

Both Kinds of Reuse Can Happen Together

Author

Categories

More Posts

Agent vs Harnessed Agent

Is Multi-Agent Worktree Just Git Worktree?

Prompt Caching vs Tool Result Replacement

What Is the Scope of Prompt Caching?

It Matches Prefixes, Not Sessions

The Provider Differences Are Mostly About Control

Long System Prompts Benefit the Most

Both Kinds of Reuse Can Happen Together

Author

Categories

More Posts

Agent vs Harnessed Agent

Is Multi-Agent Worktree Just Git Worktree?

Prompt Caching vs Tool Result Replacement