Module 09 · Lesson 02
The Context Window — Why It Changes Everything
Reading time: 18 minutes Track: Claude Fluency for Teams · Required for all learners
The most important thing you're probably not thinking about
Every interaction with Claude happens inside a context window — the total amount of text the model can "see" at one time. This includes your system prompt (if any), the full conversation history, any documents you've pasted in, and Claude's own responses.
Claude's context window is large — hundreds of thousands of tokens, where a token is roughly three-quarters of a word. That sounds enormous until you paste a 200-page document and a long conversation history and notice Claude starting to lose the thread of early instructions.
Understanding how the context window works changes how you structure prompts, how you manage long tasks, and why Claude sometimes seems to "forget" something you told it earlier.
What lives in the context window
Think of the context window as a single, very long document that Claude reads from top to bottom before generating each response. Everything in it influences the output:
- System prompt (set by the application or operator): Background instructions, persona, constraints
- Conversation history: Every message in the current session, both yours and Claude's
- Pasted content: Documents, code, data you've inserted inline
- Claude's prior responses: Also consume context — long responses shrink the available space for new input
When the total length approaches the window limit, older content is typically truncated or summarized. This is why long sessions can feel like Claude is "forgetting" earlier instructions — it may literally no longer be reading them.
Practical implications
1. Put the most important instructions at the start and end.
Research on attention in language models consistently shows that models are most sensitive to content at the beginning and end of the context. If you have a critical constraint — "never suggest solutions that require admin access" — say it near the top of your prompt, and optionally repeat it at the end. Instructions buried in the middle of a long prompt are more likely to be underweighted.
2. For long documents, extract before pasting.
Pasting a 50-page PDF uses enormous context. Better: extract the specific sections relevant to your task and paste only those. "Here is Section 4.2 of the report, which covers the budget allocation..." uses far less context and focuses Claude's attention.
3. Start a new conversation for new tasks.
Many people run long, multipurpose conversations with Claude — ask about email drafts, then pivot to code review, then back to strategy. The accumulated history from earlier tasks consumes context and can subtly influence Claude's framing of later ones. For distinct tasks, a fresh session with focused context outperforms a sprawling one.
4. Long responses consume context too.
If you ask Claude to write a 2,000-word document and then ask follow-up questions in the same session, that 2,000-word response is now in the context, competing for attention with your follow-up instructions. For iterative editing tasks, consider copying your working draft into a new session rather than continuing in the same thread.
5. The context window is not a memory system.
Context window ≠ memory. Memory implies persistence across sessions. The context window is ephemeral — it exists for one conversation and disappears when the session ends. If you need Claude to remember something across sessions, you need either a system that manages that state (like Claude.ai Projects with memory enabled) or you need to re-provide the information at the start of each session.
Working with long documents
When you do need to work with long material, a few patterns work well:
Chunking: Break the document into sections and process them separately, aggregating results yourself. "Summarize this section:" repeated across chunks, then "Now combine these summaries into a single executive summary."
Targeted extraction: Ask Claude to extract specific things rather than summarize broadly. "List every action item and its assigned owner from this transcript" is better than "summarize this transcript" when you only need action items.
Map-reduce patterns: For analysis tasks, process chunks independently (map), then feed the outputs back to Claude for synthesis (reduce). This keeps any single context invocation focused.
The key takeaway
The context window is not infinite, not persistent, and not uniform in its attention. Treat it like working memory — finite, temporary, and most reliable for what's freshest and most prominent. Structure your prompts accordingly.