Exercises — Module 03: Context engineering
3.1 — Compaction bake-off
Generate a 40-turn conversation (a coding agent on a small task is ideal). Implement three compaction policies:
- Sliding window of last 8.
- Summarize older, keep last 8 verbatim.
- Hierarchical: rolling summary + per-turn tags + selective retention.
For each, measure: final-answer quality (rubric-judged), input tokens per turn, cost.
Deliverable: table of results + one-paragraph recommendation.
3.2 — Caching
Profile your harness’s cache hit rate. Reorganize the prompt so static content moves to the front. Re-measure.
Deliverable: before/after cache hit rate and cost per call.
3.3 — Agentic retrieval
Build a small corpus (say, 100 chunks of docs). Compare:
- Unconditional RAG: always inject top-K chunks.
- Agentic retrieval: a
search_docstool the model decides to call.
Use a 20-case eval. Compare cost, latency, and answer quality.
Deliverable: numbers + recommendation.
3.4 — External state
Replace in-prompt conversation history with an external file the agent can read and write via tools. Measure the context size at turn
- How does this interact with compaction?
Deliverable: code + reflection.
3.5 — Hybrid search
For a corpus of your choice, build retrieval with: pure BM25, pure vector, hybrid (with reranking). Measure recall@10 on a hand-labeled ground truth of 30 queries.
Deliverable: a chart.