From RAPTOR Hierarchical Trees to RL-Trained Policy Learning — The Complete Implementation Playbook
Part I established foundations: six-layer taxonomy, hot/cold separation, and the extract → consolidate pipeline. Part II addresses the harder question: which algorithms make each layer perform at production quality?
{
"task_type": "oauth_implementation",
"attempt_summary": "Used implicit flow...",
"failure_reason": "Missing PKCE verifier",
"constraint_violated": "RFC 7636 §4.2",
"corrective_insight": "Always generate code_verifier before redirect",
"timestamp": "2024-09-14T11:22:00Z"
}memory_insert(), memory_search(), memory_replace(). This is the direct precursor to today's Memory-as-a-Tool pattern.SELECT fact, source, confidence FROM memory WHERE valid_at <= '2024-03-01' AND (valid_to IS NULL OR valid_to > '2024-03-01') AND recorded_at <= NOW()
Neither proactive-only nor reactive-only retrieval is optimal at production scale. The correct pattern is a hybrid: small proactive retrieval for high-value always-relevant context, plus reactive tool calls for deep archives.
Use this matrix to select the right techniques for your use case. Production systems typically compose 3–4 techniques, not just one.
| Technique | Multi-Hop | Temporal | Token Cost | Build Cost | Primary Use Case |
|---|---|---|---|---|---|
| Standard RAG | ✗ | ✗ | Low | Low | Simple document QA |
| RAPTOR | ⚠ | ✗ | Low | High | Hierarchical document corpora |
| HippoRAG | ✓ | ✗ | Med | High | Entity-rich knowledge bases |
| Reflexion | N/A | ⚠ | Low | Low | Task-retry learning agents |
| MemGPT | ✓ | ⚠ | High | Low | Multi-session conversational |
| A-Mem | ✓ | ⚠ | Very Low | Med | Personal assistant · copilot |
| Zep | ✓ | ✓ | Med | Med | Healthcare · finance · enterprise |
| Memory-R1 | ✓ | ✓ | Variable | High (RL) | Learned policy deployment |