Skip to main content

PageIndex Integration

PageIndex is a vectorless reasoning-based RAG system that navigates document trees to find relevant content. Unlike embedding-based retrieval, PageIndex performs tree traversal — each node selection is a distinct decision point.

Briefcase provides two components for PageIndex observability:

  • PageIndexTracer — wraps PageIndexClient directly; captures chat_completions calls and enriches records with tree structure metadata
  • PageIndexMCPObserver — post-processes MCP tool call records when PageIndex is accessed via Model Context Protocol

Installation

pip install briefcase-ai[pageindex]

PageIndexMCPObserver does not require the pageindex package.

PageIndexTracer

Constructor

from briefcase.integrations.frameworks import PageIndexTracer

tracer = PageIndexTracer(
api_key="your-pageindex-api-key", # used if client not provided
context_version=None, # version tag for all records
async_capture=True, # export in background thread
client=None, # existing PageIndexClient (overrides api_key)
fetch_tree_metadata=True, # call get_tree() after each chat_completions
)
ParameterDefaultDescription
api_keyNonePageIndex API key. Creates a PageIndexClient internally.
context_versionNoneVersion tag added to every decision record.
async_captureTrueExport to BriefcaseConfig.exporter in a daemon thread.
clientNoneSupply an existing PageIndexClient (takes precedence over api_key).
fetch_tree_metadataTrueCall get_tree() after chat_completions to compute depth/path.

Basic Usage

tracer = PageIndexTracer(api_key="pi-key-abc123", context_version="v2.1")

response = tracer.chat_completions(
messages=[{"role": "user", "content": "What is the capital of France?"}],
doc_id="pi-abc123",
)

print(response["choices"][0]["message"]["content"])

# Inspect captured records
for record in tracer.get_records():
print(f"doc_id: {record['pageindex.doc_id']}")
print(f"depth: {record['pageindex.tree.depth']}")
print(f"nodes_visited: {record['pageindex.tree.nodes_visited']}")
print(f"path: {record['pageindex.tree.path']}")
print(f"execution_time_ms: {record['execution_time_ms']:.1f}")

Using an Existing Client

from pageindex import PageIndexClient

client = PageIndexClient(api_key="pi-key-abc123")
tracer = PageIndexTracer(client=client, context_version="prod-v1")

response = tracer.chat_completions(
messages=[{"role": "user", "content": "Explain section 3.2"}],
doc_id="doc-789",
)

get_tree() Pass-Through

tree = tracer.get_tree("doc-789")
print(tree) # {"tree": {"title": "...", "nodes": [...]}}

Public API

tracer.get_records() -> List[Dict[str, Any]]   # all captured records
tracer.clear() # reset captured records

PageIndex Decision Record

Every chat_completions call produces a record with these fields:

{
"decision_id": "uuid-...",
"decision_type": "pageindex_retrieval",
"function_name": "PageIndexTracer.chat_completions",
"inputs": {
"messages": [...],
"doc_id": "pi-abc123"
},
"outputs": {
"content": "Paris is the capital of France."
},
"started_at": "2026-02-26T10:00:00Z",
"ended_at": "2026-02-26T10:00:00.843Z",
"execution_time_ms": 843.2,
"context_version": "v2.1",

# PageIndex-specific attributes
"pageindex.doc_id": "pi-abc123",
"pageindex.retrieval_method": "tree_search",
"pageindex.tree.depth": 3,
"pageindex.tree.nodes_visited": 12,
"pageindex.tree.path": "root > Chapter 1 > Section 1.2 > ... (4 more)",
"pageindex.tree.backtrack_count": 0
}

Note on nodes_visited: This is the total node count of the fetched tree, used as an upper-bound proxy for traversal. PageIndex does not expose per-query traversal paths via the API.

Note on backtrack_count: Always 0 — backtracking is server-side only and not exposed in the API response.

PageIndexMCPObserver

When a LangChain agent or OpenAI agent uses PageIndex as an MCP tool, the tool call appears in handler records with an opaque JSON output string. PageIndexMCPObserver parses that output and adds pageindex.* attributes in-place.

Constructor

from briefcase.integrations.frameworks import PageIndexMCPObserver

observer = PageIndexMCPObserver()
# No configuration needed. No pageindex package required.

Detection Logic

The observer identifies PageIndex MCP records by (in order):

  1. Tool/function name contains any of: pageindex, page_index, pi_search, pi_chat, pi_retrieve (case-insensitive)
  2. The output JSON contains doc_id or retrieval_id keys
  3. The output JSON contains a nodes array at root level
  4. The output JSON contains a tree key with a dict value

Usage with LangChain Handler

from briefcase.integrations.frameworks import (
BriefcaseLangChainHandler,
PageIndexMCPObserver,
)

handler = BriefcaseLangChainHandler(engagement_id="my-project")
observer = PageIndexMCPObserver()

# After running the chain...
for record in handler.get_decisions_as_dicts():
enriched = observer.observe(record) # mutates record in-place
if enriched:
print(f"PageIndex call: doc={record['pageindex.doc_id']}")

print(f"Observed: {observer.observed_count}, Enriched: {observer.enriched_count}")

Usage with OpenAI Agents Tracer

from briefcase.integrations.frameworks import OpenAIAgentsTracer, PageIndexMCPObserver

tracer = OpenAIAgentsTracer()
observer = PageIndexMCPObserver()

# After the agent run...
for trace in tracer.get_records():
for span in trace.get("spans", []):
if observer.observe(span):
print(f"PageIndex span: doc={span['pageindex.doc_id']}")

Public API

observer.observe(record: Dict) -> bool      # True if enriched
observer.is_pageindex_mcp_response(record) # check without mutating
observer.observed_count # total records seen
observer.enriched_count # total records enriched

Choosing PageIndexTracer vs PageIndexMCPObserver

ScenarioUse
Direct PageIndex SDK callsPageIndexTracer
PageIndex accessed as LangChain tool via MCPPageIndexMCPObserver on LangChain records
PageIndex accessed by an OpenAI agent via MCPPageIndexMCPObserver on tracer spans
Both (mixed architecture)Both

See Also