LlamaIndex Integration
BriefcaseLlamaIndexHandler is a LlamaIndex callback handler that captures
AI decisions for every LLM call, retrieval, query engine operation, embedding,
and synthesis step in your LlamaIndex application.
Installation
pip install briefcase-ai
pip install llama-index-core # or your llama_index provider packages
Quick Setup with auto()
The simplest way to instrument LlamaIndex is one line:
import briefcase_ai
briefcase_ai.auto("llamaindex")
# That's it — all LlamaIndex query engine calls are now traced.
Constructor
from briefcase_ai.integrations.frameworks import BriefcaseLlamaIndexHandler
handler = BriefcaseLlamaIndexHandler(
engagement_id="", # project identifier
workstream_id="", # workflow identifier
capture_llm=True, # capture LLM calls
capture_embeddings=True, # capture embedding operations
capture_retrievals=True, # capture retriever queries
capture_queries=True, # capture query engine + synthesizer calls
max_input_chars=10000, # truncation limit for inputs
max_output_chars=10000, # truncation limit for outputs
event_starts_to_ignore=None, # list of EventType strings to skip on start
event_ends_to_ignore=None, # list of EventType strings to skip on end
exporter=None, # per-instance BaseExporter override
async_capture=True, # export runs in background thread
)
Basic Usage
Register the handler via Settings.callback_manager for global instrumentation:
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from briefcase_ai.integrations.frameworks import BriefcaseLlamaIndexHandler
handler = BriefcaseLlamaIndexHandler(
engagement_id="my-project",
workstream_id="rag-pipeline",
)
Settings.callback_manager.add_handler(handler)
# Build your index and run queries — handler captures automatically
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What are the key findings?")
print(response)
for decision in handler.get_decisions():
print(f"{decision.decision_type}: {decision.function_name}")
print(f" execution: {decision.execution_time_ms:.1f}ms")
if decision.token_usage:
print(f" tokens: {decision.token_usage}")
Retrieving Decisions
# Get CapturedDecision objects
decisions = handler.get_decisions()
# Get serializable dicts (for storage or logging)
dicts = handler.get_decisions_as_dicts()
print(f"Captured {handler.decision_count} decisions")
# Reset for next request
handler.clear()
CapturedDecision Fields
BriefcaseLlamaIndexHandler produces CapturedDecision objects with these fields:
@dataclass
class CapturedDecision:
decision_id: str # UUID string (matches LlamaIndex event_id)
decision_type: str # "llm", "embedding", "retriever", "query", "synthesize"
function_name: str # model name or "retriever" / "query_engine" / "synthesizer"
inputs: Dict[str, Any] # truncated input data
outputs: Dict[str, Any] # truncated output data
model_parameters: Dict[str, Any] # temperature, max_tokens, etc. (LLM only)
error: Optional[str] # set if an exception event fired
started_at: Optional[datetime]
ended_at: Optional[datetime]
execution_time_ms: Optional[float]
parent_run_id: Optional[str] # parent event_id for nesting
engagement_id: str
workstream_id: str
token_usage: Optional[Dict[str, int]] # prompt/completion/total (LLM only)
What Each Event Type Records
LLM calls (decision_type = "llm"):
inputs.messages— chat messages (role + content, truncated)inputs.template— prompt template string if presentoutputs.text— generated completion texttoken_usage—{"prompt_tokens": N, "completion_tokens": N, "total_tokens": N}model_parameters— temperature, max_tokens, top_p from model config
Embedding operations (decision_type = "embedding"):
inputs.text_count— number of texts embeddedoutputs.embedding_count— number of embeddings returnedoutputs.dimensions— dimensionality of the embedding vectors
Retrieval events (decision_type = "retriever"):
inputs.query— the retrieval query stringoutputs.document_count— number of nodes returnedoutputs.documents— list of{content_preview, score}dicts (200 char preview)
Query engine calls (decision_type = "query"):
inputs.query— the user query stringoutputs.response— the final response text
Synthesis steps (decision_type = "synthesize"):
inputs.query— the query being synthesizedoutputs.response— the synthesized response text
Event Types Ignored
Use event_starts_to_ignore and event_ends_to_ignore to skip specific
LlamaIndex event types:
handler = BriefcaseLlamaIndexHandler(
engagement_id="my-project",
event_starts_to_ignore=["chunking", "templating"],
event_ends_to_ignore=["chunking", "templating"],
)
Valid event type strings: "llm", "embedding", "retrieve", "query",
"synthesize", "chunking", "reranking", "exception", "templating",
"sub_question", "tree", "agent_step".
Advanced: Per-Instance Exporter
Override the global exporter for this handler:
from briefcase_ai.exporters import SplunkHECExporter
handler = BriefcaseLlamaIndexHandler(
engagement_id="acme",
workstream_id="rag",
exporter=SplunkHECExporter(
url="https://splunk.example.com:8088",
token="your-hec-token",
),
)
Advanced: Export on Query Completion
When a top-level query event ends, the handler calls _trigger_export which
passes the decision record to the configured exporter. The export runs in a
background daemon thread when async_capture=True (the default).
To configure a global exporter:
from briefcase_ai.config import setup
from briefcase_ai.exporters import OTelExporter
setup(
exporter=OTelExporter(
endpoint="http://localhost:4317",
service_name="my-rag-service",
)
)
Troubleshooting
No decisions captured: Confirm you called Settings.callback_manager.add_handler(handler)
before building the index or running queries. The callback manager must be set
before LlamaIndex initialises its components.
Missing token usage: Token usage is populated only when the LLM response
payload includes a token_count object. This depends on your LLM provider and
the llama_index integration package version.
event_id mismatch: LlamaIndex generates a UUID event_id on on_event_start
and passes the same ID to on_event_end. If on_event_start is called with an
empty event_id, the handler generates one internally — ensure you pass the
returned ID to on_event_end if calling the handler manually.
See Also
- Integrations Overview — comparison table
- LangChain Integration — if you use LangChain wrappers around LlamaIndex
- Infrastructure — Exporters — all export targets