Skip to main content

LlamaIndex Integration

BriefcaseLlamaIndexHandler is a LlamaIndex callback handler that captures AI decisions for every LLM call, retrieval, query engine operation, embedding, and synthesis step in your LlamaIndex application.

Installation

pip install briefcase-ai
pip install llama-index-core # or your llama_index provider packages

Quick Setup with auto()

The simplest way to instrument LlamaIndex is one line:

import briefcase_ai

briefcase_ai.auto("llamaindex")
# That's it — all LlamaIndex query engine calls are now traced.

Constructor

from briefcase_ai.integrations.frameworks import BriefcaseLlamaIndexHandler

handler = BriefcaseLlamaIndexHandler(
engagement_id="", # project identifier
workstream_id="", # workflow identifier
capture_llm=True, # capture LLM calls
capture_embeddings=True, # capture embedding operations
capture_retrievals=True, # capture retriever queries
capture_queries=True, # capture query engine + synthesizer calls
max_input_chars=10000, # truncation limit for inputs
max_output_chars=10000, # truncation limit for outputs
event_starts_to_ignore=None, # list of EventType strings to skip on start
event_ends_to_ignore=None, # list of EventType strings to skip on end
exporter=None, # per-instance BaseExporter override
async_capture=True, # export runs in background thread
)

Basic Usage

Register the handler via Settings.callback_manager for global instrumentation:

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from briefcase_ai.integrations.frameworks import BriefcaseLlamaIndexHandler

handler = BriefcaseLlamaIndexHandler(
engagement_id="my-project",
workstream_id="rag-pipeline",
)

Settings.callback_manager.add_handler(handler)

# Build your index and run queries — handler captures automatically
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the key findings?")
print(response)

for decision in handler.get_decisions():
print(f"{decision.decision_type}: {decision.function_name}")
print(f" execution: {decision.execution_time_ms:.1f}ms")
if decision.token_usage:
print(f" tokens: {decision.token_usage}")

Retrieving Decisions

# Get CapturedDecision objects
decisions = handler.get_decisions()

# Get serializable dicts (for storage or logging)
dicts = handler.get_decisions_as_dicts()

print(f"Captured {handler.decision_count} decisions")

# Reset for next request
handler.clear()

CapturedDecision Fields

BriefcaseLlamaIndexHandler produces CapturedDecision objects with these fields:

@dataclass
class CapturedDecision:
decision_id: str # UUID string (matches LlamaIndex event_id)
decision_type: str # "llm", "embedding", "retriever", "query", "synthesize"
function_name: str # model name or "retriever" / "query_engine" / "synthesizer"
inputs: Dict[str, Any] # truncated input data
outputs: Dict[str, Any] # truncated output data
model_parameters: Dict[str, Any] # temperature, max_tokens, etc. (LLM only)
error: Optional[str] # set if an exception event fired
started_at: Optional[datetime]
ended_at: Optional[datetime]
execution_time_ms: Optional[float]
parent_run_id: Optional[str] # parent event_id for nesting
engagement_id: str
workstream_id: str
token_usage: Optional[Dict[str, int]] # prompt/completion/total (LLM only)

What Each Event Type Records

LLM calls (decision_type = "llm"):

  • inputs.messages — chat messages (role + content, truncated)
  • inputs.template — prompt template string if present
  • outputs.text — generated completion text
  • token_usage{"prompt_tokens": N, "completion_tokens": N, "total_tokens": N}
  • model_parameters — temperature, max_tokens, top_p from model config

Embedding operations (decision_type = "embedding"):

  • inputs.text_count — number of texts embedded
  • outputs.embedding_count — number of embeddings returned
  • outputs.dimensions — dimensionality of the embedding vectors

Retrieval events (decision_type = "retriever"):

  • inputs.query — the retrieval query string
  • outputs.document_count — number of nodes returned
  • outputs.documents — list of {content_preview, score} dicts (200 char preview)

Query engine calls (decision_type = "query"):

  • inputs.query — the user query string
  • outputs.response — the final response text

Synthesis steps (decision_type = "synthesize"):

  • inputs.query — the query being synthesized
  • outputs.response — the synthesized response text

Event Types Ignored

Use event_starts_to_ignore and event_ends_to_ignore to skip specific LlamaIndex event types:

handler = BriefcaseLlamaIndexHandler(
engagement_id="my-project",
event_starts_to_ignore=["chunking", "templating"],
event_ends_to_ignore=["chunking", "templating"],
)

Valid event type strings: "llm", "embedding", "retrieve", "query", "synthesize", "chunking", "reranking", "exception", "templating", "sub_question", "tree", "agent_step".

Advanced: Per-Instance Exporter

Override the global exporter for this handler:

from briefcase_ai.exporters import SplunkHECExporter

handler = BriefcaseLlamaIndexHandler(
engagement_id="acme",
workstream_id="rag",
exporter=SplunkHECExporter(
url="https://splunk.example.com:8088",
token="your-hec-token",
),
)

Advanced: Export on Query Completion

When a top-level query event ends, the handler calls _trigger_export which passes the decision record to the configured exporter. The export runs in a background daemon thread when async_capture=True (the default).

To configure a global exporter:

from briefcase_ai.config import setup
from briefcase_ai.exporters import OTelExporter

setup(
exporter=OTelExporter(
endpoint="http://localhost:4317",
service_name="my-rag-service",
)
)

Troubleshooting

No decisions captured: Confirm you called Settings.callback_manager.add_handler(handler) before building the index or running queries. The callback manager must be set before LlamaIndex initialises its components.

Missing token usage: Token usage is populated only when the LLM response payload includes a token_count object. This depends on your LLM provider and the llama_index integration package version.

event_id mismatch: LlamaIndex generates a UUID event_id on on_event_start and passes the same ID to on_event_end. If on_event_start is called with an empty event_id, the handler generates one internally — ensure you pass the returned ID to on_event_end if calling the handler manually.

See Also