LlamaIndex Integration

BriefcaseLlamaIndexHandler is a LlamaIndex callback handler that captures AI decisions for every LLM call, retrieval, query engine operation, embedding, and synthesis step in your LlamaIndex application.

Installation

pip install briefcase-ai
pip install llama-index-core  # or your llama_index provider packages

Quick Setup with `auto()`

The simplest way to instrument LlamaIndex is one line:

import briefcase_ai

briefcase_ai.auto("llamaindex")
# That's it — all LlamaIndex query engine calls are now traced.

Constructor

from briefcase_ai.integrations.frameworks import BriefcaseLlamaIndexHandler

handler = BriefcaseLlamaIndexHandler(
    engagement_id="",              # project identifier
    workstream_id="",              # workflow identifier
    capture_llm=True,              # capture LLM calls
    capture_embeddings=True,       # capture embedding operations
    capture_retrievals=True,       # capture retriever queries
    capture_queries=True,          # capture query engine + synthesizer calls
    max_input_chars=10000,         # truncation limit for inputs
    max_output_chars=10000,        # truncation limit for outputs
    event_starts_to_ignore=None,   # list of EventType strings to skip on start
    event_ends_to_ignore=None,     # list of EventType strings to skip on end
    exporter=None,                 # per-instance BaseExporter override
    async_capture=True,            # export runs in background thread
)

Basic Usage

from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from briefcase_ai.integrations.frameworks import BriefcaseLlamaIndexHandler

handler = BriefcaseLlamaIndexHandler(
    engagement_id="my-project",
    workstream_id="rag-pipeline",
)

Settings.callback_manager.add_handler(handler)

# Build your index and run queries — handler captures automatically
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

response = query_engine.query("What are the key findings?")
print(response)

for decision in handler.get_decisions():
    print(f"{decision.decision_type}: {decision.function_name}")
    print(f"  execution: {decision.execution_time_ms:.1f}ms")
    if decision.token_usage:
        print(f"  tokens: {decision.token_usage}")

Retrieving Decisions

# Get CapturedDecision objects
decisions = handler.get_decisions()

# Get serializable dicts (for storage or logging)
dicts = handler.get_decisions_as_dicts()

print(f"Captured {handler.decision_count} decisions")

# Reset for next request
handler.clear()

CapturedDecision Fields

BriefcaseLlamaIndexHandler produces CapturedDecision objects with these fields:

@dataclass
class CapturedDecision:
    decision_id: str          # UUID string (matches LlamaIndex event_id)
    decision_type: str        # "llm", "embedding", "retriever", "query", "synthesize"
    function_name: str        # model name or "retriever" / "query_engine" / "synthesizer"
    inputs: Dict[str, Any]    # truncated input data
    outputs: Dict[str, Any]   # truncated output data
    model_parameters: Dict[str, Any]   # temperature, max_tokens, etc. (LLM only)
    error: Optional[str]      # set if an exception event fired
    started_at: Optional[datetime]
    ended_at: Optional[datetime]
    execution_time_ms: Optional[float]
    parent_run_id: Optional[str]  # parent event_id for nesting
    engagement_id: str
    workstream_id: str
    token_usage: Optional[Dict[str, int]]   # prompt/completion/total (LLM only)

What Each Event Type Records

LLM calls (decision_type = "llm"):

inputs.messages — chat messages (role + content, truncated)
inputs.template — prompt template string if present
outputs.text — generated completion text
token_usage — {"prompt_tokens": N, "completion_tokens": N, "total_tokens": N}
model_parameters — temperature, max_tokens, top_p from model config

Embedding operations (decision_type = "embedding"):

inputs.text_count — number of texts embedded
outputs.embedding_count — number of embeddings returned
outputs.dimensions — dimensionality of the embedding vectors

Retrieval events (decision_type = "retriever"):

inputs.query — the retrieval query string
outputs.document_count — number of nodes returned
outputs.documents — list of {content_preview, score} dicts (200 char preview)

Query engine calls (decision_type = "query"):

inputs.query — the user query string
outputs.response — the final response text

Synthesis steps (decision_type = "synthesize"):

inputs.query — the query being synthesized
outputs.response — the synthesized response text

Event Types Ignored

Use event_starts_to_ignore and event_ends_to_ignore to skip specific LlamaIndex event types:

handler = BriefcaseLlamaIndexHandler(
    engagement_id="my-project",
    event_starts_to_ignore=["chunking", "templating"],
    event_ends_to_ignore=["chunking", "templating"],
)

Valid event type strings: "llm", "embedding", "retrieve", "query", "synthesize", "chunking", "reranking", "exception", "templating", "sub_question", "tree", "agent_step".

Advanced: Per-Instance Exporter

Override the global exporter for this handler:

from briefcase_ai.exporters import SplunkHECExporter

handler = BriefcaseLlamaIndexHandler(
    engagement_id="acme",
    workstream_id="rag",
    exporter=SplunkHECExporter(
        url="https://splunk.example.com:8088",
        token="your-hec-token",
    ),
)

Advanced: Export on Query Completion

When a top-level query event ends, the handler calls _trigger_export which passes the decision record to the configured exporter. The export runs in a background daemon thread when async_capture=True (the default).

To configure a global exporter:

from briefcase_ai.config import setup
from briefcase_ai.exporters import OTelExporter

setup(
    exporter=OTelExporter(
        endpoint="http://localhost:4317",
        service_name="my-rag-service",
    )
)

Troubleshooting

No decisions captured: Confirm you called Settings.callback_manager.add_handler(handler) before building the index or running queries. The callback manager must be set before LlamaIndex initialises its components.

Missing token usage: Token usage is populated only when the LLM response payload includes a token_count object. This depends on your LLM provider and the llama_index integration package version.

event_id mismatch: LlamaIndex generates a UUID event_id on on_event_start and passes the same ID to on_event_end. If on_event_start is called with an empty event_id, the handler generates one internally — ensure you pass the returned ID to on_event_end if calling the handler manually.

Installation​

Quick Setup with auto()​

Constructor​

Basic Usage​

Retrieving Decisions​

CapturedDecision Fields​

What Each Event Type Records​

Event Types Ignored​

Advanced: Per-Instance Exporter​

Advanced: Export on Query Completion​

Troubleshooting​

See Also​