Skip to main content

LLM Interaction Playbook

This playbook provides deterministic, SDK-aware patterns for LLM workflows that need reproducible behavior, traceable decisions, and operational review loops.

1) Prompt Contract Template

Define a strict contract for each task:

task_id: claim_adjudication_v1
objective: Decide approve/deny with rationale
inputs:
- claim_text: string
- policy_refs: [string]
constraints:
- output must be valid JSON
- rationale must cite at least 1 policy reference
output_schema:
decision: enum[approve, deny]
confidence: float[0,1]
rationale: string
citations: [string]

Use the same task_id as DecisionSnapshot(function_name=...) so replay and analytics are stable.

2) Schema-First Output Enforcement

Always validate model output structure before business use.

import json
import briefcase_ai

briefcase_ai.init()

def parse_model_output(raw: str) -> dict:
payload = json.loads(raw)
required = {"decision", "confidence", "rationale", "citations"}
missing = required - payload.keys()
if missing:
raise ValueError(f"missing fields: {sorted(missing)}")
return payload

3) Validation-Before-Execution Pattern

If prompts reference versioned knowledge, validate first.

from briefcase.validation import PromptValidationEngine

validator = PromptValidationEngine()
report = validator.validate(prompt)
if report.status != "passed":
raise RuntimeError({"status": report.status, "errors": [e.message for e in report.errors]})

4) Capture Every Decision Snapshot

decision = briefcase_ai.DecisionSnapshot("claim_adjudication_v1")
decision.add_input(briefcase_ai.Input("prompt", prompt, "string"))
decision.add_input(briefcase_ai.Input("model", "gpt-4", "string"))
output = briefcase_ai.Output("response", raw_response, "string")
output.with_confidence(0.92)
decision.add_output(output)
decision.add_tag("workflow", "prior_auth")
decision.add_tag("tenant", "example_tenant")

Store it:

store = briefcase_ai.SqliteBackend.in_memory()
decision_id = store.save_decision(decision)

5) Correlation IDs and Tracing

For multi-step chains, attach one correlation ID across all steps.

from briefcase.correlation import briefcase_workflow

with briefcase_workflow("prior_auth", client) as workflow:
# agent A -> agent B -> decision capture
pass

Recommended tags/attributes:

  • workflow
  • correlation_id
  • tenant
  • model
  • validation.mode

6) Retry and Error Envelope Standard

Return structured failures, not free-form strings:

{
"ok": false,
"error_type": "validation_failed",
"retryable": false,
"details": {
"code": 409,
"reference": "policies/medicare.pdf"
}
}

Retry policy defaults:

  • transport timeouts: retry with exponential backoff
  • schema validation failures: do not blindly retry; regenerate with stricter system prompt
  • knowledge resolution failures: fail fast and escalate

7) Deterministic Replay Regression Loop

Use replay after prompt/model changes:

engine = briefcase_ai.ReplayEngine(store)
result = engine.replay(decision_id, "strict")

For change analysis across revisions:

  • replay representative snapshots
  • compare outputs and confidence
  • gate rollout if drift exceeds threshold

8) Minimal Server API Flow

  1. POST /api/v1/decisions
  2. POST /api/v1/replay/{id}
  3. POST /api/v1/diff

Use stable IDs and tags in all requests for joinability.

9) Minimal WASM Flow

import { init, JsDecisionSnapshot, JsMemoryStorage } from "briefcase-wasm";

await init();
const d = new JsDecisionSnapshot("chat_completion");
d.add_input("prompt", "Summarize this", "string");
d.add_output("response", "Summary", "string");
const mem = new JsMemoryStorage();
const id = mem.save_decision(d);

10) Safe Defaults

  • Use explicit schemas for every LLM output.
  • Validate prompt references before execution when using external knowledge.
  • Record every request/response pair with a snapshot.
  • Use strict replay mode for release candidates.
  • Keep one canonical prompt contract per function name.

Anti-Patterns

  • Parsing natural-language output without schema checks.
  • Mixing multiple task intents under one function_name.
  • Running retries that hide deterministic validation failures.
  • Using transient identifiers that prevent replay correlation.