LLM Interaction Playbook
This playbook provides deterministic, SDK-aware patterns for LLM workflows that need reproducible behavior, traceable decisions, and operational review loops.
1) Prompt Contract Template
Define a strict contract for each task:
task_id: claim_adjudication_v1
objective: Decide approve/deny with rationale
inputs:
- claim_text: string
- policy_refs: [string]
constraints:
- output must be valid JSON
- rationale must cite at least 1 policy reference
output_schema:
decision: enum[approve, deny]
confidence: float[0,1]
rationale: string
citations: [string]
Use the same task_id as DecisionSnapshot(function_name=...) so replay and analytics are stable.
2) Schema-First Output Enforcement
Always validate model output structure before business use.
import json
import briefcase_ai
briefcase_ai.init()
def parse_model_output(raw: str) -> dict:
payload = json.loads(raw)
required = {"decision", "confidence", "rationale", "citations"}
missing = required - payload.keys()
if missing:
raise ValueError(f"missing fields: {sorted(missing)}")
return payload
3) Validation-Before-Execution Pattern
If prompts reference versioned knowledge, validate first.
from briefcase.validation import PromptValidationEngine
validator = PromptValidationEngine()
report = validator.validate(prompt)
if report.status != "passed":
raise RuntimeError({"status": report.status, "errors": [e.message for e in report.errors]})
4) Capture Every Decision Snapshot
decision = briefcase_ai.DecisionSnapshot("claim_adjudication_v1")
decision.add_input(briefcase_ai.Input("prompt", prompt, "string"))
decision.add_input(briefcase_ai.Input("model", "gpt-4", "string"))
output = briefcase_ai.Output("response", raw_response, "string")
output.with_confidence(0.92)
decision.add_output(output)
decision.add_tag("workflow", "prior_auth")
decision.add_tag("tenant", "example_tenant")
Store it:
store = briefcase_ai.SqliteBackend.in_memory()
decision_id = store.save_decision(decision)
5) Correlation IDs and Tracing
For multi-step chains, attach one correlation ID across all steps.
from briefcase.correlation import briefcase_workflow
with briefcase_workflow("prior_auth", client) as workflow:
# agent A -> agent B -> decision capture
pass
Recommended tags/attributes:
workflowcorrelation_idtenantmodelvalidation.mode
6) Retry and Error Envelope Standard
Return structured failures, not free-form strings:
{
"ok": false,
"error_type": "validation_failed",
"retryable": false,
"details": {
"code": 409,
"reference": "policies/medicare.pdf"
}
}
Retry policy defaults:
- transport timeouts: retry with exponential backoff
- schema validation failures: do not blindly retry; regenerate with stricter system prompt
- knowledge resolution failures: fail fast and escalate
7) Deterministic Replay Regression Loop
Use replay after prompt/model changes:
engine = briefcase_ai.ReplayEngine(store)
result = engine.replay(decision_id, "strict")
For change analysis across revisions:
- replay representative snapshots
- compare outputs and confidence
- gate rollout if drift exceeds threshold
8) Minimal Server API Flow
POST /api/v1/decisionsPOST /api/v1/replay/{id}POST /api/v1/diff
Use stable IDs and tags in all requests for joinability.
9) Minimal WASM Flow
import { init, JsDecisionSnapshot, JsMemoryStorage } from "briefcase-wasm";
await init();
const d = new JsDecisionSnapshot("chat_completion");
d.add_input("prompt", "Summarize this", "string");
d.add_output("response", "Summary", "string");
const mem = new JsMemoryStorage();
const id = mem.save_decision(d);
10) Safe Defaults
- Use explicit schemas for every LLM output.
- Validate prompt references before execution when using external knowledge.
- Record every request/response pair with a snapshot.
- Use strict replay mode for release candidates.
- Keep one canonical prompt contract per function name.
Anti-Patterns
- Parsing natural-language output without schema checks.
- Mixing multiple task intents under one
function_name. - Running retries that hide deterministic validation failures.
- Using transient identifiers that prevent replay correlation.