lakeFS Integration
Use lakeFS integration to attach versioned artifact lineage to AI decisions.
Overview
The lakeFS integration instruments knowledge reads and artifact writes with commit-linked metadata. This makes decisions reproducible and traceable to the exact artifact state used at execution time.
Note: lakeFS is one of 8 supported VCS backends in Briefcase AI. For an overview of all available version control systems and how to choose the right one for your use case, see the Version Control Systems guide.
What Engineers Use It For
- Attach commit references to policy and data reads
- Persist generated artifacts with lineage metadata
- Query decision traces with artifact context attached
- Reconstruct historical behavior from versioned object state
Features
- Automatic SHA Capture: No manual tracking required
- Multiple Patterns: Context managers, decorators, or direct client
- OpenTelemetry Integration: Seamless span attribute capture
- Low Operational Friction: Works in existing lakeFS-backed workflows
- Mock Mode: Works without lakeFS for development
- Artifact Lineage Commits: Upload + commit generated artifacts with commit IDs and URIs
Usage Patterns
Context Manager (Recommended)
from briefcase_ai import versioned_context
with versioned_context(client, "example-policy-repo", "main") as lakefs:
policy = lakefs.read_object("policies/policy.pdf")
# Automatically tracked with commit SHA
Decorator
from briefcase_ai import versioned
@versioned(repository="example-policy-repo")
def evaluate_claim(claim_data, versioned_client):
policy = versioned_client.read_object("policies/policy.pdf")
return evaluate(claim_data, policy)
Direct Client
from briefcase_ai.integrations.lakefs import VersionedClient
lakefs = VersionedClient(
repository="example-policy-repo",
branch="main",
briefcase_client=client
)
policy = lakefs.read_object("policies/policy.pdf")
Artifact Lineage Client (Upload + Commit)
Use ArtifactLineageClient when you need to version generated artifacts (not just read objects):
from pathlib import Path
from briefcase_ai.integrations.lakefs import ArtifactLineageClient
lineage = ArtifactLineageClient.from_env(
repository="regulated-ai-demo",
branch="main",
)
commit = lineage.version_files(
files={
"AuditArtifacts/decision_memo.md": Path("output/decision_memo.md"),
"AuditArtifacts/result.json": Path("output/result.json"),
},
message="Publish onboarding decision artifacts",
metadata={"workflow": "broker_dealer_onboarding"},
)
print(commit.commit_id)
print(lineage.object_uri("AuditArtifacts/result.json", commit.commit_id))
Configuration
The SDK supports multiple configuration methods with the following priority:
- Explicit parameters (highest priority)
- briefcase_client.config dict
- Environment variables (fallback)
- Default endpoint (lowest priority)
Environment Variables
export LAKEFS_ENDPOINT="https://lakefs.example.com/api/v1"
export LAKEFS_ACCESS_KEY="your_access_key"
export LAKEFS_SECRET_KEY="your_secret_key"
With environment variables set, you can create clients without explicit credentials:
from briefcase_ai.integrations.lakefs import VersionedClient
# Automatically uses LAKEFS_* environment variables
client = VersionedClient(
repository="example-policy-repo",
branch="main",
briefcase_client=briefcase_client
)
Captured Metadata
All file accesses capture:
lakefs.commit.sha- Full commit SHAlakefs.commit.branch- Branch namelakefs.file.path- File path accessedlakefs.file.size- File size in byteslakefs.artifact.{path}- Per-file commit SHA
Artifact lineage commits additionally return:
commit_id(64-char hash in live or simulated mode)repository,branch,mode- committed object path mapping
- metadata payload attached to the commit
Rust Backend (LakeFSBackend)
The LakeFSBackend is a first-class StorageBackend implementation in briefcase-core that persists decision snapshots directly to lakeFS. It is separate from the Python VersionedClient/ArtifactLineageClient integrations described above—those operate at the knowledge-base read layer, while LakeFSBackend is the snapshot storage layer for the Rust SDK.
Environment Variables
LakeFSBackend reads configuration from a distinct set of environment variables prefixed with BRIEFCASE_LAKEFS_ (not the LAKEFS_* variables used by the Python SDK):
export BRIEFCASE_LAKEFS_ENDPOINT="https://lakefs.example.com" # no /api/v1 suffix
export BRIEFCASE_LAKEFS_ACCESS_KEY="your_access_key"
export BRIEFCASE_LAKEFS_SECRET_KEY="your_secret_key"
export BRIEFCASE_LAKEFS_REPOSITORY="my-repo"
export BRIEFCASE_LAKEFS_BRANCH="main"
Note: do not include /api/v1 in BRIEFCASE_LAKEFS_ENDPOINT—the client appends it internally.
Programmatic Construction
use briefcase_core::storage::lakefs::{LakeFSBackend, LakeFSConfig};
let config = LakeFSConfig::new(
"https://lakefs.example.com", // endpoint — no /api/v1 suffix
"my-repo",
"main",
"AKIAJVEKL2YLV5ZXZSFQ",
"your_secret_key",
);
let backend = LakeFSBackend::new(config);
Feature Flags
LakeFSBackend is gated behind the lakefs-storage feature, which is enabled by the combined storage + networking features:
[dependencies]
briefcase-core = { version = "*", features = ["async", "storage", "networking"] }
Upload Flow (lakeFS Cloud)
lakeFS Cloud requires a 3-step presigned staging flow for all object uploads. Direct PUT with a body is rejected. The client handles this transparently:
GET /repositories/{repo}/branches/{branch}/staging/backing?path=...&presign=true— obtain a presigned S3 URLPUT {presigned_url}— upload bytes directly to S3 (no lakeFS auth header)PUT /repositories/{repo}/branches/{branch}/objects?path=...— link the staged object with{ physical_address, checksum, size_bytes }
This is handled automatically by upload_object and is invisible to callers.
Integration Testing
Integration tests live in crates/core/src/storage/lakefs_integration_tests.rs and are marked #[ignore] so they are skipped in unit test runs. Run them against a real lakeFS instance:
# Reads credentials from ~/.lakectl.yaml (or ~/lakectl.yaml)
./scripts/test-lakefs-integration.sh <repository> [branch]
# Or via Make
make test-lakefs REPO=my-repo BRANCH=main
# Or with explicit env overrides
BRIEFCASE_LAKEFS_ENDPOINT=https://... \
BRIEFCASE_LAKEFS_ACCESS_KEY=... \
BRIEFCASE_LAKEFS_SECRET_KEY=... \
./scripts/test-lakefs-integration.sh my-repo main
The script strips the /api/v1 suffix from the lakectl endpoint automatically.
CI: The lakefs-integration GitHub Actions job runs only on pushes to main (not on PRs) and requires five repository secrets: BRIEFCASE_LAKEFS_ENDPOINT, BRIEFCASE_LAKEFS_ACCESS_KEY, BRIEFCASE_LAKEFS_SECRET_KEY, BRIEFCASE_LAKEFS_REPOSITORY, and BRIEFCASE_LAKEFS_BRANCH. Use ./scripts/set-lakefs-secrets.sh <repository> to push all five from your local ~/.lakectl.yaml in one step.
API Reference
See Integrations API for detailed API documentation.
Migration Note
briefcase remains available as a compatibility alias in 2.1.30.
Use briefcase_ai imports for all new code. Alias removal is planned for 2.1.31.