Skip to main content

lakeFS Integration

Use lakeFS integration to attach versioned artifact lineage to AI decisions.

Overview

The lakeFS integration instruments knowledge reads and artifact writes with commit-linked metadata. This makes decisions reproducible and traceable to the exact artifact state used at execution time.

Note: lakeFS is one of 8 supported VCS backends in Briefcase AI. For an overview of all available version control systems and how to choose the right one for your use case, see the Version Control Systems guide.

What Engineers Use It For

  • Attach commit references to policy and data reads
  • Persist generated artifacts with lineage metadata
  • Query decision traces with artifact context attached
  • Reconstruct historical behavior from versioned object state

Features

  • Automatic SHA Capture: No manual tracking required
  • Multiple Patterns: Context managers, decorators, or direct client
  • OpenTelemetry Integration: Seamless span attribute capture
  • Low Operational Friction: Works in existing lakeFS-backed workflows
  • Mock Mode: Works without lakeFS for development
  • Artifact Lineage Commits: Upload + commit generated artifacts with commit IDs and URIs

Usage Patterns

from briefcase_ai import versioned_context

with versioned_context(client, "example-policy-repo", "main") as lakefs:
policy = lakefs.read_object("policies/policy.pdf")
# Automatically tracked with commit SHA

Decorator

from briefcase_ai import versioned

@versioned(repository="example-policy-repo")
def evaluate_claim(claim_data, versioned_client):
policy = versioned_client.read_object("policies/policy.pdf")
return evaluate(claim_data, policy)

Direct Client

from briefcase_ai.integrations.lakefs import VersionedClient

lakefs = VersionedClient(
repository="example-policy-repo",
branch="main",
briefcase_client=client
)

policy = lakefs.read_object("policies/policy.pdf")

Artifact Lineage Client (Upload + Commit)

Use ArtifactLineageClient when you need to version generated artifacts (not just read objects):

from pathlib import Path
from briefcase_ai.integrations.lakefs import ArtifactLineageClient

lineage = ArtifactLineageClient.from_env(
repository="regulated-ai-demo",
branch="main",
)

commit = lineage.version_files(
files={
"AuditArtifacts/decision_memo.md": Path("output/decision_memo.md"),
"AuditArtifacts/result.json": Path("output/result.json"),
},
message="Publish onboarding decision artifacts",
metadata={"workflow": "broker_dealer_onboarding"},
)

print(commit.commit_id)
print(lineage.object_uri("AuditArtifacts/result.json", commit.commit_id))

Configuration

The SDK supports multiple configuration methods with the following priority:

  1. Explicit parameters (highest priority)
  2. briefcase_client.config dict
  3. Environment variables (fallback)
  4. Default endpoint (lowest priority)

Environment Variables

export LAKEFS_ENDPOINT="https://lakefs.example.com/api/v1"
export LAKEFS_ACCESS_KEY="your_access_key"
export LAKEFS_SECRET_KEY="your_secret_key"

With environment variables set, you can create clients without explicit credentials:

from briefcase_ai.integrations.lakefs import VersionedClient

# Automatically uses LAKEFS_* environment variables
client = VersionedClient(
repository="example-policy-repo",
branch="main",
briefcase_client=briefcase_client
)

Captured Metadata

All file accesses capture:

  • lakefs.commit.sha - Full commit SHA
  • lakefs.commit.branch - Branch name
  • lakefs.file.path - File path accessed
  • lakefs.file.size - File size in bytes
  • lakefs.artifact.{path} - Per-file commit SHA

Artifact lineage commits additionally return:

  • commit_id (64-char hash in live or simulated mode)
  • repository, branch, mode
  • committed object path mapping
  • metadata payload attached to the commit

Rust Backend (LakeFSBackend)

The LakeFSBackend is a first-class StorageBackend implementation in briefcase-core that persists decision snapshots directly to lakeFS. It is separate from the Python VersionedClient/ArtifactLineageClient integrations described above—those operate at the knowledge-base read layer, while LakeFSBackend is the snapshot storage layer for the Rust SDK.

Environment Variables

LakeFSBackend reads configuration from a distinct set of environment variables prefixed with BRIEFCASE_LAKEFS_ (not the LAKEFS_* variables used by the Python SDK):

export BRIEFCASE_LAKEFS_ENDPOINT="https://lakefs.example.com"   # no /api/v1 suffix
export BRIEFCASE_LAKEFS_ACCESS_KEY="your_access_key"
export BRIEFCASE_LAKEFS_SECRET_KEY="your_secret_key"
export BRIEFCASE_LAKEFS_REPOSITORY="my-repo"
export BRIEFCASE_LAKEFS_BRANCH="main"

Note: do not include /api/v1 in BRIEFCASE_LAKEFS_ENDPOINT—the client appends it internally.

Programmatic Construction

use briefcase_core::storage::lakefs::{LakeFSBackend, LakeFSConfig};

let config = LakeFSConfig::new(
"https://lakefs.example.com", // endpoint — no /api/v1 suffix
"my-repo",
"main",
"AKIAJVEKL2YLV5ZXZSFQ",
"your_secret_key",
);

let backend = LakeFSBackend::new(config);

Feature Flags

LakeFSBackend is gated behind the lakefs-storage feature, which is enabled by the combined storage + networking features:

[dependencies]
briefcase-core = { version = "*", features = ["async", "storage", "networking"] }

Upload Flow (lakeFS Cloud)

lakeFS Cloud requires a 3-step presigned staging flow for all object uploads. Direct PUT with a body is rejected. The client handles this transparently:

  1. GET /repositories/{repo}/branches/{branch}/staging/backing?path=...&presign=true — obtain a presigned S3 URL
  2. PUT {presigned_url} — upload bytes directly to S3 (no lakeFS auth header)
  3. PUT /repositories/{repo}/branches/{branch}/objects?path=... — link the staged object with { physical_address, checksum, size_bytes }

This is handled automatically by upload_object and is invisible to callers.

Integration Testing

Integration tests live in crates/core/src/storage/lakefs_integration_tests.rs and are marked #[ignore] so they are skipped in unit test runs. Run them against a real lakeFS instance:

# Reads credentials from ~/.lakectl.yaml (or ~/lakectl.yaml)
./scripts/test-lakefs-integration.sh <repository> [branch]

# Or via Make
make test-lakefs REPO=my-repo BRANCH=main

# Or with explicit env overrides
BRIEFCASE_LAKEFS_ENDPOINT=https://... \
BRIEFCASE_LAKEFS_ACCESS_KEY=... \
BRIEFCASE_LAKEFS_SECRET_KEY=... \
./scripts/test-lakefs-integration.sh my-repo main

The script strips the /api/v1 suffix from the lakectl endpoint automatically.

CI: The lakefs-integration GitHub Actions job runs only on pushes to main (not on PRs) and requires five repository secrets: BRIEFCASE_LAKEFS_ENDPOINT, BRIEFCASE_LAKEFS_ACCESS_KEY, BRIEFCASE_LAKEFS_SECRET_KEY, BRIEFCASE_LAKEFS_REPOSITORY, and BRIEFCASE_LAKEFS_BRANCH. Use ./scripts/set-lakefs-secrets.sh <repository> to push all five from your local ~/.lakectl.yaml in one step.

API Reference

See Integrations API for detailed API documentation.

Migration Note

briefcase remains available as a compatibility alias in 2.1.30. Use briefcase_ai imports for all new code. Alias removal is planned for 2.1.31.