Version Control Systems
Use this feature to plug regulated AI workflows into the version-control backend that fits your environment.
Overview
Briefcase AI provides native support for multiple version-control and data-versioning systems through a unified abstraction layer. Engineers can switch providers without changing application-level decision capture and replay logic.
What Engineers Use It For
- Store decision context in the backend already used by platform teams
- Keep one storage API while changing provider implementations
- Run the same replay and debug workflow across local and remote backends
- Standardize artifact lineage across heterogeneous environments
The supported VCS backends include:
- SQLite — Lightweight, file-based versioning (default)
- lakeFS — Data lake versioning with Iceberg/Delta support
- DVC — Data Version Control for ML workflows
- Nessie — REST API-based data lake versioning (Project Nessie)
- Pachyderm — ML pipeline versioning and lineage
- ArtiVC — Simple artifact versioning
- DuckLake — DuckDB-based analytics versioning
- Iceberg — Apache Iceberg REST Catalog with time travel
- Git+LFS — Git with Large File Storage for binary artifacts
How It Works
VCS Abstraction Model
Briefcase AI uses a two-layer abstraction:
VcsProvidertrait — Interface for raw object I/O (write_object,read_object,list_objects,delete_object,create_version).VcsStorageBackend<P: VcsProvider>— Generic adapter implementing fullStorageBackendbehavior (snapshot/decision CRUD, query filtering, batching) on top of any provider.
Briefcase AI separates application storage behavior from provider-specific APIs.
Core Concepts
- VcsProvider Trait: each backend implements a small set of raw byte I/O and versioning methods.
- VcsProviderConfig: unified config with builder pattern and
from_env()loader. - Feature Flags: compile-time backend selection (for example
vcs-nessie,vcs-all). - Environment Variables: runtime configuration via
BRIEFCASE_{PROVIDER}_{KEY}.
Provider Comparison
| Provider | Protocol | Best For | Auth Methods | Operational Fit |
|---|---|---|---|---|
| DVC | Git + filesystem | Local data science workflows | Local Git identity | Developer workstations, local CI |
| Nessie | REST API (v2) | Data lake versioning (Iceberg/Delta) | Service-specific token/config | Managed catalog environments |
| Pachyderm | REST API | ML pipeline versioning and lineage | Service-specific token/config | Pipeline orchestration stacks |
| ArtiVC | Service/SDK dependent | Artifact versioning | Service-specific token/config | Artifact registry integrations |
| DuckLake | Local DuckDB | DuckDB-based analytics | None (local DB file) | Local analytics + embedded workloads |
| Iceberg | Catalog API | Table-format time travel queries | Catalog-specific auth/config | Warehouse and lakehouse platforms |
| Git+LFS | Git + LFS | Large-file versioning | Local Git identity | Repo-centric artifact workflows |
| lakeFS | REST API | Multi-cloud data lake versioning | Access key + secret key | Data platform and governance pipelines |
| SQLite | Embedded DB | Local development and testing | File-based | Lightweight single-node scenarios |
Quick Setup
Feature Flags
Enable the VCS backends you need in Cargo.toml:
[dependencies]
briefcase-core = { version = "*", features = ["vcs-nessie"] }
# Or enable all available VCS providers:
briefcase-core = { version = "*", features = ["vcs-all"] }
Individual feature flags: vcs-dvc, vcs-nessie, vcs-pachyderm, vcs-artivc, vcs-ducklake, vcs-iceberg, vcs-gitlfs. All depend on vcs-storage (which pulls in async and networking).
Environment Variables
Each provider reads configuration from BRIEFCASE_{PROVIDER}_{KEY} variables via VcsProviderConfig::from_env(). Standard keys: ENDPOINT, TOKEN, ACCESS_KEY, SECRET_KEY, REPOSITORY, BRANCH.
DVC
export BRIEFCASE_DVC_ENDPOINT="/path/to/dvc/repo"
export BRIEFCASE_DVC_REPOSITORY="my-repo"
export BRIEFCASE_DVC_BRANCH="main"
Nessie
export BRIEFCASE_NESSIE_ENDPOINT="https://nessie.example.com/api/v2"
export BRIEFCASE_NESSIE_TOKEN="your-bearer-token"
export BRIEFCASE_NESSIE_BRANCH="main"
Pachyderm
export BRIEFCASE_PACHYDERM_ENDPOINT="https://pachd.example.com:30650"
export BRIEFCASE_PACHYDERM_TOKEN="your-bearer-token"
export BRIEFCASE_PACHYDERM_REPOSITORY="my-project"
ArtiVC
export BRIEFCASE_ARTIVC_ENDPOINT="/path/to/artivc/repo"
export BRIEFCASE_ARTIVC_REPOSITORY="my-artifacts"
DuckLake
export BRIEFCASE_DUCKLAKE_ENDPOINT="/path/to/storage/root"
export BRIEFCASE_DUCKLAKE_REPOSITORY="my-analytics"
Iceberg
export BRIEFCASE_ICEBERG_ENDPOINT="https://iceberg-catalog.example.com/v1"
export BRIEFCASE_ICEBERG_TOKEN="your-bearer-token"
export BRIEFCASE_ICEBERG_REPOSITORY="my-warehouse"
export BRIEFCASE_ICEBERG_BRANCH="main"
Git+LFS
export BRIEFCASE_GITLFS_ENDPOINT="/path/to/git/repo"
export BRIEFCASE_GITLFS_BRANCH="main"
lakeFS
export LAKEFS_ENDPOINT="https://lakefs.example.com/api/v1"
export LAKEFS_ACCESS_KEY="your_access_key"
export LAKEFS_SECRET_KEY="your_secret_key"
Rust Usage
use briefcase_core::storage::vcs::{create_vcs_backend, create_vcs_backend_from_env, VcsProviderConfig};
use briefcase_core::storage::StorageBackend;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Option 1: from environment variables
let backend = create_vcs_backend_from_env("nessie").await?;
// Option 2: explicit config
let config = VcsProviderConfig::new("nessie")
.with_endpoint("https://nessie.example.com/api/v2")
.with_token("my-token")
.with_branch("main");
let backend = create_vcs_backend(config).await?;
let snapshot_id = backend.save_snapshot(&snapshot).await?;
let loaded = backend.load_snapshot(&snapshot_id).await?;
let results = backend.query_snapshots(&query).await?;
backend.flush().await?;
Ok(())
}
Dynamic Provider Selection
Select the provider at runtime without recompilation:
let provider_name = std::env::var("VCS_PROVIDER").unwrap_or("sqlite".to_string());
let backend = create_vcs_backend_from_env(&provider_name).await?;
Python Usage
from briefcase_ai.integrations.vcs import DvcClient, NessieClient
nessie = NessieClient(
endpoint="https://nessie.example.com/api/v2",
token="your-bearer-token",
repository="my-catalog",
branch="main",
)
with nessie:
data = nessie.read_object("data/training.csv")
nessie.write_object("results/output.json", result_bytes)
nessie.create_version("Updated results")
# DVC client for local workflows
# dvc = DvcClient(repository="my-ml-repo", branch="main", briefcase_client=client)
Available clients: DvcClient, NessieClient, PachydermClient, ArtiVCClient, DuckLakeClient, IcebergClient, GitLFSClient. All inherit from VcsClientBase in briefcase_ai.integrations.vcs.base.
Adding Custom Providers
See Writing a Custom VCS Provider for a step-by-step guide to implementing your own backend.
API Reference
See Integrations API for detailed API documentation on each provider interface.