Skip to main content

Version Control Systems

Use this feature to plug regulated AI workflows into the version-control backend that fits your environment.

Overview

Briefcase AI provides native support for multiple version-control and data-versioning systems through a unified abstraction layer. Engineers can switch providers without changing application-level decision capture and replay logic.

What Engineers Use It For

  • Store decision context in the backend already used by platform teams
  • Keep one storage API while changing provider implementations
  • Run the same replay and debug workflow across local and remote backends
  • Standardize artifact lineage across heterogeneous environments

The supported VCS backends include:

  • SQLite — Lightweight, file-based versioning (default)
  • lakeFS — Data lake versioning with Iceberg/Delta support
  • DVC — Data Version Control for ML workflows
  • Nessie — REST API-based data lake versioning (Project Nessie)
  • Pachyderm — ML pipeline versioning and lineage
  • ArtiVC — Simple artifact versioning
  • DuckLake — DuckDB-based analytics versioning
  • Iceberg — Apache Iceberg REST Catalog with time travel
  • Git+LFS — Git with Large File Storage for binary artifacts

How It Works

VCS Abstraction Model

Briefcase AI uses a two-layer abstraction:

  1. VcsProvider trait — Interface for raw object I/O (write_object, read_object, list_objects, delete_object, create_version).
  2. VcsStorageBackend<P: VcsProvider> — Generic adapter implementing full StorageBackend behavior (snapshot/decision CRUD, query filtering, batching) on top of any provider.
VCS abstraction flow from app code to VCS providers through VcsStorageBackend

Briefcase AI separates application storage behavior from provider-specific APIs.

Core Concepts

  • VcsProvider Trait: each backend implements a small set of raw byte I/O and versioning methods.
  • VcsProviderConfig: unified config with builder pattern and from_env() loader.
  • Feature Flags: compile-time backend selection (for example vcs-nessie, vcs-all).
  • Environment Variables: runtime configuration via BRIEFCASE_{PROVIDER}_{KEY}.

Provider Comparison

ProviderProtocolBest ForAuth MethodsOperational Fit
DVCGit + filesystemLocal data science workflowsLocal Git identityDeveloper workstations, local CI
NessieREST API (v2)Data lake versioning (Iceberg/Delta)Service-specific token/configManaged catalog environments
PachydermREST APIML pipeline versioning and lineageService-specific token/configPipeline orchestration stacks
ArtiVCService/SDK dependentArtifact versioningService-specific token/configArtifact registry integrations
DuckLakeLocal DuckDBDuckDB-based analyticsNone (local DB file)Local analytics + embedded workloads
IcebergCatalog APITable-format time travel queriesCatalog-specific auth/configWarehouse and lakehouse platforms
Git+LFSGit + LFSLarge-file versioningLocal Git identityRepo-centric artifact workflows
lakeFSREST APIMulti-cloud data lake versioningAccess key + secret keyData platform and governance pipelines
SQLiteEmbedded DBLocal development and testingFile-basedLightweight single-node scenarios

Quick Setup

Feature Flags

Enable the VCS backends you need in Cargo.toml:

[dependencies]
briefcase-core = { version = "*", features = ["vcs-nessie"] }

# Or enable all available VCS providers:
briefcase-core = { version = "*", features = ["vcs-all"] }

Individual feature flags: vcs-dvc, vcs-nessie, vcs-pachyderm, vcs-artivc, vcs-ducklake, vcs-iceberg, vcs-gitlfs. All depend on vcs-storage (which pulls in async and networking).

Environment Variables

Each provider reads configuration from BRIEFCASE_{PROVIDER}_{KEY} variables via VcsProviderConfig::from_env(). Standard keys: ENDPOINT, TOKEN, ACCESS_KEY, SECRET_KEY, REPOSITORY, BRANCH.

DVC

export BRIEFCASE_DVC_ENDPOINT="/path/to/dvc/repo"
export BRIEFCASE_DVC_REPOSITORY="my-repo"
export BRIEFCASE_DVC_BRANCH="main"

Nessie

export BRIEFCASE_NESSIE_ENDPOINT="https://nessie.example.com/api/v2"
export BRIEFCASE_NESSIE_TOKEN="your-bearer-token"
export BRIEFCASE_NESSIE_BRANCH="main"

Pachyderm

export BRIEFCASE_PACHYDERM_ENDPOINT="https://pachd.example.com:30650"
export BRIEFCASE_PACHYDERM_TOKEN="your-bearer-token"
export BRIEFCASE_PACHYDERM_REPOSITORY="my-project"

ArtiVC

export BRIEFCASE_ARTIVC_ENDPOINT="/path/to/artivc/repo"
export BRIEFCASE_ARTIVC_REPOSITORY="my-artifacts"

DuckLake

export BRIEFCASE_DUCKLAKE_ENDPOINT="/path/to/storage/root"
export BRIEFCASE_DUCKLAKE_REPOSITORY="my-analytics"

Iceberg

export BRIEFCASE_ICEBERG_ENDPOINT="https://iceberg-catalog.example.com/v1"
export BRIEFCASE_ICEBERG_TOKEN="your-bearer-token"
export BRIEFCASE_ICEBERG_REPOSITORY="my-warehouse"
export BRIEFCASE_ICEBERG_BRANCH="main"

Git+LFS

export BRIEFCASE_GITLFS_ENDPOINT="/path/to/git/repo"
export BRIEFCASE_GITLFS_BRANCH="main"

lakeFS

export LAKEFS_ENDPOINT="https://lakefs.example.com/api/v1"
export LAKEFS_ACCESS_KEY="your_access_key"
export LAKEFS_SECRET_KEY="your_secret_key"

Rust Usage

use briefcase_core::storage::vcs::{create_vcs_backend, create_vcs_backend_from_env, VcsProviderConfig};
use briefcase_core::storage::StorageBackend;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Option 1: from environment variables
let backend = create_vcs_backend_from_env("nessie").await?;

// Option 2: explicit config
let config = VcsProviderConfig::new("nessie")
.with_endpoint("https://nessie.example.com/api/v2")
.with_token("my-token")
.with_branch("main");
let backend = create_vcs_backend(config).await?;

let snapshot_id = backend.save_snapshot(&snapshot).await?;
let loaded = backend.load_snapshot(&snapshot_id).await?;
let results = backend.query_snapshots(&query).await?;

backend.flush().await?;

Ok(())
}

Dynamic Provider Selection

Select the provider at runtime without recompilation:

let provider_name = std::env::var("VCS_PROVIDER").unwrap_or("sqlite".to_string());
let backend = create_vcs_backend_from_env(&provider_name).await?;

Python Usage

from briefcase_ai.integrations.vcs import DvcClient, NessieClient

nessie = NessieClient(
endpoint="https://nessie.example.com/api/v2",
token="your-bearer-token",
repository="my-catalog",
branch="main",
)

with nessie:
data = nessie.read_object("data/training.csv")
nessie.write_object("results/output.json", result_bytes)
nessie.create_version("Updated results")

# DVC client for local workflows
# dvc = DvcClient(repository="my-ml-repo", branch="main", briefcase_client=client)

Available clients: DvcClient, NessieClient, PachydermClient, ArtiVCClient, DuckLakeClient, IcebergClient, GitLFSClient. All inherit from VcsClientBase in briefcase_ai.integrations.vcs.base.

Adding Custom Providers

See Writing a Custom VCS Provider for a step-by-step guide to implementing your own backend.

API Reference

See Integrations API for detailed API documentation on each provider interface.