Skip to main content

Provider Development Guide

Step-by-step reference for writing new built-in providers. Built-in providers live in src/provider/ and are compiled into the daemon. For a lower-effort path using shell scripts, see §6.


1. The Provider Trait

Every provider implements this trait (defined in src/provider/mod.rs):

pub trait Provider: Send + Sync {
fn metadata(&self) -> ProviderMetadata;
fn execute(&self, path: Option<&str>) -> Option<ProviderResult>;
}

metadata() is called at registration time and on every comb list request. It must be fast and allocation-light (it currently allocates; a future optimisation may switch to Cow<'static, str>). Return a ProviderMetadata describing:

  • name: the provider's key used in comb get <name>.<field>
  • fields: a list of FieldSchema { name, field_type } describing what fields execute() will populate
  • invalidation: when the cached value should be refreshed (see §3)
  • global: true if the provider ignores the path argument (e.g., hostname, user); false if it is path-scoped (e.g., git, terraform)

execute(path) runs the provider and returns the result. It is called on a blocking thread pool (tokio::task::spawn_blocking), so it may safely call std::process::Command, std::fs::read_to_string, and other blocking operations. Return None to indicate that no value is available (the cache will not be updated). Return Some(ProviderResult) on success.

ProviderResult is a HashMap<String, Value> wrapper. Insert fields with result.insert("fieldname", Value::String("...")).


2. Step-by-Step: Writing a "docker context" Provider

This section builds a complete provider that reports the current Docker context name and endpoint.

Docker stores its active context in ~/.docker/config.json (field "currentContext") and context details in ~/.docker/contexts/meta/<hash>/meta.json. Reading these files directly is ~1µs, versus ~30ms for docker context inspect.

2.1 Create the file

Create src/provider/dockercontext.rs:

use crate::provider::{
FieldSchema, FieldType, InvalidationStrategy, Provider, ProviderMetadata,
ProviderResult, Value,
};
use std::path::PathBuf;

pub struct DockerContextProvider;

impl Provider for DockerContextProvider {
fn metadata(&self) -> ProviderMetadata {
ProviderMetadata {
name: "dockercontext".to_string(),
fields: vec![
FieldSchema { name: "name".to_string(), field_type: FieldType::String },
FieldSchema { name: "endpoint".to_string(), field_type: FieldType::String },
],
invalidation: InvalidationStrategy::Watch {
patterns: vec![
home_subpath(".docker/config.json"),
home_subpath(".docker/contexts"),
],
fallback_poll_secs: Some(60),
},
global: true,
}
}

fn execute(&self, _path: Option<&str>) -> Option<ProviderResult> {
let home = std::env::var("HOME").ok()?;
let config_path = PathBuf::from(&home).join(".docker").join("config.json");

let config_text = std::fs::read_to_string(&config_path).ok()?;
let config: serde_json::Value = serde_json::from_str(&config_text).ok()?;

let context_name = config
.get("currentContext")
.and_then(|v| v.as_str())
.unwrap_or("default")
.to_string();

// Look up the endpoint from the context metadata.
let endpoint = read_context_endpoint(&home, &context_name)
.unwrap_or_else(|| "unix:///var/run/docker.sock".to_string());

let mut result = ProviderResult::new();
result.insert("name", Value::String(context_name));
result.insert("endpoint", Value::String(endpoint));
Some(result)
}
}

fn home_subpath(rel: &str) -> String {
std::env::var("HOME")
.map(|h| format!("{}/{}", h, rel))
.unwrap_or_else(|_| rel.to_string())
}

fn read_context_endpoint(home: &str, context_name: &str) -> Option<String> {
if context_name == "default" {
return None;
}

// Docker names contexts by SHA256 of the name; iterate the meta directory.
let meta_dir = PathBuf::from(home).join(".docker").join("contexts").join("meta");
for entry in std::fs::read_dir(&meta_dir).ok()? {
let entry = entry.ok()?;
let meta_path = entry.path().join("meta.json");
let text = std::fs::read_to_string(&meta_path).ok()?;
let meta: serde_json::Value = serde_json::from_str(&text).ok()?;

if meta.get("Name").and_then(|v| v.as_str()) == Some(context_name) {
return meta
.pointer("/Endpoints/docker/Host")
.and_then(|v| v.as_str())
.map(|s| s.to_string());
}
}
None
}

2.2 Register the provider

Add the module to src/provider/mod.rs:

pub mod dockercontext;

Add the import and registration to src/provider/registry.rs:

use crate::provider::dockercontext::DockerContextProvider;

// In with_defaults() and in the builtins vec inside with_config():
("dockercontext", Box::new(DockerContextProvider)),

2.3 Config (optional, for disabling)

No config entry is required. Users can disable it via ~/.config/beachcomber/config.toml:

[providers.dockercontext]
enabled = false

2.4 Use it

comb get dockercontext.name
comb get dockercontext.endpoint

2.5 Write a test

Add a test module at the bottom of src/provider/dockercontext.rs:

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn metadata_is_valid() {
let provider = DockerContextProvider;
let meta = provider.metadata();
assert_eq!(meta.name, "dockercontext");
assert!(meta.global);
assert_eq!(meta.fields.len(), 2);
assert!(meta.fields.iter().any(|f| f.name == "name"));
assert!(meta.fields.iter().any(|f| f.name == "endpoint"));
}

#[test]
fn returns_none_without_docker_config() {
// Point HOME at a temp directory with no .docker/ directory.
let dir = tempfile::tempdir().unwrap();
std::env::set_var("HOME", dir.path());
let provider = DockerContextProvider;
let result = provider.execute(None);
assert!(result.is_none());
// Restore HOME to avoid contaminating other tests.
std::env::remove_var("HOME");
}

#[test]
fn reads_default_context() {
let dir = tempfile::tempdir().unwrap();
let docker_dir = dir.path().join(".docker");
std::fs::create_dir_all(&docker_dir).unwrap();
std::fs::write(
docker_dir.join("config.json"),
r#"{"auths": {}, "currentContext": "default"}"#,
).unwrap();

std::env::set_var("HOME", dir.path());
let provider = DockerContextProvider;
let result = provider.execute(None).unwrap();
assert_eq!(
result.get("name"),
Some(&Value::String("default".to_string()))
);
std::env::remove_var("HOME");
}
}

Run with:

cargo test -p beachcomber provider::dockercontext

3. InvalidationStrategy: Choosing the Right Variant

pub enum InvalidationStrategy {
Once,
Poll { interval_secs: u64, floor_secs: u64 },
Watch { patterns: Vec<String>, fallback_poll_secs: Option<u64> },
WatchAndPoll { patterns: Vec<String>, interval_secs: u64, floor_secs: u64 },
}

Once — compute once at daemon startup, never again. Use for values that cannot change without a daemon restart: hostname, current user, static environment facts. Cost: one execution at startup, zero ongoing overhead.

// hostname: never changes while daemon is running
invalidation: InvalidationStrategy::Once,

Poll { interval_secs, floor_secs } — re-execute on a timer. Use when there is no file to watch that reliably reflects state changes. floor_secs prevents consumer-requested poll intervals from going below a minimum (usually 1). The interval is in seconds; interval_secs: 30 means re-run every 30 seconds.

// battery level: no file to watch reliably, poll every 30s
invalidation: InvalidationStrategy::Poll {
interval_secs: 30,
floor_secs: 1,
},

Watch { patterns, fallback_poll_secs } — re-execute when the filesystem paths in patterns change. Use when there is a file or directory that is written whenever the state changes. fallback_poll_secs is used as a poll interval on systems where file watching fails or is unavailable. Set it to Some(60) unless freshness is critical.

// kubecontext: re-run when kubeconfig is written
invalidation: InvalidationStrategy::Watch {
patterns: vec!["/home/user/.kube/config".to_string()],
fallback_poll_secs: Some(60),
},

In practice, patterns should use absolute paths where possible. For paths relative to $HOME, expand them in metadata() using std::env::var("HOME") (see the dockercontext example above).

WatchAndPoll { patterns, interval_secs, floor_secs } — watch files AND poll on a timer. Use when file watching catches most changes quickly but some changes don't touch a watchable file (e.g., network-propagated git changes that arrive via git fetch). The git provider uses this: it watches .git for local operations and polls every 60 seconds to catch remote state.

// git: watch .git for local commits/checkouts, poll every 60s for remote changes
invalidation: InvalidationStrategy::WatchAndPoll {
patterns: vec![".git".to_string()],
interval_secs: 60,
floor_secs: 1,
},

Note that for path-scoped providers (e.g., git), patterns like ".git" are relative to the queried path and the FsWatcher receives the resolved absolute path when demand is first registered. For global providers, patterns should be absolute paths.


4. Performance Guidelines

Provider execution happens on tokio's blocking thread pool. Slow providers delay cache freshness but do not block the scheduler loop. Still, keep providers fast. The tier list from docs/performance.md:

TierTargetMethod
Nanosecond (<1µs)user, hostname, kubecontext, gcloud, awslibc calls, env vars, file reads + line scan
Microsecond (1-100µs)terraform, python, direnv (no binary)File existence checks + small reads
Millisecond (1-10ms)git, network, batteryAt most one process spawn
Slow (10-50ms)mise, direnv (with binary), script providersMultiple spawns or interpreted CLI

Rule 1: Never fork a process when you can read a file.

Process spawns cost 2-6ms minimum. File reads cost nanoseconds. Before using Command::new(...), ask: does this tool write its state to a file I can parse?

// Bad: 5ms to spawn git just to count stashes
let output = Command::new("git").args(["stash", "list"]).output().ok()?;
let count = output.stdout.lines().count();

// Good: ~1µs to read the stash log file directly
let stash_log = dir.join(".git").join("logs").join("refs").join("stash");
let count = std::fs::read_to_string(&stash_log)
.map(|s| s.lines().count() as i64)
.unwrap_or(0);

Real examples from docs/performance.md:

  • gcloud: reading ~/.config/gcloud/properties instead of spawning the Python CLI — 500ms to 1µs (~500,000x)
  • kubecontext: reading ~/.kube/config instead of running kubectl — 60ms to 749ns (~80,000x)
  • git stash: reading .git/logs/refs/stash instead of git stash list — 5ms to 1µs

Rule 2: If you must spawn a process, spawn exactly one.

If a file read is truly not feasible, cap the provider at one process spawn. The git provider spawns one (git status). The network provider spawns one (airport for SSID; everything else uses libc::getifaddrs()).

Rule 3: Providers that poll frequently must be fast.

A provider polling every 5 seconds and taking 50ms per execution consumes 1% of a blocking thread slot continuously. Use Poll { interval_secs } values that match the provider's actual cost:

  • Sub-microsecond providers: can poll every 5-10s safely
  • Millisecond providers: 30s minimum
  • Slow providers (>10ms): 60s minimum or use Watch instead

Rule 4: Providers must be stateless.

execute() receives no mutable state. Do not use Mutex-wrapped fields inside your provider struct to cache intermediate results — this adds contention and complexity. If two concurrent calls to execute() are needed (different paths), they must be independent.

See docs/performance.md for the full performance profile, benchmark commands, and the regression checklist.


5. Testing Patterns

Basic structure

Every provider file should have a #[cfg(test)] module. At minimum, test:

  1. metadata() returns valid, expected values
  2. execute() returns None when the required tool/file is absent
  3. execute() returns the expected fields when given a valid fixture

Using tempdir

For providers that read files, use tempfile::tempdir() to create a controlled environment:

#[test]
fn detects_git_repo() {
let dir = tempfile::tempdir().unwrap();
// Create a minimal .git directory
std::fs::create_dir(dir.path().join(".git")).unwrap();
std::fs::write(dir.path().join(".git").join("HEAD"), "ref: refs/heads/main\n").unwrap();

let provider = GitProvider;
// execute() returns None for a bare .git dir without a valid git repo state,
// but it should not panic.
let _ = provider.execute(Some(dir.path().to_str().unwrap()));
}

Testing with real git repos

For providers that shell out (like git), test against a real initialized repo:

#[test]
fn git_status_on_empty_repo() {
let dir = tempfile::tempdir().unwrap();
std::process::Command::new("git")
.args(["init"])
.current_dir(dir.path())
.output()
.unwrap();
std::process::Command::new("git")
.args(["commit", "--allow-empty", "-m", "init"])
.current_dir(dir.path())
.env("GIT_AUTHOR_NAME", "test")
.env("GIT_AUTHOR_EMAIL", "test@test")
.env("GIT_COMMITTER_NAME", "test")
.env("GIT_COMMITTER_EMAIL", "test@test")
.output()
.unwrap();

let provider = GitProvider;
let result = provider.execute(Some(dir.path().to_str().unwrap()));
assert!(result.is_some());
let result = result.unwrap();
assert_eq!(result.get("branch"), Some(&Value::String("main".to_string())));
assert_eq!(result.get("dirty"), Some(&Value::Bool(false)));
}

Testing when the external tool is not installed

Providers that depend on optional tools (docker, kubectl, aws) must return None gracefully when the tool is absent or when the relevant config files do not exist. Test this by pointing HOME to a clean tempdir:

#[test]
fn returns_none_without_kubeconfig() {
let dir = tempfile::tempdir().unwrap();
std::env::set_var("HOME", dir.path());
std::env::remove_var("KUBECONFIG");

let provider = KubecontextProvider;
assert!(provider.execute(None).is_none());

std::env::remove_var("HOME");
}

Avoid std::env::set_var in parallel tests — it mutates global state. Either mark such tests #[serial] (via the serial_test crate) or use a single-threaded test binary: cargo test -- --test-threads=1.

Testing metadata() completeness

A quick structural test catches registration bugs early:

#[test]
fn metadata_fields_match_execute_output() {
let dir = tempfile::tempdir().unwrap();
// ... set up fixture ...
let provider = DockerContextProvider;
let meta = provider.metadata();
let result = provider.execute(None).unwrap();

for field in &meta.fields {
assert!(
result.get(&field.name).is_some(),
"metadata declares field '{}' but execute() did not populate it",
field.name
);
}
}

6. Script Providers vs Built-in Providers

When to use a script provider

Script providers are defined in ~/.config/beachcomber/config.toml without writing any Rust. Use them when:

  • The logic is simple or already exists as a shell script
  • The tool does not have a file-based state representation (forced to shell out)
  • The data changes infrequently so the performance cost is acceptable
  • You need something working today and can write a built-in later

How script providers work

A script provider entry in config:

[providers.my_vpn]
command = "vpn-status --json"
output = "json"

[providers.my_vpn.invalidation]
poll = "10s"
watch = ["/etc/vpn/state"]

This creates a ScriptProvider instance (see src/provider/script.rs) that:

  1. Runs sh -c "vpn-status --json" when executed
  2. Parses stdout as JSON (field: output = "json") or key=value pairs (output = "kv")
  3. Returns the parsed fields as a ProviderResult

The invalidation config maps directly to InvalidationStrategy:

  • poll only -> Poll { interval_secs }
  • watch only -> Watch { patterns, fallback_poll_secs: Some(60) }
  • Both -> WatchAndPoll { patterns, interval_secs }
  • Neither -> Poll { interval_secs: 30 } (default)

Set scope = "path" to make the provider path-scoped (the script will be run with its working directory set to the queried path):

[providers.project_version]
command = "cat VERSION 2>/dev/null || echo unknown"
output = "kv" # stdout format: "version=1.2.3"
scope = "path"

[providers.project_version.invalidation]
watch = ["."]

When to write a built-in

Prefer a built-in provider when:

  • Performance matters: The provider will be queried frequently (prompt, tmux, status bar) and spawn_blocking a process every 5-30s adds up
  • File parsing is required: The tool stores state in a structured file (INI, TOML, plain text) that you can parse directly without spawning the tool
  • Cross-platform behaviour: Shell semantics differ between sh and cmd.exe; Rust handles this uniformly
  • The provider will be broadly useful: If most beachcomber users would want it, it belongs in the binary

The perf breakeven point: if direct file reading brings execution from >1ms to <100µs, write a built-in. If the tool must be shelled out anyway and the data changes slowly, a script provider is fine.

Migrating a script provider to built-in

  1. Identify what the script does — which file does it read, or which binary does it call?
  2. Check docs/performance.md to see if the tool has already been handled as a file read
  3. Write the built-in following §2 above, matching the field names your existing config consumers expect
  4. Remove the script entry from config and register the built-in in registry.rs
  5. Run cargo bench --bench providers before and after to verify the improvement