Provider Development Guide
Step-by-step reference for writing new built-in providers. Built-in providers live in src/provider/ and are compiled into the daemon. For a lower-effort path using shell scripts, see §6.
1. The Provider Trait
Every provider implements this trait (defined in src/provider/mod.rs):
pub trait Provider: Send + Sync {
fn metadata(&self) -> ProviderMetadata;
fn execute(&self, path: Option<&str>) -> Option<ProviderResult>;
}
metadata() is called at registration time and on every comb list request. It must be fast and allocation-light (it currently allocates; a future optimisation may switch to Cow<'static, str>). Return a ProviderMetadata describing:
name: the provider's key used incomb get <name>.<field>fields: a list ofFieldSchema { name, field_type }describing what fieldsexecute()will populateinvalidation: when the cached value should be refreshed (see §3)global:trueif the provider ignores thepathargument (e.g.,hostname,user);falseif it is path-scoped (e.g.,git,terraform)
execute(path) runs the provider and returns the result. It is called on a blocking thread pool (tokio::task::spawn_blocking), so it may safely call std::process::Command, std::fs::read_to_string, and other blocking operations. Return None to indicate that no value is available (the cache will not be updated). Return Some(ProviderResult) on success.
ProviderResult is a HashMap<String, Value> wrapper. Insert fields with result.insert("fieldname", Value::String("...")).
2. Step-by-Step: Writing a "docker context" Provider
This section builds a complete provider that reports the current Docker context name and endpoint.
Docker stores its active context in ~/.docker/config.json (field "currentContext") and context details in ~/.docker/contexts/meta/<hash>/meta.json. Reading these files directly is ~1µs, versus ~30ms for docker context inspect.
2.1 Create the file
Create src/provider/dockercontext.rs:
use crate::provider::{
FieldSchema, FieldType, InvalidationStrategy, Provider, ProviderMetadata,
ProviderResult, Value,
};
use std::path::PathBuf;
pub struct DockerContextProvider;
impl Provider for DockerContextProvider {
fn metadata(&self) -> ProviderMetadata {
ProviderMetadata {
name: "dockercontext".to_string(),
fields: vec![
FieldSchema { name: "name".to_string(), field_type: FieldType::String },
FieldSchema { name: "endpoint".to_string(), field_type: FieldType::String },
],
invalidation: InvalidationStrategy::Watch {
patterns: vec![
home_subpath(".docker/config.json"),
home_subpath(".docker/contexts"),
],
fallback_poll_secs: Some(60),
},
global: true,
}
}
fn execute(&self, _path: Option<&str>) -> Option<ProviderResult> {
let home = std::env::var("HOME").ok()?;
let config_path = PathBuf::from(&home).join(".docker").join("config.json");
let config_text = std::fs::read_to_string(&config_path).ok()?;
let config: serde_json::Value = serde_json::from_str(&config_text).ok()?;
let context_name = config
.get("currentContext")
.and_then(|v| v.as_str())
.unwrap_or("default")
.to_string();
// Look up the endpoint from the context metadata.
let endpoint = read_context_endpoint(&home, &context_name)
.unwrap_or_else(|| "unix:///var/run/docker.sock".to_string());
let mut result = ProviderResult::new();
result.insert("name", Value::String(context_name));
result.insert("endpoint", Value::String(endpoint));
Some(result)
}
}
fn home_subpath(rel: &str) -> String {
std::env::var("HOME")
.map(|h| format!("{}/{}", h, rel))
.unwrap_or_else(|_| rel.to_string())
}
fn read_context_endpoint(home: &str, context_name: &str) -> Option<String> {
if context_name == "default" {
return None;
}
// Docker names contexts by SHA256 of the name; iterate the meta directory.
let meta_dir = PathBuf::from(home).join(".docker").join("contexts").join("meta");
for entry in std::fs::read_dir(&meta_dir).ok()? {
let entry = entry.ok()?;
let meta_path = entry.path().join("meta.json");
let text = std::fs::read_to_string(&meta_path).ok()?;
let meta: serde_json::Value = serde_json::from_str(&text).ok()?;
if meta.get("Name").and_then(|v| v.as_str()) == Some(context_name) {
return meta
.pointer("/Endpoints/docker/Host")
.and_then(|v| v.as_str())
.map(|s| s.to_string());
}
}
None
}
2.2 Register the provider
Add the module to src/provider/mod.rs:
pub mod dockercontext;
Add the import and registration to src/provider/registry.rs:
use crate::provider::dockercontext::DockerContextProvider;
// In with_defaults() and in the builtins vec inside with_config():
("dockercontext", Box::new(DockerContextProvider)),
2.3 Config (optional, for disabling)
No config entry is required. Users can disable it via ~/.config/beachcomber/config.toml:
[providers.dockercontext]
enabled = false
2.4 Use it
comb get dockercontext.name
comb get dockercontext.endpoint
2.5 Write a test
Add a test module at the bottom of src/provider/dockercontext.rs:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn metadata_is_valid() {
let provider = DockerContextProvider;
let meta = provider.metadata();
assert_eq!(meta.name, "dockercontext");
assert!(meta.global);
assert_eq!(meta.fields.len(), 2);
assert!(meta.fields.iter().any(|f| f.name == "name"));
assert!(meta.fields.iter().any(|f| f.name == "endpoint"));
}
#[test]
fn returns_none_without_docker_config() {
// Point HOME at a temp directory with no .docker/ directory.
let dir = tempfile::tempdir().unwrap();
std::env::set_var("HOME", dir.path());
let provider = DockerContextProvider;
let result = provider.execute(None);
assert!(result.is_none());
// Restore HOME to avoid contaminating other tests.
std::env::remove_var("HOME");
}
#[test]
fn reads_default_context() {
let dir = tempfile::tempdir().unwrap();
let docker_dir = dir.path().join(".docker");
std::fs::create_dir_all(&docker_dir).unwrap();
std::fs::write(
docker_dir.join("config.json"),
r#"{"auths": {}, "currentContext": "default"}"#,
).unwrap();
std::env::set_var("HOME", dir.path());
let provider = DockerContextProvider;
let result = provider.execute(None).unwrap();
assert_eq!(
result.get("name"),
Some(&Value::String("default".to_string()))
);
std::env::remove_var("HOME");
}
}
Run with:
cargo test -p beachcomber provider::dockercontext
3. InvalidationStrategy: Choosing the Right Variant
pub enum InvalidationStrategy {
Once,
Poll { interval_secs: u64, floor_secs: u64 },
Watch { patterns: Vec<String>, fallback_poll_secs: Option<u64> },
WatchAndPoll { patterns: Vec<String>, interval_secs: u64, floor_secs: u64 },
}
Once — compute once at daemon startup, never again. Use for values that cannot change without a daemon restart: hostname, current user, static environment facts. Cost: one execution at startup, zero ongoing overhead.
// hostname: never changes while daemon is running
invalidation: InvalidationStrategy::Once,
Poll { interval_secs, floor_secs } — re-execute on a timer. Use when there is no file to watch that reliably reflects state changes. floor_secs prevents consumer-requested poll intervals from going below a minimum (usually 1). The interval is in seconds; interval_secs: 30 means re-run every 30 seconds.
// battery level: no file to watch reliably, poll every 30s
invalidation: InvalidationStrategy::Poll {
interval_secs: 30,
floor_secs: 1,
},
Watch { patterns, fallback_poll_secs } — re-execute when the filesystem paths in patterns change. Use when there is a file or directory that is written whenever the state changes. fallback_poll_secs is used as a poll interval on systems where file watching fails or is unavailable. Set it to Some(60) unless freshness is critical.
// kubecontext: re-run when kubeconfig is written
invalidation: InvalidationStrategy::Watch {
patterns: vec!["/home/user/.kube/config".to_string()],
fallback_poll_secs: Some(60),
},
In practice, patterns should use absolute paths where possible. For paths relative to $HOME, expand them in metadata() using std::env::var("HOME") (see the dockercontext example above).
WatchAndPoll { patterns, interval_secs, floor_secs } — watch files AND poll on a timer. Use when file watching catches most changes quickly but some changes don't touch a watchable file (e.g., network-propagated git changes that arrive via git fetch). The git provider uses this: it watches .git for local operations and polls every 60 seconds to catch remote state.
// git: watch .git for local commits/checkouts, poll every 60s for remote changes
invalidation: InvalidationStrategy::WatchAndPoll {
patterns: vec![".git".to_string()],
interval_secs: 60,
floor_secs: 1,
},
Note that for path-scoped providers (e.g., git), patterns like ".git" are relative to the queried path and the FsWatcher receives the resolved absolute path when demand is first registered. For global providers, patterns should be absolute paths.
4. Performance Guidelines
Provider execution happens on tokio's blocking thread pool. Slow providers delay cache freshness but do not block the scheduler loop. Still, keep providers fast. The tier list from docs/performance.md:
| Tier | Target | Method |
|---|---|---|
Nanosecond (<1µs) | user, hostname, kubecontext, gcloud, aws | libc calls, env vars, file reads + line scan |
| Microsecond (1-100µs) | terraform, python, direnv (no binary) | File existence checks + small reads |
| Millisecond (1-10ms) | git, network, battery | At most one process spawn |
| Slow (10-50ms) | mise, direnv (with binary), script providers | Multiple spawns or interpreted CLI |
Rule 1: Never fork a process when you can read a file.
Process spawns cost 2-6ms minimum. File reads cost nanoseconds. Before using Command::new(...), ask: does this tool write its state to a file I can parse?
// Bad: 5ms to spawn git just to count stashes
let output = Command::new("git").args(["stash", "list"]).output().ok()?;
let count = output.stdout.lines().count();
// Good: ~1µs to read the stash log file directly
let stash_log = dir.join(".git").join("logs").join("refs").join("stash");
let count = std::fs::read_to_string(&stash_log)
.map(|s| s.lines().count() as i64)
.unwrap_or(0);
Real examples from docs/performance.md:
gcloud: reading~/.config/gcloud/propertiesinstead of spawning the Python CLI — 500ms to 1µs (~500,000x)kubecontext: reading~/.kube/configinstead of runningkubectl— 60ms to 749ns (~80,000x)gitstash: reading.git/logs/refs/stashinstead ofgit stash list— 5ms to 1µs
Rule 2: If you must spawn a process, spawn exactly one.
If a file read is truly not feasible, cap the provider at one process spawn. The git provider spawns one (git status). The network provider spawns one (airport for SSID; everything else uses libc::getifaddrs()).
Rule 3: Providers that poll frequently must be fast.
A provider polling every 5 seconds and taking 50ms per execution consumes 1% of a blocking thread slot continuously. Use Poll { interval_secs } values that match the provider's actual cost:
- Sub-microsecond providers: can poll every 5-10s safely
- Millisecond providers: 30s minimum
- Slow providers (>10ms): 60s minimum or use
Watchinstead
Rule 4: Providers must be stateless.
execute() receives no mutable state. Do not use Mutex-wrapped fields inside your provider struct to cache intermediate results — this adds contention and complexity. If two concurrent calls to execute() are needed (different paths), they must be independent.
See docs/performance.md for the full performance profile, benchmark commands, and the regression checklist.
5. Testing Patterns
Basic structure
Every provider file should have a #[cfg(test)] module. At minimum, test:
metadata()returns valid, expected valuesexecute()returnsNonewhen the required tool/file is absentexecute()returns the expected fields when given a valid fixture
Using tempdir
For providers that read files, use tempfile::tempdir() to create a controlled environment:
#[test]
fn detects_git_repo() {
let dir = tempfile::tempdir().unwrap();
// Create a minimal .git directory
std::fs::create_dir(dir.path().join(".git")).unwrap();
std::fs::write(dir.path().join(".git").join("HEAD"), "ref: refs/heads/main\n").unwrap();
let provider = GitProvider;
// execute() returns None for a bare .git dir without a valid git repo state,
// but it should not panic.
let _ = provider.execute(Some(dir.path().to_str().unwrap()));
}
Testing with real git repos
For providers that shell out (like git), test against a real initialized repo:
#[test]
fn git_status_on_empty_repo() {
let dir = tempfile::tempdir().unwrap();
std::process::Command::new("git")
.args(["init"])
.current_dir(dir.path())
.output()
.unwrap();
std::process::Command::new("git")
.args(["commit", "--allow-empty", "-m", "init"])
.current_dir(dir.path())
.env("GIT_AUTHOR_NAME", "test")
.env("GIT_AUTHOR_EMAIL", "test@test")
.env("GIT_COMMITTER_NAME", "test")
.env("GIT_COMMITTER_EMAIL", "test@test")
.output()
.unwrap();
let provider = GitProvider;
let result = provider.execute(Some(dir.path().to_str().unwrap()));
assert!(result.is_some());
let result = result.unwrap();
assert_eq!(result.get("branch"), Some(&Value::String("main".to_string())));
assert_eq!(result.get("dirty"), Some(&Value::Bool(false)));
}
Testing when the external tool is not installed
Providers that depend on optional tools (docker, kubectl, aws) must return None gracefully when the tool is absent or when the relevant config files do not exist. Test this by pointing HOME to a clean tempdir:
#[test]
fn returns_none_without_kubeconfig() {
let dir = tempfile::tempdir().unwrap();
std::env::set_var("HOME", dir.path());
std::env::remove_var("KUBECONFIG");
let provider = KubecontextProvider;
assert!(provider.execute(None).is_none());
std::env::remove_var("HOME");
}
Avoid std::env::set_var in parallel tests — it mutates global state. Either mark such tests #[serial] (via the serial_test crate) or use a single-threaded test binary: cargo test -- --test-threads=1.
Testing metadata() completeness
A quick structural test catches registration bugs early:
#[test]
fn metadata_fields_match_execute_output() {
let dir = tempfile::tempdir().unwrap();
// ... set up fixture ...
let provider = DockerContextProvider;
let meta = provider.metadata();
let result = provider.execute(None).unwrap();
for field in &meta.fields {
assert!(
result.get(&field.name).is_some(),
"metadata declares field '{}' but execute() did not populate it",
field.name
);
}
}
6. Script Providers vs Built-in Providers
When to use a script provider
Script providers are defined in ~/.config/beachcomber/config.toml without writing any Rust. Use them when:
- The logic is simple or already exists as a shell script
- The tool does not have a file-based state representation (forced to shell out)
- The data changes infrequently so the performance cost is acceptable
- You need something working today and can write a built-in later
How script providers work
A script provider entry in config:
[providers.my_vpn]
command = "vpn-status --json"
output = "json"
[providers.my_vpn.invalidation]
poll = "10s"
watch = ["/etc/vpn/state"]
This creates a ScriptProvider instance (see src/provider/script.rs) that:
- Runs
sh -c "vpn-status --json"when executed - Parses stdout as JSON (field:
output = "json") or key=value pairs (output = "kv") - Returns the parsed fields as a
ProviderResult
The invalidation config maps directly to InvalidationStrategy:
pollonly ->Poll { interval_secs }watchonly ->Watch { patterns, fallback_poll_secs: Some(60) }- Both ->
WatchAndPoll { patterns, interval_secs } - Neither ->
Poll { interval_secs: 30 }(default)
Set scope = "path" to make the provider path-scoped (the script will be run with its working directory set to the queried path):
[providers.project_version]
command = "cat VERSION 2>/dev/null || echo unknown"
output = "kv" # stdout format: "version=1.2.3"
scope = "path"
[providers.project_version.invalidation]
watch = ["."]
When to write a built-in
Prefer a built-in provider when:
- Performance matters: The provider will be queried frequently (prompt, tmux, status bar) and
spawn_blockinga process every 5-30s adds up - File parsing is required: The tool stores state in a structured file (INI, TOML, plain text) that you can parse directly without spawning the tool
- Cross-platform behaviour: Shell semantics differ between sh and cmd.exe; Rust handles this uniformly
- The provider will be broadly useful: If most beachcomber users would want it, it belongs in the binary
The perf breakeven point: if direct file reading brings execution from >1ms to <100µs, write a built-in. If the tool must be shelled out anyway and the data changes slowly, a script provider is fine.
Migrating a script provider to built-in
- Identify what the script does — which file does it read, or which binary does it call?
- Check
docs/performance.mdto see if the tool has already been handled as a file read - Write the built-in following §2 above, matching the field names your existing config consumers expect
- Remove the script entry from config and register the built-in in
registry.rs - Run
cargo bench --bench providersbefore and after to verify the improvement