Configuration & Runtime Engine
PADAS Core is the runtime engine identity for the data plane: a single process that owns runtime state, execution scheduling, and persistence projection for streams, tasks, and connectors. It is not “just an API server”—the embedded control plane (padas-api on Axum + Rustls, /api/v1/*) orchestrates the same runtime topology the hot path executes. StreamRouter moves events through buffers; an optional WAL subscriber provides runtime durability; tasks are execution units on that graph; connectors are ingress/egress boundaries. Operators reason in API-driven orchestration terms (streams.json, tasks.json, connectors.json under [api.persistence].config_dir) while the engine reconciles registry persistence with live runtime reconciliation after restarts.
Related: REST API Reference · Runtime configurations · Monitoring · Stream configuration · Security — runtime operational security reference · Troubleshooting & Logs · Glossary · Cores
Overview
Core is the long-running runtime engine that materializes an execution graph from registry objects:
| Subsystem | Implementation-aware role |
|---|---|
| StreamRouter | Per-stream routing fabric: buffer transitions, backpressure modes, and fan-out to subscribers—including the optional WAL subscriber path. |
| Runtime scheduling | Workers, virtual shards, and task/connector lifecycle map to OS-level execution; there is no out-of-process “pipeline scheduler” for product pipelines. |
| WAL subscriber behavior | WAL is a router subscriber writing segmented data under [core.wal].path; lagging consumers can hybrid-read from WAL per [core.subscriber.lag] (see Glossary → Consumer lag). |
| Runtime execution graph | Sources → stream buffers → (optional WAL) → task PDL → sink streams / connector sinks; one single-process runtime graph. |
| Task orchestration | Tasks subscribe to source streams, run PDL in processing or detection mode, manage aggregation state, and route outputs—execution unit semantics. |
| Connector lifecycle | Start/stop/restart via /api/v1/connectors maps to connector runtime threads/async work—runtime lifecycle tied to the same process. |
| Embedded API orchestration | Axum router under [api].prefix (default /api/v1) mutates runtime state and persists registry JSON without a sidecar control plane. |
| Runtime persistence projection | streams.json, tasks.json, connectors.json are the on-disk persistence projection of API-managed objects; startup registry reconciliation reloads them into the runtime engine. |
Why Core is not just an API server: the REST surface is the embedded control plane for runtime coordination; disabling or mis-sizing the engine does not leave a “headless API”—it leaves a broken streaming runtime.
Why pipelines are runtime compositions: a pipeline in product language is not a standalone service—it is realized as runtime objects (streams, tasks, connectors) under one scheduler inside this single-process runtime graph.
Runtime state synchronization: UI and automation drive desired state through /api/v1/*; the process applies changes, persists registry files, and reflects status on …/status routes—runtime state and config state (see Runtime persistence model) must stay aligned across rolling restarts and reload semantics.
Runtime role
In runtime topology terms, Core is the single-process runtime engine that executes workloads described by:
- Streams — stream transport: named channels with buffers, retention, optional WAL, and HTTP produce/consume surfaces. Stream topology realization is “which routers exist and how they connect.”
- Tasks — execution units: subscriptions to source streams, PDL queries, concurrency limits, aggregation windows, and sinks (processing vs detection modes).
- Connectors — ingress/egress boundaries: sources pump events into streams; sinks drain tasks or streams per connector class semantics (class-specific backpressure / drop policies—Glossary → Connector class).
Runtime graph realization: there is no separate “pipeline microservice”—the pipeline is not a standalone service; it is pipeline realized as runtime objects sharing runtime coordination, IO, and persistence ownership in one address space.
Runtime object ownership: the process owns buffers, WAL segments under [core.wal], task internal state, and connector handles—execution isolation is cooperative (caps, modes), not separate JVMs per pipeline.
Persistence recovery: on start, registry JSON plus padas.toml / padas.default.toml drive startup orchestration: restore routers, replay or resume consumers per offset and WAL policy, then serve /api/v1/*.
Shutdown guarantees: bounded by [core].shutdown_timeout_secs; for clean runtime lifecycle, drain or pause ingest before SIGTERM in production so sink delivery and WAL flush windows can complete.
Core responsibilities
| Responsibility | Where it lives | Operational notes |
|---|---|---|
| Runtime coordination | Engine + [core] | Workers, shards, startup/shutdown timeouts; virtual shard count affects hashing and parallelism—profile before raising. |
| Stream lifecycle | Engine + streams.json | Create/update/delete, pause/resume/restart; per-stream WAL and buffer overlays via API or persisted registry. |
| Stream durability behavior | [core.stream.wal], [core.wal], WAL subscriber | WAL segments + retention; sync_writes trades latency for durability; high-throughput profiles sometimes disable per-stream WAL—explicit runtime durability trade-off (Glossary → WAL). |
| Consumer recovery | StreamRouter + offsets + [core.subscriber.lag] | Execution scheduling resumes from persisted offsets; hybrid WAL reads when lag exceeds threshold / timeout_ms. |
| Task execution | [core.task], tasks.json | Execution state management for PDL, aggregation cleanup/watermarks ([core.task.aggregation.*]); overlays in task JSON vs padas.toml must be reconciled before change windows (Glossary → Watermark, Tasks). |
| Connector execution | connectors.json + connector runtime | Start/stop/restart mirrors tasks; runtime lifecycle and backpressure are class-specific. |
| Execution state management | In-process task/agg state + registry | Restart clears hot state; recovery depends on replay behavior and window definitions. |
| Persistence ownership | [api.persistence].config_dir, core.data_dir, WAL path | Registry vs WAL vs instance UUID paths—stateful paths must stay stable across VM migration. |
| Embedded control plane | [api], Axum, /api/v1/* | API-driven orchestration; POST /api/v1/system/reload applies supported padas.toml deltas where implemented—runtime reload semantics vs full restart differ by key. |
| Runtime reload semantics | POST /api/v1/system/reload | Prefer validate_only in automation before applying; unsupported keys still need process restart. |
| Metrics / internal streams | [observability], GET /api/v1/metrics | Internal/system streams (e.g. _padas_metrics) feed observability—observability implications on hot path when metrics level is high. |
| Runtime API protection | [api.auth], service-account JSON | Axum runtime auth middleware when enabled—see Service-account authentication and Security. |
Configuration file
Hierarchy
Config layering is fixed: padas.default.toml (immutable vendor defaults) → operator padas.toml → environment overrides (e.g. PADAS_HOME). Runtime precedence: later layers override earlier for the same keys. Treat padas.default.toml as read-only baseline; keep deployment portability overrides in padas.toml beside it under $PADAS_HOME/etc/. Stateful paths (data_dir, WAL, [api.persistence].config_dir) belong in operator config so reload behavior and backups stay consistent.
Reload behavior vs runtime restart implications: POST /api/v1/system/reload validates and applies a subset of changes; anything outside that set requires a process restart—plan maintenance window considerations accordingly. Full doc: Runtime configurations.
Minimal lab profile (TLS and auth off)
Use only on trusted localhost. Auth-disabled operational risks: with [api.auth].enabled = false, /api/v1/* is unauthenticated runtime API exposure—never on shared networks (Security).
[api]
host = "127.0.0.1"
[api.tls]
enabled = false
[api.auth]
enabled = false
Production-oriented API bind and transport
[api]
enabled = true
host = "0.0.0.0"
port = 8999
prefix = "/api/v1"
[api.tls]
enabled = true
cert_file = "/var/lib/padas/etc/api.crt"
key_file = "/var/lib/padas/etc/api.key"
[api.cors]
enabled = true
allowed_origins = "https://padas-ui.example.com"
allow_credentials = false
allowed_methods = "GET,POST,PUT,DELETE,OPTIONS"
allowed_headers = "Content-Type,Authorization"
preflight_max_age_secs = 3600
Service-account authentication (summary)
When [api.auth].enabled = true, Axum runtime auth middleware requires Authorization: Bearer <token> on every embedded route under prefix, including /api/v1/health and /metrics—full runtime API protection for the surface. Token material lives in JSON at [api.auth].service_account_token_file (default ./data/security/service-account.token relative to PADAS_HOME). Service token lifecycle (rotation, grace, lockout, X-Forwarded-For) and UI/Core trust model are documented in Security — runtime operational security reference. Rustls terminates TLS per [api.tls] in padas-api.
Registry persistence (API-managed objects)
[api.persistence]
config_dir = "/var/lib/padas/data/registry"
streams_registry_file = "streams.json"
tasks_registry_file = "tasks.json"
connectors_registry_file = "connectors.json"
backup_enabled = true
backup_dir = "/var/lib/padas/data/registry/backups"
max_backups = 10
Runtime projection: config_dir holds the authoritative persistence projection of API-managed streams/tasks/connectors—the registry durability operators backup. Reconciliation behavior: on startup the engine loads JSON and aligns runtime state with disk; runtime drift implications appear when manual file edits disagree with last API write or when restores are partial.
Backup strategy: snapshot config_dir with backup_enabled rotation and external volume backup—restore expectations include restarting Core so registry reconciliation runs against a consistent trio of files. Persistence ownership is split: registry JSON for config state, WAL/data dirs for runtime durability and task windows—restore one without the other at your own risk.
Engine, data paths, and timeouts
[core]
workers = 0
virtual_shards = 4096
data_dir = "/var/lib/padas/data"
startup_timeout_secs = 30
shutdown_timeout_secs = 30
instance_uuid is normally auto-generated on first start; keep data_dir and WAL paths stable so offset persistence and segment files remain consistent across rolling restart implications.
Streams, buffers, and WAL defaults
Per-stream buffers default under [core.stream.buffer] (max_events, mode = timeout | drop | block, timeout_ms)—they define stream backpressure on the hot path.
[core.stream.buffer]
max_events = 25000
max_bytes = 0
timeout_ms = 10
mode = "timeout"
[core.stream.wal]
enabled = false
[core.stream.wal.batch]
max_events = 100
max_timeout_ms = 100
Global WAL storage and retention:
[core.wal]
path = "/var/lib/padas/data/wal"
retention_ms = 3600000
retention_bytes = 1073741824
max_segment_size = 104857600
max_segments = 20
sync_writes = false
[core.wal.compression]
enabled = true
algorithm = "zstd"
level = 1
WAL / stream relationship:
- With
[core.stream.wal].enabled = true, appended events are durably segmented under[core.wal].pathsubject to retention andsync_writes. - High-throughput modes may disable stream WAL entirely—explicit runtime durability trade-off vs latency (Glossary → WAL).
Subscriber lag and hybrid reads
Controls when consumers transition from StreamRouter buffers to WAL-backed reads—replay lag and consumer lag tuning (Glossary → Consumer lag):
[core.subscriber.lag]
threshold = 50
timeout_ms = 100
[core.subscriber.batch]
max_events = 500
[core.subscriber.offset.batch]
max_events = 1000
[core.subscriber]
enable_adaptive_capacity = true
Tasks and aggregation (runtime execution)
[core.task]
# Values from shipped defaults; raise workers/thread caps only after profiling.
[core.task.aggregation.cleanup]
# Window state retention — align with PDL windows and disk budget.
[core.task.aggregation.watermark]
# Idle / shutdown flush behaviour for tumbling windows (watermark / cleanup).
Task REST payloads may supply execution overlays mapping to these tables—resolve which layer wins (TOML vs task JSON) before production edits (Tasks).
Observability hooks
Prefer [observability.logging].format = "Json" for centralized logging; [observability.metrics_collection].level increases runtime observability cost on the hot path. GET /api/v1/metrics exposes Prometheus text when enabled.
Streams, tasks, and connectors inside one runtime engine
Event lifecycle on the runtime execution graph:
- Ingestion — HTTP produce, connector sources, or internal generators append to a stream’s StreamRouter buffer (buffer transitions).
- WAL handoff — optional WAL subscriber persists copies per stream/WAL policy before downstream visibility guarantees are defined by mode.
- Task execution — tasks subscribe to source streams; task execution ordering is defined by the engine’s subscription and shard assignment—task saturation shows as lag, drops, or
errorstatus. - Sink routing — PDL outputs land on sink streams or connector sinks; sink delivery respects connector-specific backpressure.
- Consumer lag behavior — subscribers past
[core.subscriber.lag]thresholds read from WAL segments (execution isolation is not cross-process; it is per-task/connector caps).
Runtime state transitions: streams move created → active → paused/stopped; tasks/connectors expose running, stopped, error—use …/status routes for runtime diagnostics.
Control plane: REST CRUD on /api/v1/streams, /tasks, /connectors updates live runtime state and atomically persists streams.json / tasks.json / connectors.json. Lifecycle start / stop / restart maps to in-process runtime orchestration—not external Kubernetes job semantics unless you wrap them yourself.
Runtime execution model
| Stage | Runtime behavior |
|---|---|
| Ingestion | Events enter StreamRouter buffers from HTTP, connectors, or internal producers. |
| Routing | Shards/virtual shards map keys to internal queues; fan-out to subscribers and WAL path. |
| Buffering | [core.stream.buffer] modes (timeout, drop, block) define stream backpressure before drops or blocking producers. |
| WAL persistence | Optional subscriber writes batches to [core.wal].path with retention and compression. |
| Task execution | PDL runs per execution unit; aggregation windows hold runtime state until cleanup/watermark. |
| Sink delivery | Tasks and connector sinks consume from router and/or WAL hybrid reads. |
| Offset management | Offset persistence enables consumer recovery; batching under [core.subscriber.offset.batch] tunes commit frequency. |
| Replay behavior | After restart or lag, consumers may replay from WAL per policy—coordinate with task execution windows to avoid duplicate side effects in sinks. |
Runtime persistence model
| Concern | Behavior |
|---|---|
| WAL vs registry persistence | WAL = runtime durability for stream event history (policy-dependent). Registry JSON = config state for streams/tasks/connectors the API last wrote. |
| Runtime state vs config state | Runtime state includes hot buffers, in-flight connector handles, aggregation windows; config state is the persisted registry + padas.toml. |
| Recovery model | Startup reloads registry + merges padas.toml layers; WAL segments inform replay semantics for lagging subscribers. |
| Replay semantics | Defined by offset + WAL retention—if retention is shorter than outage, you lose replay range. |
| Stream durability | Tied to WAL enablement, sync_writes, and retention bytes/time—size for WAL growth worst case. |
| Offset persistence | Drives consumer recovery after rolling restart implications—validate backup includes offset stores if your deployment relies on them on disk under data_dir. |
| Registry restore expectations | Restoring only streams.json without matching tasks/connectors yields partial runtime topology—expect reload failures or manual cleanup. |
Runtime orchestration boundaries
| Boundary | Responsibility |
|---|---|
| UI | Operator desired state and Core pairing (account_token); does not execute PDL or hold StreamRouter. |
| Core runtime | Runtime engine: StreamRouter, WAL, tasks, connectors, offsets, execution scheduling. |
| Registry | streams.json, tasks.json, connectors.json—persistence projection and runtime reconciliation input. |
| REST API | Embedded control plane: /api/v1/*, POST /api/v1/system/reload, auth middleware—orchestration boundary to automation. |
| Connectors | External IO; translate network/files/syslog into stream events or drain to downstream systems. |
| Stream engine | In-process stream transport and buffer/WAL stack shared by all pipelines. |
Operational notes
| Topic | Guidance |
|---|---|
| Rolling restart implications | One node at a time still drops runtime state for that process—expect replay lag and brief connector stalls; upstream should tolerate redelivery where at-least-once applies. |
| WAL sizing | Plan retention_bytes, max_segments, and disk IO for burst ingest; WAL growth without retention tuning fills disks. |
| State recovery | Validate startup_timeout_secs against large registries + WAL replay; startup hangs often trace here. |
| Runtime reload safety | Use validate_only on POST /api/v1/system/reload in CI/CD; bad TOML can still risk reload failures depending on validation coverage. |
| Persistence corruption considerations | Truncated JSON in config_dir or corrupt WAL segments—restore from backup; expect persistence inconsistencies until a clean snapshot is mounted. |
| Deployment synchronization | padas.toml on disk must match what automation assumes; UI Core rows only store reachability + Bearer—deployment synchronization is still file + registry ops. |
| API orchestration behavior | High-frequency CRUD without backpressure can task saturate the control plane threadpool—pace automation. |
| Runtime drift | Manual edits to JSON while Core is running can diverge from in-memory runtime state—prefer API or controlled restarts. |
| Observability implications | Verbose metrics/logging increase CPU; correlate with Monitoring during incidents. |
| Maintenance window considerations | Coordinate TLS rotation, service-account.token rotation, and secret.json-less Core changes with UI token refresh (Security). |
Runtime troubleshooting checkpoints
| Symptom | Where to look first |
|---|---|
| Startup hangs | startup_timeout_secs, registry size, WAL replay volume, TLS cert load (Rustls). |
| WAL growth | Retention, max_segments, ingest rate vs sink throughput; disk free space. |
| Stalled consumers | [core.subscriber.lag], offset commits, downstream sink health. |
| Replay lag | WAL read path vs buffer path; segment count; consumer lag behavior. |
| Reload failures | POST /api/v1/system/reload response body; unsupported keys still need restart. |
| Stream backpressure | Buffer mode, max_events, drop metrics vs ingress EPS. |
| Task saturation | [core.task] worker/thread caps, PDL cost, aggregation windows. |
| Connector stalls | Connector status, external dependency timeouts, sink rate limits. |
| Persistence inconsistencies | Mtime skew across streams.json / tasks.json / connectors.json; partial restore. |
Deeper playbooks: Troubleshooting & Logs.
Performance Tuning
TBD
Related pages
- REST API Reference —
/api/v1/*,curl, auth headers - Monitoring — metrics, EPS, incident correlation
- Stream configuration — operator topology language vs runtime topology
- Security — runtime operational security reference — Bearer, TLS, runtime API exposure
- Runtime configurations —
padas.toml/padas.default.toml,PADAS_HOME, load order - Troubleshooting & Logs — reachability, TLS, auth triage
- Glossary — runtime vocabulary (streams, WAL, tasks, operations)
- Cores — UI pairing and
account_token