Version: 2.0.0 (Latest)

Core concepts

Events

An event is the unit of data in PADAS. Events carry a payload (JSON or text), plus metadata like timestamps and optional keys for partitioning and ordering.

Event payload shapes

JSON: structured objects for parsed or enriched data
Text: raw lines (e.g. syslog in raw mode) when you want to preserve the original message exactly

Events are immutable once written to a stream. Transformation produces new events on a downstream stream — the source is never modified.

Streams

A stream is an append-only sequence of events. Streams are the backbone of PADAS: connectors write to streams; tasks read from a source stream and write to sink streams; API clients consume from streams.

Durability (WAL)

Streams can optionally be backed by a write-ahead log (WAL) for durability and historical reads:

WAL-backed streams: data is persisted to disk, survives process restarts, and can be replayed from any offset. Segments are sealed, indexed, and subject to configurable retention (by time or size).
In-memory streams: data lives only in the process buffer. Faster and lower overhead, but events are lost on restart.

Choose WAL-backed streams when downstream consumers need reliable delivery or replay. Use in-memory streams for transient fan-out and high-volume intermediate stages where durability overhead outweighs the benefit.

Positions and offsets

When you read from a stream, you choose a starting position:

earliest — replay all retained events from the beginning (WAL-backed only)
Offset-based — seek to a specific offset (WAL-backed only)
latest — start from the next new event (both WAL and in-memory)

The API is stateless for consumption: clients track their own offsets.

Backpressure and buffering

Streams have configurable bounded buffers that absorb bursts. When a buffer fills, PADAS applies one of three modes:

Block — the producer pauses until space is available
Drop — events are discarded and counted in metrics
Timeout — the producer waits briefly, then drops if still full

Buffer sizing and backpressure mode are key latency and throughput tuning levers.

Tasks

A task subscribes to a source stream, runs a PDL pipeline over each event, and publishes results to one or more output streams.

Processing mode

A processing task runs a single PDL pipeline: filter → transform → enrich → (optional) aggregate. Every matching event produces output. Use processing tasks for normalization, field extraction, enrichment, and format conversion.

Detection mode

A detection task holds one or more named PDL queries. Each query is evaluated independently against incoming events. When a query matches, the task emits a detection event (including the query name and any extracted fields). Detection tasks can be configured to stop at the first match or evaluate all queries.

Detection mode is suited for signature-based alerting, anomaly flags, and policy checks running against a normalized event stream.

PDL (PADAS Domain Language)

PDL is the query and processing language for tasks and the ad-hoc query API. Key capabilities:

Filtering — boolean expressions over event fields
Transformation — field assignment, renaming, type coercion, string manipulation
Extraction — regex, grok, and key-value parsing from raw text fields
Enrichment — field lookups against the lookup service
Aggregation — windowed counts, sums, and grouping over time windows
Conditional logic — branching output based on field values

PDL queries run inline at stream processing speed — no separate query engine to operate.

Connectors

A connector integrates an external system with a PADAS stream. Each connector has a lifecycle (stopped → starting → running → stopping) managed via the REST API.

Source connectors

Source connectors ingest data from the outside world and publish events to a target stream.

Connector	What it ingests
Syslog	UDP/TCP syslog (RFC 3164/5424), structured or raw mode
HTTP	REST endpoint polling or push ingestion, JSON payloads
Kafka	Consumer from Kafka topics (SASL/SSL, configurable group)
File	File tail (continuous) or batch read
PADAS TCP	High-throughput cross-node forwarding (MessagePack)

Sink connectors

Sink connectors subscribe to a stream and deliver events to an external destination.

Connector	What it delivers to
Syslog	External syslog receivers over UDP/TCP
HTTP	REST endpoints (POST JSON)
Kafka	Kafka topics (SASL/SSL)
Splunk	Splunk HEC
Object Storage	S3-compatible stores — Parquet or JSON Lines for analytics
PADAS TCP	Another PADAS node (MessagePack)

Multiple sink connectors can subscribe to the same stream independently.

The control plane (REST API)

PADAS exposes a REST API for:

Streams — create, configure, start/stop, produce events, consume events, query via PDL
Tasks — create, configure, start/stop, check status and last error
Connectors — create, configure, start/stop, check status and last error
Metrics — Prometheus-compatible scrape endpoint

The API is designed to be stateless for consumption: clients supply their own offsets when reading from streams. Authentication uses Bearer tokens via service accounts.

Building blocks vs systems

It helps to think in two layers:

Building blocks: events, streams, tasks, connectors
Systems you assemble: ingestion pipelines, normalization pipelines, detection pipelines, forwarding and replication between nodes

A typical security telemetry pipeline, for example:

A syslog source connector ingests raw firewall logs → raw_firewall stream
A processing task parses and normalizes fields (OCSF-aligned) → normalized_firewall stream
A detection task runs signature queries → alerts stream
A Kafka sink connector forwards alerts to a downstream SIEM
An object storage sink connector archives normalized events to S3 for long-term search

Each step is independently configurable and observable.

Observability and system streams

PADAS emits operational data as streams:

_padas_metrics — metrics events (throughput, drop rates, latency) exposed as Prometheus metrics and consumable via the API
_padas_internal — structured lifecycle and runtime events (connector starts, task errors, WAL segment rotations) for debugging and automation

Schema-on-read and normalization

PADAS is built to ingest heterogeneous real-world data without forcing a single schema at the edge.

Typical normalization workflow:

Ingest raw events into a stream (text or vendor-specific JSON) — no schema contract required
Parse with PDL tasks — extract fields using grok, regex, or key-value parsing
Normalize — map vendor-specific field names to whatever shape downstream systems expect: OCSF for security analytics, OpenTelemetry semantic conventions for observability, or your own internal model
Enrich — resolve IP addresses, asset owners, or threat intelligence via the lookup service
Route — fan out normalized, enriched events to detection tasks, sinks, and downstream systems

This approach keeps ingest reliable — a malformed or unexpected event format doesn't break the pipeline. Normalization happens downstream where it can be versioned and updated without touching the ingest configuration.

Events​

Event payload shapes​

Streams​

Durability (WAL)​

Positions and offsets​

Backpressure and buffering​

Tasks​

Processing mode​

Detection mode​

PDL (PADAS Domain Language)​

Connectors​

Source connectors​

Sink connectors​

The control plane (REST API)​

Building blocks vs systems​

Observability and system streams​

Schema-on-read and normalization​