Core concepts
Events
An event is the unit of data in PADAS. Events carry a payload (JSON or text), plus metadata like timestamps and optional keys for partitioning and ordering.
Event payload shapes
- JSON: structured objects for parsed or enriched data
- Text: raw lines (e.g. syslog in raw mode) when you want to preserve the original message exactly
Events are immutable once written to a stream. Transformation produces new events on a downstream stream — the source is never modified.
Streams
A stream is an append-only sequence of events. Streams are the backbone of PADAS: connectors write to streams; tasks read from a source stream and write to sink streams; API clients consume from streams.
Durability (WAL)
Streams can optionally be backed by a write-ahead log (WAL) for durability and historical reads:
- WAL-backed streams: data is persisted to disk, survives process restarts, and can be replayed from any offset. Segments are sealed, indexed, and subject to configurable retention (by time or size).
- In-memory streams: data lives only in the process buffer. Faster and lower overhead, but events are lost on restart.
Choose WAL-backed streams when downstream consumers need reliable delivery or replay. Use in-memory streams for transient fan-out and high-volume intermediate stages where durability overhead outweighs the benefit.
Positions and offsets
When you read from a stream, you choose a starting position:
earliest— replay all retained events from the beginning (WAL-backed only)- Offset-based — seek to a specific offset (WAL-backed only)
latest— start from the next new event (both WAL and in-memory)
The API is stateless for consumption: clients track their own offsets.
Backpressure and buffering
Streams have configurable bounded buffers that absorb bursts. When a buffer fills, PADAS applies one of three modes:
- Block — the producer pauses until space is available
- Drop — events are discarded and counted in metrics
- Timeout — the producer waits briefly, then drops if still full
Buffer sizing and backpressure mode are key latency and throughput tuning levers.
Tasks
A task subscribes to a source stream, runs a PDL pipeline over each event, and publishes results to one or more output streams.
Processing mode
A processing task runs a single PDL pipeline: filter → transform → enrich → (optional) aggregate. Every matching event produces output. Use processing tasks for normalization, field extraction, enrichment, and format conversion.
Detection mode
A detection task holds one or more named PDL queries. Each query is evaluated independently against incoming events. When a query matches, the task emits a detection event (including the query name and any extracted fields). Detection tasks can be configured to stop at the first match or evaluate all queries.
Detection mode is suited for signature-based alerting, anomaly flags, and policy checks running against a normalized event stream.
PDL (PADAS Domain Language)
PDL is the query and processing language for tasks and the ad-hoc query API. Key capabilities:
- Filtering — boolean expressions over event fields
- Transformation — field assignment, renaming, type coercion, string manipulation
- Extraction — regex, grok, and key-value parsing from raw text fields
- Enrichment — field lookups against the lookup service
- Aggregation — windowed counts, sums, and grouping over time windows
- Conditional logic — branching output based on field values
PDL queries run inline at stream processing speed — no separate query engine to operate.
Connectors
A connector integrates an external system with a PADAS stream. Each connector has a lifecycle (stopped → starting → running → stopping) managed via the REST API.
Source connectors
Source connectors ingest data from the outside world and publish events to a target stream.
| Connector | What it ingests |
|---|---|
| Syslog | UDP/TCP syslog (RFC 3164/5424), structured or raw mode |
| HTTP | REST endpoint polling or push ingestion, JSON payloads |
| Kafka | Consumer from Kafka topics (SASL/SSL, configurable group) |
| File | File tail (continuous) or batch read |
| PADAS TCP | High-throughput cross-node forwarding (MessagePack) |
Sink connectors
Sink connectors subscribe to a stream and deliver events to an external destination.
| Connector | What it delivers to |
|---|---|
| Syslog | External syslog receivers over UDP/TCP |
| HTTP | REST endpoints (POST JSON) |
| Kafka | Kafka topics (SASL/SSL) |
| Splunk | Splunk HEC |
| Object Storage | S3-compatible stores — Parquet or JSON Lines for analytics |
| PADAS TCP | Another PADAS node (MessagePack) |
Multiple sink connectors can subscribe to the same stream independently.
The control plane (REST API)
PADAS exposes a REST API for:
- Streams — create, configure, start/stop, produce events, consume events, query via PDL
- Tasks — create, configure, start/stop, check status and last error
- Connectors — create, configure, start/stop, check status and last error
- Metrics — Prometheus-compatible scrape endpoint
The API is designed to be stateless for consumption: clients supply their own offsets when reading from streams. Authentication uses Bearer tokens via service accounts.
Building blocks vs systems
It helps to think in two layers:
- Building blocks: events, streams, tasks, connectors
- Systems you assemble: ingestion pipelines, normalization pipelines, detection pipelines, forwarding and replication between nodes
A typical security telemetry pipeline, for example:
- A syslog source connector ingests raw firewall logs →
raw_firewallstream - A processing task parses and normalizes fields (OCSF-aligned) →
normalized_firewallstream - A detection task runs signature queries →
alertsstream - A Kafka sink connector forwards alerts to a downstream SIEM
- An object storage sink connector archives normalized events to S3 for long-term search
Each step is independently configurable and observable.
Observability and system streams
PADAS emits operational data as streams:
_padas_metrics— metrics events (throughput, drop rates, latency) exposed as Prometheus metrics and consumable via the API_padas_internal— structured lifecycle and runtime events (connector starts, task errors, WAL segment rotations) for debugging and automation
Schema-on-read and normalization
PADAS is built to ingest heterogeneous real-world data without forcing a single schema at the edge.
Typical normalization workflow:
- Ingest raw events into a stream (text or vendor-specific JSON) — no schema contract required
- Parse with PDL tasks — extract fields using grok, regex, or key-value parsing
- Normalize — map vendor-specific field names to whatever shape downstream systems expect: OCSF for security analytics, OpenTelemetry semantic conventions for observability, or your own internal model
- Enrich — resolve IP addresses, asset owners, or threat intelligence via the lookup service
- Route — fan out normalized, enriched events to detection tasks, sinks, and downstream systems
This approach keeps ingest reliable — a malformed or unexpected event format doesn't break the pipeline. Normalization happens downstream where it can be versioned and updated without touching the ingest configuration.