Skip to main content
Version: 2.0.0 (Latest)

What is Padas?

PADAS is a high-performance streaming platform for ingesting, transforming, detecting, and routing events. It's designed for security and operational data (syslog, APIs, Kafka topics, files), but the core model is general: events in → processing → events out.

Throughput scales with pipeline complexity — from simple filter-and-forward to multi-stage windowed detection — and with available hardware. Data moves through the system as a continuous stream: connectors bring data in, tasks transform and detect patterns, connectors send results out.

How to read these docs

If you're new, start with:

  • This page (what PADAS is, where it fits)
  • Core concepts (the vocabulary: events, streams, tasks, connectors)
  • Architecture (how the pieces fit together)

What you can do with it

Ingest from many sources

  • Syslog over UDP/TCP (RFC 3164/5424, or raw lines)
  • HTTP polling or push-style ingestion
  • Kafka topics
  • Files (tail / batch read)
  • PADAS-to-PADAS TCP (MessagePack) for cross-node forwarding

Build pipelines with streams and tasks

  • Create streams as durable (WAL-backed) or in-memory channels for events
  • Attach tasks that run PDL queries to filter, transform, enrich, and aggregate
  • Run detection tasks that match events against one or more patterns and emit alerts
  • Send results to one or more downstream streams

Deliver data to destinations

  • Syslog, HTTP, Kafka, Splunk — forward to external systems and SIEMs
  • Object storage (S3-compatible) — write Parquet or JSON Lines for downstream analytics
  • PADAS REST API — consume streams directly from applications and dashboards

Enrich with context (optional)

  • Attach a lookup service to resolve fields (IP → geo, asset → owner, hash → threat intel) without schema changes to the core engine
  • PDL tasks apply enrichment inline during stream processing

How PADAS fits into a platform

PADAS Core is the streaming engine. It can run standalone or alongside optional context services:

LayerPurpose
PADAS CoreIngestion, processing, detection, routing
Lookup serviceField enrichment (geo, asset, threat intel)
Historical searchLong-term query over stored Parquet data
Entity / intel / vectorPlanned enrichment context services

The core engine stays schema-agnostic. Context services add richness without requiring a centralized schema at the edge.

Mental model

Key design ideas

  • Schema-on-read: ingest raw data first; normalize and enrich with PDL tasks later.
  • Streams are the backbone: connectors and tasks all publish to and consume from named streams.
  • Durability is configurable: per-stream WAL enables crash recovery and historical reads; in-memory streams minimize overhead for transient data.
  • Operational visibility: metrics and lifecycle events are published as streams (_padas_metrics, _padas_internal), exposable as Prometheus metrics.
  • Normalize to any schema: PDL tasks map vendor-specific fields to whatever shape downstream systems expect — OCSF for security analytics, OpenTelemetry semantic conventions for observability pipelines, or your own internal model. Raw data stays intact upstream.

A note on scope

Some ecosystems present streaming as a suite of loosely coupled products (brokers, schema registry, stream processors, connectors, governance layers). PADAS integrates the core streaming loop — ingest, process, detect, route — into a single runtime. Higher-level services (lookup/context, long-term storage and search) are optional additions, not prerequisites.

This means less operational surface area and faster time-to-value for the common case, with a clear extension path for larger deployments.

When to use PADAS (and when not to)

Good fits

  • High-throughput log and event ingestion, from simple routing to complex windowed detection
  • Real-time transformation, normalization, and field enrichment
  • Detection and windowed aggregation with stateful processing
  • Forwarding and fan-out to multiple destinations (Kafka, Splunk, S3, syslog)
  • Replacing or augmenting heavy-weight streaming stacks for security telemetry use cases

Not a fit by itself

  • Long-term analytics over months of data — use the optional historical search layer (object storage + query service) alongside PADAS
  • Full schema governance enforced at ingestion — PADAS intentionally accepts heterogeneous inputs; schema validation is a downstream or application concern

Next steps