Reference
Padas Domain Language (PDL) defines stream-processing expressions over JSON event records: boolean queries, parse stages, transformations (eval, rename, fields, output), partitioning, and windowed aggregations. Stages in a pipeline string are separated by | and execute sequentially in source order; each stage consumes the output projection of the previous stage and emits the next projection downstream.
Per-event stages (query, parse, eval, rename, fields, output, partition_by) evaluate against the current record (and, for aggregations, advance window state as defined by the runtime). Stateful windowed aggregations maintain buffers and partial reducers until a window closes or emits per every_event and windowing rules; their output shape differs from single-event transforms (see Aggregations).
Condensed syntax summary: Quick Reference. Worked examples: Comprehensive examples.
Table of Contents
- Execution model
- Command categories
- Query operations
- Transformation commands
- Parse commands
- Partitioning
- Aggregations
- Data types
- Field access
- Pipeline composition
- Examples
- Runtime considerations
- Errors
Execution model
Each input record moves left-to-right through pipeline stages; coloring matches the architecture data-plane legend (streams vs task-style processing).
| Concern | Semantics |
|---|---|
| Ordering | Pipeline tokens apply left-to-right; the event object (projection) is threaded stage by stage. |
| Filtering | A boolean query stage retains or discards the entire event for that branch when the expression is true or false. |
| Mutation | eval, rename, fields, output, and parse_* materialize or remove fields on the in-flight record. |
| Partitioning | partition_by derives a routing key used for stream-scoped partitioning and worker affinity. |
| Aggregation | Windowed reducers consume a stream of timestamps and values, maintain state until emission, and emit aggregate rows as flat JSON per Output shape (Glossary → Aggregation). |
| Single evaluation | For processing tasks, each pipeline string is evaluated once per input event on the hot path; aggregation state must not advance twice for the same logical event (Glossary → Processing task). |
Command categories
| Family | Role | Representative forms |
|---|---|---|
| Query | Boolean filter on the current record | field = "x", NOT (a AND b) |
| Parse | String field → structured fields | parse_json, parse_csv, parse_regex, … |
| Transform | Compute, rename, project, scalar output | eval, rename, fields, output |
| Partition | Key extraction for routing | partition_by f1, f2 |
| Aggregate | Windowed, possibly grouped, reducers | sum(x) AS s timespan=5m group_by k |
Detailed syntax, parameters, and edge cases follow in the sections below.
Query operations
A query is a boolean expression over the event JSON. If it evaluates to true, the event is retained for subsequent stages (unless a later stage drops it); if false, it is discarded for that branch.
Comparison operators
| Operator | Semantics | Example |
|---|---|---|
= | Equality; string RHS may include a single * wildcard (see Wildcards) | status = "active" |
!= | Inequality; wildcard strings supported | tier != "free" |
> < >= <= | Ordered comparison on numeric or comparable scalars | score >= 80 |
~= | Regex match on string | path ~= "^/api/v[0-9]+/" |
?= | String: RHS substring occurs in field value. Array: field array contains the scalar RHS | msg ?= "timeout", tags ?= "prod" |
IN | Field value equals any element of RHS array literal (RHS elements must be one uniform type) | action IN ["login", "logout"] |
?= versus IN
| Form | LHS | RHS | True when |
|---|---|---|---|
?= | String or array | Single scalar / literal | String: substring match. Array: membership of that value. |
IN | Any comparable field | Array literal | Field value equals one of the listed values. |
Logical operators and precedence
| Operator | Role |
|---|---|
NOT | Unary negation of the following comparison or parenthesized subquery. |
AND | Binary conjunction. |
OR | Binary disjunction. |
Precedence: NOT binds tightest (to its operand). AND binds tighter than OR. Thus a AND b OR c groups as (a AND b) OR c. Parentheses override defaults; mixed AND/OR without parentheses is discouraged in production definitions.
Evaluation: Parenthesized subexpressions evaluate as a unit. Function predicates (isnull, regex, …) evaluate to booleans like comparisons.
Wildcards
Wildcards apply to string comparisons with = and != only; other operators with wildcard patterns are rejected (Wildcard patterns only support = and != operators).
| Pattern | Meaning | Cost note |
|---|---|---|
* alone | Matches all events (true()); highest volume | Avoid on hot paths |
field = "*" | Field exists and is non-null | Cheaper than pattern translation |
field = "pre*" | Prefix match | Preferred among patterns |
field = "*suf" | Suffix match | Higher scan cost than prefix |
field = "*mid*" | Substring / embedded wildcards | Highest relative cost |
Patterns are translated for matching (implementation may cache compiled forms). Prefer explicit field = "*" for existence checks instead of bare *.
Regex (~=)
The RHS is a regular expression applied to the string field. Unbounded repetition and deep alternation increase backtracking risk and CPU cost; anchor patterns where possible.
Query functions
| Function | Semantics |
|---|---|
regex(field, pattern) | Regex match |
cidrmatch(field, cidr) | CIDR membership |
isnull / isempty / isnumber / isstring / isarray | Type / presence predicates |
true() / false() | Constant booleans |
Query examples
user.role = "admin"
event.data.user.id > 1000
categories IN ["security", "monitoring"]
email ~= "^admin@.*\\.com$"
(status = "active" AND priority >= 5) OR (type = "emergency")
(user.role = "admin" OR user.role = "superuser") AND NOT (status = "disabled")
isnull(field1) AND field2 > 0
Transformation commands
eval
Purpose: Evaluate expressions and assign one or more fields on the current event.
Syntax:
eval field = expression
eval f1 = e1, f2 = e2, ...
Runtime behavior: Assignments in one eval run in source order; later assignments may read fields produced earlier in the same statement. Expressions may invoke functions (math, string, coercion, conditional). Type mismatches or invalid arity surface as runtime or validation errors per engine rules.
Arithmetic operators
| Operator | Description | Example |
|---|---|---|
+ | Addition | total = price + tax |
- | Subtraction | net = gross - fee |
* | Multiplication | total = price * quantity |
/ | Division | average = sum / count |
Conditionals and coercion
eval status = if(score > 80, "pass", "fail")
eval grade = case(score >= 90, "A", score >= 80, "B", "C")
eval display_name = coalesce(nickname, first_name, "Anonymous")
eval str_val = to_string(id)
eval num_val = to_number(price_string)
Supported eval functions (selected)
| Category | Functions |
|---|---|
| Conditional | if, case, coalesce |
| Coercion | to_number, to_string, to_boolean |
| String | to_upper, to_lower, trim, substring, replace, split, join |
| Date/time | now, format_date, parse_date |
| Math | abs, round, floor, ceil, sqrt, pow, log, log10 |
| Hash / encoding | md5, sha1, sha256, base64_encode, base64_decode |
| Arrays | length, index, slice |
| Utility | random, uuid, url_encode, url_decode |
Exhaustive signatures and edge cases: Eval functions.
rename
Purpose: Map existing field paths to new names without transforming values.
Syntax:
rename old AS new
rename a AS x, b AS y
Runtime behavior: Path resolution uses the same field-access rules as queries and eval. Invalid paths or collisions follow engine validation.
rename user_id AS id
rename user.profile.first_name AS first_name, user.profile.last_name AS last_name
rename data[0] AS first_item
fields
Purpose: Project the event to a whitelist of fields, or remove listed fields and retain the rest.
Syntax:
fields f1, f2, f3
fields - f1, f2
fields remove f1, f2
Runtime behavior: Default form keeps only listed keys. - or remove drops listed keys. Order of operations matters relative to downstream sinks and aggregations.
output
Purpose: Select a single field as the pipeline result for consumers that expect a scalar or explicitly typed string payload.
Syntax:
output field
output field type=string
Runtime behavior:
| Aspect | Semantics |
|---|---|
| Selection | Exactly one field path; nested paths use dot / bracket notation. |
| Missing field | If the path does not resolve, the engine errors (no silent null emission). |
| Raw type | Without type=string, the value’s JSON type is preserved (number, boolean, array, object). |
type=string | Coerces the value to a string for wire formats; objects and arrays serialize as JSON text. |
| Downstream | The result becomes the payload / result passed to the stage consumer (sink or task binding) per task configuration. |
output score
output event.payload
output count type=string
Parse commands
Purpose: Parse a string-typed field and attach extracted keys to the current event.
Common form:
parse_<kind> <source_field> [options]
Command summary
| Command | Input | Notable options |
|---|---|---|
parse_csv | Delimited row | delimiter, header |
parse_kv | k=v tokens | delimiter |
parse_cef | CEF string | — |
parse_leef | LEEF string | — |
parse_xml | XML string | xpath |
parse_json | JSON text | — |
parse_regex | Free text | flags, named captures (?P<name>…) |
Field conflict resolution
When a parse would create a field name that already exists on the event, the runtime renames the new key by appending incrementing numeric suffixes so collisions do not overwrite existing values silently.
Per-command notes
parse_json: Merges parsed object keys into the projection (nested target paths supported where grammar allows).parse_csv: Header row may be inferred or supplied; delimiter defaults are engine-defined unless overridden.parse_kv: Supports multiple pairs per line and quoted values spanning spaces.parse_regex: Named groups become field names; optionalflags(e.g.i). Patterns with catastrophic backtracking are a runtime cost risk.parse_xml: XPath selects extracted fragment; useful for narrow fields inside larger XML.parse_cef/parse_leef: Normalise vendor encodings to first-class fields.
parse_csv log_data delimiter="|" header="ts,user,action"
parse_kv message delimiter=":"
parse_json envelope.body
parse_regex raw "(?P<ip>\\S+) \\[(?P<ts>[^\\]]+)\\]" flags="i"
Partitioning
Purpose: Derive a partition key from one or more fields for key-based routing, stream-scoped isolation, and coherent keyed execution (including alignment with grouped aggregates when combined with rekey).
Syntax:
partition_by field
partition_by f1, f2, ...
Runtime semantics:
| Topic | Behavior |
|---|---|
| Single field | Key is the field’s stringified value (per engine casting rules). |
| Composite key | Multiple fields are joined with **` |
| Missing / null | Missing or null components are represented as the literal token null in the composite key string. |
| Stream scope | The same key value in different streams does not share partition state. |
| Worker assignment | The key drives write-queue / worker affinity so related events cohere for ordering-sensitive processing. |
| Downstream | Partition key influences how partitioned sinks and grouped aggregates route; pair with aggregate rekey=true when post-aggregate routing must track group_by dimensions. |
parse_json | partition_by user_id | count timespan=5m group_by user_id
partition_by tenant_id, user_id | sum(amount) timespan=1h group_by user_id
Aggregations
Aggregations reduce many events within a time window to one or more emit records, optionally per group_by key. All functions in one statement share the same window, where, and grouping clauses.
Syntax
<fn1> [AS a1], <fn2> [AS a2], ...
[window=<type>]
timespan=<n><unit>
[hop=<n><unit>] [slide=<n><unit>] [gap=<n><unit>] [max_duration=<n><unit>]
[group_by f1, f2, ...]
[where <query>]
[rekey=true|false]
[every_event=true|false]
Parameters (selected):
| Parameter | Role |
|---|---|
timespan | Window length (required). |
window | tumbling (default), sliding, hopping, session. |
slide | Sliding step; default is engine-defined (bounded, often ~1s or a fraction of timespan). |
hop | Hopping period (required for hopping). |
gap / max_duration | Session inactivity gap and optional cap. |
group_by | Distinct key set; each key maintains isolated reducer state. |
where | Predicate on ingested events entering the window. |
rekey | When true, outgoing aggregate events adopt routing key from group_by fields. |
every_event | false (default): emit when the window closes (event time passes window_end). true: emit on every contributing event (debug / high-volume warning). |
Aggregation state semantics
| Topic | Semantics |
|---|---|
| Stateful execution | Windowed aggregations maintain state (partial sums, buffers, session clocks) until emission or window close. |
| Retention | State lives for the lifetime of open windows; larger timespan, slide, or session gap increases retention duration. |
every_event=false | Reduces downstream event volume by emitting on window boundaries only. |
every_event=true | Materializes partial aggregates per event; suitable for debugging or low-rate streams; increases CPU and sink load. |
group_by cardinality | Each distinct key holds separate partial state; high cardinality grows memory proportionally. |
collect | Retains full JSON bodies of matched events in-window → large memory footprint vs scalar reducers. |
Output shape
Windowed aggregations emit flat rows (not one nested object per group under arbitrary parent keys):
- Without
group_by: one object withwindow_start,window_end, and metric fields at the root. - With
group_by: one object per group per emission, each withgroup_key(pipe-separated composite, same order asgroup_by),window_start,window_end, and metrics at the root.
Multi-group results appear as a JSON array of those objects in the PDL API; processing tasks fan out one sink event per array element where configured—see Glossary → Aggregation.
{
"window_start": 1704067200000,
"window_end": 1704067500000,
"event_count": 150
}
[
{"group_key": "ok", "window_start": 1704067200000, "window_end": 1704067500000, "c": 100},
{"group_key": "err", "window_start": 1704067200000, "window_end": 1704067500000, "c": 3}
]
Multi-function aggregation
Multiple reducers may share one window specification in a single statement—one pass maintains consistent window boundaries and where filtering across metrics.
count AS n, sum(bytes) AS total, dc(user_id) AS users timespan=5m group_by tenant where status = "ok"
Aggregation functions
| Function | Description |
|---|---|
count / count(field) | Event count / non-null count |
sum, avg, min, max | Numeric reducers |
dc | Distinct count |
median, stddev, variance | Statistical |
values | Multiset of distinct values (engine serialisation rules apply) |
first / last | First / last non-null field value in processing order within window/group |
earliest / latest | Field value from event with minimum / maximum canonical envelope timestamp; ties keep first stabilised winner |
collect(query) | Array of full events matching query inside window |
collect
collect(score > 80) AS high_scores timespan=5m group_by user_id
Output: Emits window_start, window_end, and an array field (default name collected_events if no alias) containing full event objects. Empty arrays are valid when no matches occur.
Windowing strategies
| Type | Required params | Notes |
|---|---|---|
tumbling | timespan | Fixed, non-overlapping buckets. |
sliding | timespan | Optional slide ≤ timespan; overlapping windows. |
hopping | timespan, hop | hop ≤ timespan. |
session | timespan, gap | Optional max_duration ≥ timespan when set. |
Validation (typical): timespan > 0; hop > 0; slide <= timespan; gap > 0; invalid combinations fail at validate or plan time with explicit errors.
Representative aggregation examples
count timespan=5m
count AS c, sum(latency) AS total_ms timespan=1m group_by region where ok = true
sum(amount) AS rev timespan=1h group_by user_id, sku rekey=true
count AS events window=session timespan=1h gap=5m max_duration=2h
Data types
PDL expressions operate on JSON-aligned types.
| Type | Literal / form | Notes |
|---|---|---|
| String | "..." | Comparisons may use wildcards with = / != only. |
| Number | 123, 45.67 | Arithmetic and ordered comparisons. |
| Boolean | true, false | Logical and equality. |
| Array | [...] | ?=, IN, indexing. |
| Object | Via field paths | Dot and bracket access. |
Coercion: Mixed-type arithmetic or concatenation may trigger implicit coercion; prefer explicit to_number / to_string for deterministic pipelines.
Field access
| Style | Form | Example |
|---|---|---|
| Dot | Nested objects | user.profile.id |
| Bracket index | Arrays | items[0] |
| Bracket key | Maps / objects | headers["content-type"] |
| Mixed | rows[0].cells["id"] |
Field names allow letters, digits, _, ., and may start with @ (e.g. @timestamp). Wildcard performance guidance lives under Query operations → Wildcards; avoid duplicating wildcard tables here.
Pipeline composition
query | parse_json body | eval x = y + 1 | rename x AS z | fields z
| Rule | Semantics |
|---|---|
| Pipe order | Stages evaluate strictly in listed order. |
| At most one windowed aggregation | A single pipeline string may contain one windowed aggregation segment (timespan= with aggregate functions) unless grammar explicitly allows otherwise; the padas-pdl parser / validator is authoritative. |
| Trailing query | A boolean expression after the aggregate may filter aggregate output when supported by grammar (e.g. threshold on emitted metrics). |
| Hot path | Processing tasks: one PDL evaluation per input event; aggregation state advances accordingly (Glossary → Processing task). |
parse_kv message
| eval ok = response = 200
| count timespan=5s group_by user, ok
| count > 5
Use validate_pdl_syntax and engine tests for version-specific acceptance.
Examples
status = "error" | parse_json detail | eval sev = to_upper(code) | fields ts, sev, detail
parse_csv line | eval total = qty * price | rename total AS order_total | fields order_id, order_total
partition_by tenant | sum(bytes) AS b timespan=5m group_by svc rekey=true
Additional patterns: Quick Reference, Comprehensive examples.
Runtime considerations
| Concern | Guidance |
|---|---|
| Filter early | Apply cheap boolean queries before parse_* when predicates do not depend on parsed fields. |
| Project early | Use fields to drop large blobs before heavy eval or aggregations to cut memory and serialization cost. |
| Parse cost | Structured parsers and regex extraction dominate CPU on wide or high-rate streams. |
| Regex | Favour anchored, bounded patterns; avoid deep nesting and unbounded */+ on hot paths. |
| Coercion | Repeated implicit string/number mixing in eval adds overhead; coerce once. |
| Aggregation state | group_by cardinality, timespan, slide, collect, and every_event=true directly impact heap usage and emit rate. |
| Windows | Long timespan defers emission and increases per-window buffer size; short slide on sliding windows multiplies overlapping state. |
Errors
| Error class | Typical cause | Recommended action |
|---|---|---|
| Syntax | Invalid token order, unknown function, malformed window clause | Run validate_pdl_syntax; compare against grammar in padas-pdl; fix spelling / punctuation. |
| Type mismatch | String compared to number without coercion, illegal operator for type | Insert to_number / to_string / to_boolean; narrow where predicates. |
| Missing field | Path absent on event, or output target missing | Correct path; guard with coalesce or query predicates before use. |
| Parse | Input string not valid for selected parse_* | Inspect raw field; add upstream where; tighten pattern. |
| Window parameters | hop > timespan, slide > timespan, invalid gap / max_duration | Consult Windowing strategies validation table; adjust literals. |
| Wildcard / operator | Wildcard with > / < / ~= / etc. | Use = / != only for wildcard patterns, or switch to regex ~=. |
| Resource pressure | High-cardinality group_by, large collect, every_event=true on flood traffic | Reduce cardinality, shrink window, disable every_event, or scale consumers. |
The parser and runtime implementation for your deployed engine version are authoritative for version-specific acceptance and error strings. Validate production PDL against the same engine version used at runtime.