Skip to main content
Version: 2.0.0 (Latest)

Quick Reference

Padas Domain Language (PDL) defines stream-processing expressions over JSON events: filtering (boolean queries that retain or discard a record), parsing (string-to-field extraction), transformation (eval, type coercion, conditionals), routing (partition_by, aggregate rekey), and aggregation (windowed stateful reduction). Normative syntax and edge cases: Reference.

A pipeline is a linear chain of stages separated by | (or an equivalent stage list in the task configuration). Stages execute sequentially in source order. Each stage consumes the event projection produced by the previous stage and emits the next projection downstream; query stages filter without mutating retained rows unless combined with mutation stages in the same task definition.

Execution semantics

ConceptBehavior
Stage chainingStages apply in order; there is no implicit parallelism inside a single PDL pipeline unless the runtime maps partitions independently.
Event flowOne inbound JSON record enters the chain; each stage reads the current field tree; parsers and eval materialize or overwrite fields; fields projects a subset; output may reduce the payload to a scalar for specialized sinks.
FilteringA query stage evaluates a boolean expression; false drops the event for that branch; true forwards the unchanged projection unless a later stage mutates it.
Aggregation stateWindowed aggregations maintain state until the window closes and the engine emits one or more aggregate records per window (and per group_by key); see Aggregation.
Routingpartition_by and aggregate rekey influence how the runtime routes keyed work and sink partitions; see Partitioning.
Windowstimespan bounds the window lifecycle; tumbling, sliding, and session modes control overlap and gap handling; open windows retain buffers and partial aggregates until emission.

Query expressions

Queries filter whole events: the expression evaluates to a boolean; true retains the event for subsequent stages, false discards it for that processing branch (unless the enclosing task type documents alternate behavior).

Comparison syntax

Field paths use dot notation for nested JSON. Operators combine a path with a literal or comparable value.

field = value
field != value
field > value
field >= value
field < value
field <= value
field ?= value
field ~= pattern
field IN [v1, v2, v3]
OperatorSemantics
= / !=Equality / inequality on scalars; string = / != may use a single * wildcard in the pattern (see Wildcards).
> / < / >= / <=Ordered comparison on numeric or otherwise comparable scalars; not defined for wildcard string patterns.
?=String: substring contains the right-hand literal. Array: true if the array contains the scalar element (membership).
~=Regex match on string values; pattern syntax follows the engine’s regex implementation.
INTrue if the field value equals any element of the right-hand array literal; array elements must be a uniform type (String or Integer) per query definition rules.

Logical operators and precedence

NOT predicate
left AND right
left OR right
(query1 AND query2) OR query3
ConstructSemantics
NOTUnary negation of the immediately following comparison or parenthesized subquery.
AND / ORBinary conjunction / disjunction; operands are comparisons or parenthesized queries.

Precedence: NOT binds tightest (to its operand). AND binds tighter than OR. Therefore a AND b OR c groups as (a AND b) OR c. OR chains associate left-to-right at the same precedence level. Parentheses override defaults and should be used wherever mixing AND and OR would otherwise be ambiguous.

Evaluation order: Subexpressions inside parentheses evaluate as a unit before their result participates in outer operators. For deterministic matching and auditability, prefer explicit parentheses over reliance on default precedence.

Boolean and null semantics

Comparisons evaluate against the resolved field value and literal; missing paths or type mismatches surface as runtime or validation errors depending on stage configuration—see Errors. AND and OR use ordinary boolean truth; short-circuiting follows typical boolean evaluation in the engine implementation.

Wildcards

With = / != on string JSON, a single * wildcard is permitted in the pattern. Wildcard patterns are translated internally for matching; leading and embedded * patterns can increase scan cost versus trailing * prefix forms. field = "*" denotes field existence (non-null) semantics per deployment. Standalone * matches all events and should be treated as a last-resort predicate in high-volume streams.

Regex (~=)

The right-hand side is a regular expression applied to the string field. Patterns may be cached by the runtime; unbounded quantifiers and nested alternation increase backtracking risk and CPU cost. Prefer anchored, bounded patterns for hot paths.

Query examples

user.age > 25
user.name = "Alice"
user.premium = true
scores ?= 90
scores.length > 3
scores[0] > 80
user.age > 25 AND user.premium = true
user.department = "Engineering" OR user.department = "Sales"
status IN ["active", "pending"]
email ~= "^[^@]+@example\\.com$"

Parse commands

Parse stages read a string field (raw line, embedded JSON text, CEF/LEEF, etc.), parse the payload, and attach structured fields to the current event.

Parse semantics

TopicBehavior
Extracted fieldsSuccessful parses materialize new keys on the event object (or nested target where the command supports a path).
Collision / overwriteNew keys produced by a parse coexist with prior fields; if a generated key collides with an existing name, the effective value is last writer for that stage chain position—confirm collision rules for your engine version in Reference.
Output structureparse_json merges object keys into the projection; parse_csv / parse_kv / parse_regex / parse_cef / parse_leef / parse_xml emit flat or path-scoped fields per command grammar.
Field attachment modelParses transform the in-flight record in place for the remainder of the pipeline unless a later stage renames, projects (fields), or replaces the payload (output).

Command forms

JSON — Parses a string field as JSON and merges object fields.

parse_json field_name
parse_json field_name.subfield

CSV — Splits delimiter-separated values; optional header= defines or overrides column names.

parse_csv field_name
parse_csv field_name header="col1,col2,col3"
parse_csv field_name delimiter=","

XML — Extracts via XPath for legacy or XML-embedded payloads.

parse_xml field_name
parse_xml field_name xpath="//user/name"

Key–value — Tokenizes key=value or key:value forms.

parse_kv field_name
parse_kv field_name delimiter="="

Regex — Named capture groups become output field names.

parse_regex field_name "(?P<level>\w+) (?P<msg>.*)"
parse_regex field_name "(?P<level>\w+) (?P<msg>.*)" flags="i"

CEF / LEEF — Normalizes ArcSight-style CEF and LEEF into standard fields.

parse_cef field_name
parse_leef field_name

Transformations

eval

eval evaluates one or more expressions and materializes fields. Assignments execute in source order within a single eval statement; later assignments may read fields produced earlier in the same statement.

eval field = expression
eval field1 = expr1, field2 = expr2

Arithmetic — Numeric operators and parentheses follow conventional precedence; coercion may occur when types differ—normalize with to_number / to_string to control cost.

eval total = price * quantity
eval discount = price * 0.1
eval final = (price * quantity) * (1 - discount)

Mathematical functions — Unary/binary numeric helpers (sqrt, abs, round, floor, ceil, pow, log, log10).

eval sqrt_val = sqrt(value)
eval abs_val = abs(value)
eval round_val = round(value)

String functions — Concatenation, case, length, substring, replace.

eval full_name = name + " " + surname
eval upper_name = to_upper(name)
eval lower_name = to_lower(name)
eval name_len = length(name)
eval substr = substring(text, start, length)
eval replaced = replace(text, "old", "new")

Type conversion — Explicit coercion reduces ambiguity and downstream serialization surprises.

eval str_val = to_string(number)
eval num_val = to_number(string)
eval bool_val = to_boolean(value)

Conditionalsif, case, coalesce evaluate branches and return the first matching or non-null value per function semantics.

eval status = if(condition, true_value, false_value)
eval grade = case(age >= 65, "senior", age >= 18, "adult", "minor")
eval result = coalesce(field1, field2, "default")

Aggregation

Aggregates consume a stream of events within a time window (timespan=…) and emit summarized records. AS names output metrics. Exact JSON shapes and multi-group emission: Reference → Output shape, Glossary → Aggregation.

Runtime and state

TopicSemantics
Runtime stateWindowed aggregations maintain state (partial sums, counts, buffers, session clocks) until the window closes or the session expires.
Window lifecycletimespan defines the window length; **`window=tumbling
State retentionState exists for the duration of open windows; larger timespan and higher cardinality group_by increase memory footprint.
Grouped outputgroup_by emits one aggregate row per distinct key per window; multiple groups may serialize as a JSON array; downstream tasks may fan out one sink event per row.
Filtering into the windowwhere restricts which events enter the aggregate computation.
rekey=trueRewrites the routing key from group_by fields so partitioned sinks route consistently with aggregate keys.

Forms

sum(field) AS alias timespan=5m
avg(field) AS alias timespan=5m
count AS alias timespan=5m
min(field) AS alias timespan=5m
max(field) AS alias timespan=5m
first(field) AS alias timespan=5m
last(field) AS alias timespan=5m
earliest(field) AS alias timespan=5m
latest(field) AS alias timespan=5m
dc(field) AS alias timespan=5m
sum(field1) AS total, avg(field2) AS average timespan=5m
sum(field) AS total group_by group_field timespan=5m
avg(field) AS average group_by field1, field2 timespan=5m
sum(field) AS total window=tumbling timespan=5m
sum(field) AS total window=sliding timespan=5m slide=1m
sum(field) AS total window=session timespan=5m gap=2m
sum(field) AS total where condition timespan=5m
sum(amount) AS total timespan=1h group_by user_id, department rekey=true
count AS events timespan=5m group_by user_id rekey=true

Partitioning

partition_by extracts one or more fields that form the partition key for keyed execution, scaling, and sink routing. It routes logical work to a stable key derived from the event.

partition_by user_id
partition_by user_id, department
parse_json | partition_by user_id | count timespan=5m
partition_by tenant_id, user_id | sum(amount) timespan=1h group_by user_id

Downstream implications: The key influences which downstream operator instance consumes the event and how aggregates align with sink partitions; combine with aggregate rekey when the post-aggregate key must match the partition scheme.

Output shaping

fields

fields projects the event to a subset of keys (whitelist) or removes listed keys.

fields field1, field2, field3
fields remove field1, field2
fields - field1, field2

Reducing payload size before heavy eval or sinks lowers memory and serialization cost.

rename

rename maps existing field paths to new names without transforming values.

rename old_field AS new_field
rename field1 AS new1, field2 AS new2

output

output selects a single field and exposes its value as the pipeline result for stages that expect a scalar or explicitly typed text payload (certain sink encodings).

output field_name
output field_name type=string
BehaviorSemantics
Scalar extractionThe engine projects one field’s value as the primary emission for the stage result.
Payload replacementThe downstream record serializes around that scalar (or typed string) rather than the full JSON object unless the task merges metadata separately.
Downstream serializationtype= hints string coercion for wire formats that require text.
Single-field emissionMultiple output stages in one logical pipeline are invalid or last-wins per task grammar—see Reference.

Examples

Pipeline compositions

parse_json raw_data | eval total = price * quantity | fields total
parse_csv data |
eval total = price * quantity |
eval tax = total * 0.08 |
eval final = total + tax |
rename final AS order_total |
fields order_total
user.age > 25 |
eval status = if(premium, "vip", "regular") |
fields name, status

Predicate, transform, and window patterns

user.name != null AND user.email != null AND user.age > 0
eval full_name = first_name + " " + last_name
eval age_group = case(age < 18, "minor", age < 65, "adult", "senior")
eval is_high_value = amount > 1000
sum(amount) AS revenue timespan=1d group_by date
count AS action_count timespan=1h group_by user_id where action = "purchase"
eval ratio = if(divisor != 0, dividend / divisor, 0)
eval name = coalesce(user.name, "Unknown")

Non-normative sample payloads

The JSON below is illustrative only; it does not define schema requirements. Use for manual tests or parse_json fixtures.

E-commerce order

{
"order_id": "ORD-123",
"customer": {
"name": "Alice",
"email": "alice@example.com",
"tier": "premium"
},
"items": [
{"name": "Laptop", "price": 999.99, "quantity": 1},
{"name": "Mouse", "price": 29.99, "quantity": 2}
],
"discount_code": "SAVE10"
}

Log entry

{
"timestamp": "2024-01-20T14:30:25Z",
"level": "ERROR",
"message": "Database connection failed",
"details": "timeout=30s, retries=3, host=db-prod-01"
}

User event

{
"user_id": 123,
"action": "purchase",
"timestamp": 1640995200,
"amount": 99.99,
"product": "Laptop",
"category": "Electronics"
}

Performance and runtime considerations

ConcernGuidance
Parse costparse_regex, parse_xml, and large parse_json on wide strings dominate CPU; filter before parse when the predicate does not depend on parsed fields.
Regex backtracking~= and parse_regex patterns with nested quantifiers risk exponential backtracking; prefer bounded classes and anchors.
Memory / stateLong timespan, high-cardinality group_by, and session windows retain more in-flight state.
Aggregation costMore functions per window and more keys increase merge work at emit time.
ProjectionEarly fields drops large blobs before eval and aggregations, reducing per-event memory and serialization volume.
Type coercionRepeated implicit coercion in eval adds overhead; coerce once with to_number / to_string.

Errors

Failures fall into overlapping categories below; the exact error code and message depend on the engine build.

CategoryDescription
Validation failurePipeline or query fails static checks (syntax, unknown function, illegal token order) before execution.
Execution-time errorsA stage evaluates at runtime and encounters an illegal value (for example division by zero, missing path where required).
Parse-time errorsA parse_* stage receives input that does not match the expected format.
Runtime failure modelThe task may drop the event, retry per connector policy, or surface the error to observability depending on task type—see task and stream documentation.
Message / code (typical)CauseMitigation
FieldNotFoundResolved path missing on the eventCorrect the path; use coalesce or guards
InvalidSyntaxToken order or spellingCompare with Reference
TypeMismatchString vs number, etc.Insert to_string / to_number / to_boolean
DivisionByZeroDivisor evaluates to zeroGuard with if
ParseErrorInput not valid for parse_*Inspect raw field; where before parse