zpars

A grammar parser toolkit written in Zig - ABNF, BNF, PEG, CFG, ERE, and S-expressions
with comptime parser combinators, a bytecode VM, native JIT compilers, and AOT compilation for AArch64 and x86_64.

Features

Multi-format parsing - ABNF, BNF, PEG, CFG, ERE, and S-expression grammars, each with tokenizer, parser, and error diagnostics.
Validation - Detects duplicate rules, undefined references, unused rules, and unproductive cycles.
Runtime matching - Match input strings against any rule in a dynamically loaded grammar.
Formatting - Pretty-print grammars back to canonical form with aligned operators.
Comptime combinators - Zero-overhead parser combinator library resolved entirely at comptime.
ABNF compiler - Compile ABNF grammar strings into combinator types at comptime - define your grammar in standard ABNF and get a parser for free.
Grammar VM - Bytecode virtual machine inspired by LPeg. Compiles any grammar to instructions and executes with backtracking and capture groups.
Peephole optimizer - Optimizes compiled bytecode with charset-to-char reduction, consecutive char fusion into string instructions, common prefix factoring, and optional char fusion.
JIT compilers - Compiles bytecode to native machine code (AArch64 and x86_64), eliminating interpreter dispatch overhead.
AOT compilation - Compile grammars ahead of time to portable .zpar binary blobs containing native machine code. Load and execute them without the grammar compiler.
Parse trees - Run any multi-rule grammar with auto-rule-captures to produce a hierarchical tree (S-expression or JSON), with each rule call becoming a node typed by rule name. Works on both the VM and the JIT, with optional packrat memoization and Warth-style left recursion.
Tree queries - Tree-sitter-style structural queries over parse trees, with @captures, alternation, quantifiers, anchors, field selectors (name: (pattern), declared via backwards-compatible #@ field directives), anonymous-token literal matches (#@ tokens "..." or --tokens=all), and #eq?/#match? predicates (regex compiled at query-compile time via the bundled ERE engine).
Benchmarks - 1M-iteration benchmarks comparing comptime vs runtime, optimized vs unoptimized VM, and interpreter vs JIT.

Comptime ABNF Compiler

Define a grammar in ABNF and compile it to a parser type at comptime - zero runtime overhead:

const zpars = @import("zpars");

const HttpVersion = zpars.abnf.Compiler.Compile(
   \\version = "HTTP/" 1*DIGIT "." 1*DIGIT
, "version");

test "parse HTTP version" {
   const r = HttpVersion.parse("HTTP/1.1 OK").?;
   try std.testing.expectEqualStrings("HTTP/1.1", r.value);
   try std.testing.expectEqualStrings(" OK", r.rest);
}

Multi-rule grammars with cross-references work too:

const Pair = zpars.abnf.Compiler.Compile(
   \\number = 1*DIGIT
   \\pair   = number "," number
, "pair");

const r = Pair.parse("42,7!").?;
// r.value == "42,7", r.rest == "!"

All 16 RFC 5234 core rules (ALPHA, DIGIT, SP, CRLF, etc.) are available implicitly.

Parser Combinators

The combinator library can also be used directly:

const c = zpars.combinators;

const Digit = c.CharRange('0', '9');
const Number = c.Capture(c.Many(Digit, .{ .min = 1 }));
const P = c.Seq(.{ Number, c.Literal(","), Number });

const r = c.Capture(P).parse("42,7!").?;
// r.value == "42,7"

Available primitives: Literal, Char, CharRange, AnyOf, NoneOf, ByteLiteral, CaseInsensitiveLiteral, Any, Eof.

Available combinators: Seq, Alt, Many, Optional, Not, Map, Capture.

Runtime Matcher

For grammars loaded at runtime, use the Matcher:

const zpars = @import("zpars");

var scanner = zpars.abnf.Scanner.init(grammar);
const tokens = scanner.scanTokens();
var parser = zpars.abnf.Parser.init(tokens, grammar);
const rules = try parser.parse();
var validator = zpars.Validator.init(allocator, rules);
const merged = try validator.validate();
const matcher = try zpars.Matcher.init(allocator, merged);

const r = matcher.match("version", "HTTP/1.1 OK").?;
// r.value == "HTTP/1.1", r.rest == " OK"

Grammar VM

A bytecode virtual machine that compiles grammar ASTs into instructions and executes them with backtracking. Inspired by the LPeg parsing machine.

Disassemble a grammar to see the generated bytecode:

$ zpars vm grammar.peg
   0: call    -> 2
   1: match
   2: call    -> 8
   3: choice  -> 7
   4: char    '+'
   ...

Run a match with capture groups (ERE):

$ zpars vm email.ere "user@example.com"
match: 16 bytes "user@example.com"
  group 0: "user"
  group 1: "example"
  group 2: "com"

Trace execution step-by-step with -t:

$ zpars vm -t pattern.ere "abcbd"
   0: sp=0   pos=0   "abcbd" char 'a'
   1: sp=0   pos=1   "bcbd" choice -> 7
   2: sp=1   pos=1   "bcbd" choice -> 5
   3: sp=2   pos=1   "bcbd" char 'b'
      backtrack -> pc=5 pos=2
   ...

The VM supports all grammar formats (ABNF, BNF, PEG, ERE).

Peephole Optimizer

The optimizer runs after compilation, rewriting bytecode in-place before execution:

Charset-to-char - Replaces set instructions with a single-element charset by a cheaper char instruction.
Char fusion - Fuses runs of consecutive char instructions into a single string instruction.
Common prefix factoring - Factors shared prefixes out of alternation branches (e.g. https|http shares http).
Optional char fusion - Collapses choice/char/commit sequences for a? patterns into a single optional_char instruction.

JIT Compilers

The JIT compiler translates VM bytecode to native machine code, removing interpreter dispatch overhead entirely. Both AArch64 (ARM64) and x86_64 backends are supported, selected automatically at compile time. The JIT uses the same API as the interpreter VM:

const zpars = @import("zpars");

var compiler = zpars.vm.Compiler.compile(rules);
var jit = try zpars.vm.Jit.init(
    compiler.getCode(),
    compiler.getCharsets(),
    compiler.getStringData(),
    input,
);
defer jit.deinit();

const result = jit.execute();

AOT Compilation

The AOT compiler produces standalone .zpar binary blobs containing native machine code and all static data. Unlike the JIT (which compiles at runtime), AOT compiles once and produces a portable blob that can be loaded and executed without the grammar compiler:

$ zpars compile examples/calc.peg -o calc.zpar
$ zpars run calc.zpar "1+2*3"
match: 5 bytes "1+2*3"

The blob format includes the native code, charset tables, string data, and jump tables. At runtime, the code is mmap'd as executable and called directly -- the only overhead is a single mmap syscall at load time.

Parse Trees

Run any multi-rule grammar with rules_as_captures to produce a hierarchical parse tree where every rule call is a node typed by its rule name. The compiler emits event_open / event_close instructions around each rule body; the VM/JIT records them on a side channel so they survive backtracking, and a post-pass folds the events into a tree:

$ zpars tree examples/calc.peg "1+2*3"
(Expr [0,5]
  (Term [0,1]
    (Factor [0,1]))
  (Term [2,5]
    (Factor [2,3])
    (Factor [4,5])))

-j switches to compact JSON, --jit runs through the native JIT, -p enables packrat memoization (which transparently caches and replays the per-rule events on cache hits). The output format mirrors tree-sitter's S-expressions for easy inspection and diffing.

Error recovery (PEG, experimental)

PEG grammars can opt into labeled-failure recovery via backwards-compatible #@ comment directives that other PEG tools see as plain # comments:

Program <- Stmt*
Stmt    <- Expr ";"
         / Expr     #@ throw missing_semi
Expr    <- [a-z]+

#@ rule Stmt catches missing_semi -> recover_missing

#@ throw <label> ends an alternative with a labeled failure that propagates past ordered-choice points. #@ rule <name> catches <label> -> <handler> wraps the rule's body in a catch frame; recover_missing is a builtin handler that emits a zero-width MISSING(label) marker. zpars tree runs the resulting bytecode through the VM with capture events on and renders synthesized ERROR / MISSING nodes and partial rule subtrees:

$ zpars tree examples/recovery.peg "foo;bar"
(Program [0,7]
  (Stmt [0,4]
    (Expr [0,3]))
  (Stmt [4,7]
    (Expr [4,7])
    (MISSING [7,7] missing_semi)))

Recovery directives can't be mixed with PEG () capture groups in the same grammar.

Tree queries (experimental)

A tree-sitter-style query language runs structural patterns against parse trees. Queries are S-expression-shaped with @captures, [alternation], ?/*/+ quantifiers, . anchors, and #predicates?:

; Capture every Factor.
(Factor) @factor

; Capture Factors whose text matches a regex.
((Factor) @odd
 (#match? @odd "[13579]"))

; Capture Terms that contain at least two Factors.
(Term (Factor) (Factor)) @mul

; Capture the first Term in an Expr.
(Expr . (Term) @first-term)

Run with zpars query:

$ zpars query examples/calc.peg examples/calc.scm "1+2*3"
pattern: 3
  capture: name=first-term, range=[0,1], text='1'
pattern: 0
  capture: name=factor, range=[0,1], text='1'
pattern: 1
  capture: name=odd, range=[0,1], text='1'
pattern: 2
  capture: name=mul, range=[2,5], text='2*3'
pattern: 0
  capture: name=factor, range=[2,3], text='2'
pattern: 0
  capture: name=factor, range=[4,5], text='3'
pattern: 1
  capture: name=odd, range=[4,5], text='3'

The query layer also recognises the recovery-era node kinds: (ERROR) @e matches error_nodes, (MISSING) @m matches missing_nodes, and (Rule partial) filters to rule_partial nodes specifically.

Built-in predicates: #eq?, #match? (ERE, compiled at query-compile time), #any-of?, #contains?, each with a #not- form. #vim-match? / #lua-match? are aliased to #match? for nvim-treesitter compatibility. Unknown predicates pass through (tree-sitter convention).

Anonymous tokens

Bare string atoms in queries match anonymous-token nodes (literal-text matches in the parse tree). Token emission is opt-in via --tokens=<mode> on the CLI (or Compiler.Options.token_events programmatically). Three modes:

--tokens=off -- no token nodes (default for non-PEG grammars).
--tokens=tagged (default for PEG) -- only literals listed in #@ tokens "..." directives become token nodes. Backwards-compatible: other PEG tools see the directives as plain comments.
--tokens=all -- every literal-matching opcode emits a token node. Most ecosystem-compatible, but inflates the event log.

#@ tokens "+" "*"
Expr   <- Term ("+" Term)*
Term   <- Factor ("*" Factor)*
Factor <- "(" Expr ")" / [0-9]+

; operators.scm -- captures every "+" and "*" anonymous-token node.
"+" @plus
"*" @star

$ zpars query examples/calc-tokens.peg examples/operators.scm "1+2*3"
pattern: 0
  capture: name=plus, range=[1,2], text='+'
pattern: 1
  capture: name=star, range=[3,4], text='*'

Both the VM and JIT emit event_token natively.

Fields

Field annotations let queries select children by role rather than position. zpars stays PEG-compliant: fields are declared via backwards-compatible #@ field comment directives (other PEG tools see them as plain comments), then referenced from queries with the standard tree-sitter name: (pattern) syntax.

#@ tokens "function"
#@ field Func name = Ident
#@ field Func body = Body
Func  <- "function" _ Ident _ Body
Ident <- [a-z]+
Body  <- "{" _ "}"
_     <- " "*

When the same target appears multiple times in a rule body, an explicit ordinal disambiguates: #@ field Bin left = Expr#1, #@ field Bin right = Expr#2. Targets can also be quoted literals (= "function") for tagging anonymous-token children.

; func.scm
(Func
  name: (Ident) @fname
  body: (Body) @fbody)

$ zpars query examples/func.peg examples/func.scm "function foo{}"
pattern: 0
  capture: name=fname, range=[9,12], text='foo'
  capture: name=fbody, range=[12,14], text='{}'

Field events are auto-enabled when the PEG parser collects at least one #@ field directive. Both the VM and JIT emit event_field natively.

Multi-pattern groupings are supported: ((a) (b) (#pred? ...)) matches when the visited node has children matching a and b in order (gap-allowed by default; use . between them for strict adjacency). Predicates fire after the full sequence binds, so a failed predicate triggers backtracking through quantifiers naturally. Single-pattern groupings (((Rule (Factor)+ @f) (#eq? @f "x"))) get the same treatment: the group's predicates fold into the inner's child-sequence match so a rejected greedy bind retries with a shorter count.

The CLI prints one entry per pattern match by default; pass -c (or --captures) to flatten the output into one entry per capture, ordered by source position. JSON forms are available for both modes via -j. An empty result set prints no matches to stderr.

See examples/ for sample grammars (calc, email, JSON, URI, recovery, queries) and inputs.

CLI

zpars check   <file>                          # validate a grammar
zpars compile <file> -o <output>              # compile grammar to native .zpar blob
zpars fmt     <file>                          # format a grammar
zpars match   -r <rule> <file> <input>        # match input against a rule
zpars run     <blob> <input>                  # run a compiled .zpar blob
zpars tree    [-j] [-p|--jit] <file> <input>  # print parse tree (S-exp default, -j JSON)
zpars vm      [-t] [-p] <file> [<input>]      # disassemble and run via VM
zpars query   [-j] [-c] [--tokens=off|all|tagged] <grammar> <query-file> <input>  # run a tree-sitter-style query

check

Validate an ABNF grammar, reporting syntax errors and semantic issues:

$ zpars check grammar.abnf
grammar.abnf:1:12: error: expected element, found ')'
   foo = (a / )
              ^

$ zpars check grammar.abnf
grammar.abnf: warning: rule 'helper' is defined but never referenced
grammar.abnf: error: rule 'start' references undefined rule 'missing'

fmt

Parse and reformat a grammar with aligned = signs:

$ zpars fmt grammar.abnf
number = 1*DIGIT
pair   = number "," number

match

Match an input string against a rule:

$ zpars match -r version grammar.abnf "HTTP/1.1 OK"
HTTP/1.1

Editor Support

The VSCode extension provides semantic highlighting for ABNF, BNF, PEG, CFG, and S-expression grammars, powered by the zpars WASM build.

Install from Open VSX.

Building

Requires Zig 0.16.0+.

zig build                      # build the CLI executable
zig build test                 # run all tests
zig build bench                # run benchmarks (ReleaseFast, 1M iterations)
zig build lsp                  # build the LSP server
zig build vsx                  # build the Open VSX extension (WASM + TypeScript)

References

RFC 5234 - Augmented BNF for Syntax Specifications: ABNF
RFC 7405 - Case-Sensitive String Support in ABNF
Report on the Algorithmic Language ALGOL 60 (1960) - original BNF definition (Section 1.1)
Parsing Expression Grammars: A Recognition-Based Syntactic Foundation (2004) - Bryan Ford's PEG paper
IEEE Std 1003.1 - Extended Regular Expressions (ERE) - POSIX ERE specification (Section 9.4)
RFC 9804 - S-Expressions
A Text Pattern-Matching Tool based on Parsing Expression Grammars (2009) - Ierusalimschy's LPeg parsing machine
Packrat Parsers Can Support Left Recursion (2008) - Warth et al.'s seed-growing algorithm for left recursion in PEGs
Maidl, A. M., Mascarenhas, F., Medeiros, S., & Ierusalimschy, R. (2016). Error Reporting in Parsing Expression Grammars - labeled-failure model
Medeiros, F., & Ierusalimschy, R. (2018). Syntax Error Recovery in Parsing Expression Grammars - the recovery model zpars's #@ directives implement

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.vscode		.vscode
editors		editors
examples		examples
src		src
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.zig		bench.zig
build.zig		build.zig
build.zig.zon		build.zig.zon
flake.lock		flake.lock
flake.nix		flake.nix
icon.png		icon.png
icon.svg		icon.svg
test_edge_cases		test_edge_cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zpars

Features

Comptime ABNF Compiler

Parser Combinators

Runtime Matcher

Grammar VM

Peephole Optimizer

JIT Compilers

AOT Compilation

Parse Trees

Error recovery (PEG, experimental)

Tree queries (experimental)

Anonymous tokens

Fields

CLI

check

fmt

match

Editor Support

Building

References

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zpars

Features

Comptime ABNF Compiler

Parser Combinators

Runtime Matcher

Grammar VM

Peephole Optimizer

JIT Compilers

AOT Compilation

Parse Trees

Error recovery (PEG, experimental)

Tree queries (experimental)

Anonymous tokens

Fields

CLI

check

fmt

match

Editor Support

Building

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages