PXF: Proto eXpressive Format

Concrete syntax.

PXF is the human-readable text format in the ProtoWire family. The grammar below is written in ISO/IEC 14977 EBNF and matches the canonical reference parser. Whitespace and comments are insignificant between tokens; comments may appear wherever whitespace may appear.

Document

A PXF document is an optional @type directive followed by zero or more entries. The directive pins the document to a fully-qualified message type; parsers refuse a document whose entries do not match.

document       = [ type_directive ] , { entry } ;
type_directive = '@type' , identifier ;

Entries

Every entry starts with a key. Three operators distinguish the three shapes: = assigns a scalar or list, : binds a map entry, and a bare { … } opens a nested message block.

entry           = key , ( assignment_tail | map_tail | block_tail ) ;

assignment_tail = '=' , value ;
map_tail        = ':' , value ;
block_tail      = '{' , { entry } , '}' ;

key             = identifier | string | integer ;

Values

Values are scalars, lists, or block values. Lists accept comma- or newline-separated elements and may freely mix the two; the comma is consumed if present.

value       = string
            | integer
            | float
            | bool
            | null
            | bytes
            | timestamp
            | duration
            | identifier
            | list
            | block_value ;

list        = '[' , [ value , { [ ',' ] , value } ] , ']' ;
block_value = '{' , { entry } , '}' ;

Identifiers

Identifiers carry enum values, message-type names, and bare keys. They begin with a letter or underscore and may contain dots, which is useful for fully-qualified type names like infra.v1.ServerConfig.

identifier  = ident_start , { ident_part } ;
ident_start = letter | '_' ;
ident_part  = letter | digit | '_' | '.' ;

bool        = 'true' | 'false' ;
null        = 'null' ;

Numbers

Decimal integers and IEEE-754 floats. Floats accept either a decimal point with optional exponent, or an exponent alone.

integer  = [ '-' ] , digit , { digit } ;

float    = [ '-' ] , digit , { digit } ,
           ( '.' , { digit } , [ exponent ]
           | exponent ) ;

exponent = ( 'e' | 'E' ) , [ '+' | '-' ] , digit , { digit } ;

Timestamps & durations

The lexer recognizes a four-digit year followed by - as an RFC 3339 timestamp, and a digit run followed by a time unit as a Go-style duration. Negative integers and identifiers that begin with a letter take precedence over those forms.

(* RFC 3339 date-time. e.g. 2024-01-15T10:30:00Z, 2024-01-15T10:30:00.123456789+02:00 *)
timestamp        = ?RFC 3339 date-time? ;

(* Go time.ParseDuration. e.g. 30s, 1h30m, 500ms, 1.5h *)
duration         = duration_segment , { duration_segment } ;
duration_segment = digit , { digit } ,
                   [ '.' , digit , { digit } ] , time_unit ;
time_unit        = 'ns' | 'us' | 'µs' | 'ms' | 's' | 'm' | 'h' ;

Strings

Single-quoted simple strings honor C-style escapes plus 2-digit hex, 3-digit octal, and 4/8-digit Unicode escapes. Triple-quoted strings preserve raw content with no escape interpretation; the leading newline is stripped, and the closing line's indent is removed from each preceding line.

string        = simple_string | triple_string ;

simple_string = '"' , { string_char | escape_seq } , '"' ;
triple_string = '"""' , ?any text not containing """? , '"""' ;

escape_seq    = '\' , ( simple_escape
                          | hex_escape
                          | octal_escape
                          | unicode_4_escape
                          | unicode_8_escape ) ;

Bytes

Byte literals carry standard or raw base64 with optional padding. Backslashes inside b"…" are not interpreted.

bytes       = 'b' , '"' , { base64_char } , '"' ;
base64_char = letter | digit | '+' | '/' | '=' ;

Comments

Three flavors, freely mixed. Block comments do not nest.

comment       = line_comment | block_comment ;
line_comment  = ( '#' | '//' ) , { ?any byte except LF? } ;
block_comment = '/*' , { ?any byte? } , '*/' ;

Full railroad diagram

The full diagram covers every production above and a few lexical helpers (character classes, hex/octal digits). It's tall; scroll inside the frame, or open it in a new tab.

ProtoWire PXF grammar railroad diagram