Network Working Group B. Franco Jr.
Internet-Draft TrendVidia, LLC
Intended status: Informational 4 May 2026
Expires: 5 November 2026
The Proto eXpressive Format (PXF) and the protowire Encoding
Family
draft-trendvidia-protowire-00
Abstract
This document specifies the Proto eXpressive Format (PXF), a UTF-8
text serialization for messages defined by Protocol Buffers
schemas, together with three companion encodings: PB (the existing
Protocol Buffers binary wire format, with PXF-specific annotations
that constrain how it is produced and consumed), SBE (a fixed-
layout binary encoding derived from FIX Simple Binary Encoding),
and a response envelope. Collectively these are referred to as
the protowire family. This document defines the wire surface
sufficient for independent interoperable implementations. It does
not define library APIs.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current
Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
This Internet-Draft will expire on 5 November 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document.
Franco Jr. Expires 5 November 2026 [Page 1]
Internet-Draft protowire May 2026
Table of Contents
1. Introduction ................................................. 3
1.1. Scope ................................................... 3
1.2. Terminology ............................................. 4
2. The protowire Family ......................................... 4
2.1. Common Schema Layer ..................................... 5
2.2. Wire Equivalence ........................................ 5
3. PXF Text Format .............................................. 5
3.1. Character Set ........................................... 5
3.2. Whitespace and Comments ................................. 6
3.3. ABNF Grammar ............................................ 6
3.4. Documents and Type Directives ........................... 8
3.5. Entries and Keys ........................................ 9
3.6. String Literals ......................................... 9
3.7. Bytes Literals ......................................... 11
3.8. Numeric Literals ....................................... 11
3.9. Booleans, Null, and Identifier Values .................. 12
3.10. Timestamps and Durations .............................. 12
3.11. Lists ................................................. 13
3.12. Blocks and Map Tails .................................. 13
4. PB Binary Encoding .......................................... 14
5. SBE Binary Encoding ......................................... 14
5.1. Header and Block Length ................................ 14
5.2. Repeating Groups ....................................... 15
5.3. Variable-Length Data ................................... 15
6. Annotation Extensions ....................................... 15
6.1. PXF Annotations ........................................ 15
6.2. SBE Annotations ........................................ 16
7. Response Envelope ........................................... 16
8. Decoder Conformance ......................................... 17
8.1. Mandatory Limits ....................................... 17
8.2. Recursion .............................................. 18
8.3. UTF-8 Enforcement ...................................... 18
8.4. SBE Bounds Checking .................................... 19
8.5. Map Keys ............................................... 19
9. Media Types ................................................. 19
10. IANA Considerations ........................................ 20
10.1. Media Type Registrations .............................. 20
10.2. Annotation Field Number Range ......................... 21
11. Security Considerations .................................... 21
12. References ................................................. 23
12.1. Normative References .................................. 23
12.2. Informative References ................................ 24
Authors' Addresses ............................................. 24
Franco Jr. Expires 5 November 2026 [Page 2]
Internet-Draft protowire May 2026
1. Introduction
Protocol Buffers [PROTOBUF] is widely deployed for binary
serialization of structured data, but its standard text format
("text format", or "prototext") is targeted at debugging and
does not address the requirements of human-edited configuration,
API integration, or fixed-layout binary streaming. The protowire
family covers those requirements while reusing Protocol Buffers
schemas as the single source of truth for field identity, types,
and numbering.
The family comprises four encodings:
PXF A UTF-8 text format with a small, regular grammar; intended
for human authoring and machine consumption alike.
PB The existing Protocol Buffers binary wire format
[PROTOBUF-WIRE]. This document does not redefine PB; it
specifies the constraints that a protowire implementation
applies on top of PB and the annotation extensions it uses.
SBE A fixed-layout binary encoding derived from FIX Simple
Binary Encoding [FIX-SBE], driven from Protocol Buffers
schemas annotated with SBE template metadata.
Envelope A versioned response wrapper carrying transport status,
application errors, field-level errors, and an opaque
payload.
All four are driven from a single set of .proto schemas. PXF and
SBE add field-level and message-level annotations expressed as
Protocol Buffers extension options, defined in Section 6.
Status of the underlying Protocol Buffers specification.
Protocol Buffers does not currently have a finalized IETF
standard. The canonical specification is published by Google at
and, in practice, the protoc reference
compiler acts as the de facto specification for behavior the
published documentation does not pin down. An IETF effort is in
progress to register Protocol Buffers media types
[I-D.ietf-dispatch-mime-protobuf]; that draft registers both
"application/protobuf" and "application/x-protobuf" (the latter
reflecting historical deployment under the experimental "x-"
prefix that predates RFC 6648), but does not redefine the wire
format. Section 10.1 of this document registers a separate
"application/protowire-pb" type that signals the additional
conformance and annotation requirements specified here.
Relationship to ProtoJSON. Protocol Buffers also defines a JSON
mapping commonly called "ProtoJSON" [PROTOJSON], which serializes
messages as JSON [RFC8259] objects, represents
google.protobuf.Timestamp as an RFC 3339 [RFC3339] string, and
represents google.protobuf.Duration as a decimal-seconds string
with an "s" suffix. PXF (Section 3) is intentionally distinct
from ProtoJSON: PXF targets human authoring and review, supports
comments, multi-line strings, and bare-identifier enum values,
and uses an entry-list document shape rather than JSON's
object-and-comma shape. Where the two formats overlap on
semantics — most visibly the use of RFC 3339 for timestamps — PXF
intentionally adopts the ProtoJSON convention to reduce the
number of disjoint time formats a tooling chain has to support.
Implementations are not required to provide ProtoJSON; this
document does not specify it.
This document treats both [PROTOBUF] (the language and feature
spec) and [PROTOBUF-WIRE] (the binary wire format) as normative
references in their protobuf.dev form, and inherits whatever
stability properties those documents and the protoc
implementation provide. A future revision of this document
SHOULD migrate to an IETF Protocol Buffers reference if and when
one is published as an RFC.
1.1. Scope
This document defines:
* the lexical and syntactic structure of PXF text (Section 3);
* the constraints that protowire applies to PB binary
encoding (Section 4);
* the SBE wire framing used by protowire (Section 5);
* the schema-level annotation extensions (Section 6);
* the response envelope (Section 7);
* the conformance requirements that any decoder operating on
untrusted input MUST satisfy (Section 8).
Franco Jr. Expires 5 November 2026 [Page 3]
Internet-Draft protowire May 2026
This document does not define library APIs, programming-language
bindings, code generation strategies, or performance
characteristics.
1.2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14 [RFC2119] [RFC8174] when, and only when, they
appear in all capitals, as shown here.
The following terms are used throughout this document:
schema A Protocol Buffers FileDescriptorSet [PROTOBUF],
together with any annotation extensions defined in
Section 6.
port An independent implementation of the protowire
encodings, typically targeting a single programming
language.
document A complete PXF input: a sequence of bytes that satisfies
the production "document" in Section 3.3.
value A PXF construct that denotes a single scalar, list, or
nested block; see Section 3.3.
entry A key together with an assignment, map, or block tail;
see Section 3.5.
well-known type One of the Protocol Buffers types listed in
Section 6 of [PROTOBUF] (Timestamp, Duration, the
*Value wrappers, etc.) plus the protowire-defined
arbitrary-precision numeric types (Section 6.1).
port-trusted bytes Bytes whose origin is the calling
application; not under attacker control.
attacker-controlled bytes Bytes whose origin is, or may be,
under the control of a party with a goal adverse to the
host of the decoding process.
2. The protowire Family
A protowire implementation accepts and emits four wire forms for
any message defined in a schema: PXF text, PB binary, SBE binary
(when the message carries SBE annotations), and Envelope binary
(which is itself a PB message).
Franco Jr. Expires 5 November 2026 [Page 4]
Internet-Draft protowire May 2026
2.1. Common Schema Layer
For any schema S, the four encodings represent the same logical
value space. Implementations MUST produce wire output that
round-trips through every encoding for which the schema is well
defined; concretely, for a value v of type T defined in S:
* decode_PB(encode_PB(v)) == v
* decode_PXF(encode_PXF(v)) == v
* decode_SBE(encode_SBE(v)) == v if T carries SBE annotations
* encode_PB(decode_PXF(t)) == encode_PB(v)
if t = encode_PXF(v)
Equality is defined per [PROTOBUF] Section "Message equality"
(field-by-field, with proto3 default semantics applied).
2.2. Wire Equivalence
Two ports are wire-equivalent for schema S if, for every value v
of every message type in S, both ports produce byte-identical
PB and Envelope output and parse each other's PXF and SBE output
to equal values. Wire equivalence is the contract this document
specifies; it is not an API-stability or performance contract.
3. PXF Text Format
PXF is a UTF-8 text format. A PXF document is a sequence of
entries denoting a single message value, optionally preceded by a
type directive that identifies the schema type the document
represents.
3.1. Character Set
A PXF document is a sequence of bytes that, when interpreted as
UTF-8 [RFC3629], yields a sequence of Unicode scalar values.
* A document MUST be valid UTF-8.
* Decoders MUST reject documents containing invalid UTF-8 byte
sequences with an error.
* Decoders MUST reject documents containing Unicode surrogate
code points (U+D800 through U+DFFF) or code points above
U+10FFFF, regardless of how the surrogate or out-of-range
value is expressed (raw bytes that round-trip as such are
already excluded by valid-UTF-8 conformance; \uHHHH and
\UHHHHHHHH escapes producing those code points are excluded
by Section 3.6).
Franco Jr. Expires 5 November 2026 [Page 5]
Internet-Draft protowire May 2026
* A leading UTF-8 byte order mark (U+FEFF, encoded as %xEF.BB.BF)
MAY be present and MUST be ignored. Subsequent occurrences of
U+FEFF are interpreted as ordinary characters and have no
special meaning.
3.2. Whitespace and Comments
The following Unicode scalar values are whitespace: U+0009
(HT), U+000A (LF), U+000D (CR), and U+0020 (SPACE). Whitespace
separates tokens but is otherwise insignificant.
PXF supports two comment forms:
* Line comments begin with "#" or "//" and extend to (but do not
include) the next U+000A.
* Block comments begin with "/*" and end at the next occurrence
of "*/". Block comments do not nest.
Comments MAY appear anywhere whitespace MAY appear. A comment is
treated as a single whitespace character for the purposes of
tokenization.
3.3. ABNF Grammar
The PXF surface grammar is given in ABNF [RFC5234] [RFC7405].
The grammar describes a token stream; whitespace and comments
per Section 3.2 MAY appear between any two adjacent tokens and
are not shown.
document = [ type-directive ] *field-entry
type-directive = %s"@type" 1*WSP identifier
entry = field-entry / map-entry
field-entry = identifier ( assignment-tail / block-tail )
map-entry = map-key map-tail
assignment-tail = "=" value
map-tail = ":" value
block-tail = "{" *entry "}"
map-key = identifier / string / integer
value = string
/ number
/ bool
/ null
/ bytes
Franco Jr. Expires 5 November 2026 [Page 6]
Internet-Draft protowire May 2026
/ timestamp
/ duration
/ identifier
/ list
/ block-value
list = "[" [ value *( [","] value ) ] "]"
block-value = "{" *entry "}"
number = float / integer
integer = [ "-" ] 1*DIGIT
float = [ "-" ] 1*DIGIT
( "." *DIGIT [ exponent ] / exponent )
exponent = ( "e" / "E" ) [ "+" / "-" ] 1*DIGIT
bool = %s"true" / %s"false"
null = %s"null"
identifier = ident-start *ident-part
ident-start = ALPHA / "_"
ident-part = ALPHA / DIGIT / "_" / "."
string = triple-string / simple-string
simple-string = DQUOTE *( string-char / escape-seq ) DQUOTE
string-char = %x20-21 / %x23-5B / %x5D-7F
/ utf8-non-ascii ; LF, ", \ excluded
triple-string = 3DQUOTE triple-content 3DQUOTE
triple-content = *( %x00-21 / %x23-7F / utf8-non-ascii )
; any UTF-8 sequence not containing 3DQUOTE
escape-seq = "\" ( simple-escape
/ hex-escape
/ octal-escape
/ unicode-4-escape
/ unicode-8-escape )
simple-escape = DQUOTE / "\" / "'" / "?"
/ %x61 / %x62 / %x66 / %x6E
/ %x72 / %x74 / %x76 ; a b f n r t v
hex-escape = "x" 2HEXDIG
octal-escape = oct-lead 2OCT-DIGIT ; value <= 0xFF
unicode-4-escape = "u" 4HEXDIG
unicode-8-escape = "U" 8HEXDIG
bytes = %x62 DQUOTE *base64-char DQUOTE ; 'b' "..."
base64-char = ALPHA / DIGIT / "+" / "/" / "="
Franco Jr. Expires 5 November 2026 [Page 7]
Internet-Draft protowire May 2026
timestamp = date-time ; per RFC 3339, Section 5.6
duration = 1*duration-segment
duration-segment = 1*DIGIT [ "." 1*DIGIT ] time-unit
time-unit = "ns" / "us" / micro-us / "ms"
/ "s" / "m" / "h"
micro-us = %xC2.B5 %x73 ; UTF-8 of "µs"
OCT-DIGIT = %x30-37
oct-lead = %x30-33 ; ensures \NNN <= 0xFF
3DQUOTE = DQUOTE DQUOTE DQUOTE
utf8-non-ascii =
ABNF core productions ALPHA, DIGIT, HEXDIG, DQUOTE, and WSP are
imported from [RFC5234] Appendix B.1. The case-sensitive
string-prefix notation %s is from [RFC7405]; PXF identifiers,
keywords ("true", "false", "null"), and the "@type" directive
are case-sensitive.
The grammar is LL(1) modulo the lexical disambiguation rules in
Section 3.8 and Section 3.10:
* An input that begins with four DIGIT followed by "-" is
tokenized as a timestamp, not as a negative integer or a
bare identifier-prefix.
* An input matching 1*DIGIT [ "." 1*DIGIT ] time-unit, where the
time-unit is one of the literal strings in the grammar above,
is tokenized as a duration. An identifier whose initial
characters happen to match a duration prefix (for example
"5seconds") is tokenized as an identifier because identifier
productions extend through ALPHA / "_".
* Numeric literals take precedence over identifiers: a leading
DIGIT or "-" forces the numeric branch.
3.4. Documents and Type Directives
A PXF document represents a single message value. The optional
type directive, of the form
@type identifier
names the fully-qualified message type. Decoders MAY require the
type directive when the calling application has not pre-bound a
target type; decoders MUST ignore the directive when a target
type has been pre-bound and the directive matches it, and MUST
reject when it does not match.
Franco Jr. Expires 5 November 2026 [Page 8]
Internet-Draft protowire May 2026
A document with no entries denotes the message-typed default
value (all fields unset).
3.5. Entries and Keys
An entry binds a key to a value. The key is a field name within
the surrounding message type, with the following rules:
* An identifier key matches a field by its proto field name
(the lowerCamelCase or snake_case name in the schema, as
written; both forms are accepted, and emitters SHOULD use
whichever the schema declares).
* A string or integer key is permitted only inside a map
literal (i.e. the *map-entry* production of Section 3.3).
A string key MUST be a UTF-8 string; for map fields with
non-string K, the string is parsed as a literal of K's type.
An integer key matches a map field whose K is one of
the protobuf scalar integer types (int32, int64, sint32,
sint64, uint32, uint64, fixed32, fixed64, sfixed32, sfixed64,
bool encoded as 0/1).
The three entry tails are NOT interchangeable; the grammar in
Section 3.3 splits them across two productions:
* An assignment-tail "=" binds a scalar, list, or block to a
field of an enclosing *message* type. It is the right-hand
side of *field-entry* and REQUIRES an identifier key (= proto
field name). Parsers MUST reject "=" with a non-identifier
key (string or integer) at parse time.
* A map-tail ":" binds a value to a key of an enclosing *map*
type. It is the right-hand side of *map-entry*. *map-entry*
MUST NOT appear at document top level (the document represents
a proto message, never a map); parsers MUST reject ":"
at the top level with an error indicating that field
assignments use "=".
* A block-tail "{ ... }" with no preceding "=" or ":" is
permitted only in message context, where the bound field is
message-typed; it is equivalent to "= { ... }". Like
assignment-tail, it REQUIRES an identifier key, and parsers
MUST reject a block-tail with a non-identifier key. Map
values that are themselves messages MUST use the explicit
form "key: { ... }"; the bare-block form is not accepted in
map context.
Inside a "{ ... }" block the parser cannot statically tell
whether the surrounding field is message-typed or map-typed;
both *field-entry* and *map-entry* are accepted in that position
and the message-vs-map disambiguation is performed by the
schema-resolution step that runs after parsing.
Repeated fields MAY be expressed either as a single key with a
list-typed value, or as multiple entries with the same key and
scalar (or block) values; the two forms denote the same field
value, with elements concatenated in document order.
3.6. String Literals
PXF supports two string forms. Both denote sequences of Unicode
scalar values.
Simple strings use double-quote delimiters and recognize escape
sequences:
Franco Jr. Expires 5 November 2026 [Page 9]
Internet-Draft protowire May 2026
"Hello, world\n"
Within a simple-string, U+000A (LF) MUST NOT appear unescaped:
line continuations are not supported. Decoders MUST reject a
simple-string containing a literal LF.
The defined simple escapes are:
\" U+0022 QUOTATION MARK
\\ U+005C REVERSE SOLIDUS
\' U+0027 APOSTROPHE
\? U+003F QUESTION MARK
\a U+0007 BELL
\b U+0008 BACKSPACE
\f U+000C FORM FEED
\n U+000A LINE FEED
\r U+000D CARRIAGE RETURN
\t U+0009 CHARACTER TABULATION
\v U+000B LINE TABULATION
Numeric escapes:
\xHH two hex digits, denotes the byte 0xHH
\NNN three octal digits, value MUST be <= 0xFF;
denotes the byte 0xNN
\uHHHH four hex digits, denotes Unicode scalar U+HHHH
\UHHHHHHHH eight hex digits, denotes Unicode scalar
U+HHHHHHHH
The \uHHHH and \UHHHHHHHH forms MUST denote a Unicode scalar
value: the code point MUST be in U+0000..U+10FFFF and MUST NOT
be a surrogate (U+D800..U+DFFF). Decoders MUST reject otherwise.
The interpretation of \xHH and octal escapes depends on the
target field type:
* When the surrounding string literal is bound to a proto3
string-typed field, the result of escape expansion MUST be
valid UTF-8. Decoders MUST reject a string-typed value whose
escape-expanded byte sequence is not valid UTF-8 (Section 8.3).
* When the surrounding string literal is bound to a proto3
bytes-typed field, no UTF-8 constraint applies.
Triple-quoted strings ("""...""") begin with three U+0022
characters and end at the next occurrence of three consecutive
U+0022 characters. Inside a triple-quoted string:
* Escape sequences are NOT interpreted; backslashes are literal.
Franco Jr. Expires 5 November 2026 [Page 10]
Internet-Draft protowire May 2026
* If the byte immediately following the opening """ is U+000A,
that LF MUST be removed by the decoder.
* After leading-LF stripping, if every non-empty line preceding
the closing """ shares a common leading-whitespace prefix, that
prefix MUST be removed from each such line. The "leading
whitespace" is the longest sequence of U+0020 and U+0009
characters at the start of a line; lines that consist
exclusively of whitespace do not constrain the prefix.
The output of triple-quote processing MUST be valid UTF-8 when
bound to a string-typed field; the same UTF-8 conformance rule
in Section 8.3 applies.
3.7. Bytes Literals
A bytes literal is the lowercase letter "b" immediately followed
by a double-quoted body containing only base64 characters
[RFC4648]:
b"SGVsbG8sIHdvcmxkIQ=="
Decoders MUST accept both the standard base64 alphabet and the
URL-safe alphabet [RFC4648] Section 5; padding ("=") is OPTIONAL
and MUST be tolerated whether present or absent. Decoders MUST
reject input containing characters outside both alphabets and
"=". Whitespace is NOT permitted inside the body.
Backslashes inside b"..." are NOT interpreted as escape
introducers.
3.8. Numeric Literals
Integers are sequences of decimal digits, optionally preceded by
"-". Hexadecimal, octal, and binary integer literals are NOT
defined in this version.
Floats are decimal-point or exponent forms; see the ABNF in
Section 3.3. The literal "1." is a valid float (integer part 1,
empty fractional part); the literal ".5" is NOT (no integer
part). This restriction is intentional: an unprefixed "." is
reserved for future qualified-key syntax.
The target field type determines the numeric domain:
* For fixed-width integer fields (int32, int64, uint32, uint64,
sint32, sint64, fixed32, fixed64, sfixed32, sfixed64),
decoders MUST reject literals whose value falls outside the
field's representable range.
Franco Jr. Expires 5 November 2026 [Page 11]
Internet-Draft protowire May 2026
* For float and double fields, decoders MUST accept literals in
the IEEE 754 [IEEE754] representable range; values that round
to +Inf or -Inf are rejected unless explicitly written as the
identifiers "inf", "+inf", "-inf", "nan".
* For fields of the well-known types pxf.BigInt, pxf.Decimal,
and pxf.BigFloat (Section 6.1), decoders MUST preserve the
literal's exact value, including, for Decimal, the exact
scale: the literal "1.00" decodes to a Decimal with unscaled
value 100 and scale 2, distinct from "1.0" or "1".
The number of digits in any single numeric literal is bounded;
see Section 8.1.
3.9. Booleans, Null, and Identifier Values
The literals "true" and "false" denote the boolean values; "null"
denotes the null value. All three are case-sensitive. A "null"
literal:
* When bound to a singular message-typed field, MUST clear the
field.
* When bound to a singular scalar field, MUST be rejected
unless the schema marks the field as nullable via a wrapper
type (e.g. google.protobuf.StringValue) or via a future
annotation.
* When bound to a list-typed (repeated) field, MUST be rejected:
list elements are not nullable.
An identifier appearing as a value denotes an enum value name
when the target field is enum-typed. Decoders MUST reject
unknown enum names unless the schema declares the field with
open enum semantics, in which case the unknown name is preserved
in the message's unknown-fields set.
3.10. Timestamps and Durations
A timestamp literal is an RFC 3339 [RFC3339] date-time
production. The lexer recognizes a timestamp by lookahead: an
input matching the regular expression /^[0-9]{4}-/ at a position
where a value is expected MUST be tokenized as a timestamp.
This rule disambiguates against the identifier and integer
productions.
The choice of RFC 3339 aligns with the ProtoJSON [PROTOJSON]
serialization of google.protobuf.Timestamp, which uses the same
format. A PXF timestamp literal and the ProtoJSON string form
of the same Timestamp value are byte-identical except for the
surrounding ProtoJSON quote characters. PXF duration literals
are NOT byte-compatible with ProtoJSON's Duration form (a
decimal-seconds string with an "s" suffix); PXF retains the
multi-segment unit form ("1h30m500ms") because it is materially
easier to read in human-authored documents.
Decoders MUST accept timestamps with arbitrary fractional-second
precision up to nanoseconds. Decoders that target a
fixed-precision representation (typically
Franco Jr. Expires 5 November 2026 [Page 12]
Internet-Draft protowire May 2026
google.protobuf.Timestamp, which has nanosecond resolution) MUST
reject literals exceeding that precision rather than silently
truncating.
A duration literal is a sequence of one or more segments, each
consisting of a numeric magnitude followed by a unit suffix:
30s ; 30 seconds
1h30m ; 1 hour 30 minutes
500ms ; 500 milliseconds
1.5h ; 1.5 hours
2µs ; 2 microseconds (alternative form: 2us)
The defined units are "ns", "us", "µs", "ms", "s", "m", and "h",
denoting nanoseconds, microseconds (twice; "us" and "µs" are
semantic equivalents), milliseconds, seconds, minutes, and hours
respectively. Day, week, month, and year units are NOT defined
and MUST NOT be inferred.
3.11. Lists
A list value is a sequence of values delimited by "[" and "]".
Elements MAY be separated by "," or by intervening whitespace
alone (including newlines), or by both:
[1, 2, 3]
[1 2 3]
[
"alpha"
"beta"
]
A trailing comma is permitted.
List values bind only to repeated fields, to fields whose
Protocol Buffers type is a list-shaped well-known type, and to
list elements within other lists.
3.12. Blocks and Map Tails
A block value "{ entry-list }" denotes a nested message when the
surrounding context is message-typed, and a map literal when the
surrounding context is map-typed. Within the block, keys are
interpreted against the nested message's or map's schema. Blocks
nest to arbitrary depth subject to the limit in Section 8.1.
A nested message field is bound with "=" (or with the bare-block
form, which is its abbreviation):
Franco Jr. Expires 5 November 2026 [Page 13]
Internet-Draft protowire May 2026
address = { city = "Berlin", zip = "10115" }
address { city = "Berlin", zip = "10115" } ; same value
A map field is bound with "=" between the field name and the
block, but the entries inside the block use ":" because each
entry is a key-of-the-map-to-value pair:
headers = {
"X-Request-ID": "abc123"
"Content-Type": "application/pxf"
}
Decoders MUST reject "=" inside a map block and ":" inside a
message block (Section 3.5).
Within either kind of block, entries MAY be separated by ";" or
by newlines or by both; this is consistent with the ABNF in
Section 3.3, where entries are juxtaposed without an explicit
separator and the optional ";" is consumed as whitespace.
4. PB Binary Encoding
The protowire PB encoding is the Protocol Buffers binary wire
format [PROTOBUF-WIRE] with no protowire-specific changes to the
wire grammar. This document does not redefine PB.
Constraints applied by protowire:
* Annotation extensions defined in Section 6 are encoded as
Protocol Buffers field options on the relevant FieldOptions,
MessageOptions, and FileOptions messages. Their field
numbers are reserved (Section 10.2).
* Decoders MUST enforce the limits in Section 8.1 on PB inputs.
In particular, the depth limit applies to nested submessages,
groups, and map entries; the length-prefix limit MUST be
enforced before allocation.
* A protowire emitter SHOULD produce fields in field-number
order; decoders MUST accept any field order, per the existing
PB rules.
5. SBE Binary Encoding
The protowire SBE encoding is a subset of FIX Simple Binary
Encoding [FIX-SBE]. An SBE message is generated for any
Protocol Buffers message annotated with sbe.template_id; the
schema-level identifiers come from sbe.schema_id and sbe.version
on the file (Section 6.2).
Field-level annotations sbe.length and sbe.encoding control
fixed-byte-length string and bytes fields and primitive-type
narrowing respectively (Section 6.2).
5.1. Header and Block Length
Every SBE message on the wire is preceded by an 8-byte
message-header carrying the schema's blockLength, templateId,
schemaId, and version, in little-endian byte order. Decoders
MUST validate, before reading any field of the body:
Franco Jr. Expires 5 November 2026 [Page 14]
Internet-Draft protowire May 2026
1. data.length >= HEADER_SIZE + wire_block_length
2. wire_block_length >= template_block_length, where
template_block_length is the value the decoder's compiled
schema specifies for this template. A wire block strictly
smaller than the template block length MUST be rejected. A
wire block strictly larger MUST be accepted (forward
compatibility for additive schema evolution).
5.2. Repeating Groups
Each repeating group is preceded by a group-header carrying
blockLength and numInGroup. Before iterating any group, decoders
MUST validate:
1. pos + GROUP_HEADER_SIZE <= data.length
2. count multiplied by wire_block_length does not overflow
64-bit unsigned arithmetic
3. pos + GROUP_HEADER_SIZE + count * wire_block_length <=
data.length
4. If count > 0, then wire_block_length > 0. A wire_block_length
of 0 with a non-zero count MUST be rejected before any
per-element allocation.
5.3. Variable-Length Data
Variable-length data ("varData") fields follow the fixed block,
each preceded by a length prefix per the schema's varDataEncoding.
Decoders MUST validate that pos + LENGTH_PREFIX_SIZE +
length <= data.length before reading the data.
6. Annotation Extensions
protowire defines extensions on Protocol Buffers FileOptions,
MessageOptions, and FieldOptions. Field numbers are allocated
from the reserved range described in Section 10.2.
6.1. PXF Annotations
The PXF annotations apply to FieldOptions:
extend google.protobuf.FieldOptions {
bool required = 50000;
string default = 50001;
}
Franco Jr. Expires 5 November 2026 [Page 15]
Internet-Draft protowire May 2026
pxf.required: when true, decoders MUST reject a PXF document in
which the annotated field is absent. A field bound to "null" is
considered present for the purpose of this check; null-rejection
for non-nullable types is governed by Section 3.9.
pxf.default: when set, the value is a PXF literal (parsed by the
same rules as a value in any entry). Decoders MUST treat an
absent annotated field as if the document had supplied the
default literal. The default applies only to absent fields, not
to fields explicitly set to "null" or to the proto3 zero value.
The well-known types pxf.BigInt, pxf.Decimal, and pxf.BigFloat
are defined in [PROTOWIRE-BIGNUM] (this is the
proto/pxf/bignum.proto file in the canonical repository). They
provide arbitrary-precision signed integer, exact decimal, and
binary floating-point representations respectively, encoded
over PB as length-delimited unsigned big-endian magnitudes plus
sign and scale fields.
6.2. SBE Annotations
SBE annotations apply at three scopes:
extend google.protobuf.FileOptions {
uint32 schema_id = 50100;
uint32 version = 50101;
}
extend google.protobuf.MessageOptions {
uint32 template_id = 50200;
}
extend google.protobuf.FieldOptions {
uint32 length = 50300;
string encoding = 50301;
}
sbe.schema_id and sbe.version identify the schema as a whole.
sbe.template_id MUST be unique among messages within a schema.
sbe.length specifies a fixed byte length for string- or
bytes-typed fields. Values longer than the limit MUST be
truncated by emitters; values shorter MUST be padded with U+0000
bytes; decoders MUST trim trailing U+0000 bytes when populating
string fields and MUST NOT trim them when populating bytes fields.
sbe.encoding narrows a Protocol Buffers numeric type to a smaller
SBE primitive. The defined values are: "int8", "int16",
"int32", "int64", "uint8", "uint16", "uint32", "uint64",
Franco Jr. Expires 5 November 2026 [Page 16]
Internet-Draft protowire May 2026
"float", "double". Emitters MUST reject values that fall outside
the narrowed type's range; decoders MUST sign-extend or
zero-extend per the SBE primitive when populating the wider
Protocol Buffers field.
7. Response Envelope
The Envelope message provides a uniform response carrier across
wire formats. Its schema is defined in package envelope.v1:
message Envelope {
int32 status = 1;
string transport_error = 2;
bytes data = 3;
AppError error = 4;
}
message AppError {
string code = 1;
string message = 2;
repeated string args = 3;
repeated FieldError details = 4;
map metadata = 5;
}
message FieldError {
string field = 1;
string code = 2;
string message = 3;
repeated string args = 4;
}
Semantics:
* status carries an HTTP [RFC9110] or gRPC [GRPC] status code.
* transport_error is set when no application-layer response was
produced (network error, timeout, connection refused).
Implementations MUST NOT set transport_error and error
simultaneously.
* data is the success payload, encoded in whichever wire form
the surrounding transport selects (PXF, PB, JSON).
Decoders MUST treat data as opaque bytes; it is parsed only
after the envelope itself is parsed.
* error.code and details[*].code are machine-readable
identifiers; clients perform localization by consulting a
string table keyed by "error." and "field." and
substituting args positionally.
Franco Jr. Expires 5 November 2026 [Page 17]
Internet-Draft protowire May 2026
The package path "envelope.v1" is part of the wire contract.
Incompatible changes MUST bump the package to "envelope.v2";
"v1" and "v2" MAY coexist indefinitely.
8. Decoder Conformance
This section specifies the requirements that any protowire
decoder operating on attacker-controlled bytes MUST satisfy.
The threat model assumes:
* the attacker controls every byte of the input,
* the attacker controls the input length, up to a configured
maximum,
* the attacker MAY submit many inputs concurrently from many
sources,
* the schema (.proto descriptor, SBE template) and the
configured limits are NOT attacker-controlled.
Under these conditions, a conforming decoder MUST terminate in
time and memory bounded by the input length and the configured
limits only. It MUST NOT crash, abort, or unwind the host
process; it MUST NOT allocate beyond the configured budget; it
MUST NOT return string-typed fields whose contents are not valid
UTF-8.
8.1. Mandatory Limits
Conforming decoders MUST enforce the following limits. Defaults
are recommended; the values are configurable per call by the
calling application except where noted.
+========================+============+=========================+
| Limit | Default | Applies to |
+========================+============+=========================+
| MaxNestingDepth | 100 | PXF block/list nesting; |
| | | PB submessage / group / |
| | | map-entry nesting |
+------------------------+------------+-------------------------+
| MaxMessageSize | 67108864 | Total input length to a |
| | (64 MiB) | single decode call |
+------------------------+------------+-------------------------+
| MaxNumericLiteralDigits| 4096 | Digit count of any PXF |
| | | numeric literal |
+------------------------+------------+-------------------------+
| MaxBytesLiteralLength | = | Decoded byte length of |
| | MaxMessage | any PXF b"..." literal |
Franco Jr. Expires 5 November 2026 [Page 18]
Internet-Draft protowire May 2026
| | Size | |
+------------------------+------------+-------------------------+
| MaxVarintBytes | 10 | Every PB varint read. |
| | (fixed) | NOT configurable. |
+------------------------+------------+-------------------------+
| MaxRepeatedCount | = | Element count of any |
| | MaxMessage | repeated, map, or SBE |
| | Size | group field |
+------------------------+------------+-------------------------+
A decoder presented with input that requires exceeding any limit
MUST return an error before allocating memory proportional to the
violating quantity. It MUST NOT abort, panic, raise an
uncatchable exception, or unwind into a state from which the
caller cannot recover.
Length arithmetic MUST be performed in 64-bit unsigned arithmetic
and checked against the buffer length before any narrowing
conversion to a native integer width and before any allocation.
8.2. Recursion
PXF parsers descend recursively into "{...}" blocks and "[...]"
lists; PB parsers descend into submessages, groups, and map
entries. Conforming decoders MUST:
1. maintain a depth counter incremented at every recursive
descent;
2. reject input that would cause the counter to exceed
MaxNestingDepth;
3. thread the depth counter through inner decoder instances
constructed mid-stream. In particular, when a nested
submessage is decoded by handing its bytes to a freshly
constructed input-stream object, the depth counter MUST be
passed in rather than reset to zero.
8.3. UTF-8 Enforcement
Proto3 string fields are sequences of valid UTF-8 [RFC3629].
Conforming decoders:
* MUST validate UTF-8 strictly when populating any string-typed
field, regardless of the source encoding (PB length-delimited
bytes, SBE char-array fields, SBE varData, PXF simple-string
or triple-string).
Franco Jr. Expires 5 November 2026 [Page 19]
Internet-Draft protowire May 2026
* MUST NOT use UTF-8 decoders that substitute U+FFFD for invalid
sequences when populating string-typed fields.
* MUST reject PXF \xHH and \NNN (octal) escapes that produce
invalid UTF-8 when the surrounding literal is bound to a
string-typed field. The same byte sequences are permitted
inside b"..." (bytes literal) and inside string literals
bound to bytes-typed fields.
* MUST reject PXF \uHHHH and \UHHHHHHHH escapes that encode a
surrogate code point or a code point above U+10FFFF.
8.4. SBE Bounds Checking
The bounds-checking obligations in Section 5 are conformance
requirements, restated here for emphasis. Specifically:
* Wire block length less than template block length MUST be
rejected (Section 5.1).
* Group count multiplied by group block length MUST be 64-bit
checked against the buffer length before iteration
(Section 5.2).
* A group with count > 0 and wire_block_length == 0 MUST be
rejected before any per-element allocation (Section 5.2).
8.5. Map Keys
In implementation languages where dynamic property assignment
walks a prototype chain (notably JavaScript / TypeScript), a
conforming decoder MUST NOT use a plain object literal as the
container for attacker-keyed maps. Such decoders MUST use a
prototype-free object (Object.create(null)) or a Map, or MUST
explicitly reject the keys "__proto__", "constructor", and
"prototype". The same obligation applies in any other
implementation language that exhibits prototype-mutation
semantics for reserved string keys.
9. Media Types
This document defines the following media types.
application/pxf
PXF text format (Section 3). The schema-type association is
carried either by the document's @type directive or
out-of-band (e.g. an HTTP Content-Schema parameter).
Charset: UTF-8 (fixed; the format MUST be UTF-8 per
Section 3.1).
Franco Jr. Expires 5 November 2026 [Page 20]
Internet-Draft protowire May 2026
application/protowire-pb
Protocol Buffers binary, with the protowire constraints
(Section 4).
application/protowire-sbe
SBE binary, with the protowire constraints (Section 5).
application/protowire-envelope
The Envelope message (Section 7), encoded as Protocol Buffers
binary. The data field's content type is carried in the
envelope-data-type parameter of the media type or, for
transports that lack media-type parameters, in a transport-
level header.
10. IANA Considerations
10.1. Media Type Registrations
Relationship to "application/protobuf" and
"application/x-protobuf". Prior to
[I-D.ietf-dispatch-mime-protobuf], no IETF-registered media type
existed for Protocol Buffers binary, and deployments converged
informally on "application/protobuf" and (less preferably, per
[RFC6648]) "application/x-protobuf". The dispatch draft
registers both. Neither carries protowire's additional
decoder-conformance and annotation-extension constraints. This
document registers "application/protowire-pb" as a distinct type
rather than layering a parameter on "application/protobuf"
because (a) the conformance requirements in Section 8 are
mandatory for protowire payloads and (b) protowire payloads are
tied to a schema that uses the annotation extensions in
Section 6. A recipient that handles "application/protobuf" but
not "application/protowire-pb" will, in the absence of those
annotations and limits, parse the bytes correctly but will not
provide the protowire conformance guarantees. Servers
negotiating with a client that advertises only
"application/protobuf" SHOULD downgrade to that media type and
accept the loss of protowire-specific guarantees rather than
refuse the request.
IANA is requested to register the following media types in the
"Media Types" registry [RFC6838]:
Type name: application
Subtype name: pxf
Required parameters: none
Optional parameters: charset (fixed value: utf-8)
Encoding considerations: 8bit; UTF-8 text.
Security considerations: See Section 11 of this document.
Interoperability considerations: See Section 8.
Published specification: This document.
Applications that use this media type: Configuration
tooling, API integration, schema-driven editors.
Fragment identifier considerations: none defined.
Author/Change controller: IETF.
Provisional registration: yes.
Type name: application
Subtype name: protowire-pb
Required parameters: none
Optional parameters: schema (URI of the FileDescriptorSet)
Encoding considerations: binary.
Security considerations: See Section 11.
Interoperability considerations: See Section 8.
Published specification: This document.
Applications that use this media type: API integration.
Fragment identifier considerations: none defined.
Author/Change controller: IETF.
Provisional registration: yes.
Franco Jr. Expires 5 November 2026 [Page 21]
Internet-Draft protowire May 2026
Type name: application
Subtype name: protowire-sbe
Required parameters: none
Optional parameters: schema (URI of the SBE schema XML)
Encoding considerations: binary.
Security considerations: See Section 11.
Interoperability considerations: See Section 8.
Published specification: This document.
Applications that use this media type: Low-latency message
streaming, market-data fan-out.
Fragment identifier considerations: none defined.
Author/Change controller: IETF.
Provisional registration: yes.
Type name: application
Subtype name: protowire-envelope
Required parameters: none
Optional parameters: envelope-data-type (media type of the
"data" field's contents).
Encoding considerations: binary.
Security considerations: See Section 11.
Interoperability considerations: See Section 8.
Published specification: This document.
Applications that use this media type: API integration.
Fragment identifier considerations: none defined.
Author/Change controller: IETF.
Provisional registration: yes.
10.2. Annotation Field Number Range
This document allocates Protocol Buffers extension field numbers
in the range 50000-59999 to the protowire family. Field numbers
in this range are reserved for the extensions defined in
Section 6 and for future extensions of this document; they
MUST NOT be reused for unrelated extensions. The currently
assigned numbers are:
pxf.required 50000
pxf.default 50001
sbe.schema_id 50100
sbe.version 50101
sbe.template_id 50200
sbe.length 50300
sbe.encoding 50301
Future protowire extensions SHOULD allocate within this range
and document the assignment in a successor of this document.
Franco Jr. Expires 5 November 2026 [Page 22]
Internet-Draft protowire May 2026
11. Security Considerations
The protowire family is designed to be parsed safely on
attacker-controlled bytes. Section 8 specifies the conformance
requirements that follow from this objective. This section
addresses the considerations that do not reduce to a single
conformance requirement.
Resource exhaustion. Without the limits in Section 8.1, every
protowire encoding admits trivial denial-of-service inputs:
deeply nested PXF blocks blow native call stacks; large PB length
prefixes drive allocator pressure; SBE group counts multiplied by
element sizes overflow length arithmetic; long PXF numeric
literals drive quadratic big-number parsers. Implementers MUST
apply Section 8.1.
Length-arithmetic overflow. All offset, length, and count
arithmetic on attacker-supplied quantities MUST use 64-bit
unsigned operations and MUST be checked against the input
length before any narrowing. Implementations in languages whose
default integer width on the host platform is 32 bits MUST be
particularly careful to use explicit 64-bit types.
Trapping conversions. Several implementation languages provide
integer conversions that abort the process on out-of-range
input (Swift Int(_:), Rust "as" without checked_*, Java
Math.toIntExact, C++ static_cast with subsequent UB). When
converting attacker-supplied lengths or counts to native integer
widths, implementations MUST use fallible conversion forms.
UTF-8 substitution. Many platform string APIs silently
substitute U+FFFD for invalid UTF-8. When such substitution is
applied to a string-typed field, the resulting message violates
the proto3 invariant that string fields contain valid UTF-8 and
may differ from the producer's intent. Implementations MUST use
strict, error-returning UTF-8 decoders on string fields
(Section 8.3).
Schema input. An implementation that accepts a
FileDescriptorSet or SBE schema XML at runtime, where the
descriptor or XML is, or may be, attacker-controlled, MUST apply
the limits of Section 8.1 to the schema parser as well. XML
schema parsers MUST disable DTDs and external entities to
mitigate XXE attacks (e.g. defusedxml [DEFUSEDXML] in Python;
XmlReaderSettings.DtdProcessing = Prohibit in .NET; the
feature flags "disallow-doctype-decl",
"external-general-entities", and "external-parameter-entities"
in Java).
Franco Jr. Expires 5 November 2026 [Page 23]
Internet-Draft protowire May 2026
Prototype pollution. Decoders implemented in JavaScript,
TypeScript, or any language with prototype-mutation semantics
for reserved string keys MUST avoid plain object literals as
the storage for attacker-keyed maps; see Section 8.5.
Cryptographic transport. This document specifies an
encoding family. It does not provide confidentiality,
integrity, or origin authentication; transports carrying
protowire payloads SHOULD use TLS [RFC8446] or an equivalent.
Application-level error metadata. AppError.metadata
(Section 7) is a free-form string-to-string map. Servers SHOULD
NOT place sensitive information (credentials, raw user input)
in metadata. Clients SHOULD treat metadata values as untrusted
and apply context-appropriate sanitization before display or
logging.
12. References
12.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997.
[RFC3339] Klyne, G. and C. Newman, "Date and Time on the
Internet: Timestamps", RFC 3339,
DOI 10.17487/RFC3339, July 2002.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of
ISO 10646", STD 63, RFC 3629,
DOI 10.17487/RFC3629, November 2003.
[RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", RFC 4648, DOI 10.17487/RFC4648,
October 2006.
[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for
Syntax Specifications: ABNF", STD 68, RFC 5234,
DOI 10.17487/RFC5234, January 2008.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures",
BCP 13, RFC 6838, DOI 10.17487/RFC6838,
January 2013.
[RFC7405] Kyzivat, P., "Case-Sensitive String Support in
ABNF", RFC 7405, DOI 10.17487/RFC7405,
December 2014.
Franco Jr. Expires 5 November 2026 [Page 24]
Internet-Draft protowire May 2026
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in
RFC 2119 Key Words", BCP 14, RFC 8174,
DOI 10.17487/RFC8174, May 2017.
[RFC8259] Bray, T., Ed., "The JavaScript Object Notation
(JSON) Data Interchange Format", STD 90, RFC 8259,
DOI 10.17487/RFC8259, December 2017.
[RFC8446] Rescorla, E., "The Transport Layer Security (TLS)
Protocol Version 1.3", RFC 8446,
DOI 10.17487/RFC8446, August 2018.
[RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J.
Reschke, Ed., "HTTP Semantics", STD 97, RFC 9110,
DOI 10.17487/RFC9110, June 2022.
[IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic",
IEEE 754-2019, July 2019.
[PROTOBUF] Google, "Protocol Buffers Language Specification
(proto3)", .
[PROTOBUF-WIRE]
Google, "Protocol Buffers Encoding",
.
[FIX-SBE] FIX Trading Community, "Simple Binary Encoding,
Version 2.0", FIX Protocol Ltd., 2020,
.
12.2. Informative References
[I-D.ietf-dispatch-mime-protobuf]
Borenstein, N. and S. Vaucher, "The application/
protobuf and application/x-protobuf Media Types",
Work in Progress, Internet-Draft,
draft-ietf-dispatch-mime-protobuf-07.
[RFC6648] Saint-Andre, P., Crocker, D., and M. Nottingham,
"Deprecating the 'X-' Prefix and Similar Constructs
in Application Protocols", BCP 178, RFC 6648,
DOI 10.17487/RFC6648, June 2012.
[PROTOJSON] Google, "Protocol Buffers JSON Mapping (ProtoJSON)",
.
[GRPC] gRPC Authors, "gRPC over HTTP/2",
.
[DEFUSEDXML]
Heimes, C., "defusedxml: Defuses XML bombs and
other exploits",
.
[PROTOWIRE-BIGNUM]
TrendVidia LLC, "PXF arbitrary-precision numeric
types", proto/pxf/bignum.proto in the protowire
canonical repository,
.
Authors' Addresses
B. Franco Jr.
TrendVidia, LLC
Email: contact@trendvidia.com
Franco Jr. Expires 5 November 2026 [Page 25]