Network Working Group B. Franco Jr. Internet-Draft TrendVidia, LLC Intended status: Informational 4 May 2026 Expires: 5 November 2026 The Proto eXpressive Format (PXF) and the protowire Encoding Family draft-trendvidia-protowire-00 Abstract This document specifies the Proto eXpressive Format (PXF), a UTF-8 text serialization for messages defined by Protocol Buffers schemas, together with three companion encodings: PB (the existing Protocol Buffers binary wire format, with PXF-specific annotations that constrain how it is produced and consumed), SBE (a fixed- layout binary encoding derived from FIX Simple Binary Encoding), and a response envelope. Collectively these are referred to as the protowire family. This document defines the wire surface sufficient for independent interoperable implementations. It does not define library APIs. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 5 November 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Franco Jr. Expires 5 November 2026 [Page 1] Internet-Draft protowire May 2026 Table of Contents 1. Introduction ................................................. 3 1.1. Scope ................................................... 3 1.2. Terminology ............................................. 4 2. The protowire Family ......................................... 4 2.1. Common Schema Layer ..................................... 5 2.2. Wire Equivalence ........................................ 5 3. PXF Text Format .............................................. 5 3.1. Character Set ........................................... 5 3.2. Whitespace and Comments ................................. 6 3.3. ABNF Grammar ............................................ 6 3.4. Documents and Type Directives ........................... 8 3.5. Entries and Keys ........................................ 9 3.6. String Literals ......................................... 9 3.7. Bytes Literals ......................................... 11 3.8. Numeric Literals ....................................... 11 3.9. Booleans, Null, and Identifier Values .................. 12 3.10. Timestamps and Durations .............................. 12 3.11. Lists ................................................. 13 3.12. Blocks and Map Tails .................................. 13 4. PB Binary Encoding .......................................... 14 5. SBE Binary Encoding ......................................... 14 5.1. Header and Block Length ................................ 14 5.2. Repeating Groups ....................................... 15 5.3. Variable-Length Data ................................... 15 6. Annotation Extensions ....................................... 15 6.1. PXF Annotations ........................................ 15 6.2. SBE Annotations ........................................ 16 7. Response Envelope ........................................... 16 8. Decoder Conformance ......................................... 17 8.1. Mandatory Limits ....................................... 17 8.2. Recursion .............................................. 18 8.3. UTF-8 Enforcement ...................................... 18 8.4. SBE Bounds Checking .................................... 19 8.5. Map Keys ............................................... 19 9. Media Types ................................................. 19 10. IANA Considerations ........................................ 20 10.1. Media Type Registrations .............................. 20 10.2. Annotation Field Number Range ......................... 21 11. Security Considerations .................................... 21 12. References ................................................. 23 12.1. Normative References .................................. 23 12.2. Informative References ................................ 24 Authors' Addresses ............................................. 24 Franco Jr. Expires 5 November 2026 [Page 2] Internet-Draft protowire May 2026 1. Introduction Protocol Buffers [PROTOBUF] is widely deployed for binary serialization of structured data, but its standard text format ("text format", or "prototext") is targeted at debugging and does not address the requirements of human-edited configuration, API integration, or fixed-layout binary streaming. The protowire family covers those requirements while reusing Protocol Buffers schemas as the single source of truth for field identity, types, and numbering. The family comprises four encodings: PXF A UTF-8 text format with a small, regular grammar; intended for human authoring and machine consumption alike. PB The existing Protocol Buffers binary wire format [PROTOBUF-WIRE]. This document does not redefine PB; it specifies the constraints that a protowire implementation applies on top of PB and the annotation extensions it uses. SBE A fixed-layout binary encoding derived from FIX Simple Binary Encoding [FIX-SBE], driven from Protocol Buffers schemas annotated with SBE template metadata. Envelope A versioned response wrapper carrying transport status, application errors, field-level errors, and an opaque payload. All four are driven from a single set of .proto schemas. PXF and SBE add field-level and message-level annotations expressed as Protocol Buffers extension options, defined in Section 6. Status of the underlying Protocol Buffers specification. Protocol Buffers does not currently have a finalized IETF standard. The canonical specification is published by Google at and, in practice, the protoc reference compiler acts as the de facto specification for behavior the published documentation does not pin down. An IETF effort is in progress to register Protocol Buffers media types [I-D.ietf-dispatch-mime-protobuf]; that draft registers both "application/protobuf" and "application/x-protobuf" (the latter reflecting historical deployment under the experimental "x-" prefix that predates RFC 6648), but does not redefine the wire format. Section 10.1 of this document registers a separate "application/protowire-pb" type that signals the additional conformance and annotation requirements specified here. Relationship to ProtoJSON. Protocol Buffers also defines a JSON mapping commonly called "ProtoJSON" [PROTOJSON], which serializes messages as JSON [RFC8259] objects, represents google.protobuf.Timestamp as an RFC 3339 [RFC3339] string, and represents google.protobuf.Duration as a decimal-seconds string with an "s" suffix. PXF (Section 3) is intentionally distinct from ProtoJSON: PXF targets human authoring and review, supports comments, multi-line strings, and bare-identifier enum values, and uses an entry-list document shape rather than JSON's object-and-comma shape. Where the two formats overlap on semantics — most visibly the use of RFC 3339 for timestamps — PXF intentionally adopts the ProtoJSON convention to reduce the number of disjoint time formats a tooling chain has to support. Implementations are not required to provide ProtoJSON; this document does not specify it. This document treats both [PROTOBUF] (the language and feature spec) and [PROTOBUF-WIRE] (the binary wire format) as normative references in their protobuf.dev form, and inherits whatever stability properties those documents and the protoc implementation provide. A future revision of this document SHOULD migrate to an IETF Protocol Buffers reference if and when one is published as an RFC. 1.1. Scope This document defines: * the lexical and syntactic structure of PXF text (Section 3); * the constraints that protowire applies to PB binary encoding (Section 4); * the SBE wire framing used by protowire (Section 5); * the schema-level annotation extensions (Section 6); * the response envelope (Section 7); * the conformance requirements that any decoder operating on untrusted input MUST satisfy (Section 8). Franco Jr. Expires 5 November 2026 [Page 3] Internet-Draft protowire May 2026 This document does not define library APIs, programming-language bindings, code generation strategies, or performance characteristics. 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. The following terms are used throughout this document: schema A Protocol Buffers FileDescriptorSet [PROTOBUF], together with any annotation extensions defined in Section 6. port An independent implementation of the protowire encodings, typically targeting a single programming language. document A complete PXF input: a sequence of bytes that satisfies the production "document" in Section 3.3. value A PXF construct that denotes a single scalar, list, or nested block; see Section 3.3. entry A key together with an assignment, map, or block tail; see Section 3.5. well-known type One of the Protocol Buffers types listed in Section 6 of [PROTOBUF] (Timestamp, Duration, the *Value wrappers, etc.) plus the protowire-defined arbitrary-precision numeric types (Section 6.1). port-trusted bytes Bytes whose origin is the calling application; not under attacker control. attacker-controlled bytes Bytes whose origin is, or may be, under the control of a party with a goal adverse to the host of the decoding process. 2. The protowire Family A protowire implementation accepts and emits four wire forms for any message defined in a schema: PXF text, PB binary, SBE binary (when the message carries SBE annotations), and Envelope binary (which is itself a PB message). Franco Jr. Expires 5 November 2026 [Page 4] Internet-Draft protowire May 2026 2.1. Common Schema Layer For any schema S, the four encodings represent the same logical value space. Implementations MUST produce wire output that round-trips through every encoding for which the schema is well defined; concretely, for a value v of type T defined in S: * decode_PB(encode_PB(v)) == v * decode_PXF(encode_PXF(v)) == v * decode_SBE(encode_SBE(v)) == v if T carries SBE annotations * encode_PB(decode_PXF(t)) == encode_PB(v) if t = encode_PXF(v) Equality is defined per [PROTOBUF] Section "Message equality" (field-by-field, with proto3 default semantics applied). 2.2. Wire Equivalence Two ports are wire-equivalent for schema S if, for every value v of every message type in S, both ports produce byte-identical PB and Envelope output and parse each other's PXF and SBE output to equal values. Wire equivalence is the contract this document specifies; it is not an API-stability or performance contract. 3. PXF Text Format PXF is a UTF-8 text format. A PXF document is a sequence of entries denoting a single message value, optionally preceded by a type directive that identifies the schema type the document represents. 3.1. Character Set A PXF document is a sequence of bytes that, when interpreted as UTF-8 [RFC3629], yields a sequence of Unicode scalar values. * A document MUST be valid UTF-8. * Decoders MUST reject documents containing invalid UTF-8 byte sequences with an error. * Decoders MUST reject documents containing Unicode surrogate code points (U+D800 through U+DFFF) or code points above U+10FFFF, regardless of how the surrogate or out-of-range value is expressed (raw bytes that round-trip as such are already excluded by valid-UTF-8 conformance; \uHHHH and \UHHHHHHHH escapes producing those code points are excluded by Section 3.6). Franco Jr. Expires 5 November 2026 [Page 5] Internet-Draft protowire May 2026 * A leading UTF-8 byte order mark (U+FEFF, encoded as %xEF.BB.BF) MAY be present and MUST be ignored. Subsequent occurrences of U+FEFF are interpreted as ordinary characters and have no special meaning. 3.2. Whitespace and Comments The following Unicode scalar values are whitespace: U+0009 (HT), U+000A (LF), U+000D (CR), and U+0020 (SPACE). Whitespace separates tokens but is otherwise insignificant. PXF supports two comment forms: * Line comments begin with "#" or "//" and extend to (but do not include) the next U+000A. * Block comments begin with "/*" and end at the next occurrence of "*/". Block comments do not nest. Comments MAY appear anywhere whitespace MAY appear. A comment is treated as a single whitespace character for the purposes of tokenization. 3.3. ABNF Grammar The PXF surface grammar is given in ABNF [RFC5234] [RFC7405]. The grammar describes a token stream; whitespace and comments per Section 3.2 MAY appear between any two adjacent tokens and are not shown. document = [ type-directive ] *field-entry type-directive = %s"@type" 1*WSP identifier entry = field-entry / map-entry field-entry = identifier ( assignment-tail / block-tail ) map-entry = map-key map-tail assignment-tail = "=" value map-tail = ":" value block-tail = "{" *entry "}" map-key = identifier / string / integer value = string / number / bool / null / bytes Franco Jr. Expires 5 November 2026 [Page 6] Internet-Draft protowire May 2026 / timestamp / duration / identifier / list / block-value list = "[" [ value *( [","] value ) ] "]" block-value = "{" *entry "}" number = float / integer integer = [ "-" ] 1*DIGIT float = [ "-" ] 1*DIGIT ( "." *DIGIT [ exponent ] / exponent ) exponent = ( "e" / "E" ) [ "+" / "-" ] 1*DIGIT bool = %s"true" / %s"false" null = %s"null" identifier = ident-start *ident-part ident-start = ALPHA / "_" ident-part = ALPHA / DIGIT / "_" / "." string = triple-string / simple-string simple-string = DQUOTE *( string-char / escape-seq ) DQUOTE string-char = %x20-21 / %x23-5B / %x5D-7F / utf8-non-ascii ; LF, ", \ excluded triple-string = 3DQUOTE triple-content 3DQUOTE triple-content = *( %x00-21 / %x23-7F / utf8-non-ascii ) ; any UTF-8 sequence not containing 3DQUOTE escape-seq = "\" ( simple-escape / hex-escape / octal-escape / unicode-4-escape / unicode-8-escape ) simple-escape = DQUOTE / "\" / "'" / "?" / %x61 / %x62 / %x66 / %x6E / %x72 / %x74 / %x76 ; a b f n r t v hex-escape = "x" 2HEXDIG octal-escape = oct-lead 2OCT-DIGIT ; value <= 0xFF unicode-4-escape = "u" 4HEXDIG unicode-8-escape = "U" 8HEXDIG bytes = %x62 DQUOTE *base64-char DQUOTE ; 'b' "..." base64-char = ALPHA / DIGIT / "+" / "/" / "=" Franco Jr. Expires 5 November 2026 [Page 7] Internet-Draft protowire May 2026 timestamp = date-time ; per RFC 3339, Section 5.6 duration = 1*duration-segment duration-segment = 1*DIGIT [ "." 1*DIGIT ] time-unit time-unit = "ns" / "us" / micro-us / "ms" / "s" / "m" / "h" micro-us = %xC2.B5 %x73 ; UTF-8 of "µs" OCT-DIGIT = %x30-37 oct-lead = %x30-33 ; ensures \NNN <= 0xFF 3DQUOTE = DQUOTE DQUOTE DQUOTE utf8-non-ascii = ABNF core productions ALPHA, DIGIT, HEXDIG, DQUOTE, and WSP are imported from [RFC5234] Appendix B.1. The case-sensitive string-prefix notation %s is from [RFC7405]; PXF identifiers, keywords ("true", "false", "null"), and the "@type" directive are case-sensitive. The grammar is LL(1) modulo the lexical disambiguation rules in Section 3.8 and Section 3.10: * An input that begins with four DIGIT followed by "-" is tokenized as a timestamp, not as a negative integer or a bare identifier-prefix. * An input matching 1*DIGIT [ "." 1*DIGIT ] time-unit, where the time-unit is one of the literal strings in the grammar above, is tokenized as a duration. An identifier whose initial characters happen to match a duration prefix (for example "5seconds") is tokenized as an identifier because identifier productions extend through ALPHA / "_". * Numeric literals take precedence over identifiers: a leading DIGIT or "-" forces the numeric branch. 3.4. Documents and Type Directives A PXF document represents a single message value. The optional type directive, of the form @type identifier names the fully-qualified message type. Decoders MAY require the type directive when the calling application has not pre-bound a target type; decoders MUST ignore the directive when a target type has been pre-bound and the directive matches it, and MUST reject when it does not match. Franco Jr. Expires 5 November 2026 [Page 8] Internet-Draft protowire May 2026 A document with no entries denotes the message-typed default value (all fields unset). 3.5. Entries and Keys An entry binds a key to a value. The key is a field name within the surrounding message type, with the following rules: * An identifier key matches a field by its proto field name (the lowerCamelCase or snake_case name in the schema, as written; both forms are accepted, and emitters SHOULD use whichever the schema declares). * A string or integer key is permitted only inside a map literal (i.e. the *map-entry* production of Section 3.3). A string key MUST be a UTF-8 string; for map fields with non-string K, the string is parsed as a literal of K's type. An integer key matches a map field whose K is one of the protobuf scalar integer types (int32, int64, sint32, sint64, uint32, uint64, fixed32, fixed64, sfixed32, sfixed64, bool encoded as 0/1). The three entry tails are NOT interchangeable; the grammar in Section 3.3 splits them across two productions: * An assignment-tail "=" binds a scalar, list, or block to a field of an enclosing *message* type. It is the right-hand side of *field-entry* and REQUIRES an identifier key (= proto field name). Parsers MUST reject "=" with a non-identifier key (string or integer) at parse time. * A map-tail ":" binds a value to a key of an enclosing *map* type. It is the right-hand side of *map-entry*. *map-entry* MUST NOT appear at document top level (the document represents a proto message, never a map); parsers MUST reject ":" at the top level with an error indicating that field assignments use "=". * A block-tail "{ ... }" with no preceding "=" or ":" is permitted only in message context, where the bound field is message-typed; it is equivalent to "= { ... }". Like assignment-tail, it REQUIRES an identifier key, and parsers MUST reject a block-tail with a non-identifier key. Map values that are themselves messages MUST use the explicit form "key: { ... }"; the bare-block form is not accepted in map context. Inside a "{ ... }" block the parser cannot statically tell whether the surrounding field is message-typed or map-typed; both *field-entry* and *map-entry* are accepted in that position and the message-vs-map disambiguation is performed by the schema-resolution step that runs after parsing. Repeated fields MAY be expressed either as a single key with a list-typed value, or as multiple entries with the same key and scalar (or block) values; the two forms denote the same field value, with elements concatenated in document order. 3.6. String Literals PXF supports two string forms. Both denote sequences of Unicode scalar values. Simple strings use double-quote delimiters and recognize escape sequences: Franco Jr. Expires 5 November 2026 [Page 9] Internet-Draft protowire May 2026 "Hello, world\n" Within a simple-string, U+000A (LF) MUST NOT appear unescaped: line continuations are not supported. Decoders MUST reject a simple-string containing a literal LF. The defined simple escapes are: \" U+0022 QUOTATION MARK \\ U+005C REVERSE SOLIDUS \' U+0027 APOSTROPHE \? U+003F QUESTION MARK \a U+0007 BELL \b U+0008 BACKSPACE \f U+000C FORM FEED \n U+000A LINE FEED \r U+000D CARRIAGE RETURN \t U+0009 CHARACTER TABULATION \v U+000B LINE TABULATION Numeric escapes: \xHH two hex digits, denotes the byte 0xHH \NNN three octal digits, value MUST be <= 0xFF; denotes the byte 0xNN \uHHHH four hex digits, denotes Unicode scalar U+HHHH \UHHHHHHHH eight hex digits, denotes Unicode scalar U+HHHHHHHH The \uHHHH and \UHHHHHHHH forms MUST denote a Unicode scalar value: the code point MUST be in U+0000..U+10FFFF and MUST NOT be a surrogate (U+D800..U+DFFF). Decoders MUST reject otherwise. The interpretation of \xHH and octal escapes depends on the target field type: * When the surrounding string literal is bound to a proto3 string-typed field, the result of escape expansion MUST be valid UTF-8. Decoders MUST reject a string-typed value whose escape-expanded byte sequence is not valid UTF-8 (Section 8.3). * When the surrounding string literal is bound to a proto3 bytes-typed field, no UTF-8 constraint applies. Triple-quoted strings ("""...""") begin with three U+0022 characters and end at the next occurrence of three consecutive U+0022 characters. Inside a triple-quoted string: * Escape sequences are NOT interpreted; backslashes are literal. Franco Jr. Expires 5 November 2026 [Page 10] Internet-Draft protowire May 2026 * If the byte immediately following the opening """ is U+000A, that LF MUST be removed by the decoder. * After leading-LF stripping, if every non-empty line preceding the closing """ shares a common leading-whitespace prefix, that prefix MUST be removed from each such line. The "leading whitespace" is the longest sequence of U+0020 and U+0009 characters at the start of a line; lines that consist exclusively of whitespace do not constrain the prefix. The output of triple-quote processing MUST be valid UTF-8 when bound to a string-typed field; the same UTF-8 conformance rule in Section 8.3 applies. 3.7. Bytes Literals A bytes literal is the lowercase letter "b" immediately followed by a double-quoted body containing only base64 characters [RFC4648]: b"SGVsbG8sIHdvcmxkIQ==" Decoders MUST accept both the standard base64 alphabet and the URL-safe alphabet [RFC4648] Section 5; padding ("=") is OPTIONAL and MUST be tolerated whether present or absent. Decoders MUST reject input containing characters outside both alphabets and "=". Whitespace is NOT permitted inside the body. Backslashes inside b"..." are NOT interpreted as escape introducers. 3.8. Numeric Literals Integers are sequences of decimal digits, optionally preceded by "-". Hexadecimal, octal, and binary integer literals are NOT defined in this version. Floats are decimal-point or exponent forms; see the ABNF in Section 3.3. The literal "1." is a valid float (integer part 1, empty fractional part); the literal ".5" is NOT (no integer part). This restriction is intentional: an unprefixed "." is reserved for future qualified-key syntax. The target field type determines the numeric domain: * For fixed-width integer fields (int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64), decoders MUST reject literals whose value falls outside the field's representable range. Franco Jr. Expires 5 November 2026 [Page 11] Internet-Draft protowire May 2026 * For float and double fields, decoders MUST accept literals in the IEEE 754 [IEEE754] representable range; values that round to +Inf or -Inf are rejected unless explicitly written as the identifiers "inf", "+inf", "-inf", "nan". * For fields of the well-known types pxf.BigInt, pxf.Decimal, and pxf.BigFloat (Section 6.1), decoders MUST preserve the literal's exact value, including, for Decimal, the exact scale: the literal "1.00" decodes to a Decimal with unscaled value 100 and scale 2, distinct from "1.0" or "1". The number of digits in any single numeric literal is bounded; see Section 8.1. 3.9. Booleans, Null, and Identifier Values The literals "true" and "false" denote the boolean values; "null" denotes the null value. All three are case-sensitive. A "null" literal: * When bound to a singular message-typed field, MUST clear the field. * When bound to a singular scalar field, MUST be rejected unless the schema marks the field as nullable via a wrapper type (e.g. google.protobuf.StringValue) or via a future annotation. * When bound to a list-typed (repeated) field, MUST be rejected: list elements are not nullable. An identifier appearing as a value denotes an enum value name when the target field is enum-typed. Decoders MUST reject unknown enum names unless the schema declares the field with open enum semantics, in which case the unknown name is preserved in the message's unknown-fields set. 3.10. Timestamps and Durations A timestamp literal is an RFC 3339 [RFC3339] date-time production. The lexer recognizes a timestamp by lookahead: an input matching the regular expression /^[0-9]{4}-/ at a position where a value is expected MUST be tokenized as a timestamp. This rule disambiguates against the identifier and integer productions. The choice of RFC 3339 aligns with the ProtoJSON [PROTOJSON] serialization of google.protobuf.Timestamp, which uses the same format. A PXF timestamp literal and the ProtoJSON string form of the same Timestamp value are byte-identical except for the surrounding ProtoJSON quote characters. PXF duration literals are NOT byte-compatible with ProtoJSON's Duration form (a decimal-seconds string with an "s" suffix); PXF retains the multi-segment unit form ("1h30m500ms") because it is materially easier to read in human-authored documents. Decoders MUST accept timestamps with arbitrary fractional-second precision up to nanoseconds. Decoders that target a fixed-precision representation (typically Franco Jr. Expires 5 November 2026 [Page 12] Internet-Draft protowire May 2026 google.protobuf.Timestamp, which has nanosecond resolution) MUST reject literals exceeding that precision rather than silently truncating. A duration literal is a sequence of one or more segments, each consisting of a numeric magnitude followed by a unit suffix: 30s ; 30 seconds 1h30m ; 1 hour 30 minutes 500ms ; 500 milliseconds 1.5h ; 1.5 hours 2µs ; 2 microseconds (alternative form: 2us) The defined units are "ns", "us", "µs", "ms", "s", "m", and "h", denoting nanoseconds, microseconds (twice; "us" and "µs" are semantic equivalents), milliseconds, seconds, minutes, and hours respectively. Day, week, month, and year units are NOT defined and MUST NOT be inferred. 3.11. Lists A list value is a sequence of values delimited by "[" and "]". Elements MAY be separated by "," or by intervening whitespace alone (including newlines), or by both: [1, 2, 3] [1 2 3] [ "alpha" "beta" ] A trailing comma is permitted. List values bind only to repeated fields, to fields whose Protocol Buffers type is a list-shaped well-known type, and to list elements within other lists. 3.12. Blocks and Map Tails A block value "{ entry-list }" denotes a nested message when the surrounding context is message-typed, and a map literal when the surrounding context is map-typed. Within the block, keys are interpreted against the nested message's or map's schema. Blocks nest to arbitrary depth subject to the limit in Section 8.1. A nested message field is bound with "=" (or with the bare-block form, which is its abbreviation): Franco Jr. Expires 5 November 2026 [Page 13] Internet-Draft protowire May 2026 address = { city = "Berlin", zip = "10115" } address { city = "Berlin", zip = "10115" } ; same value A map field is bound with "=" between the field name and the block, but the entries inside the block use ":" because each entry is a key-of-the-map-to-value pair: headers = { "X-Request-ID": "abc123" "Content-Type": "application/pxf" } Decoders MUST reject "=" inside a map block and ":" inside a message block (Section 3.5). Within either kind of block, entries MAY be separated by ";" or by newlines or by both; this is consistent with the ABNF in Section 3.3, where entries are juxtaposed without an explicit separator and the optional ";" is consumed as whitespace. 4. PB Binary Encoding The protowire PB encoding is the Protocol Buffers binary wire format [PROTOBUF-WIRE] with no protowire-specific changes to the wire grammar. This document does not redefine PB. Constraints applied by protowire: * Annotation extensions defined in Section 6 are encoded as Protocol Buffers field options on the relevant FieldOptions, MessageOptions, and FileOptions messages. Their field numbers are reserved (Section 10.2). * Decoders MUST enforce the limits in Section 8.1 on PB inputs. In particular, the depth limit applies to nested submessages, groups, and map entries; the length-prefix limit MUST be enforced before allocation. * A protowire emitter SHOULD produce fields in field-number order; decoders MUST accept any field order, per the existing PB rules. 5. SBE Binary Encoding The protowire SBE encoding is a subset of FIX Simple Binary Encoding [FIX-SBE]. An SBE message is generated for any Protocol Buffers message annotated with sbe.template_id; the schema-level identifiers come from sbe.schema_id and sbe.version on the file (Section 6.2). Field-level annotations sbe.length and sbe.encoding control fixed-byte-length string and bytes fields and primitive-type narrowing respectively (Section 6.2). 5.1. Header and Block Length Every SBE message on the wire is preceded by an 8-byte message-header carrying the schema's blockLength, templateId, schemaId, and version, in little-endian byte order. Decoders MUST validate, before reading any field of the body: Franco Jr. Expires 5 November 2026 [Page 14] Internet-Draft protowire May 2026 1. data.length >= HEADER_SIZE + wire_block_length 2. wire_block_length >= template_block_length, where template_block_length is the value the decoder's compiled schema specifies for this template. A wire block strictly smaller than the template block length MUST be rejected. A wire block strictly larger MUST be accepted (forward compatibility for additive schema evolution). 5.2. Repeating Groups Each repeating group is preceded by a group-header carrying blockLength and numInGroup. Before iterating any group, decoders MUST validate: 1. pos + GROUP_HEADER_SIZE <= data.length 2. count multiplied by wire_block_length does not overflow 64-bit unsigned arithmetic 3. pos + GROUP_HEADER_SIZE + count * wire_block_length <= data.length 4. If count > 0, then wire_block_length > 0. A wire_block_length of 0 with a non-zero count MUST be rejected before any per-element allocation. 5.3. Variable-Length Data Variable-length data ("varData") fields follow the fixed block, each preceded by a length prefix per the schema's varDataEncoding. Decoders MUST validate that pos + LENGTH_PREFIX_SIZE + length <= data.length before reading the data. 6. Annotation Extensions protowire defines extensions on Protocol Buffers FileOptions, MessageOptions, and FieldOptions. Field numbers are allocated from the reserved range described in Section 10.2. 6.1. PXF Annotations The PXF annotations apply to FieldOptions: extend google.protobuf.FieldOptions { bool required = 50000; string default = 50001; } Franco Jr. Expires 5 November 2026 [Page 15] Internet-Draft protowire May 2026 pxf.required: when true, decoders MUST reject a PXF document in which the annotated field is absent. A field bound to "null" is considered present for the purpose of this check; null-rejection for non-nullable types is governed by Section 3.9. pxf.default: when set, the value is a PXF literal (parsed by the same rules as a value in any entry). Decoders MUST treat an absent annotated field as if the document had supplied the default literal. The default applies only to absent fields, not to fields explicitly set to "null" or to the proto3 zero value. The well-known types pxf.BigInt, pxf.Decimal, and pxf.BigFloat are defined in [PROTOWIRE-BIGNUM] (this is the proto/pxf/bignum.proto file in the canonical repository). They provide arbitrary-precision signed integer, exact decimal, and binary floating-point representations respectively, encoded over PB as length-delimited unsigned big-endian magnitudes plus sign and scale fields. 6.2. SBE Annotations SBE annotations apply at three scopes: extend google.protobuf.FileOptions { uint32 schema_id = 50100; uint32 version = 50101; } extend google.protobuf.MessageOptions { uint32 template_id = 50200; } extend google.protobuf.FieldOptions { uint32 length = 50300; string encoding = 50301; } sbe.schema_id and sbe.version identify the schema as a whole. sbe.template_id MUST be unique among messages within a schema. sbe.length specifies a fixed byte length for string- or bytes-typed fields. Values longer than the limit MUST be truncated by emitters; values shorter MUST be padded with U+0000 bytes; decoders MUST trim trailing U+0000 bytes when populating string fields and MUST NOT trim them when populating bytes fields. sbe.encoding narrows a Protocol Buffers numeric type to a smaller SBE primitive. The defined values are: "int8", "int16", "int32", "int64", "uint8", "uint16", "uint32", "uint64", Franco Jr. Expires 5 November 2026 [Page 16] Internet-Draft protowire May 2026 "float", "double". Emitters MUST reject values that fall outside the narrowed type's range; decoders MUST sign-extend or zero-extend per the SBE primitive when populating the wider Protocol Buffers field. 7. Response Envelope The Envelope message provides a uniform response carrier across wire formats. Its schema is defined in package envelope.v1: message Envelope { int32 status = 1; string transport_error = 2; bytes data = 3; AppError error = 4; } message AppError { string code = 1; string message = 2; repeated string args = 3; repeated FieldError details = 4; map metadata = 5; } message FieldError { string field = 1; string code = 2; string message = 3; repeated string args = 4; } Semantics: * status carries an HTTP [RFC9110] or gRPC [GRPC] status code. * transport_error is set when no application-layer response was produced (network error, timeout, connection refused). Implementations MUST NOT set transport_error and error simultaneously. * data is the success payload, encoded in whichever wire form the surrounding transport selects (PXF, PB, JSON). Decoders MUST treat data as opaque bytes; it is parsed only after the envelope itself is parsed. * error.code and details[*].code are machine-readable identifiers; clients perform localization by consulting a string table keyed by "error." and "field." and substituting args positionally. Franco Jr. Expires 5 November 2026 [Page 17] Internet-Draft protowire May 2026 The package path "envelope.v1" is part of the wire contract. Incompatible changes MUST bump the package to "envelope.v2"; "v1" and "v2" MAY coexist indefinitely. 8. Decoder Conformance This section specifies the requirements that any protowire decoder operating on attacker-controlled bytes MUST satisfy. The threat model assumes: * the attacker controls every byte of the input, * the attacker controls the input length, up to a configured maximum, * the attacker MAY submit many inputs concurrently from many sources, * the schema (.proto descriptor, SBE template) and the configured limits are NOT attacker-controlled. Under these conditions, a conforming decoder MUST terminate in time and memory bounded by the input length and the configured limits only. It MUST NOT crash, abort, or unwind the host process; it MUST NOT allocate beyond the configured budget; it MUST NOT return string-typed fields whose contents are not valid UTF-8. 8.1. Mandatory Limits Conforming decoders MUST enforce the following limits. Defaults are recommended; the values are configurable per call by the calling application except where noted. +========================+============+=========================+ | Limit | Default | Applies to | +========================+============+=========================+ | MaxNestingDepth | 100 | PXF block/list nesting; | | | | PB submessage / group / | | | | map-entry nesting | +------------------------+------------+-------------------------+ | MaxMessageSize | 67108864 | Total input length to a | | | (64 MiB) | single decode call | +------------------------+------------+-------------------------+ | MaxNumericLiteralDigits| 4096 | Digit count of any PXF | | | | numeric literal | +------------------------+------------+-------------------------+ | MaxBytesLiteralLength | = | Decoded byte length of | | | MaxMessage | any PXF b"..." literal | Franco Jr. Expires 5 November 2026 [Page 18] Internet-Draft protowire May 2026 | | Size | | +------------------------+------------+-------------------------+ | MaxVarintBytes | 10 | Every PB varint read. | | | (fixed) | NOT configurable. | +------------------------+------------+-------------------------+ | MaxRepeatedCount | = | Element count of any | | | MaxMessage | repeated, map, or SBE | | | Size | group field | +------------------------+------------+-------------------------+ A decoder presented with input that requires exceeding any limit MUST return an error before allocating memory proportional to the violating quantity. It MUST NOT abort, panic, raise an uncatchable exception, or unwind into a state from which the caller cannot recover. Length arithmetic MUST be performed in 64-bit unsigned arithmetic and checked against the buffer length before any narrowing conversion to a native integer width and before any allocation. 8.2. Recursion PXF parsers descend recursively into "{...}" blocks and "[...]" lists; PB parsers descend into submessages, groups, and map entries. Conforming decoders MUST: 1. maintain a depth counter incremented at every recursive descent; 2. reject input that would cause the counter to exceed MaxNestingDepth; 3. thread the depth counter through inner decoder instances constructed mid-stream. In particular, when a nested submessage is decoded by handing its bytes to a freshly constructed input-stream object, the depth counter MUST be passed in rather than reset to zero. 8.3. UTF-8 Enforcement Proto3 string fields are sequences of valid UTF-8 [RFC3629]. Conforming decoders: * MUST validate UTF-8 strictly when populating any string-typed field, regardless of the source encoding (PB length-delimited bytes, SBE char-array fields, SBE varData, PXF simple-string or triple-string). Franco Jr. Expires 5 November 2026 [Page 19] Internet-Draft protowire May 2026 * MUST NOT use UTF-8 decoders that substitute U+FFFD for invalid sequences when populating string-typed fields. * MUST reject PXF \xHH and \NNN (octal) escapes that produce invalid UTF-8 when the surrounding literal is bound to a string-typed field. The same byte sequences are permitted inside b"..." (bytes literal) and inside string literals bound to bytes-typed fields. * MUST reject PXF \uHHHH and \UHHHHHHHH escapes that encode a surrogate code point or a code point above U+10FFFF. 8.4. SBE Bounds Checking The bounds-checking obligations in Section 5 are conformance requirements, restated here for emphasis. Specifically: * Wire block length less than template block length MUST be rejected (Section 5.1). * Group count multiplied by group block length MUST be 64-bit checked against the buffer length before iteration (Section 5.2). * A group with count > 0 and wire_block_length == 0 MUST be rejected before any per-element allocation (Section 5.2). 8.5. Map Keys In implementation languages where dynamic property assignment walks a prototype chain (notably JavaScript / TypeScript), a conforming decoder MUST NOT use a plain object literal as the container for attacker-keyed maps. Such decoders MUST use a prototype-free object (Object.create(null)) or a Map, or MUST explicitly reject the keys "__proto__", "constructor", and "prototype". The same obligation applies in any other implementation language that exhibits prototype-mutation semantics for reserved string keys. 9. Media Types This document defines the following media types. application/pxf PXF text format (Section 3). The schema-type association is carried either by the document's @type directive or out-of-band (e.g. an HTTP Content-Schema parameter). Charset: UTF-8 (fixed; the format MUST be UTF-8 per Section 3.1). Franco Jr. Expires 5 November 2026 [Page 20] Internet-Draft protowire May 2026 application/protowire-pb Protocol Buffers binary, with the protowire constraints (Section 4). application/protowire-sbe SBE binary, with the protowire constraints (Section 5). application/protowire-envelope The Envelope message (Section 7), encoded as Protocol Buffers binary. The data field's content type is carried in the envelope-data-type parameter of the media type or, for transports that lack media-type parameters, in a transport- level header. 10. IANA Considerations 10.1. Media Type Registrations Relationship to "application/protobuf" and "application/x-protobuf". Prior to [I-D.ietf-dispatch-mime-protobuf], no IETF-registered media type existed for Protocol Buffers binary, and deployments converged informally on "application/protobuf" and (less preferably, per [RFC6648]) "application/x-protobuf". The dispatch draft registers both. Neither carries protowire's additional decoder-conformance and annotation-extension constraints. This document registers "application/protowire-pb" as a distinct type rather than layering a parameter on "application/protobuf" because (a) the conformance requirements in Section 8 are mandatory for protowire payloads and (b) protowire payloads are tied to a schema that uses the annotation extensions in Section 6. A recipient that handles "application/protobuf" but not "application/protowire-pb" will, in the absence of those annotations and limits, parse the bytes correctly but will not provide the protowire conformance guarantees. Servers negotiating with a client that advertises only "application/protobuf" SHOULD downgrade to that media type and accept the loss of protowire-specific guarantees rather than refuse the request. IANA is requested to register the following media types in the "Media Types" registry [RFC6838]: Type name: application Subtype name: pxf Required parameters: none Optional parameters: charset (fixed value: utf-8) Encoding considerations: 8bit; UTF-8 text. Security considerations: See Section 11 of this document. Interoperability considerations: See Section 8. Published specification: This document. Applications that use this media type: Configuration tooling, API integration, schema-driven editors. Fragment identifier considerations: none defined. Author/Change controller: IETF. Provisional registration: yes. Type name: application Subtype name: protowire-pb Required parameters: none Optional parameters: schema (URI of the FileDescriptorSet) Encoding considerations: binary. Security considerations: See Section 11. Interoperability considerations: See Section 8. Published specification: This document. Applications that use this media type: API integration. Fragment identifier considerations: none defined. Author/Change controller: IETF. Provisional registration: yes. Franco Jr. Expires 5 November 2026 [Page 21] Internet-Draft protowire May 2026 Type name: application Subtype name: protowire-sbe Required parameters: none Optional parameters: schema (URI of the SBE schema XML) Encoding considerations: binary. Security considerations: See Section 11. Interoperability considerations: See Section 8. Published specification: This document. Applications that use this media type: Low-latency message streaming, market-data fan-out. Fragment identifier considerations: none defined. Author/Change controller: IETF. Provisional registration: yes. Type name: application Subtype name: protowire-envelope Required parameters: none Optional parameters: envelope-data-type (media type of the "data" field's contents). Encoding considerations: binary. Security considerations: See Section 11. Interoperability considerations: See Section 8. Published specification: This document. Applications that use this media type: API integration. Fragment identifier considerations: none defined. Author/Change controller: IETF. Provisional registration: yes. 10.2. Annotation Field Number Range This document allocates Protocol Buffers extension field numbers in the range 50000-59999 to the protowire family. Field numbers in this range are reserved for the extensions defined in Section 6 and for future extensions of this document; they MUST NOT be reused for unrelated extensions. The currently assigned numbers are: pxf.required 50000 pxf.default 50001 sbe.schema_id 50100 sbe.version 50101 sbe.template_id 50200 sbe.length 50300 sbe.encoding 50301 Future protowire extensions SHOULD allocate within this range and document the assignment in a successor of this document. Franco Jr. Expires 5 November 2026 [Page 22] Internet-Draft protowire May 2026 11. Security Considerations The protowire family is designed to be parsed safely on attacker-controlled bytes. Section 8 specifies the conformance requirements that follow from this objective. This section addresses the considerations that do not reduce to a single conformance requirement. Resource exhaustion. Without the limits in Section 8.1, every protowire encoding admits trivial denial-of-service inputs: deeply nested PXF blocks blow native call stacks; large PB length prefixes drive allocator pressure; SBE group counts multiplied by element sizes overflow length arithmetic; long PXF numeric literals drive quadratic big-number parsers. Implementers MUST apply Section 8.1. Length-arithmetic overflow. All offset, length, and count arithmetic on attacker-supplied quantities MUST use 64-bit unsigned operations and MUST be checked against the input length before any narrowing. Implementations in languages whose default integer width on the host platform is 32 bits MUST be particularly careful to use explicit 64-bit types. Trapping conversions. Several implementation languages provide integer conversions that abort the process on out-of-range input (Swift Int(_:), Rust "as" without checked_*, Java Math.toIntExact, C++ static_cast with subsequent UB). When converting attacker-supplied lengths or counts to native integer widths, implementations MUST use fallible conversion forms. UTF-8 substitution. Many platform string APIs silently substitute U+FFFD for invalid UTF-8. When such substitution is applied to a string-typed field, the resulting message violates the proto3 invariant that string fields contain valid UTF-8 and may differ from the producer's intent. Implementations MUST use strict, error-returning UTF-8 decoders on string fields (Section 8.3). Schema input. An implementation that accepts a FileDescriptorSet or SBE schema XML at runtime, where the descriptor or XML is, or may be, attacker-controlled, MUST apply the limits of Section 8.1 to the schema parser as well. XML schema parsers MUST disable DTDs and external entities to mitigate XXE attacks (e.g. defusedxml [DEFUSEDXML] in Python; XmlReaderSettings.DtdProcessing = Prohibit in .NET; the feature flags "disallow-doctype-decl", "external-general-entities", and "external-parameter-entities" in Java). Franco Jr. Expires 5 November 2026 [Page 23] Internet-Draft protowire May 2026 Prototype pollution. Decoders implemented in JavaScript, TypeScript, or any language with prototype-mutation semantics for reserved string keys MUST avoid plain object literals as the storage for attacker-keyed maps; see Section 8.5. Cryptographic transport. This document specifies an encoding family. It does not provide confidentiality, integrity, or origin authentication; transports carrying protowire payloads SHOULD use TLS [RFC8446] or an equivalent. Application-level error metadata. AppError.metadata (Section 7) is a free-form string-to-string map. Servers SHOULD NOT place sensitive information (credentials, raw user input) in metadata. Clients SHOULD treat metadata values as untrusted and apply context-appropriate sanitization before display or logging. 12. References 12.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November 2003. [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006. [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, DOI 10.17487/RFC5234, January 2008. [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, DOI 10.17487/RFC6838, January 2013. [RFC7405] Kyzivat, P., "Case-Sensitive String Support in ABNF", RFC 7405, DOI 10.17487/RFC7405, December 2014. Franco Jr. Expires 5 November 2026 [Page 24] Internet-Draft protowire May 2026 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017. [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10.17487/RFC8259, December 2017. [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018. [RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, Ed., "HTTP Semantics", STD 97, RFC 9110, DOI 10.17487/RFC9110, June 2022. [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", IEEE 754-2019, July 2019. [PROTOBUF] Google, "Protocol Buffers Language Specification (proto3)", . [PROTOBUF-WIRE] Google, "Protocol Buffers Encoding", . [FIX-SBE] FIX Trading Community, "Simple Binary Encoding, Version 2.0", FIX Protocol Ltd., 2020, . 12.2. Informative References [I-D.ietf-dispatch-mime-protobuf] Borenstein, N. and S. Vaucher, "The application/ protobuf and application/x-protobuf Media Types", Work in Progress, Internet-Draft, draft-ietf-dispatch-mime-protobuf-07. [RFC6648] Saint-Andre, P., Crocker, D., and M. Nottingham, "Deprecating the 'X-' Prefix and Similar Constructs in Application Protocols", BCP 178, RFC 6648, DOI 10.17487/RFC6648, June 2012. [PROTOJSON] Google, "Protocol Buffers JSON Mapping (ProtoJSON)", . [GRPC] gRPC Authors, "gRPC over HTTP/2", . [DEFUSEDXML] Heimes, C., "defusedxml: Defuses XML bombs and other exploits", . [PROTOWIRE-BIGNUM] TrendVidia LLC, "PXF arbitrary-precision numeric types", proto/pxf/bignum.proto in the protowire canonical repository, . Authors' Addresses B. Franco Jr. TrendVidia, LLC Email: contact@trendvidia.com Franco Jr. Expires 5 November 2026 [Page 25]