Schema-backed serialization

One .proto, three wire formats. Built for humans and machines.

ProtoWire is a serialization family driven by your existing Protocol Buffer schemas. Read and write the same message as PXF (Proto eXpressive Format) for humans, pb (protobuf) for the wire, or SBE (Simple Binary Encoding) when latency matters. No new schema language to learn.

ProtoWire logo

Motivation

Configs and messages deserve types.

JSON and YAML are easy to read but tell you nothing about shape. Textproto is tied to one tool. FlatBuffers, MessagePack, and CBOR drop the schema at the edge. ProtoWire reuses the .proto definitions you already have, so every byte across every encoding stays type-checked, versioned, and wire-compatible.

Schema-backed

The parser always knows each field's type. No ambiguity, no out-of-band validation, no surprise null.

Three formats, one schema

PXF (text), pb (protobuf wire), and SBE (FIX Simple Binary Encoding) are generated from the same .proto.

Three field states

Distinguish set, null, and absent, which is essential for PATCH semantics and config inheritance.

Well-known type sugar

Inline RFC 3339 timestamps and Go-style durations. Write 2024-01-15T10:30:00Z or 1h30m directly, no nested blocks required.

PXF in practice

Reads like config. Validates like protobuf.

The text format mixes assignments, blocks, lists, and maps in one consistent syntax. Comments use #, //, or /* */. Triple-quoted strings preserve raw multi-line content.

Server config: assignments and blocks
// Schema-pinned to infra.v1.ServerConfig
@type infra.v1.ServerConfig

hostname = "web-01.prod.example.com"
port     = 8443
enabled  = true
status   = STATUS_SERVING

created_at = 2024-01-15T10:30:00Z
timeout    = 30s

tls {
  cert_file = "/etc/ssl/cert.pem"
  key_file  = "/etc/ssl/key.pem"
  verify    = true
}

tags = ["production", "us-east", "frontend"]

labels = {
  env: "production"
  team: "platform"
  "hello world": "quoted keys supported"
}
syntax = "proto3";
package infra.v1;

import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

message ServerConfig {
  string hostname = 1;
  int32 port = 2;
  bool enabled = 3;
  Status status = 4;
  google.protobuf.Timestamp created_at = 5;
  google.protobuf.Duration timeout = 6;
  TLS tls = 7;
  repeated string tags = 8;
  map<string, string> labels = 9;
}

enum Status {
  STATUS_UNSPECIFIED = 0;
  STATUS_SERVING = 1;
}

message TLS {
  string cert_file = 1;
  string key_file = 2;
  bool verify = 3;
}
Repeated messages and integer maps
endpoints = [
  {
    path   = "/api/v1/users"
    method = GET
  }
  {
    path   = "/health"
    method = GET
  }
]

error_codes = {
  404: "Not Found"
  500: "Internal Error"
}

nullable_name = "present"
message ServerConfig {
  // ... fields from previous example ...
  repeated Endpoint endpoints = 10;
  map<int32, string> error_codes = 11;
  optional string nullable_name = 12;
}

message Endpoint {
  string path = 1;
  Method method = 2;
}

enum Method {
  METHOD_UNSPECIFIED = 0;
  GET = 1;
  POST = 2;
  PUT = 3;
  DELETE = 4;
}
Oneof with nested choices
event_id = "evt-456"

user {
  user_id = "u-123"
  login {
    ip  = "192.168.1.1"
    mfa = true
  }
}
message Event {
  string event_id = 1;
  User user = 2;
}

message User {
  string user_id = 1;
  oneof action {
    Login login = 10;
    Logout logout = 11;
  }
}

message Login {
  string ip = 1;
  bool mfa = 2;
}

message Logout {
  string reason = 1;
}
Required and default annotations

Field options (pxf.required) = true and (pxf.default) = "…" travel with the schema. A decoder rejects any document that omits a required field, and fills in the literal default when an annotated field is absent. Defaults apply to absent fields only, never to fields explicitly set to null or to the proto3 zero value.

@type infra.v1.ServerConfig

hostname = "web-01.prod.example.com"
region   = "us-east-1"

# port, enabled, log_level are all omitted.
# The decoder fills them with the schema's defaults:
#   port      = 8080
#   enabled   = true
#   log_level = "info"
#
# Drop the `region =` line and the same decoder rejects
# the document: required field "region" is missing.
syntax = "proto3";
package infra.v1;

import "pxf/annotations.proto";

message ServerConfig {
  string hostname  = 1 [(pxf.required) = true];
  string region    = 2 [(pxf.required) = true];

  int32  port      = 3 [(pxf.default)  = "8080"];
  bool   enabled   = 4 [(pxf.default)  = "true"];
  string log_level = 5 [(pxf.default)  = "\"info\""];
}

Data transfer & system integration

The CSV problem, fixed at the type system.

CSV is the lingua franca of cross-team data handoff — and the source of every cross-team data bug. Is 1 a string or a number? Is the empty cell missing, null, or zero? Did the producer mean ISO date or US date? Is TRUE a bool or just a string that looks like one? PXF's @dataset directive carries row-oriented bulk data with declared types; pair it with @proto and the same file declares its own schema, so producers and consumers agree before the first row is parsed.

Same trades export: CSV vs PXF
symbol,price,qty,active,created_at
AAPL,188.42,100,true,2024-01-15
MSFT,415.10,50,false,2024-01-15
AAPL,188.55,,,2024-01-15
// Schema travels with the data: no -p flag, no out-of-band .proto file
@proto trades.v1.Trade {
  string symbol     = 1;
  double price      = 2;
  int64  qty        = 3;
  bool   active     = 4;
  google.protobuf.Timestamp created_at = 5;
}
@dataset trades.v1.Trade ( symbol, price, qty, active, created_at )
( "AAPL", 188.42, 100, true,  2024-01-15T00:00:00Z )
( "MSFT", 415.10,  50, false, 2024-01-15T00:00:00Z )
( "AAPL", 188.55,    ,      , 2024-01-15T00:00:00Z )

Types travel with the data

The column header declares the schema's field name; the schema declares the type. Producers can't accidentally emit "100" where a number is expected — the encoder won't let them. Consumers don't guess.

Empty cell ≠ null ≠ zero

Three distinct states the consumer can act on: the empty cell is absent; null is present, intentionally null; 0 is present, zero. Patch semantics, default fallbacks, and audit logs all rely on this distinction; CSV collapses it.

Schema travels with the data

The @proto directive embeds the .proto definitions inline. No out-of-band schema sharing, no version-drift bugs, no Slack threads asking what column 5 means — the file is fully interpretable by anyone who reads it.

Out of the box

One binary, plus editors that speak PXF.

The pxf CLI ships with every operation the stack supports — encode and decode against protobuf binary, validate and pretty-print, lint schemas, query with jq syntax, infer a .proto from a sample CSV. Editor support for VS Code and JetBrains IDEs shares the same TextMate grammar, with inline parse-error squiggles powered by the canonical parser. No separate tools, no plugin churn.

# Install (one binary; same package every language port also ships):
go install github.com/trendvidia/protowire/cmd/pxf@latest
pxf query: jq-style queries against self-describing PXF
# trades.pxf carries an @proto schema header + @dataset rows.
# No -p flag needed; the schema travels with the data.

# Count rows where the symbol is AAPL:
pxf query 'pxf_directive("dataset")[0].rows
            | map(select(.symbol == "AAPL"))
            | length' trades.pxf

# Build a typed object from a different file and emit valid PXF:
pxf query 'pxf_proto("trades.v1.Trade";
            { symbol: .ticker, price: .last, qty: .size })' raw.json

# Adapt CSV / JSON / YAML on the way in; output is always PXF:
pxf query '.endpoints[0].path' config.yaml
# Count query:
value = 2

# pxf_proto constructor — emits a typed PXF document:
@type trades.v1.Trade
symbol = "AAPL"
price  = 188.42
qty    = 100

# YAML field extraction:
value = "/api/v1/users"
// Self-describing PXF: the @proto header lets readers parse the
// @dataset rows without a separate .proto file on disk.
@proto trades.v1.Trade {
  string symbol = 1;
  double price  = 2;
  int64  qty    = 3;
}
@dataset trades.v1.Trade ( symbol, price, qty )
( "AAPL", 188.42, 100 )
( "MSFT", 415.10,  50 )
( "AAPL", 188.55,  75 )
{
  "ticker": "AAPL",
  "last":   188.42,
  "size":   100,
  "venue":  "NASDAQ",
  "flags":  ["odd_lot"]
}
endpoints:
  - path: /api/v1/users
    method: GET
    timeout_ms: 5000
  - path: /api/v1/orders
    method: POST
    timeout_ms: 10000
  - path: /health
    method: GET
    timeout_ms: 1000

Encode & decode

pxf encode writes protobuf binary; pxf decode reads it back as PXF. Round-trip preserves every field, including the three-state presence channel.

Validate & format

pxf validate checks a document against its schema before publishing. pxf fmt applies the canonical writer — idempotent, diff-friendly, no stylistic bikeshedding.

Lint schemas

pxf lint walks every message field, oneof, and enum value in your .proto sources and rejects names that collide with PXF value keywords (null, true, false).

Infer a schema from CSV

pxf infer-schema walks a CSV (or @dataset) sample and emits a .proto with per-column types. Fail-fast on contradictions or --full-scan to collect every type mismatch in one pass.

Canonical schemas (pxf/*, sbe/*, envelope/v1/*) are bundled into every pxf binary. Resolution chain: bundled → in-document @proto-p schema.protoprotoregistry via -s/-n/--schema. Later sources shadow earlier ones, so a self-describing PXF file Just Works, and an explicit -p still wins when you need it.

Editors

VS Code and JetBrains IDEs (IntelliJ, GoLand, PyCharm, RustRover, …) share the same TextMate grammar so syntax highlighting is identical across editors. The JetBrains plugin embeds protowire-java's parser and the VS Code extension embeds protowire-typescript's, both surfacing parse errors inline with the same diagnostics the canonical parser emits.

Neither extension is published to a marketplace yet — install from the prebuilt artifacts in the repo. The raw pxf.tmbundle directory is also still available for TextMate / Sublime Text users. Schema-aware validation in the editor (field / type checking against a descriptor set) is on the roadmap.

Performance

Built for the hot path.

ProtoWire is engineered for speed at every level of the stack. PXF reads about twice as fast as JSON and roughly five times as fast as YAML on the same payload, and the binary tier is a different conversation entirely. The numbers below are from a single representative Config message (one TLS block, three endpoints, four labels, and a timestamp + duration) on the Go reference implementation; every port runs the same harness against the same canonical testdata.

Encoded size

Lower is better. pb is the baseline.

Format Bytes vs pb
pb (protobuf wire) binary 301   1.00×
JSON 606   2.01×
YAML 606   2.01×
PXF 662   2.20×

Hover any format name to peek at the actual payload. PXF carries a small premium over JSON because it spells out enum names, type directives, and timestamps as text. That cost buys you a file you can review, comment, and edit by hand.

Encode and decode time

Lower is better. Per-message, single thread.

FormatEncodeDecodeAllocs (encode / decode)
PXF4.5 µs6.1 µs14 / 80
pb6.5 µs7.4 µs49 / 115
JSON8.2 µs10.9 µs75 / 151
YAML32.0 µs35.5 µs192 / 460

PXF beats JSON by roughly 1.8× on both encode and decode, and YAML by an order of magnitude. Allocations matter at scale: PXF cuts encode allocations by 5× against JSON and 13× against YAML.

Same harness, every port

The cross-port harness (scripts/cross_pxf_bench.sh and cross_sbe_bench.sh) decodes the same canonical fixtures into descriptor-driven dynamic messages, so the comparison reflects codec dispatch rather than generated-message ergonomics. C++ leads on PXF and SBE; Go and Rust trail by a small constant; the JVM and TypeScript ports are within a 2× band on encode.

Port PXF unmarshal PXF marshal SBE unmarshal SBE marshal
C++ 3.78 µs 3.13 µs 382 ns 236 ns
Go 5.52 µs 3.48 µs 1.05 µs 378 ns
Rust 5.88 µs 5.26 µs 576 ns 432 ns
Java 9.32 µs 3.32 µs 865 ns 276 ns
TypeScript 11.78 µs 4.65 µs 1.59 µs 946 ns

PXF fixture: 624-byte bench.v1.Config (mixed scalars, lists, maps, Timestamp, Duration). SBE fixture: 94-byte bench.v1.Order (10 scalars + a 2-entry repeating group). Apple M1, 3-second window per op. Numbers are not yet collected for Python, C#, Swift, and Dart.

Sub-microsecond reads with the SBE tier

When latency is the budget, SBE drops the parser entirely. The Order payload encodes in 378 ns in Go and 236 ns in C++; a typed View reads fields with zero allocations. Same data, same schema, traded for read-side cost.

SBE is wire-compatible with FIX SBE 1.0 and shares the same .proto definitions as the rest of ProtoWire. Switching tiers is a flag, not a rewrite.

Cross-format numbers (top tables) are from the Go reference implementation; cross-port numbers are aggregated from scripts/cross_pxf_bench.sh and cross_sbe_bench.sh. All measurements on Apple M1, single core, ProtoWire v1.0.0 (the frozen wire format, post-hardening with strict UTF-8 validation and MaxNestingDepth=100 enforced on decode). Allocation counts include the dynamic-message scaffolding used for parity across encodings; production deployments with generated code will see further improvements on the binary tiers.

Beyond the codecs

protoregistry: where your schemas live.

The codecs read and write bytes; protoregistry stores and serves the .proto files those bytes are typed against. It is a multi-namespace schema registry with versioning, two-phase staging, and backward-compatibility enforcement, built for the same stack as the rest of ProtoWire.

Multi-namespace by design

Each namespace is a self-contained scope. Imports resolve only inside the namespace, so two teams shipping different versions of common.proto never collide.

Two-phase publish & promote

Publish compiles and stages a candidate set; promote atomically swaps every staged version to current, so coordinated multi-schema changes land together.

Compatibility enforced at promote time

Field deletion, type changes, cardinality changes are rejected before traffic sees them. Rollback is a pointer move with zero data duplication.

Hot-swap, lock-free

Readers grab compiled descriptors via atomic.Pointer. Schema swaps are instant; no draining, no restarts.

gRPC + CLI + Go SDK

The gRPC service in proto/protoregistry/v1/ is the durable integration point. A protoregistry CLI runs the server and manages namespaces; the Go client exposes a remote-backed protoreflect.MessageTypeResolver.

Custom built-in types

Extend Google's well-known types with your own shared protos via the reserved __builtins__ namespace. Shadowing real WKTs is rejected unless you opt in.

Stability

v1.0.0 froze the wire format across every port.

ProtoWire's wire format is frozen at v1.0.0; backwards-compatible changes only across the v1.x line. Every published language port shipped v1.0.0 in lockstep against the same canonical testdata. Read the full contract in STABILITY.md, or jump straight to the EBNF and railroad diagrams.