Encoding Message Formats
Format strings bridge the gap between abstract Tamarin terms and concrete byte-level message formats. They enable unified models to work with real protocol implementations.
The abstraction gap
Section titled “The abstraction gap”When verifying protocols with Tamarin, you work with symbolic terms:
// Abstract: a tuple of symbolic valueshandshake(sender_index, ephemeral_key, encrypted_payload)But real protocol implementations work with bytes:
| Type (1) | Reserved (3) | Sender (4) | Ephemeral Key (32) | Payload (48) |Format strings bridge this gap by defining exactly how symbolic terms map to byte sequences.
How format strings work
Section titled “How format strings work”Format strings are defined as macros inside #ifdef SPECMON blocks, making them available only during monitoring while verification uses abstract representations:
#ifdef SPECMONmacros: // Monitoring: concrete byte layout handshake(sender, eph_key, payload) = cat(byte('0x01', '1'), // Message type: 1 byte byte('0x000000', '3'), // Reserved: 3 bytes byte(sender, '4'), // Sender index: 4 bytes byte(eph_key, '32'), // Ephemeral key: 32 bytes byte(payload, '48')) // Encrypted payload: 48 bytes#elsemacros: // Verification: abstract tuple handshake(sender, eph_key, payload) = <sender, eph_key, payload>#endifDuring verification, Tamarin sees handshake(x, y, z) as an abstract tuple <x, y, z> and reasons symbolically about its contents.
During monitoring, SpecMon uses the format string to:
- Parse incoming bytes into symbolic terms (pattern matching)
- Construct outgoing bytes from symbolic terms (message building)
Building format strings
Section titled “Building format strings”Format strings are built using three core functions combined with cat:
| Function | Purpose | Example |
|---|---|---|
byte(data, len) | Raw bytes | byte(key, '32') — 32-byte key |
int(data, len) | Integer (little-endian) | int(counter, '4') — 4-byte integer |
string(data, len) | UTF-8 string | string(name, '16') — 16-char string |
cat(...) | Concatenate fields | cat(byte(...), int(...)) |
The length parameter is required for parsing (to know field boundaries) but optional when only constructing messages.
Fixed-length fields
Section titled “Fixed-length fields”Most protocol fields have fixed sizes:
macros: header(version, flags, length) = cat(byte(version, '1'), // 1 byte byte(flags, '1'), // 1 byte int(length, '2')) // 2 bytes (little-endian)Variable-length fields
Section titled “Variable-length fields”The last field in a format string can omit the length to consume remaining bytes:
macros: message(header, payload) = cat(byte(header, '8'), // Fixed 8-byte header byte(payload)) // Variable: rest of messageNested format strings
Section titled “Nested format strings”Format strings can reference other format strings for complex structures:
macros: header(type, len) = cat(byte(type, '1'), int(len, '2')),
packet(type, len, data) = cat(header(type, len), // Reuse header format byte(data)) // PayloadPattern matching and parsing
Section titled “Pattern matching and parsing”When SpecMon receives bytes, it uses format strings to extract symbolic values. Given the format:
macros: msg(type, id, payload) = cat(byte(type, '1'), int(id, '4'), byte(payload))And incoming bytes 0x02 | 0x01000000 | 0xAABBCC...:
- First byte
0x02binds totype - Next 4 bytes
0x01000000bind toid(interpreted as integer 1) - Remaining bytes bind to
payload
This allows rules to match on symbolic values:
rule Process_Request: [ In(msg('0x01', id, payload)) ] // Match type = 0x01 --> [ Request(id, payload) ]Real-world example: WireGuard handshake
Section titled “Real-world example: WireGuard handshake”The WireGuard protocol defines a handshake initiation message with this structure:
| Field | Size | Description |
|---|---|---|
| Type | 1 | Message type (0x01) |
| Reserved | 3 | Zero padding |
| Sender Index | 4 | Sender’s session ID |
| Ephemeral | 32 | Ephemeral public key |
| Static | 48 | Encrypted static key |
| Timestamp | 28 | Encrypted timestamp |
| MAC1 | 16 | First MAC |
| MAC2 | 16 | Second MAC |
The format string captures this exactly:
#ifdef SPECMONmacros: handshake_init(sender, eph, static, ts, mac1, mac2) = cat(byte('0x01', '1'), // Type byte('0x000000', '3'), // Reserved byte(sender, '4'), // Sender Index byte(eph, '32'), // Ephemeral Key byte(static, '48'), // Encrypted Static byte(ts, '28'), // Encrypted Timestamp byte(mac1, '16'), // MAC1 byte(mac2, '16')) // MAC2#endifThis ensures:
- Verification: Reasons about abstract
handshake_init(...)terms - Monitoring: Parses real WireGuard packets with exact byte offsets
Best practices
Section titled “Best practices”Match the wire format exactly
Section titled “Match the wire format exactly”Format strings must precisely match the protocol’s byte layout. Misalignment causes parsing failures or incorrect matches.
// ❌ Wrong: missing reserved byteshandshake(sender, key) = cat(byte(sender, '4'), byte(key, '32'))
// ✅ Correct: includes all fieldshandshake(sender, key) = cat(byte('0x01', '1'), // Don't forget type byte byte('0x000000', '3'), // Don't forget padding byte(sender, '4'), byte(key, '32'))Use constants for fixed values
Section titled “Use constants for fixed values”Protocol-defined constants (magic bytes, version numbers) should be literals:
macros: request(data) = cat(byte('0x01', '1'), byte(data)) // Type 0x01 response(data) = cat(byte('0x02', '1'), byte(data)) // Type 0x02Handle endianness explicitly
Section titled “Handle endianness explicitly”SpecMon uses little-endian by default. Use reverse() for big-endian fields:
macros: // Big-endian 4-byte length field header(length, data) = cat(byte(reverse(length), '4'), byte(data))The CCS paper describes format strings with big-endian integers by default. The current implementation uses little-endian, so apply reverse() when you need big-endian fields.
Document field layouts
Section titled “Document field layouts”Complex format strings benefit from inline comments:
macros: tls_record(type, version, length, fragment) = cat(byte(type, '1'), // ContentType: 1 byte int(version, '2'), // ProtocolVersion: 2 bytes int(length, '2'), // Length: 2 bytes (max 16384) byte(fragment)) // Fragment: variableRelationship to built-in functions
Section titled “Relationship to built-in functions”Format strings use the same functions documented in Built-in Functions:
cat(),byte(),int(),string()— for building format stringsslice()— for extracting parts of parsed datareverse()— for endianness conversion
Next steps
Section titled “Next steps”- Creating Unified Models — Use format strings in unified specifications
- Event Format — How events are structured