We Parse 33 Industrial Protocols in a Wasm Sandbox — Here's How

Every industrial system speaks a different binary dialect. Modbus encodes register values in big-endian 16-bit words. BACnet/MSTP wraps floating-point sensor data in ASHRAE-standardized PDUs. PROFINET uses cyclic I/O frames at 31.25μs intervals. SWIFT MT103 wire transfers pack routing data in tagged blocks of ASCII text.

If you're building data infrastructure for industrial IoT, fintech, or supply chain — you've already discovered the painful truth: there is no universal parser. You write a new one for every protocol. Every integration starts from scratch.

We built INGELT to fix this. One API call parses any of 33 industrial, financial, and enterprise protocols into clean JSON. Sub-100μs latency. Crash-proof. This is how we did it.

Why Wasm? Why not just… Rust binaries?

The obvious approach is to compile each protocol parser as a native Rust library and link them into a single server process. We tried this first. It doesn't scale — for two reasons:

Blast radius. A panic in one parser (say, a malformed PROFINET frame with an unexpected length field) brings down the entire process. Every tenant. Every protocol. At once.
Hot deployment. Adding or updating a single driver requires recompiling the entire server, re-deploying, and restarting all connections. For an API that customers depend on, this is unacceptable downtime.

WebAssembly solves both. Each driver compiles to an isolated .wasm module with its own linear memory, its own stack, and strict resource limits enforced by the host runtime. A panic in the Modbus driver traps — and returns an error to that one request. The SWIFT MT parser keeps humming.

Key insight: Wasm gives us process-level isolation at function-call cost. No fork(). No container spin-up. Just a memory-bounds-checked function call that completes in microseconds.

The ABI: Raw Linear Memory

We deliberately chose a raw linear memory ABI over higher-level bindings like wit-bindgen or wasm-bindgen. The interface between host and guest is exactly four functions:

Export	Signature	Purpose
`alloc`	`(i32) → i32`	Guest allocates len bytes, returns pointer
`parse`	`(i32, i32) → i64`	Guest parses payload at (ptr, len)
`dealloc`	`(i32, i32) → ()`	Host tells guest to free (ptr, len)

The parse function returns a packed i64: the upper 32 bits hold the result length, the lower 32 bits hold the pointer to the JSON output in linear memory. The host reads the JSON directly from guest memory — zero copy on the serialization path.

// Host-side: call guest parse, read result from linear memory
let packed = instance.call_parse(ptr, len)?;
let result_len = (packed >> 32) as usize;
let result_ptr = (packed & 0xFFFFFFFF) as usize;
let json_bytes = memory.data(&store)[result_ptr..result_ptr + result_len];rust

This design means every driver developer writes one function — parse — that takes raw bytes and returns serialized JSON. No framework overhead. No trait bloat. Just bytes in, JSON out.

The Guest SDK

To reduce boilerplate, we provide ingelt-guest-sdk — a tiny Rust crate that handles the low-level ABI mechanics:

// Safe deallocation — no double-free hazard
pub fn guest_dealloc(ptr: *mut u8, len: usize) {
    unsafe {
        // Length 0, capacity = len → Drop frees the backing
        // allocation without reading any elements
        Vec::from_raw_parts(ptr, 0, len);
    }
}

// Result packing macro
macro_rules! pack_result {
    ($json:expr) => {{
        let j = serde_json::to_vec(&$json).unwrap();
        let len = j.len();
        let ptr = Box::leak(j.into_boxed_slice()).as_ptr();
        ((len as i64) << 32) | (ptr as usize as i64)
    }}
}rust

The guest_dealloc function deserves attention. The naive approach — Vec::from_raw_parts(ptr, len, len) — creates a Vec that thinks it owns len initialized elements. If any byte happens to implement Drop (it doesn't for u8, but the principle matters), this would attempt to drop uninitialized memory. Our version sets length to 0, capacity to len. The Vec's Drop implementation frees the backing allocation without touching the contents.

8 Layers of Resource Containment

Running untrusted binary parsers on production infrastructure requires defense in depth. We implement eight layers of resource containment:

Layer	Mechanism	Limit
L1: Per-Instance Memory	Wasmtime `memory_size` limit	64 MB
L2: Per-Instance Compute	Fuel metering	10M instructions
L3: Per-Instance Timeout	`tokio::time::timeout`	5 seconds
L4: Per-Instance Lifecycle	Pool memory ceiling	4 MB recycle threshold
L5: Global Memory	AtomicUsize guard	1.5 GB total Wasm memory
L6: Per-Tenant Concurrency	Semaphore	8 concurrent requests
L7: Per-Tenant Rate	Token bucket	5,000 req/min
L8: Per-IP Rate	Token bucket	30 req/min (unauth)

This means a malicious or buggy driver cannot: allocate more than 64MB, run for more than 5 seconds, exhaust the host's memory, or block other tenants. If any layer trips, the request fails gracefully with a structured error — the server stays up.

The Driver Catalog: 33 Protocols and Counting

Each driver ships as a pre-compiled .cwasm file — Wasmtime's ahead-of-time compiled format. Cold start from pre-compiled: under 200ms. Warm path (pooled instance): sub-100μs per parse.

Category	Protocols
Industrial / SCADA	Modbus, BACnet, S7-COMM, PROFINET, EtherCAT, DNP3, EIP/CIP, HART, IEC 61850
Financial / EDI	X12 EDI, EDIFACT, FIX, SWIFT MT, NACHA ACH, ISO 8583
IoT / Telemetry	LoRaWAN, CAN bus (J1939), NMEA 2000, Wiegand, ARINC 429
Enterprise / Data	CSV, TSV, XML, Multi-Schema flat files, HL7v2, DICOM
Laboratory / Specialty	SECS/GEM, GPIB, Bioreactor serial, Goodwe inverter, CANopen

But driver coverage isn't the hard part — driver quality is. Each driver includes:

Golden fixture tests (real-world captured frames, byte-for-byte)
ABI round-trip tests (alloc → parse → verify JSON → dealloc)
Edge case coverage (incomplete frames, zero-length input, malformed headers)
Kaitai Struct schema compilation where applicable (automated .ksy → Rust parsers)

Streaming: Breaking the 64MB Wall

The standard parse ABI has a hard ceiling: the 64MB per-instance memory limit. For parsing a 500MB CSV datalake file or a multi-gigabyte EDI batch, we need a different approach.

The streaming ABI introduces a three-phase lifecycle:

// Phase 1: Initialize parser state
let handle = instance.call_parse_init()?;

// Phase 2: Feed chunks (1MB at a time)
for chunk in stream.chunks(1_048_576) {
    instance.call_parse_chunk(chunk_ptr, chunk_len)?;
}

// Phase 3: Finalize and retrieve result
let packed = instance.call_parse_finalize()?;rust

The guest's streaming parser maintains a small state machine — accumulating summary statistics, schema information, and row counts — without ever holding the full dataset in memory. This means a 500MB CSV file is processed with a constant ~2MB memory footprint.

What We Ship

INGELT is a single API endpoint. You send raw protocol bytes, you get structured JSON back.

curl -X POST \
  -H "Authorization: Bearer ig_live_..." \
  --data-binary @modbus_frame.bin \
  https://ingelt.com/v1/ingest/modbus-sunspecbash

Response:

{
  "protocol": "modbus",
  "status": "ok",
  "device_address": 1,
  "function_code": 3,
  "ac_power_watts": 4850.0,
  "dc_voltage": 380.5,
  "energy_lifetime_kwh": 128456.7,
  "bytes_read": 35
}json

$1/month. $0.05/MB ingested. First 100 MB free. No vendor lock-in — you own your data.

Try INGELT

Parse any of 33 industrial protocols with one API call. $1/mo, first 100 MB free.

Get API Key →