Performance Optimization

This page explains how OpenWit achieves speed by design and how the core choices translate into low latency and high throughput in real workloads.

Performance in OpenWit comes from the architecture, not ad-hoc tuning. The system moves columnar data end to end, processes batches instead of single rows, and relies on lightweight async actors for concurrency. The result is fewer copies, fewer syscalls and predictable latency.

1. Zero-copy data movement

OpenWit passes data across components as Arrow RecordBatches. Arrow preserves the same memory layout across process boundaries, which avoids serialization and reduces CPU overhead when data crosses the wire or the actor boundary. This is a primary reason ingest to storage paths stay fast as volume grows.

Batch-oriented I/O

The ingestion path is designed for batches, not per-row inserts. Gateways group events by a configured size or time threshold, then send a single bulk protobuf to the Ingest node over gRPC. The default batch target is called out as 8 MB or 10k records which amortizes syscall and disk overhead and increases throughput.

Why batches matter in this design

Fewer network round trips between Gateway and Ingest
Fewer fsync boundaries before short-term WAL acknowledgment
Larger Arrow RecordBatches for more efficient Parquet writes and compression

These outcomes follow directly from batching before the cluster accepts work.

Async actor system on Rust

Every subsystem runs as async actors on Rust’s Tokio runtime. Actors communicate over non-blocking channels which keeps cores busy without expensive context switches. Combined with Rust’s ownership model and Arrow buffers, this avoids GC pauses and fragmentation and leads to predictable latency at high concurrency.

Columnar compression and layout

OpenWit writes Parquet with Arrow semantics so storage is columnar. The document calls out dictionary encoding, run-length encoding and delta encoding which improve compression ratios on telemetry data and reduce bytes scanned during queries. This is the reason stable Parquet files remain small and fast to scan even when retention is long.

Tiered caching for read performance

Hot and warm data lives close to the Search node. The tiers are named Electro for RAM, Pyro for SSD and Cryo for the object store. The intent is simple: serve the hottest data from memory, keep popular Parquet on disk and fall back to cloud only when needed. The expected access times are called out as nanoseconds for Electro, microseconds for Pyro and seconds for Cryo.

Cache tiers at a glance | Tier | Medium | Typical access | Used for | | ------ | ----------- | ----------- | | Electro | RAM | ns | Arrow batches for hottest windows | | Pyro | SSD | µs | Frequently scanned Parquet | | Cryo | Object | s | Durable cold data and overflow |

Smart pruning before scan

Search prunes aggressively before reading Parquet. It narrows by time range using metadata from Postgres, applies index ranges using zonemaps and bloom or loom filters and lets the SQL planner in DataFusion optimize predicates from the query AST. Only matching files are fetched which keeps I/O minimal and gives sub-second response for common queries.

Dual WALs for steady ingest and quick recovery

Short-term WAL accepts the batch and guarantees durability before acknowledgment. Long-term WAL aggregates by time, which helps recovery and background work. This protects throughput during failures and eliminates re-ingest work after a crash since the pipeline can replay from WAL and continue with Arrow conversion and Parquet writes.

Storage path that keeps writes large and sequential

The Storage node merges incoming Arrow RecordBatches into an active Parquet file, rolls it to a stable Parquet when it reaches the configured size and uploads through OpenDAL. Large, sequential Parquet writes are friendlier to disk and cloud object stores which helps both cost and speed. Metadata is updated so downstream components can discover files without scanning directories.

Indexes that reduce work

The Indexer builds artifacts per Parquet file. The document lists bitmap for equality filters, bloom or loom for membership tests, zonemap for numeric and time pruning and Tantivy for full-text search. Each index is uploaded and recorded in Postgres so the Search node can skip files that will not match. This reduces how much Parquet is read during a query.

Operator knobs that influence performance

OpenWit is config-driven from one YAML. The document states that batching, deduplication thresholds, WAL behavior, Parquet size, indexing mode and cache tiers are controlled from openwit-unified-control.yaml. Keep these aligned with your workload so the pipeline sees fewer small units and more large, compressible batches.

Examples from the document

Batch size and time window at the Gateway so gRPC handoffs remain large and predictable
Short-term and long-term WAL thresholds so durability and aggregation match ingest rate
Active to stable Parquet size so uploaded files are right sized for the object store
Index types per dataset so pruning matches query patterns
Cache tier sizing so hot working sets fit in Electro or Pyro

These are the levers described in your material and they map directly to throughput and latency.

Metrics that reflect performance

Each node exports Prometheus-compatible metrics. Watching these together gives a clear picture of throughput, backlog and query speed.

Area	Metric name (example)	Indicates
Ingest	`openwit_ingest_batches_total`	Incoming batch volume
Ingest	`wal_write_latency_ms`	Durability cost per batch
Storage	`openwit_storage_upload_latency_ms`	Time to upload stable Parquet
Storage	`active_file_size_bytes`	Rolling size toward stable threshold
Indexer	`openwit_index_build_duration_seconds`	Build time per index artifact
Search	`openwit_query_latency_ms`	End to end query latency
Search	`cache_hit_ratio`	Effectiveness of Electro and Pyro
Control	`openwit_nodes_healthy_total`	EAvailable capacity to route to

Quick reference table

Optimization	Where it applies	How it helps
Zero-copy Arrow batches	Ingest, Storage	Avoids serialization and extra copies
Batch-oriented handoff (8 MB or 10k)	Gateway, Ingest	Fewer syscalls and fewer network calls
Async actors on Tokio	All nodes	Concurrency without blocking
Columnar compression in Parquet	Storage	Smaller files and faster scans
Tiered cache: Electro, Pyro, Cryo	Search	Serve hot data from RAM or SSD
Pruning by time, index and predicates	Search	Skip irrelevant files and reduce I/O