Performance Optimization
This page explains how OpenWit achieves speed by design and how the core choices translate into low latency and high throughput in real workloads.
Performance in OpenWit comes from the architecture, not ad-hoc tuning. The system moves columnar data end to end, processes batches instead of single rows, and relies on lightweight async actors for concurrency. The result is fewer copies, fewer syscalls and predictable latency.
1. Zero-copy data movement
OpenWit passes data across components as Arrow RecordBatches. Arrow preserves the same memory layout across process boundaries, which avoids serialization and reduces CPU overhead when data crosses the wire or the actor boundary. This is a primary reason ingest to storage paths stay fast as volume grows.
Batch-oriented I/O
The ingestion path is designed for batches, not per-row inserts. Gateways group events by a configured size or time threshold, then send a single bulk protobuf to the Ingest node over gRPC. The default batch target is called out as 8 MB or 10k records which amortizes syscall and disk overhead and increases throughput.
Why batches matter in this design
- Fewer network round trips between Gateway and Ingest
- Fewer fsync boundaries before short-term WAL acknowledgment
- Larger Arrow RecordBatches for more efficient Parquet writes and compression
These outcomes follow directly from batching before the cluster accepts work.
Async actor system on Rust
Every subsystem runs as async actors on Rust’s Tokio runtime. Actors communicate over non-blocking channels which keeps cores busy without expensive context switches. Combined with Rust’s ownership model and Arrow buffers, this avoids GC pauses and fragmentation and leads to predictable latency at high concurrency.
Columnar compression and layout
OpenWit writes Parquet with Arrow semantics so storage is columnar. The document calls out dictionary encoding, run-length encoding and delta encoding which improve compression ratios on telemetry data and reduce bytes scanned during queries. This is the reason stable Parquet files remain small and fast to scan even when retention is long.
Tiered caching for read performance
Hot and warm data lives close to the Search node. The tiers are named Electro for RAM, Pyro for SSD and Cryo for the object store. The intent is simple: serve the hottest data from memory, keep popular Parquet on disk and fall back to cloud only when needed. The expected access times are called out as nanoseconds for Electro, microseconds for Pyro and seconds for Cryo.
Cache tiers at a glance | Tier | Medium | Typical access | Used for | | ------ | ----------- | ----------- | | Electro | RAM | ns | Arrow batches for hottest windows | | Pyro | SSD | µs | Frequently scanned Parquet | | Cryo | Object | s | Durable cold data and overflow |
Smart pruning before scan
Search prunes aggressively before reading Parquet. It narrows by time range using metadata from Postgres, applies index ranges using zonemaps and bloom or loom filters and lets the SQL planner in DataFusion optimize predicates from the query AST. Only matching files are fetched which keeps I/O minimal and gives sub-second response for common queries.
Dual WALs for steady ingest and quick recovery
Short-term WAL accepts the batch and guarantees durability before acknowledgment. Long-term WAL aggregates by time, which helps recovery and background work. This protects throughput during failures and eliminates re-ingest work after a crash since the pipeline can replay from WAL and continue with Arrow conversion and Parquet writes.
Storage path that keeps writes large and sequential
The Storage node merges incoming Arrow RecordBatches into an active Parquet file, rolls it to a stable Parquet when it reaches the configured size and uploads through OpenDAL. Large, sequential Parquet writes are friendlier to disk and cloud object stores which helps both cost and speed. Metadata is updated so downstream components can discover files without scanning directories.
Indexes that reduce work
The Indexer builds artifacts per Parquet file. The document lists bitmap for equality filters, bloom or loom for membership tests, zonemap for numeric and time pruning and Tantivy for full-text search. Each index is uploaded and recorded in Postgres so the Search node can skip files that will not match. This reduces how much Parquet is read during a query.
Operator knobs that influence performance
OpenWit is config-driven from one YAML. The document states that batching, deduplication thresholds, WAL behavior, Parquet size, indexing mode and cache tiers are controlled from openwit-unified-control.yaml. Keep these aligned with your workload so the pipeline sees fewer small units and more large, compressible batches.
Examples from the document
- Batch size and time window at the Gateway so gRPC handoffs remain large and predictable
- Short-term and long-term WAL thresholds so durability and aggregation match ingest rate
- Active to stable Parquet size so uploaded files are right sized for the object store
- Index types per dataset so pruning matches query patterns
- Cache tier sizing so hot working sets fit in Electro or Pyro
These are the levers described in your material and they map directly to throughput and latency.
Metrics that reflect performance
Each node exports Prometheus-compatible metrics. Watching these together gives a clear picture of throughput, backlog and query speed.
| Area | Metric name (example) | Indicates |
|---|---|---|
| Ingest | openwit_ingest_batches_total | Incoming batch volume |
| Ingest | wal_write_latency_ms | Durability cost per batch |
| Storage | openwit_storage_upload_latency_ms | Time to upload stable Parquet |
| Storage | active_file_size_bytes | Rolling size toward stable threshold |
| Indexer | openwit_index_build_duration_seconds | Build time per index artifact |
| Search | openwit_query_latency_ms | End to end query latency |
| Search | cache_hit_ratio | Effectiveness of Electro and Pyro |
| Control | openwit_nodes_healthy_total | EAvailable capacity to route to |
Quick reference table
| Optimization | Where it applies | How it helps |
|---|---|---|
| Zero-copy Arrow batches | Ingest, Storage | Avoids serialization and extra copies |
| Batch-oriented handoff (8 MB or 10k) | Gateway, Ingest | Fewer syscalls and fewer network calls |
| Async actors on Tokio | All nodes | Concurrency without blocking |
| Columnar compression in Parquet | Storage | Smaller files and faster scans |
| Tiered cache: Electro, Pyro, Cryo | Search | Serve hot data from RAM or SSD |
| Pruning by time, index and predicates | Search | Skip irrelevant files and reduce I/O |