Getting Started — Distributed Mode

Distributed Mode is production grade, horizontally scalable and fault tolerant. Each functional component runs as a separate node or container. Nodes communicate over Gossip, gRPC and Arrow Flight. You scale roles independently, apply health checks and use service discovery.

What Distributed Mode is

OpenWit becomes a self-orchestrating cluster of specialized nodes. The Control Node is the brain that coordinates, monitors and routes. Each node reads the same image and the same openwit-unified-control.yaml, then decides its role at startup with a --role flag or an environment variable.

How it compares to Standalone Mode

Core Principles

Horizontal scalability: Scale each role independently to match workload.

Add Ingest nodes for higher ingest rates
Add Storage nodes for WAL throughput and upload bandwidth
Add Search nodes for query concurrency

Service specialization: Each role has a focused purpose.

Control manages cluster health and routing
Ingest receives data and writes WAL
Storage writes Parquet and uploads to object storage
Indexer builds and uploads index files
Search executes SQL with DataFusion and integrates Tantivy for text search
Proxy routes client queries
Cache keeps hot data in RAM or on disk
Janitor prunes expired data and reconciles metadata

Resilient communication fabric

Gossip via Chitchat for lightweight health and metadata
gRPC for control, node commands and metadata queries
Arrow Flight for bulk transfer of Arrow RecordBatches

Centralized truth with decentralized execution

Postgres is the single source of truth for metadata such as WAL, Parquet, indexes, file versions and time ranges
Nodes keep local caches and state but reconcile with the catalog

Durability through dual WALs

Short-term WAL provides immediate durability and safe offset commit
Long-term WAL aggregates by time for restore, reindexing and compaction

Unified configuration and orchestration

All nodes ship from the same image
openwit-unified-control.yaml drives behavior and role selection

High-level flow in the cluster

Ingestion: Clients send logs, metrics and traces via Kafka, gRPC or HTTP. The gateway authenticates, validates schema, normalizes fields and batches events 2.WAL and dedup: Ingest deduplicates within a time window. It writes the short-term WAL, records metadata, and aggregates to long-term WAL for heavy tasks
Bulk transfer: Ingest converts to Arrow and streams batches to Storage over Arrow Flight
Materialization: Storage appends to active Parquet, rolls to stable Parquet at target size, uploads to object storage, then records file and time range in Postgres
Indexing: Indexer builds bitmap, bloom or loom, zonemap and Tantivy indexes for each file, uploads them, and updates metadata
Query: Proxy routes a query to Search. Search uses metadata and indexes to prune file lists, fetches index files first, then downloads only required Parquet from cache or cloud, and executes with DataFusion
Caching and retention: Cache holds hot data in RAM or on disk. Janitor enforces TTL and reconciles catalog with object storage

Planning Tips

Co-locate Cache near Search for locality and lower latency
Place Storage where object storage egress is cheap and bandwidth is high
Scale roles by SLO. Watch ingest queue depth, WAL backlog, indexing lag, CPU and query latency percentiles