Getting Started — Distributed Mode
Distributed Mode is production grade, horizontally scalable and fault tolerant. Each functional component runs as a separate node or container. Nodes communicate over Gossip, gRPC and Arrow Flight. You scale roles independently, apply health checks and use service discovery.
What Distributed Mode is
OpenWit becomes a self-orchestrating cluster of specialized nodes. The Control Node is the brain that coordinates, monitors and routes. Each node reads the same image and the same openwit-unified-control.yaml, then decides its role at startup with a --role flag or an environment variable.
How it compares to Standalone Mode
Core Principles
Horizontal scalability: Scale each role independently to match workload.
- Add Ingest nodes for higher ingest rates
- Add Storage nodes for WAL throughput and upload bandwidth
- Add Search nodes for query concurrency
Service specialization: Each role has a focused purpose.
- Control manages cluster health and routing
- Ingest receives data and writes WAL
- Storage writes Parquet and uploads to object storage
- Indexer builds and uploads index files
- Search executes SQL with DataFusion and integrates Tantivy for text search
- Proxy routes client queries
- Cache keeps hot data in RAM or on disk
- Janitor prunes expired data and reconciles metadata
Resilient communication fabric
- Gossip via Chitchat for lightweight health and metadata
- gRPC for control, node commands and metadata queries
- Arrow Flight for bulk transfer of Arrow RecordBatches
Centralized truth with decentralized execution
- Postgres is the single source of truth for metadata such as WAL, Parquet, indexes, file versions and time ranges
- Nodes keep local caches and state but reconcile with the catalog
Durability through dual WALs
- Short-term WAL provides immediate durability and safe offset commit
- Long-term WAL aggregates by time for restore, reindexing and compaction
Unified configuration and orchestration
- All nodes ship from the same image
openwit-unified-control.yamldrives behavior and role selection
High-level flow in the cluster
- Ingestion: Clients send logs, metrics and traces via Kafka, gRPC or HTTP. The gateway authenticates, validates schema, normalizes fields and batches events 2.WAL and dedup: Ingest deduplicates within a time window. It writes the short-term WAL, records metadata, and aggregates to long-term WAL for heavy tasks
- Bulk transfer: Ingest converts to Arrow and streams batches to Storage over Arrow Flight
- Materialization: Storage appends to active Parquet, rolls to stable Parquet at target size, uploads to object storage, then records file and time range in Postgres
- Indexing: Indexer builds bitmap, bloom or loom, zonemap and Tantivy indexes for each file, uploads them, and updates metadata
- Query: Proxy routes a query to Search. Search uses metadata and indexes to prune file lists, fetches index files first, then downloads only required Parquet from cache or cloud, and executes with DataFusion
- Caching and retention: Cache holds hot data in RAM or on disk. Janitor enforces TTL and reconciles catalog with object storage
Planning Tips
- Co-locate Cache near Search for locality and lower latency
- Place Storage where object storage egress is cheap and bandwidth is high
- Scale roles by SLO. Watch ingest queue depth, WAL backlog, indexing lag, CPU and query latency percentiles