Getting Started — Distributed Mode

Distributed Mode is production grade, horizontally scalable and fault tolerant. Each functional component runs as a separate node or container. Nodes communicate over Gossip, gRPC and Arrow Flight. You scale roles independently, apply health checks and use service discovery.

What Distributed Mode is

OpenWit becomes a self-orchestrating cluster of specialized nodes. The Control Node is the brain that coordinates, monitors and routes. Each node reads the same image and the same openwit-unified-control.yaml, then decides its role at startup with a --role flag or an environment variable.

How it compares to Standalone Mode

Core Principles

Horizontal scalability: Scale each role independently to match workload.

  • Add Ingest nodes for higher ingest rates
  • Add Storage nodes for WAL throughput and upload bandwidth
  • Add Search nodes for query concurrency

Service specialization: Each role has a focused purpose.

  • Control manages cluster health and routing
  • Ingest receives data and writes WAL
  • Storage writes Parquet and uploads to object storage
  • Indexer builds and uploads index files
  • Search executes SQL with DataFusion and integrates Tantivy for text search
  • Proxy routes client queries
  • Cache keeps hot data in RAM or on disk
  • Janitor prunes expired data and reconciles metadata

Resilient communication fabric

  • Gossip via Chitchat for lightweight health and metadata
  • gRPC for control, node commands and metadata queries
  • Arrow Flight for bulk transfer of Arrow RecordBatches

Centralized truth with decentralized execution

  • Postgres is the single source of truth for metadata such as WAL, Parquet, indexes, file versions and time ranges
  • Nodes keep local caches and state but reconcile with the catalog

Durability through dual WALs

  • Short-term WAL provides immediate durability and safe offset commit
  • Long-term WAL aggregates by time for restore, reindexing and compaction

Unified configuration and orchestration

  • All nodes ship from the same image
  • openwit-unified-control.yaml drives behavior and role selection

High-level flow in the cluster

  1. Ingestion: Clients send logs, metrics and traces via Kafka, gRPC or HTTP. The gateway authenticates, validates schema, normalizes fields and batches events 2.WAL and dedup: Ingest deduplicates within a time window. It writes the short-term WAL, records metadata, and aggregates to long-term WAL for heavy tasks
  2. Bulk transfer: Ingest converts to Arrow and streams batches to Storage over Arrow Flight
  3. Materialization: Storage appends to active Parquet, rolls to stable Parquet at target size, uploads to object storage, then records file and time range in Postgres
  4. Indexing: Indexer builds bitmap, bloom or loom, zonemap and Tantivy indexes for each file, uploads them, and updates metadata
  5. Query: Proxy routes a query to Search. Search uses metadata and indexes to prune file lists, fetches index files first, then downloads only required Parquet from cache or cloud, and executes with DataFusion
  6. Caching and retention: Cache holds hot data in RAM or on disk. Janitor enforces TTL and reconciles catalog with object storage

Planning Tips

  • Co-locate Cache near Search for locality and lower latency
  • Place Storage where object storage egress is cheap and bandwidth is high
  • Scale roles by SLO. Watch ingest queue depth, WAL backlog, indexing lag, CPU and query latency percentiles