Executive Summary

The Query Pipeline is the read path that turns a client request into results. It accepts SQL over REST or gRPC at the Proxy, validates and routes the request, plans the query on the Search node, uses metadata and index artifacts to prune irrelevant data, fetches only the needed Parquet fragments from cache or cloud, executes with DataFusion locally or Ballista in distributed mode, then streams Arrow RecordBatches or JSON back to the client. The design targets low latency and low I/O by pruning early, fetching precisely, and running in parallel.

Scope, goals, and constraints

Scope covers the full read path: ingress at Proxy, planning, index-driven pruning, fetch through tiered caches or cloud, execution, and result streaming.

The goals are fast answers for time-bounded queries, minimal bytes scanned through index and metadata pruning, and a single path that serves both SQL analytics and full-text search.

Constraints to set expectations include many large files where lookup must be cheap, possible indexing lag before artifacts are available, and multi-tenant spikes that require fairness and limits; predictable performance is achieved with caching and autoscaling.

Routing and ingress

The Proxy authenticates, applies light validation, and forwards to the Search node. It can route by query type, for example, SQL versus full-text, and by cache warmness so hot requests land on warm nodes. The pipeline explicitly supports SQL over REST and gRPC as entry points.

High-level architecture

Inside the Search node the Planner parses SQL to AST, builds a logical plan, and applies rewrites like predicate and projection pushdown. A Metastore client queries Postgres for candidate files in the requested time window and returns file metadata with row-group boundaries.

The Index Loader obtains zone maps, bloom or bitmap artifacts, and Tantivy segments from cache or cloud. The Fetcher shapes reads into precise file-part requests and pulls only the required byte ranges through cache tiers or object storage. Execution runs on DataFusion or on Ballista when distributed compute is enabled.

Results are assembled and streamed back to the client. Conceptually: Client → Proxy → Search (Planner) → Metastore → Index Loader → Fetcher → Execution → Result.

End-to-end flow

  1. Ingress. Client sends SQL or a search expression to Proxy. Proxy authenticates, applies rate limits, and forwards to Search. Routing can prefer warm nodes.
  2. Plan build. Planner parses SQL to AST, builds a DataFusion logical plan, and runs rewrites such as predicate and projection pushdown. If no time filter is present, it applies a configurable default lookback so the system does not scan unbounded history.
  3. Candidate discovery. Metastore query to Postgres, indexed on time bounds, lists files that overlap the window and returns row-group boundaries so reads can be precise.
  4. Pruning cascade. The planner applies pruning in documented cost order: zone maps for coarse min-max elimination, then bloom or bitmap for equality or small IN, then Tantivy for text predicates. This sequence drops files and row groups before heavy reads.
  5. Read shaping. After pruning, the planner builds FilePartRequest entries with file URI, row-group indices, byte offsets, and the exact columns needed.
  6. Fetch via cache or cloud. The Fetcher checks tiered cache in order: Electro for Arrow in RAM, Pyro for local Parquet on disk, Cryo for cloud object storage. Use Arrow Flight for cached reads, else stream from cloud using ranged reads with bounded concurrency and optional prefetch. Apply per-tenant and per-node read limits to protect disks and networks.
  7. Execution. Run on DataFusion with projection and filter pushdown, or submit to Ballista for distributed execution where workers reach cache or object store. For text queries prefer the pre-filter approach: run Tantivy to get candidate file or row ids then intersect with other predicates before scan.
  8. Results. Assemble and stream results back to the client as Arrow RecordBatches or JSON with support for pagination. Optionally keep hot results or loaded batches in cache for reuse.

Index-driven optimization

  • Zone maps are stored per row group and provide min and max for selected columns. The planner can drop row groups that cannot match a range filter without opening files.
  • Bloom or bitmap artifacts help skip files or groups for equality and small IN filters. Bitmap results also compute matching groups or rows to reduce bytes read.
  • Tantivy serves full-text predicates and returns candidate document, row, or file references that the planner intersects with other filters to prune early.

The planner combines these signals with time pruning and SQL predicate pushdown to keep bytes scanned low and latencies predictable.

Tiered cache and fetch strategy'

Cache tiers are explicit and ordered. Electro holds Arrow RecordBatches in RAM for the fastest access. Pyro keeps pre-downloaded Parquet on the local disk for fast reads. Cryo is the object store for cold data. The Fetcher uses Arrow Flight for cached reads, otherwise performs ranged downloads from cloud, fetches parts in parallel across files and row groups with a bounded worker pool, can prefetch adjacent groups when locality is likely, and respects per-tenant and per-node concurrency limits.

Execution specifics and full-text integration

Local runs build a DataFusion SessionContext, register only the selected row groups, and push projections and filters into the Parquet readers. Heavy queries translate the plan to Ballista and run on a cluster where workers can read from cache or cloud. Full-text integrates as a pre-filter, which reduces scan sizes before DataFusion executes the plan.

Observability and performance posture

The module emits traces and metrics for plan build, index prune, fetch, execution, and result assembly. Example Search metrics include openwit_query_latency_ms and cache_hit_ratio. The performance posture is consistent: prune early using metadata and indexes, fetch only what is needed via Electro or Pyro or Cryo, and execute in parallel on a columnar engine.

Configuration touch-points

Relevant knobs exist for routing and cache behavior in the search tier, and the docs call out that Parquet statistics and row-group sizing materially affect pruning efficiency. These settings tune query cost and latency without changing upstream ingestion or storage behavior.