Executive Summary
HTTP is one of the three entry points into the OpenWit Ingestion Pipeline alongside Kafka and gRPC. It accepts incoming POST requests, runs through the same gateway checks as other sources, batches events by size or time, forwards them to the Ingest node, persists to dual WAL, converts to Arrow, and streams to Storage via Arrow Flight.
1. Where HTTP fits
- Role: Optional REST-based ingestion path for telemetry. Producers send HTTP POST requests to the gateway.
- Primary use: Canaries and validation of the downstream path before ramping up with gRPC or Kafka.
2. End-to-end flow
A. Client → Gateway (HTTP POST)
The gateway authenticates requests, enforces quotas, validates schema, normalizes field names, and buffers events into per-tenant or per-source buffers. If validation fails the gateway returns a 4xx. If overloaded it applies backpressure, which can surface as rate limiting. Batching is triggered by batch size or timeout.
Important: batches accumulated by the gateway are held as JSON or Arrow buffers then serialized to gRPC for the handoff to the Ingest node. This is why the general gRPC message-size guardrail still applies to HTTP ingestion.
B. Gateway → Ingest (bulk handoff)
Once a batch reaches the configured threshold, the gateway forwards it to the Ingest node. The Ingest node assembles MessageBatch objects, enforces buffer ceilings on messages and bytes, and applies windowed deduplication. If buffer limits are reached, it signals backpressure upstream.
C. Durability with dual WAL
OpenWit writes the batch to short-term WAL and long-term WAL. The short-term WAL must succeed before the request is acknowledged. This provides at least once durability at the ingestion boundary. WAL rotation, retention, and the sync_on_write trade-off are configured in the processing and WAL sections.
D. Arrow conversion and Arrow Flight to Storage
After WAL success, the Ingest node converts the batch to an Arrow RecordBatch and sends it to a Storage endpoint chosen by the control plane. Arrow Flight is the transport. Keep batch_size × average event size well below the configured gRPC max message size to avoid oversize errors and leave headroom for Arrow metadata.
E. Storage
The Storage node appends Arrow to an active Parquet, rolls to a stable Parquet by size or time, uploads to object storage via OpenDAL, and then records file metadata in Postgres. Indexing is triggered afterwards.
3. HTTP receiver configuration
Your configuration PDF calls out the HTTP block explicitly. These settings control where the HTTP server listens, how much concurrency it allows, the request timeout, and the maximum payload size. A separate proxy.http section repeats these controls when the proxy is the external entry point.
| Group | Keys | Explanation |
|---|---|---|
| Enablement & port | enabled, port (defaults shown as 4318 in your doc) | Toggle HTTP and define the listening port. 4318 is documented in your config. |
| Concurrency | max_concurrent_requests (doc shows 10k) | Upper bound on the number of in-flight HTTP requests. High values need enough threads and memory. |
| Timeouts | request_timeout_ms | Per-request timeout at the HTTP layer. Tune based on upstream speed and batch flushing. |
| Payload limits | max_payload_size_mb | Hard cap on HTTP body size to protect memory. Keep this aligned with downstream gRPC and batching limits. |
Note: The HTTP frontend is good for canaries, and gRPC/Kafka are preferred for sustained high-throughput ingestion. If you do enable high concurrency on HTTP, ensure node threads and memory are sized appropriately.
4. Batching and size guardrails
The same sizing rule called out in the ingestion pipeline applies here. The gateway flushes by batch size or timeout, then serializes the batch to gRPC for the Ingest node. Keep batch_size × average event size < gRPC max message size, and ensure the active batches fit within the processing.buffer ceilings so the system does not thrash or reject under load.
5. Operational signals to watch
- Gateway pressure: buffer depth, memory usage, and backpressure events when reaching limits.
- WAL health: WAL write and fsync latency, backlog, segment counts, disk usage in the WAL directory.
- Arrow Flight: send latency, serialization overhead, endpoint selection errors.
6. Failure modes and recovery
| Scenario | Behavior | Recovery |
|---|---|---|
| Short-term WAL write fails | Request is not acknowledged; backpressure grows at the gateway | Fix disk or WAL path; ingestion resumes and acks once WAL succeeds |
| Arrow Flight send failure | Batch remains in WAL until transfer succeeds | Router retries and may select a different storage endpoint |
| Downstream slow or stalled | Gateway and Ingest apply backpressure to protect buffers | Watch buffer_full and memory gauges; reduce concurrency or batch size until stable |
7. Quick checklist before enabling HTTP
- Confirm limits: set
max_payload_size_mbto a safe value and align it with downstream gRPC limits. - Right-size concurrency:
max_concurrent_requestsshould match threads and memory on the node. - Prefer gRPC/Kafka for load: keep HTTP for canaries and validation.
- Keep batching safe: batch size × average event size must be below gRPC max; leave headroom for Arrow metadata.