Executive Summary

HTTP is one of the three entry points into the OpenWit Ingestion Pipeline alongside Kafka and gRPC. It accepts incoming POST requests, runs through the same gateway checks as other sources, batches events by size or time, forwards them to the Ingest node, persists to dual WAL, converts to Arrow, and streams to Storage via Arrow Flight.

1. Where HTTP fits

Role: Optional REST-based ingestion path for telemetry. Producers send HTTP POST requests to the gateway.
Primary use: Canaries and validation of the downstream path before ramping up with gRPC or Kafka.

2. End-to-end flow

A. Client → Gateway (HTTP POST)

The gateway authenticates requests, enforces quotas, validates schema, normalizes field names, and buffers events into per-tenant or per-source buffers. If validation fails the gateway returns a 4xx. If overloaded it applies backpressure, which can surface as rate limiting. Batching is triggered by batch size or timeout.

Important: batches accumulated by the gateway are held as JSON or Arrow buffers then serialized to gRPC for the handoff to the Ingest node. This is why the general gRPC message-size guardrail still applies to HTTP ingestion.

B. Gateway → Ingest (bulk handoff)

Once a batch reaches the configured threshold, the gateway forwards it to the Ingest node. The Ingest node assembles MessageBatch objects, enforces buffer ceilings on messages and bytes, and applies windowed deduplication. If buffer limits are reached, it signals backpressure upstream.

C. Durability with dual WAL

OpenWit writes the batch to short-term WAL and long-term WAL. The short-term WAL must succeed before the request is acknowledged. This provides at least once durability at the ingestion boundary. WAL rotation, retention, and the sync_on_write trade-off are configured in the processing and WAL sections.

D. Arrow conversion and Arrow Flight to Storage

After WAL success, the Ingest node converts the batch to an Arrow RecordBatch and sends it to a Storage endpoint chosen by the control plane. Arrow Flight is the transport. Keep batch_size × average event size well below the configured gRPC max message size to avoid oversize errors and leave headroom for Arrow metadata.

E. Storage

The Storage node appends Arrow to an active Parquet, rolls to a stable Parquet by size or time, uploads to object storage via OpenDAL, and then records file metadata in Postgres. Indexing is triggered afterwards.

3. HTTP receiver configuration

Your configuration PDF calls out the HTTP block explicitly. These settings control where the HTTP server listens, how much concurrency it allows, the request timeout, and the maximum payload size. A separate proxy.http section repeats these controls when the proxy is the external entry point.

Group	Keys	Explanation
Enablement & port	enabled, port (defaults shown as 4318 in your doc)	Toggle HTTP and define the listening port. 4318 is documented in your config.
Concurrency	max_concurrent_requests (doc shows 10k)	Upper bound on the number of in-flight HTTP requests. High values need enough threads and memory.
Timeouts	`request_timeout_ms`	Per-request timeout at the HTTP layer. Tune based on upstream speed and batch flushing.
Payload limits	`max_payload_size_mb`	Hard cap on HTTP body size to protect memory. Keep this aligned with downstream gRPC and batching limits.

Note: The HTTP frontend is good for canaries, and gRPC/Kafka are preferred for sustained high-throughput ingestion. If you do enable high concurrency on HTTP, ensure node threads and memory are sized appropriately.

4. Batching and size guardrails

The same sizing rule called out in the ingestion pipeline applies here. The gateway flushes by batch size or timeout, then serializes the batch to gRPC for the Ingest node. Keep batch_size × average event size < gRPC max message size, and ensure the active batches fit within the processing.buffer ceilings so the system does not thrash or reject under load.

5. Operational signals to watch

Gateway pressure: buffer depth, memory usage, and backpressure events when reaching limits.
WAL health: WAL write and fsync latency, backlog, segment counts, disk usage in the WAL directory.
Arrow Flight: send latency, serialization overhead, endpoint selection errors.

6. Failure modes and recovery

Scenario	Behavior	Recovery
Short-term WAL write fails	Request is not acknowledged; backpressure grows at the gateway	Fix disk or WAL path; ingestion resumes and acks once WAL succeeds
Arrow Flight send failure	Batch remains in WAL until transfer succeeds	Router retries and may select a different storage endpoint
Downstream slow or stalled	Gateway and Ingest apply backpressure to protect buffers	Watch buffer_full and memory gauges; reduce concurrency or batch size until stable

7. Quick checklist before enabling HTTP

Confirm limits: set max_payload_size_mb to a safe value and align it with downstream gRPC limits.
Right-size concurrency: max_concurrent_requests should match threads and memory on the node.
Prefer gRPC/Kafka for load: keep HTTP for canaries and validation.
Keep batching safe: batch size × average event size must be below gRPC max; leave headroom for Arrow metadata.