Real-Time Data Pipelines

Technology Stack

The Real-Time Data Stack

Apache Kafka

Event Streaming Platform

Distributed, fault-tolerant log with configurable retention. Decouples producers from consumers and allows replay — critical for recovery and backfill scenarios.

ClickHouse

OLAP Database

Columnar storage with vectorised execution. Queries over billions of rows with GROUP BY and time-series aggregations complete in seconds, not hours.

Stream Processor

Goroutine-per-partition consumer model achieves high throughput with predictable latency. Static binaries deploy cleanly in K8s with minimal resource overhead.

Redis Streams

Low-Latency Buffer

For sub-100ms pipelines where Kafka's batch behaviour adds too much latency. Redis Streams provide consumer groups with in-memory throughput.

WebSockets / SSE

Live Dashboard Transport

Push-based delivery of aggregated metrics to browser dashboards. SSE for uni-directional feeds; WebSockets for interactive real-time UIs.

Grafana

Operational Dashboards

Live pipeline health monitoring: consumer lag, throughput per topic, processing latency, and error rates — all visualised and alertable.

PostgreSQL

Transactional Store

For entities requiring ACID guarantees. Feeds into the Kafka event stream via CDC or application-level publishing for downstream consumers.

Kubernetes

Consumer Orchestration

KEDA (Kubernetes Event-Driven Autoscaling) scales consumer pods based on Kafka consumer lag metrics — compute scales with actual workload.

Architecture

Designing Data Pipelines

Event schema and contracts first

Every pipeline starts with a schema registry. We define event contracts (using Avro or Protocol Buffers) before writing producers or consumers. Schema evolution rules (backward/forward compatibility) are enforced at the registry level, preventing breaking changes from propagating silently.

Idempotent processing and exactly-once delivery

Stream processors must handle duplicates gracefully. We implement idempotent writes using event IDs as deduplication keys, and leverage Kafka's transactional API for exactly-once semantics where business correctness demands it — financial calculations, inventory updates.

Late data and out-of-order events

Mobile and distributed systems produce events that arrive late — sometimes hours after the fact. We design windowing strategies with configurable watermarks and late-arrival handling, ensuring that historical aggregates are updated correctly when delayed events arrive.

Backfill and replay strategy

Kafka's configurable retention means any consumer can replay historical events. We design pipelines with replay in mind from the start — separating raw event storage from derived state so schema changes or bug fixes can be applied retroactively.

Development Process

How We Build Pipelines

Event catalogue and schema design

We map all events in the system, define their schemas, and agree on ownership boundaries. Each event type lives in one topic, owned by one service.

Producer-first development

Producers are built and deployed first, with consumers consuming from day one in parallel — allowing us to validate event shape before building downstream logic.

Consumer lag monitoring from day one

We set up consumer lag dashboards from the first deployment. Lag is the most important health signal — it tells us immediately if processing has fallen behind producers.

Load and throughput testing

We replicate production-scale event volumes in staging using a traffic generator and verify that consumer throughput, ClickHouse ingestion rate, and dashboard latency all meet SLAs.

End-to-End Latency Testing

We instrument events with a source timestamp and measure time-to-dashboard for a sample of events continuously. Alerting fires if p99 exceeds the agreed SLA (typically 500ms for operational dashboards, 5s for analytics).

Chaos and Failure Mode Testing

We simulate Kafka broker loss, consumer crashes, and ClickHouse write failures. Key assertions: no data loss, lag recovers within defined time, downstream dashboards show stale-data indicators rather than incorrect data.

Data Accuracy Validation

Aggregated metrics from the pipeline are compared against ground truth from the source database on a scheduled basis. Discrepancies above 0.1% trigger an alert and investigation.

Schema Compatibility Tests

All schema changes are tested against schema registry compatibility rules before deployment. Backward-incompatible changes require a multi-step migration: add new field → dual-write → remove old field.

Anonymous Projects

Pipelines We've Built

FinTech · Western Europe

Challenge

A financial data platform needed to ingest 50M+ market events per day (tick data, order book changes, trade confirmations) and surface live P&L, risk metrics, and alerts to portfolio managers with sub-second latency.

Solution

We built a Kafka-based ingestion layer with Go consumers that normalise and enrich events, write to ClickHouse for historical analysis, and push aggregated signals to a WebSocket server connected to the React dashboard. Alert rules run as streaming SQL against Kafka using ksqlDB.

50M+Events/day

<1sData latency

3Months to MVP

E-commerce Analytics · Global

Challenge

A marketplace needed real-time visibility into GMV, conversion rates, and seller performance across 15 countries and 8 currencies — with dashboards updating live and historical data queryable to the minute.

Solution

We implemented a dual-write pattern: all transactions publish to Kafka; one consumer updates PostgreSQL (OLTP), another writes enriched events to ClickHouse (OLAP). A GraphQL API serves the dashboard, backed by Redis-cached ClickHouse aggregations refreshed every 10 seconds.

15Countries

10sDashboard refresh

2sHistorical query p99

Gaming Platform · CIS Region

Challenge

An online gaming operator needed a real-time fraud detection pipeline that could flag suspicious betting patterns within 200ms of a bet being placed, without impacting the bet placement latency.

Solution

We built an async fraud scoring pipeline using Redis Streams for low-latency event delivery to the scoring service. The scoring service applies rule-based and ML-based checks asynchronously; high-confidence fraud triggers an immediate account hold via a callback API. Bet placement flow is never blocked.

<200msFraud detection

0msBet placement impact

94%Fraud precision

IoT Platform · Industrial

Challenge

An industrial IoT platform needed to process sensor readings from 10,000 devices (10M readings/hour), detect anomalies in real time, and store 2 years of time-series data queryable by any device, sensor type, or time range.

Solution

MQTT broker fronts device connections; a Go bridge transforms and publishes to Kafka. Anomaly detection runs as a Kafka Streams application. Raw time-series data is stored in ClickHouse with a TTL policy; rollup aggregations run as materialised views for fast historical queries.

10KDevices

10MReadings/hour

2yrData retention

From Raw Events to
Live Intelligence

The Real-Time Data Stack

Designing Data Pipelines

Event schema and contracts first

Idempotent processing and exactly-once delivery

Late data and out-of-order events

Backfill and replay strategy

How We Build Pipelines

Event catalogue and schema design

Producer-first development

Consumer lag monitoring from day one

Load and throughput testing

Pipelines We've Built

Need a real-time data pipeline?

From Raw Events toLive Intelligence

The Real-Time Data Stack

Designing Data Pipelines

Event schema and contracts first

Idempotent processing and exactly-once delivery

Late data and out-of-order events

Backfill and replay strategy

How We Build Pipelines

Event catalogue and schema design

Producer-first development

Consumer lag monitoring from day one

Load and throughput testing

Pipelines We've Built

Need a real-time data pipeline?

From Raw Events to
Live Intelligence