Data Engineer · Bengaluru, India

Building pipelines that move data, reliably.

I build developer-grade data systems across streaming, ETL, analytics APIs, and lakehouse exploration, with an emphasis on clean architecture and usable downstream outputs.

View work Resume

38+: Public repos
6: Featured builds
3: Pipeline patterns
2021: GitHub since

Selected work

Pipelines, streams, and the systems around them.

End-to-end builds across ingestion, processing, storage, and serving. Each links to its repository.

Featured

Lakehouse MCP Server

A serverless MCP server that lets tools and LLMs query the Iceberg lakehouse through Amazon Athena — no always-on infrastructure to manage.

Exposes the lakehouse to any MCP-compatible client
Runs queries serverless via Athena, pay-per-query
Turns the Iceberg POC into a usable, queryable interface

MCPAmazon AthenaApache IcebergPythonServerless

Repository

Featured

TransitFlow Realtime Event Stream

A real-time transit-data pipeline that simulates events, streams through Kafka, processes with Spark, and lands analytics-ready datasets in AWS.

Combines Kafka, Spark, Glue, Athena, and Redshift in one architecture
Frames data engineering as an end-to-end flow from ingestion to query
Strongest systems-style project for the portfolio hero section

KafkaSparkAWS GlueAthenaRedshiftDocker

Repository

Featured

Reddit Sentiment ETL Pipeline

An orchestrated ETL workflow that extracts Reddit data, transforms it with PySpark sentiment analysis, and loads it for analytics and dashboarding.

Uses Airflow for orchestration and PySpark for transformation
Separates raw and transformed storage across MySQL and PostgreSQL
Connects engineering work to downstream BI consumption

AirflowPySparkMySQLPostgreSQLPower BI

Repository

Featured

Analytics API

A lightweight analytics-serving layer designed to expose data stored in a time-series oriented backend for downstream use.

Shows an API mindset beyond pipeline construction
Useful to signal consumption patterns, not just storage
Good bridge between engineering and analytics enablement

PythonAPI DesignAnalyticsTimescaleDB

Repository

Featured

Realtime Voting System

A streaming aggregation project that processes live voting events and surfaces real-time counts through a dashboard-oriented flow.

Demonstrates real-time processing patterns with PySpark streaming
Adds another event-driven project to support the portfolio narrative
Pairs well with TransitFlow as a second streaming proof point

PySparkStreamingDashboardingPython

Repository

Live Demo

WhatsApp Chat Analyzer

A deployed analytics app that transforms exported chats into interactive insights, giving the portfolio a user-facing project alongside systems work.

Shows you can package analysis into a usable interface
Adds visible demo value alongside backend-heavy projects
Useful as a lighter, more approachable project in the lineup

PythonStreamlitText AnalyticsVisualization

Repository Live demo

Work in Progress

Iceberg Lakehouse POC

An exploratory lakehouse proof of concept focused on Apache Iceberg and modern table-format thinking for scalable analytical data systems.

Explicitly positioned as ongoing work rather than a finished build
Signals active learning in lakehouse architecture and open table formats
Should visually read as experimental and forward-looking

IcebergLakehouseJupyterData Architecture

Repository

Toolbox

The stack, mapped to the pipeline.

From the first event to the final query — the tools I reach for at each stage.

01Ingestion

KafkaAPIsReddit APIBatch + streaming inputs

02Processing

PySparkSpark StreamingPythonSQL

03Orchestration

AirflowDockerWorkflow automation

04Storage

S3PostgreSQLMySQLRedshiftIceberg POC

05Analytics

AthenaMCPPower BITimescale-oriented APIs

About

Engineer behind the pipelines.

I focus on reliable pipelines, practical system design, and turning raw data into dependable interfaces for analytics and product teams.

Based in Bengaluru, India · currently at Moodys Ratings.

FocusStreaming, ETL, APIs, lakehouse exploration
Working stylePractical systems thinking over decorative dashboards
Portfolio intentShow engineering depth with a stronger developer-facing presentation

pipeline.sh

$ airflow dags trigger reddit-sentiment

[ok] dag run created

$ spark-submit jobs/spark-job.py

[stream] kafka -> spark -> s3 -> athena

$ lakehouse status

[wip] iceberg poc under active development

Experience · 4+ years

From analytics to data engineering.

Current
Data Engineer · Moody's Ratings
2025 - Present
Progressed into building data systems, pipeline reliability, and analytics-facing engineering work with a stronger platform and developer mindset.
Earlier
Business Analyst · Axis Bank
2022 - 2025
Started on the analytics side of the stack, working with reporting, data quality, stakeholder requirements, and problem framing that later translated well into data engineering.

Contact

Open to data engineering work and technical collaboration.

GitHub shows the real depth — start with the repos above, then reach out if there's a fit.

r.kumar01@hotmail.com GitHub LinkedIn Resume

Building pipelines that move data, reliably.

Lakehouse MCP Server

TransitFlow Realtime Event Stream

Reddit Sentiment ETL Pipeline

Analytics API

Realtime Voting System

WhatsApp Chat Analyzer

Iceberg Lakehouse POC

Data Engineer · Moody's Ratings

Business Analyst · Axis Bank