RKRohit KumarData Engineer

Data Engineer

Building pipelines, streaming systems, and analytics backends for real downstream use.

I build developer-grade data systems across streaming, ETL, analytics APIs, and lakehouse exploration, with an emphasis on clean architecture and usable downstream outputs.

Focus

Streaming, ETL, APIs, lakehouse exploration

Working style

Practical systems thinking over decorative dashboards

Portfolio intent

Show engineering depth with a stronger developer-facing presentation

About

A data engineer focused on the systems behind reliable products.

I focus on reliable pipelines, practical system design, and turning raw data into dependable interfaces for analytics and product teams.

I am a data engineer based in Bengaluru, India, currently at Moodys Ratings. I enjoy building reliable flows from source to storage to consumption, especially when the architecture is as interesting as the output.

I would rather show shipped pipelines, real project depth, and the technical direction I am growing into than fill the page with generic portfolio noise.

Streaming and event-driven pipelines

Spark, PySpark, Airflow, and SQL workflows

Warehouse and lakehouse-oriented thinking

Backend-minded data products for downstream consumption

Experience

Data engineering experience across banking and ratings.

4+ years total experience, shown as a compact progression without overpowering the projects.

Current

Data Engineer

Moody's Ratings · 2025 - Present

Progressed into building data systems, pipeline reliability, and analytics-facing engineering work with a stronger platform and developer mindset.

Earlier

Business Analyst

Axis Bank · 2022 - 2025

Started on the analytics side of the stack, working with reporting, data quality, stakeholder requirements, and problem framing that later translated well into data engineering.

Projects

Selected builds across streaming, orchestration, APIs, analytics, and lakehouse exploration.

The Iceberg proof of concept is explicitly shown as work in progress.

Featured

TransitFlow Realtime Event Stream

A real-time transit-data pipeline that simulates events, streams through Kafka, processes with Spark, and lands analytics-ready datasets in AWS.

  • Combines Kafka, Spark, Glue, Athena, and Redshift in one architecture
  • Frames data engineering as an end-to-end flow from ingestion to query
  • Strongest systems-style project for the portfolio hero section
KafkaSparkAWS GlueAthenaRedshiftDocker
Featured

Reddit Sentiment ETL Pipeline

An orchestrated ETL workflow that extracts Reddit data, transforms it with PySpark sentiment analysis, and loads it for analytics and dashboarding.

  • Uses Airflow for orchestration and PySpark for transformation
  • Separates raw and transformed storage across MySQL and PostgreSQL
  • Connects engineering work to downstream BI consumption
AirflowPySparkMySQLPostgreSQLPower BI
Featured

Analytics API

A lightweight analytics-serving layer designed to expose data stored in a time-series oriented backend for downstream use.

  • Shows an API mindset beyond pipeline construction
  • Useful to signal consumption patterns, not just storage
  • Good bridge between engineering and analytics enablement
PythonAPI DesignAnalyticsTimescaleDB
Featured

Realtime Voting System

A streaming aggregation project that processes live voting events and surfaces real-time counts through a dashboard-oriented flow.

  • Demonstrates real-time processing patterns with PySpark streaming
  • Adds another event-driven project to support the portfolio narrative
  • Pairs well with TransitFlow as a second streaming proof point
PySparkStreamingDashboardingPython
Live Demo

WhatsApp Chat Analyzer

A deployed analytics app that transforms exported chats into interactive insights, giving the portfolio a user-facing project alongside systems work.

  • Shows you can package analysis into a usable interface
  • Adds visible demo value alongside backend-heavy projects
  • Useful as a lighter, more approachable project in the lineup
PythonStreamlitText AnalyticsVisualization
Work in Progress

Iceberg Lakehouse POC

An exploratory lakehouse proof of concept focused on Apache Iceberg and modern table-format thinking for scalable analytical data systems.

  • Explicitly positioned as ongoing work rather than a finished build
  • Signals active learning in lakehouse architecture and open table formats
  • Should visually read as experimental and forward-looking
IcebergLakehouseJupyterData Architecture

Stack

The tools behind ingestion, processing, orchestration, storage, and serving.

Ingestion

KafkaAPIsReddit APIBatch + streaming inputs

Processing

PySparkSpark StreamingPythonSQL

Orchestration

AirflowDockerWorkflow automation

Storage

S3PostgreSQLMySQLRedshiftIceberg POC

Analytics

AthenaPower BITimescale-oriented APIs

Contact

Open to interesting data engineering work and technical collaboration.

GitHub is the best place to inspect the actual project depth. Use the repo links above, then reach out if there is a fit.