NLP-Driven Speech Analytics Platform
NLP · PySpark · Databricks · Whisper · Azure

Senior Data Engineer | Azure & AWS Lakehouse Architect | PySpark • Python • NLP
As a senior-level data engineer and analyst with more than 6 years of experience, currently serving as a Data Engineer at Optum, I design scalable analytics platforms and machine-learning pipelines across healthcare, telecom, and retail. I have delivered end-to-end cloud data solutions on Azure and AWS, including a Medallion lakehouse architecture and automated feature-engineering pipelines that cut data-prep latency by 30%. By building production-grade NLP and predictive models and creating interactive BI dashboards, I translate complex data into actionable insights. Aiming to apply this expertise to accelerate data-driven decision-making and improve outcomes for a forward-thinking organization.
# Example PySpark Transformation Pipeline
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
def process_bronze_to_silver(df):
"""
Cleans raw data and writes to Silver zone
in Delta Lake format.
"""
clean_df = df.filter(col("status") == "active") \
.dropDuplicates(["transaction_id"])
clean_df.write.format("delta") \
.mode("append") \
.save("/mnt/datalake/silver/transactions")
return "Data processing complete"NLP · PySpark · Databricks · Whisper · Azure
Azure · Medallion · Airflow · Power BI
Deep Learning · NLP · Machine Learning
Computer Vision · U-Net · Deep Learning
AWS · PySpark · Power BI · MLOps
Email: khajamujahiddin@gmail.com
Phone: +1 (347) 736-5812
LinkedIn: /in/mujahiddin-md-6b21b1390/
GitHub: /github.com/khajamujahid
Twitter: @Khaja7262
Instagram: @khaja_2310