KM
Khaja Mujahiddin Mohammed

Hi, I'm Khaja Mujahiddin Mohammed

Entry-Level Data Scientist | SQL β€’ Python β€’ Power BI | MS Data Science (May 2025) | AWS | Gen AI

NLPR ProgrammingDeep LearningMLPython
Recent M.S. Data Science graduate with 4-month US internship.85% accuracy β€’ Multimodal NLP30% faster ETL β€’ AWS + Airflow20–30% faster decisions β€’ Power BI
4 moUS Internship
8–10Production Dashboards
50+Active Users
85%Fake-News Accuracy
20–30%Ops Speed

About

Recent Master’s in Data Science graduate (University of New Haven, May 2025, GPA 3.29) with hands-on experience building real-time AWS data pipelines (Kinesis β†’ Glue β†’ Lambda β†’ S3), interactive Power BI dashboards with DAX, and Python automation (pandas, SQLAlchemy) during a 4-month Data Scientist internship at Innover Global (Aug 2025 – Present) and a 2.5-year full-time Jr. Data Analyst role at SRIK Consulting Services. Delivered 8–10 production Power BI dashboards used by 50+ stakeholders across finance and HR teams, automated monthly reporting workflows that saved ~8 hours per analyst per week, and developed a multimodal fake-news detection pipeline (Airflow + Hugging Face Transformers) as my capstone project, achieving 85% accuracy and deployed live on Hugging Face Spaces. Proficient in SQL, Python, Power BI (DAX, Power Query, RLS), PySpark, Flask APIs, and AWS cloud services. Additional academic projects include AdventureWorks sales analytics, Titanic survival prediction, and Microsoft Fabric real-time intelligence with KQL and Eventstream. Actively seeking Junior Data Analyst, BI Analyst, or entry-level Data Engineer roles in USA. Available immediately. Let’s connect!

Core Strengths

  • End-to-end ML & MLOps (AWS SageMaker, Airflow, MLflow)
  • ETL/Data Engineering with Spark, Kafka & AWS Glue
  • Interactive BI & Dashboards (Power BI, Tableau, R Shiny)
  • Agile delivery & cross-functional stakeholder alignment

Domains

  • Natural Language Processing (Sentiment, Summarization)
  • Computer Vision (Segmentation, Detection)
  • Forecasting & Predictive Analytics (ARIMA, Prophet, LSTM)
  • Real-time Streaming Data Pipelines (Kinesis, Kafka)

Live Snippet

# Example Airflow DAG: simple daily retraining job
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def retrain():
    # load data, retrain model, log metrics
    print("Retraining pipeline executed")

with DAG("daily_retrain",
         start_date=datetime(2025,1,1),
         schedule="@daily",
         catchup=False) as dag:
    PythonOperator(
        task_id="retrain_model",
        python_callable=retrain
    )

Projects

Representative image for Real-Time Customer Segmentation (Unsplash)

Real-Time Customer Segmentation

AWS Β· PySpark Β· Power BI Β· MLOps

Representative image for AdventureWorks Sales Dashboard (Unsplash)

AdventureWorks Sales Dashboard

Power BI Β· DAX Β· SQL

Representative image for Multimodal Fake News Detector (Unsplash)

Multimodal Fake News Detector

NLP Β· Airflow Β· Hugging Face

Representative image for Microsoft Fabric Real-Time Intelligence (Unsplash)

Microsoft Fabric Real-Time Intelligence

Fabric Β· KQL Β· Real-Time

Experience

  1. Data Scientist Intern

    Innover Global β€’ Aurora, IL β€’ Aug 2025 – Present

    • Building prototype real-time pipeline: Kinesis β†’ Glue β†’ Lambda β†’ S3 (100K+ events/day)
    • K-Means + PCA customer segmentation model in PySpark
    • Interactive Power BI dashboards with DAX slicers for stakeholders
  2. Jr. Data Analyst (started as intern)

    SRIK Consulting β€’ Hyderabad, India β€’ Sep 2020 – Jul 2023

    • Delivered 8–10 production Power BI dashboards (50+ users) with Row-Level Security
    • Automated weekly refreshes using Python + SQLAlchemy β†’ saved 8 hrs/analyst/week
    • Built star-schema models from MySQL, SQL Server, Excel, REST APIs

Skills

  • Programming & Scripting: Python, SQL, PySpark, R, Bash, PowerShell
  • Data Engineering & Big Data: ETL Pipelines, Apache Spark, Kafka, Hadoop, Databricks, Snowflake, Redshift, SSIS, Talend, Informatica
  • Cloud Platforms: AWS (EC2, S3, RDS, Glue, Lambda, Kinesis, SNS, IAM, SageMaker), Azure (Data Factory, Databricks, Synapse, Power BI Gateway), GCP (BigQuery)
  • Databases: SQL Server, Oracle, MySQL, PostgreSQL, MongoDB, Teradata, Cassandra
  • Machine Learning & AI: Scikit-learn, TensorFlow, PyTorch, ARIMA, Prophet, LSTM, NLP (spaCy, NLTK, Transformers)
  • GenAI & LLMs: OpenAI GPT-4, LangChain, Hugging Face Transformers, RAG, Llama Index, Semantic Search, Chatbot Development
  • Visualization & BI: Power BI (DAX, RLS, Power Query), Tableau, Excel (Power Pivot, Power View), R Shiny
  • DevOps & Orchestration: Docker, Kubernetes, Apache Airflow, Jenkins CI/CD, Terraform, Control-M
  • Version Control & Collaboration: Git, GitHub, JIRA, Agile/Scrum

Education

Master’s in Data Science

University of New Haven, CT, USA Β· Aug 2023 β€” May 2025

  • Coursework: Advanced Machine Learning, Deep Learning, NLP, Cloud-Based MLOps, Big Data Analytics.
  • Hands-on: Spark ETL on AWS EMR, model retraining with SageMaker Pipelines.
  • Key Projects: Multimodal Fake News Detection (85% accuracy), Pedestrian & Cyclist Segmentation (U-Net).

Certifications

  • AWS Cloud Practitioner β€” EduBridge
  • Google Data Analytics β€” Coursera
  • HackerRank SQL β€” 5 Star
  • MySQL Developer β€” Udemy
  • PGDCA (Post Graduate Diploma in Computer Applications)
  • Data Analytics & Visualization Virtual Experience β€” Forage

Contact

The form opens your mail client with a prefilled email.