Job Title:
Senior Data Engineer (SME & Interview Trainer)
Company: Sovereign IT Solutions Pvt Ltd
Location: Belgaum, Karnataka
Created: 2025-10-05
Job Type: Full Time
Job Description:
We are seeking a Senior Data Engineer with 8+ years of real-world experience to act as a Subject Matter Expert (SME) and train students specifically for Data Engineering, Big Data, and AI/ML Data Platform interviews.The ideal candidate will not only be hands-on with the latest data + AI/ML technologies but also excel at coaching, mentoring, and conducting mock interviews to prepare students for success in top-tier companies.This role is perfect for someone who can bridge deep technical expertise with interview preparation and training.Key ResponsibilitiesAct as SME & trainer, guiding students to crack Data Engineering, AI/ML, and Cloud Data Platform interviews.Conduct mock interviews, Q&A sessions, and technical deep-dives.Train students on real-world interview scenarios:End-to-end ETL/ELT pipelinesData modelling & warehousingData for AI/ML use cases (feature pipelines, vector databases, embeddings)Streaming & batch processing at scaleData governance, lineage, and security-first architecturesCore Data Engineering + AI/ML Pillars to Cover1. Data Warehousing & Modelling: Star, Snowflake, and Data Vault modelling, SCD Types 1–6, OLTP vs OLAP vs Lakehouse, Schema evolution & data versioning2. ETL/ELT & OrchestrationOrchestration: Apache Airflow, Prefect, Dagster, Azure Data Factory Transformation: dbt, Spark SQL, Pandas, PySpark Batch & streaming workflows (Kafka, Flink, Spark Structured Streaming, Kinesis)3. Big Data & Distributed Processing: Spark (PySpark, Scala, Delta Lake), Hive, Presto/Trino, Iceberg, Hudi, Partitioning, bucketing, caching & shuffle optimisation, Lakehouse & Data Mesh architectures4. Cloud Data PlatformsSnowflake, Databricks, BigQuery, Redshift, Synapse, Multi-cloud (AWS, Azure, GCP) + hybrid/on-prem migration5. Data Storage & Ingestion: Data Lakes (S3, ADLS, GCS), Semi/unstructured data (Parquet, ORC, Avro, JSON, XML, multimedia), Real-time ingestion (Kafka, Pulsar, Debezium, CDC pipelines)6. Observability & MonitoringPipeline observability (Prometheus, Grafana, ELK, CloudWatch, Datadog) Data quality & reliability (Great Expectations, Soda, Deequ) End-to-end lineage & metadata (Apache Atlas, DataHub, Purview, Collibra)7. Security & GovernanceIAM, RBAC, ABAC, fine-grained access controls Data masking, tokenization, PII handling GDPR, HIPAA, CCPA compliance Secrets management (Vault, Key Vault, KMS)8. AI/ML EnablementFeature Engineering Pipelines: scalable pipelines for ML models Feature Stores: Feast, Tecton, Databricks Feature Store MLOps Practices: MLflow, SageMaker Pipelines, Vertex AI, Azure ML Vector Databases & RAG: Pinecone, Weaviate, Milvus, Chroma for LLM apps Model Serving: TensorFlow Serving, TorchServe, Kubernetes-based serving AI Workflows: data prep for NLP, embeddings, recommendation systems LLM Integration: prompt engineering, embeddings pipeline, data optimisation for GenAI workloadsRequired Skills & Experience8+ years in Data Engineering/Big Data/Cloud roles, with hands-on AI/ML data enablement.Proven expertise with ETL/ELT, Spark, Data Lakes, Data Warehouses, and distributed systems.Strong programming skills in Python, SQL, PySpark (Scala/R/ optional).Expertise in cloud-native data + AI/ML platforms (AWS SageMaker, Azure ML, GCP Vertex AI, Databricks).Hands-on with MLOps, Feature Stores, Vector Databases, and ML model integration.Deep understanding of pipeline optimisation, governance, and cost efficiency.Excellent communication & teaching skills — ability to simplify complex data & AI concepts.Prior mentoring, training, or interview-prep experience strongly preferred.