Job Title:
Lead Data Engineer
Company: ITC Infotech
Location: Pune, Maharashtra
Created: 2025-11-16
Job Type: Full Time
Job Description:
Lead Data EngineerLocation: Banglore/Pune (Hybrid)Mode: HybridShift Timing: 2 PM to 11 PMExperience: 7-12 years. 3+ years in cloud data platforms and Databricks.Purpose:We are seeking a hands-on Technical Lead to drive the ingestion of high-volume mainframe RPC data into Databricks, enabling scalable machine learning workflows. This role is critical to building a robust data foundation for training thousands of AI models that detect anomalous behavior across applications, services, and functions.Key Responsibilities- Mainframe Data Ingestion: - Design and implement scalable pipelines to extract, parse, and ingest RPC logs and technical attributes from mainframe systems into Delta Lake on Databricks. - ML-Ready Data Engineering: - Transform and structure data for time-series modelling and anomaly detection across thousands of models. - ML Workflow Integration: - Collaborate with ML engineers to ensure data pipelines support SARIMA, ANN, and other model types; enable automated retraining and scoring. - Performance Optimization: - Tune Spark jobs, Delta Lake storage, and cluster configurations for billions of records and real-time aggregation. - FinOps & Cost Control: - Monitor and optimize Databricks resource usage; implement auto-scaling and cost-aware job scheduling. - Monitoring & Alerting: - Integrate Databricks-native alerting for pipeline health, data anomalies, and job failures.Required Skills- Strong hands-on experience with Databricks, Apache Spark, Delta Lake, and MLflow - Proven expertise in mainframe data integration (e.g., SMF, RPC logs, VSAM, DB2) - Strong Python and PySpark programming skillsPreferred Skills- Databricks experience with CI/CD tools GitHub Actions - Knowledge of FinOps principles for cloud cost optimization