Job Title:

AI/ML Engineer

Company: CloudHire

Location: Gurgaon, Haryana

Created: 2025-12-25

Job Type: Full Time

Job Description:

Data Engineer / ML Engineer — Job DescriptionLocation - Gurugram (Onsite)Salary Budget - Upto 18 LPAKey Responsibilities- Design, build, and maintain scalable data pipelines (batch + streaming) using Spark, Hadoop, and other Apache ecosystem tools. - Develop robust ETL workflows for large-scale data ingestion, transformation, and validation. - Work with Cassandra, Data Lakes, and distributed storage systems to handle large-volume datasets. - Write clean, optimized, and modular Python code for data processing, automation, and machine learning tasks. - Utilize Linux environments for scripting, performance tuning, and data workflow orchestration. - Build and manage web scraping pipelines to extract structured and unstructured data from diverse sources. - Collaborate with ML/AI teams to prepare training datasets, manage feature stores, and support model lifecycle. - Implement and experiment with LLMs, LangChain, RAG pipelines, and vector database integrations. - Assist in fine-tuning models, evaluating model performance, and deploying ML models into production. - Optimize data workflows for performance, scalability, and fault tolerance. - Document data flows, transformation logic, and machine learning processes. - Work cross-functionally with engineering, product, and DevOps teams to ensure reliable, production-grade data systems.Requirements- 3–6 years of experience as a Data Engineer, ML Engineer, or similar role. - Strong expertise in Advanced Python (data structures, multiprocessing, async, clean architecture). - Solid experience with: - Apache Spark / PySpark - Hadoop ecosystem (HDFS, Hive, Yarn, HBase, etc.) - Cassandra or similar distributed databases - Linux (CLI tools, shell scripting, environment management) - Proven ability to design and implement ETL pipelines and scalable data processing systems. - Hands-on experience with data lakes, large-scale storage, and distributed systems. - Experience with web scraping frameworks (BeautifulSoup, Scrapy, Playwright, etc.). - Familiarity with LangChain, LLMs, RAG, vector stores (FAISS, Pinecone, Milvus), and ML workflow tools. - Understanding of model training, fine-tuning, and evaluation workflows. - Strong problem-solving skills, ability to deep dive into complex data issues, and write production-ready code. - Experience with cloud environments (AWS/GCP/Azure) is a plus.

Apply Now

➤