Job Title:
Data Engineer- (Python) Data Processing
Company: Confidential
Location: Republic Of India
Created: 2025-11-19
Job Type: Full Time
Job Description:
About Us MyRemoteTeam, Inc is a fast-growing distributed workforce enabler, helping companies scale with top global talent. We empower businesses by providing world-class software engineers, operations support, and infrastructure to help them grow faster and better. Position: Python Coder – Data Processing Client: Wipro (Google Project) Location: Remote – India Commitment: 40 hrs/week | Contract: 3–6 Months Experience: 8+ Years Job Description We are looking for an experienced Python professional with strong expertise in large-scale data processing. This role involves building and maintaining automated data pipelines that process massive text datasets used for AI and LLM training. The ideal candidate will have deep hands-on experience in Python, strong data engineering skills, and the ability to work closely with ML and AI teams. Key Responsibilities Design and develop scalable ETL/ELT pipelines using Python. Ingest, process, clean, deduplicate, and normalize large text datasets. Work with diverse data formats such as JSON, CSV, XML, and Parquet. Ensure high data quality and establish quality-check standards. Optimize pipelines for speed, cost efficiency, and reliability. Collaborate with AI/ML teams on data requirements and training workflows. Support model training by investigating data-related issues when required. Required Skills 8+ years in data engineering, backend engineering, or data processing roles. Strong expertise in Python and libraries like Pandas, NumPy, Dask, Polars. Experience building large-scale data pipelines. Strong understanding of data structures, data modeling, and best coding practices. Hands-on experience with JSON/CSV/XML/Parquet formats. Excellent debugging and problem-solving skills. Good to Have Experience in LLM/AI data preprocessing (LLaMA, GPT, BERT, etc.). Knowledge of big data frameworks (Spark, Ray). Experience with Hugging Face libraries (Transformers, Datasets, Tokenizers). Familiarity with PyTorch or TensorFlow. Experience working on cloud platforms (AWS, GCP, Azure).