Job Title:
Data Engineer (Scraping & Pipeline Stabilization)
Company: ISITCA PRIVATE LIMITED
Location: New Delhi, Delhi
Created: 2025-09-24
Job Type: Full Time
Job Description:
About the Role:We are looking for a hands-on Data Engineer to join our team and take full ownership of scraping pipelines and data quality. You'll be working on data from 60+ websites involving PDFs, processed via OCR and stored in MySQL/PostgreSQL. You’ll build robust, self-healing pipelines and fix common data issues (missing fields, duplication, formatting errors). Responsibilities: Own and optimize Airflow scraping DAGs for 60+ sites Implement validation checks, retry logic, and error alerts Build pre-processing routines to clean OCR'd text Create data normalization and deduplication workflows Maintain data integrity across MySQL and PostgreSQL Collaborate with ML team for downstream AI use cases Requirements: 2–5 years of experience in Python-based data engineering Experience with Airflow, Pandas, OCR (Tesseract or AWS Textract) Solid SQL and schema design skills (MySQL/PostgreSQL) Familiarity with CSV processing and data pipelines Bonus: Experience with scraping using Scrapy or Selenium Location:Delhi (in-office only) Minimum 3 years experience must be a graduate: b tech preferred / BCA/ MCA /BSc /MScMandatory keywords (must have skills)scraping python selenium NumPy PandasOptional Keywords: (good to have the following skills)Beautiful soup MySQL Large Language Model ( LLM) Machine Learning Natural Language Processing (NLP) GitHub Django