Job Title:
Web Crawling Engineer
Company: Forage AI
Location: Ludhiana, Punjab
Created: 2026-03-06
Job Type: Full Time
Job Description:
We are seeking an experienced Web Crawling Engineer to design, build, and maintain robust data extraction systems at scale. You'll work on developing sophisticated web scraping infrastructure that handles high-volume data collection while ensuring reliability, efficiency, and compliance.Experience:3+ years of professional experience in web scraping and data extractionTechnical Skills:Strong proficiency in Python with extensive experience in web scraping frameworks (Scrapy, BeautifulSoup, Selenium, or similar)Deep understanding of HTML, CSS, JavaScript, and DOM manipulation for effective data extractionHands-on experience with PostgreSQL for data storage and managementProficiency with Redis for caching, queue management, and session handlingExperience with RabbitMQ for distributed task management and message queuingSolid knowledge of AWS EC2 for deploying and managing crawling infrastructureProven experience implementing and managing residential and rotating proxy solutions to handle rate limiting and geo-restrictionsUnderstanding of anti-bot mechanisms and techniques to work within website terms of service.Core Competencies:Ability to analyze website structures and develop efficient extraction strategiesExperience handling dynamic content, AJAX requests, and JavaScript-rendered pagesStrong debugging skills for troubleshooting scraping issues and proxy failuresKnowledge of data quality validation and cleaning techniquesUnderstanding of ethical scraping practices and robots.txt complianceResponsibilitiesDesign and implement scalable web crawling systems using Python-based frameworksDevelop and maintain distributed scraping pipelines using RabbitMQ for task distributionManage proxy rotation strategies to ensure uninterrupted data collectionOptimize crawler performance and resource utilization on AWS EC2 instancesImplement data storage solutions using PostgreSQL and caching layers with RedisMonitor crawler health, handle errors, and implement retry mechanismsEnsure data quality through validation and normalization processesCollaborate with data engineering and analytics teams to meet data requirementsStay updated on changes to target websites and adapt scrapers accordinglyNice to HaveExperience with containerization (Docker) and orchestration toolsKnowledge of additional AWS services (S3, Lambda, SQS)Familiarity with API development and reverse engineeringExperience with cloud-based scraping services or platformsUnderstanding of legal and ethical considerations in web scrapingOther Infrastructure RequirementsSince this is a completely work-from-home position, you will also require the following -● High-speed internet connectivity for video calls and efficient work.● Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM, andno other obstacles to interrupted, efficient work).● Headphones with clear audio quality.● Stable power connection and backups in case of internet/power failure.