Job Title:
Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)
Company: Sixteen Alpha AI
Location: New Delhi, Delhi
Created: 2025-12-04
Job Type: Full Time
Job Description:
About the ProjectWe’re developing anext-generation intelligent web crawling systemcapable of exploringdeep and dynamic web data sources— including sites behind authentication, infinite scrolls, and JavaScript-heavy pages. The crawler will be integrated with anAI-driven pipelinefor automated data understanding, classification, and transformation. We’re looking for ahighly experienced engineerwho has previouslybuilt large-scale, distributed crawling frameworksandintegrated AI or NLP/LLM-based componentsfor contextual data extraction.Key ResponsibilitiesDesign, develop, and deployscalable deep web crawlerscapable of bypassing common anti-bot mechanisms. ImplementAI-integrated pipelinesfor data processing, entity extraction, and semantic categorization. Developdynamic scraping systemsfor sites that rely on JavaScript, infinite scrolling, or APIs. Integrate withvector databases , LLM-based data labeling, or automated content enrichment modules. Optimize crawling logic forspeed, reliability, and stealthacross distributed environments. Collaborate ondata pipeline orchestrationusing tools like Airflow, Prefect, or custom async architectures. Required ExpertiseProven experience buildingdeep or dark web crawlers(Playwright, Scrapy, Puppeteer, or custom async frameworks). Strong understanding ofbrowser automation, session management, and anti-detection mechanisms . Experience integratingAI/ML/NLP pipelines— e.g., text classification, entity recognition, or embedding-based similarity. Skilled inasynchronous Python(asyncio, aiohttp, Playwright async API). Familiar withdatabase and pipeline systems— PostgreSQL, MongoDB, Elasticsearch, or similar. Ability to designrobust data flowsthat connect crawling → AI inference → storage/visualization.Nice to HaveKnowledge ofLLMs (OpenAI, Hugging Face, LangChain, or custom fine-tuned models) . Experience withdata cleaning, deduplication, and normalization pipelines . Familiarity withdistributed crawling frameworks (Ray, Celery, Kafka) . Prior experience integratingreal-time analytics dashboardsor monitoring tools.What We OfferCompetitive freelance pay based on expertise and delivery. Flexible, async-first remote collaboration. Opportunity toshape an AI-first data platformfrom the ground up. Potential forlong-term partnershipif the collaboration is successful.