IN.JobDiagnosis logo

Job Title:

Freelance Deep Web Crawler Engineer (AI-Integrated Data Pipeline)

Company: Sixteen Alpha AI

Location: Thane, Maharashtra

Created: 2025-11-19

Job Type: Full Time

Job Description:

About the ProjectWe’re developing a next-generation intelligent web crawling system capable of exploring deep and dynamic web data sources — including sites behind authentication, infinite scrolls, and JavaScript-heavy pages. The crawler will be integrated with an AI-driven pipeline for automated data understanding, classification, and transformation.We’re looking for a highly experienced engineer who has previously built large-scale, distributed crawling frameworks and integrated AI or NLP/LLM-based components for contextual data extraction.Key ResponsibilitiesDesign, develop, and deploy scalable deep web crawlers capable of bypassing common anti-bot mechanisms.Implement AI-integrated pipelines for data processing, entity extraction, and semantic categorization.Develop dynamic scraping systems for sites that rely on JavaScript, infinite scrolling, or APIs.Integrate with vector databases, LLM-based data labeling, or automated content enrichment modules.Optimize crawling logic for speed, reliability, and stealth across distributed environments.Collaborate on data pipeline orchestration using tools like Airflow, Prefect, or custom async architectures.Required ExpertiseProven experience building deep or dark web crawlers (Playwright, Scrapy, Puppeteer, or custom async frameworks).Strong understanding of browser automation, session management, and anti-detection mechanisms. Experience integrating AI/ML/NLP pipelines — e.g., text classification, entity recognition, or embedding-based similarity. Skilled in asynchronous Python (asyncio, aiohttp, Playwright async API). Familiar with database and pipeline systems — PostgreSQL, MongoDB, Elasticsearch, or similar. Ability to design robust data flows that connect crawling → AI inference → storage/visualization.Nice to HaveKnowledge of LLMs (OpenAI, Hugging Face, LangChain, or custom fine-tuned models).Experience with data cleaning, deduplication, and normalization pipelines.Familiarity with distributed crawling frameworks (Ray, Celery, Kafka).Prior experience integrating real-time analytics dashboards or monitoring tools.What We OfferCompetitive freelance pay based on expertise and delivery.Flexible, async-first remote collaboration.Opportunity to shape an AI-first data platform from the ground up.Potential for long-term partnership if the collaboration is successful.

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2025 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.