Job Title:
AI Data Engineer
Company: Peak Trust Global Real Estate
Location: Bharatpur, Rajasthan
Created: 2025-11-19
Job Type: Full Time
Job Description:
Location: RemoteType: Full-timeExperience: 3+ Years Salary: up to 60K/Month Role SummaryWe are looking for a hands-on AI Data Engineer who can independently manage end-to-end data workflows, including data collection, document processing, dataset preparation, retrieval pipelines, model fine-tuning, and data visualization.This role requires strong technical skills across Python, automation, ML tooling, and analytical reporting.Key Responsibilities (Technical)1. Data Acquisition & AutomationBuild automated data collection workflows using tools such as Firecrawl, Playwright, Scrapy, or similar frameworksExtract multi-format documents (PDFs, HTML, text, images)Handle large-scale crawling, rate limits, error handling, and scheduling2. Document Processing & TransformationClean and process unstructured documentsApply OCR (Tesseract, PaddleOCR) for scanned filesConvert and structure data using PyPDF2, pymupdf, BeautifulSoup, etc.Prepare data in formats such as JSON, JSONL, or CSV3. Dataset PreparationSegment and structure text for ML trainingCreate Q&A datasets, summaries, instruction-response pairs, and labeled textBuild high-quality datasets compatible with fine-tuning frameworks4. Retrieval & Indexing PipelinesImplement document chunking strategiesGenerate embeddings and manage vector databases (Qdrant, Pinecone, Weaviate)Build retrieval workflows using LangChain or LlamaIndexOptimize retrieval accuracy and latency5. Model Training & Fine-TuningRun fine-tuning jobs using HuggingFace Transformers, LoRA/QLoRA, or similar methodsMonitor training performance and refine datasetsPackage and deploy fine-tuned models6. Data Visualization & AnalyticsCreate analytical charts, trends, and insights using:PandasMatplotlibSeabornPlotlyBuild simple internal dashboards or visual summaries for reportsTransform raw datasets into meaningful visual insights7. Automation & InfrastructureWrite modular, maintainable Python scriptsContainerize workflows with DockerMaintain version control with GitEnsure reproducibility and pipeline stabilityRequired Technical SkillsStrong proficiency in PythonExperience with Firecrawl, Playwright, Scrapy, or similar toolsStrong background in document parsing, text processing, and OCRFamiliarity with LangChain or LlamaIndexExperience with vector databasesHands-on experience with HuggingFace, Transformer models, and fine-tuningAbility to write clean, efficient data pipelinesExperience with Matplotlib, Seaborn, Plotly, or other visualization toolsComfort using Docker and GitNice to HaveExperience serving models or building small APIs (FastAPI)Exposure to GPU training environmentsBackground in large-scale unstructured data workAbility to create lightweight dashboards (Plotly Dash, Streamlit)Ideal CandidateComfortable owning full pipelines independentlyDetail-oriented and analyticalStrong problem-solving abilityCan work with minimal supervisionEnjoys building structured systems from scratch