Job Title:
NLP Engineer
Company: BigStep Technologies
Location: Udaipur, Rajasthan
Created: 2026-03-31
Job Type: Full Time
Job Description:
ROLE SUMMARYWe are hiring a hands-on NLP Engineer to build robust pipelines that convert policy, regulatory, fintech, and healthcare documents into structured, graph-ready data. You will own the full extraction lifecycle from raw text to clean, schema-validated outputs using classical NLP, deep learning, and LLM APIs.KEY RESPONSIBILITIESPipeline Development: Design and build end-to-end text extraction pipelines for policy, regulatory, fintech, and healthcare documentsEntity & Clause Extraction: Extract key entities (countries, companies, minerals) and structure policy clauses and obligationsDeep Learning & Transformers: Fine-tune BERT / RoBERTa for NER, text classification, and relation extraction tasksLLM Integration: Leverage LLM APIs with structured output extraction, prompt engineering, and tool/function callingData Engineering: Build scalable Python pipelines for high-volume document processing with robust pre-processing for PDF, DOCX, and HTMLSchema & Graph Readiness: Define and enforce JSON schemas; ensure outputs are clean and compatible with knowledge graph ingestionAccuracy Improvement: Evaluate model performance, track metrics, and implement feedback loops to improve extraction quality over timeREQUIRED SKILLS3–5 years hands-on NLP engineering real production pipelines, not just model experimentsStrong Python skills: OOP, async programming, packaging, and testingNLP frameworks: spaCy, HuggingFace Transformers, NLTKDeep learning: fine-tuning transformer models for sequence labeling and classificationLLM API integration: prompt engineering, structured outputs, and function/tool callingData pipeline experience: ETL, batch processing, and text pre-processing at scaleJSON schema design and validation using pydantic or json schemaGOOD TO HAVEExperience with legal, regulatory, or policy documents (contracts, compliance filings, government publications)Familiarity with knowledge graphs or graph databases (Neo4j, RDF)Document parsing tools: pdfplumber, Docling, Apache TikaDomain knowledge in fintech or healthcare NLPExposure to information extraction benchmarks (CoNLL, DocRED, SciERC)