Job Title:

AI & ML Engineer

Company: Blessing Softtech

Location: Pune, Maharashtra

Created: 2026-04-02

Job Type: Full Time

Job Description:

REQUIREMENTS: Highly motivated, well-read, and a prodigy. • Deep experience with LLM application development — prompt engineering, RAG pipelines, tool/function calling, agent architectures• Hands-on experience with at least 2 of: OpenAI, Anthropic, Google Gemini, Groq, Mistral APIs• Strong understanding of embedding models, vector databases, and retrieval evaluation (precision,recall, MRR, NDCG)• Experience building evaluation frameworks for AI systems — not just accuracy metrics butconversation-level quality assessment• Python proficiency with async programming (asyncio, aiohttp)• Familiarity with real-time audio/voice systems is a strong plus• Experience with LangChain/LangGraph agent patterns is a strong plusROLE/RESPONSIBILITIES:• Build and maintain the evaluation framework for voice and chat agent quality — hallucination rate, tool selection accuracy, conversation success metrics, retrieval precision/recall, and end-to-end taskcompletion rates• Upgrade the RAG pipeline from basic FAISS flat index + bge-small-en-v1.5 to a production-graderetrieval system with hybrid search (semantic + BM25), cross-encoder re-ranking, multi-documentsupport, chunk quality scoring, and dynamic index updates• Design and implement LLM routing intelligence — choosing between 5 configured providers (OpenAI, Groq, Anthropic, Google Gemini, Mistral) based on query complexity, latency requirements, cost constraints, and tool-calling capability• Harden the guardrails system beyond current regex + Llama Guard 3: add topic boundaryenforcement, PII detection/redaction, hallucination detection on RAG responses, and output qualityscoring• Optimize voice pipeline latency end-to-end: STT TTFB, LLM TTFB, TTS TTFB, total round-trip. Profile each provider combination and tune VAD parameters (start/stop thresholds, confidence, min volume) per language• Build prompt engineering infrastructure — version-controlled prompt registry, A/B testing framework for system prompts, and systematic optimization based on eval results• Develop conversation analytics: real-time sentiment tracking, intent classification, conversationoutcome scoring, topic drift detection, and customer satisfaction prediction• Implement human handoff intelligence — frustration detection, repeated failure patterns, scope-boundary detection, handoff summary generationTech stack you will work with• Pipecat AI (real-time voice pipeline with frame processors, VAD, barge-in)• LangChain + LangGraph (chat agent executor, tool calling, multi-agent orchestration)• FAISS + FastEmbed (vector search, local embeddings with BAAI/bge-small-en-v1.5)• Deepgram Nova-3, Google Cloud STT, AssemblyAI (speech-to-text)• ElevenLabs, Cartesia Sonic-3, Google TTS, Deepgram TTS (text-to-speech)• OpenAI GPT-4o, Groq Llama 3.3-70B, Anthropic Claude, Google Gemini 2.0 Flash, Mistral Small(LLMs)• Llama Guard 3 via Groq (content safety), confusables library (homoglyph detection)• MCP (Model Context Protocol) via Pipedream for external tool integrationCompensation: CTC 10L ++

Apply Now

➤