Job Title:

Artificial Intelligence Engineer

Company: TerraGiG

Location: Pune, Maharashtra

Created: 2026-01-28

Job Type: Full Time

Job Description:

ROLE: GENAIExperience: 5+ expWork Mode: onsiteThe client is looking for a GenAI Engineer who can design and deploy end-to-end solutions. Key focus areas include Vector Embeddings, RAG Pipelines, and FastAPI integration. Please review the full Job Description below and focus your preparation on the "Deep-Dive" topics provided.Full Job DescriptionRole OverviewWe are seeking a GenAI Engineer to design, develop, and deploy Generative AI solutions that enhance business workflows and user experiences. The ideal candidate will have strong expertise in LLMs (Large Language Models), prompt engineering, and integration of AI services into scalable applications.Key ResponsibilitiesModel Integration: Implement/fine-tune LLMs; build APIs/microservices for GenAI features.Prompt Engineering: Design, optimize, and evaluate prompts for safety and accuracy.RAG (Retrieval-Augmented Generation): Develop pipelines for document ingestion, vector embeddings, and semantic search.App Dev: Integrate GenAI into web/mobile apps using FastAPI, Streamlit, or React.Optimization: Monitor token usage, latency, and inference costs.Safety: Implement moderation, bias detection, and responsible AI guidelines.Required SkillsPython (FastAPI, Flask, Django), LLM APIs (OpenAI, Azure), Vector DBs (Pinecone, Weaviate, FAISS).Cloud (AWS/Azure/GCP), Docker/K8s, ML fundamentals (embeddings, tokenization).Real-time AI (SSE/WebSockets).Preferred SkillsLangChain, LlamaIndex, Image models (Stable Diffusion), MLOps, CI/CD.Technical Deep-Dive: Vector EmbeddingsSince the JD specifically asks for knowledge of embeddings and vector databases, your engineers should be prepared to answer the following:1. Conceptual UnderstandingWhat are they? They are high-dimensional numerical representations of data (text, images, audio). Unlike keyword search, embeddings capture semantic meaning.Dimensionality: Be familiar with common sizes (e.g., OpenAI’s text-embedding-3-small is 1536-dimensional).Distance Metrics: Know when to use Cosine Similarity (directional similarity) vs. Euclidean Distance (magnitude-based) vs. Dot Product.2. Implementation ChallengesChunking: How to break a 100-page PDF into chunks so the embedding captures context without losing detail.Normalization: Why we normalize vectors to unit length before storing them (crucial for Cosine Similarity performance).Matryoshka Embeddings: (Advanced 2026 topic) Being able to explain how to shorten vectors (e.g., from 3072 to 256) without losing significant accuracy to save on storage costs.Suggested Preparation TopicsPillar 1: The RAG PipelineIndexing: The flow from Document -> Chunking -> Embedding -> Vector DB.Retrieval: Explain Top-K retrieval and how to use "Re-ranking" models (like Cohere Rerank) to improve the quality of the top results.Pillar 2: Engineering (The "Developer" part)FastAPI: Be ready to code a basic endpoint that takes a user query and returns a streamed response using StreamingResponse.Streaming (SSE): Explain why we use SSE for LLMs (to reduce "perceived latency" for the user).Pillar 3: Evaluation & OperationsLLM-as-a-Judge: Using a stronger model (GPT-4o) to grade the outputs of a smaller model.Token Management: How to implement a "sliding window" or "summary-based memory" to keep context without hitting token limits or high costs.

Apply Now

➤