IN.JobDiagnosis logo

Job Title:

Director of AI Data

Company: CiteWorks Studio

Location: Pune, Maharashtra

Created: 2026-03-15

Job Type: Full Time

Job Description:

Role - Director of AI DataCiteWorks Studio is hiring a Director of AI Data to lead the development of datasets and data infrastructure used to study how large language models retrieve information, generate answers, and cite sources.This leadership role focuses on building large-scale data pipelines that collect and analyze AI responses across systems such as ChatGPT, Claude, Gemini, Perplexity, and open-source large language models.What is AI Data Infrastructure?AI data infrastructure refers to the systems used to collect, process, organize, and analyze the data that powers machine learning and artificial intelligence models.For large language models, AI data infrastructure may include:• prompt-response datasets• model evaluation datasets• citation extraction pipelines• retrieval benchmarking datasets• large-scale training data collectionsThese systems allow researchers to study how AI models generate answers and retrieve knowledge.What Does a Director of AI Data Do?A Director of AI Data leads the strategy and development of data systems used for machine learning research and AI analysis.The role focuses on building the datasets and pipelines required to analyze the behavior of large language models.This includes developing systems that collect and structure:• AI-generated responses• prompt testing datasets• citation data• entity recognition signals• generative search outputsThe Director ensures that researchers and engineers have the data needed to analyze how AI systems retrieve, synthesize, and cite information.About CiteWorks StudioCiteWorks Studio is an AI research and generative engine optimization (GEO) firm focused on understanding how large language models retrieve and cite information.Modern AI systems such as ChatGPT, Gemini, Claude, and Perplexity increasingly function as the primary interface for information discovery. Instead of ranking links like traditional search engines, these systems generate answers by retrieving and synthesizing knowledge from multiple sources.CiteWorks Studio studies this transformation and helps organizations understand:• how AI systems determine trusted sources• how citation patterns appear inside AI-generated answers• how knowledge graphs influence model responses• how organizations become trusted references in generative search systemsOur research focuses on AI citation intelligence, generative search benchmarking, and LLM retrieval systems.Key ResponsibilitiesThe Director of AI Data will lead the development of large-scale datasets used to analyze how generative AI systems behave.Responsibilities include:• building data pipelines that collect AI responses across multiple LLM platforms• designing datasets used to benchmark generative AI systems• developing systems that extract citations from AI-generated answers• creating structured datasets used to analyze retrieval patterns• managing prompt testing datasets used in AI evaluation• collaborating with machine learning researchers and engineers to support AI benchmarking systemsThe role also involves developing the data infrastructure needed to analyze AI citation behavior and generative search systems at scale.Why AI Data Infrastructure MattersLarge language models generate answers by retrieving and synthesizing information from large datasets and external knowledge sources.Understanding how these systems behave requires structured datasets that capture:• model responses across prompts• citations included in AI answers• variability between models• hallucination patterns• knowledge retrieval behaviorAI data infrastructure enables researchers to analyze how generative AI systems retrieve and use information.Data Systems This Role Will BuildThe Director will help design data systems used to analyze the behavior of AI models.Prompt Response DatasetsLarge collections of prompts and AI-generated answers used to study model behavior.Citation Extraction SystemsPipelines that identify and record sources cited inside AI-generated responses.Retrieval Benchmark DatasetsDatasets used to analyze how AI models retrieve information from different sources.Cross-Model Comparison DataData used to compare outputs from multiple AI systems.Knowledge Graph Signal DatasetsStructured datasets used to analyze how entities and sources appear in AI responses.QualificationsRequired• 8+ years experience in data engineering, machine learning infrastructure, or AI systems• experience building large-scale data pipelines or ML datasets• strong understanding of large language models and AI systems• experience working with distributed data systems and large datasets• ability to lead technical data teams and collaborate with researchersPreferred• experience building datasets for machine learning evaluation or benchmarking• familiarity with retrieval augmented generation (RAG) systems• experience analyzing large language model outputs or AI-generated responses• background in NLP or information retrieval systemsWhy Join CiteWorks StudioThis role sits at the frontier of AI search research and generative AI systems.The Director of AI Data will build the infrastructure needed to analyze millions of AI-generated responses and study how models retrieve and cite information.As generative AI becomes the primary interface for information discovery, understanding AI data pipelines and retrieval behavior will become increasingly important.Key TermsLarge Language Model (LLM)A machine learning model trained on massive datasets that can generate text, answer questions, and perform reasoning tasks.AI Data InfrastructureThe systems used to collect, process, and organize data used by machine learning models and AI research.Generative SearchA form of search where AI systems generate answers by synthesizing information instead of returning ranked links.AI Citation IntelligenceThe analysis of how frequently specific sources appear in AI-generated responses.

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2026 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.