Job Title:

Security Expert - AI Benchmark & Vulnerability Annotation

Company: MillionLogics

Location: Belgaum, Karnataka

Created: 2026-05-16

Job Type: Full Time

Job Description:

Company DescriptionMillionLogics, a trusted Oracle Partner, is a global IT solutions leader with offices in London, UK, and a development hub in Hyderabad, India. We specialise in driving digital transformation through scalable and innovative solutions, focusing on Data & AI, Cloud, IT consulting, security, and Oracle technologies. Backed by a team of over 55 AI experts, we prioritise delivering tailored, result-driven solutions that empower enterprises to excel. Committed to innovation and excellence, MillionLogics is at the forefront of helping businesses adapt and thrive in a rapidly evolving digital landscape. For more insights into our services and leadership, visit our website: MillionLogics.Role DescriptionWe are looking for a Senior Security Expert (LLM Benchmark & AI Safety) to help design, build, and validate a high-difficulty cybersecurity benchmark targeting frontier AI model evaluation.This is a dual-mandate role: you will both architect challenging, real-world security scenarios for the benchmark and serve as a human annotator, verifying that included vulnerabilities are technically sound, genuinely exploitable, and represent a high-value signal for leading AI labs such as Anthropic, Google DeepMind, and OpenAI.This role sits at the intersection of offensive security, security research, and AI safety evaluation. You will work closely with ML and data teams to ensure the benchmark reflects the complexity, nuance, and adversarial depth required for evaluating frontier models.Offer Details: Mode of work: Fully RemotePay: INR 2 lakhs to 2.25 Lakhs Per month (net/take-home)Duration of Contract: 12 monthsNumber of positions: 5Experience: 7+What does day-to-day look likeBenchmark Design & Example CreationDesign and develop complex, multi-step security challenges across: Application securityCloud misconfigurationsBinary exploitationCryptographic weaknessesAPI abuseSupply chain attacksMap scenarios to frameworks such as MITRE ATT&CK, OWASP Top 10, and OWASP ASVSCreate challenges that distinguish surface-level pattern matching vs deep security reasoning in LLMsDevelop grading rubrics and ground-truth solutions, including partial-credit logicEnsure coverage across multiple difficulty tiers to benchmark model capability progressionHuman Annotation & Vulnerability VerificationReview and annotate benchmark samples across:Validity – Is the vulnerability technically correct?Reachability – Can it realistically be triggered?Exploitability – What effort and primitives are required?Clarity – Is the challenge unambiguous yet non-trivial?Flag unrealistic assumptions, inaccuracies, or oversimplificationsEvaluate whether samples provide a high-value signal for AI safety and capability evaluationsQuality & Safety Review for Lab SubmissionApply dual-use risk filtering to prevent real-world misuse while maintaining technical depthProduce structured metadata: Difficulty ratingDomain categoryRequired attacker knowledgeRecommended use (capability eval, red-teaming, safety eval, fine-tuning)Collaborate with AI lab evaluation teams to refine benchmark quality and coverageSecurity Architecture InputAdvise on secure infrastructure for: Benchmark hostingSample storageModel evaluation pipelinesReview tooling and agentic evaluation frameworks from a security perspectiveRequired Skills and Experience7+ years in offensive or applied security roles (penetration testing, red teaming, vulnerability research, application security)Proven ability to identify, reproduce, and document real vulnerabilities across:Web applicationsCloud environmentsAPIsSystems-level softwareStrong knowledge of: MITRE ATT&CK, CVE/CVSS methodologies, Exploit development fundamentalsDeep understanding of what makes a security challenge genuinely difficult vs superficially complexExperience writing structured technical documentation (vulnerability reports, threat models, risk assessments)Ability to work in high-volume annotation/review pipelines with consistent judgmentAdditional DetailsCommitments Required: 40 hours per week with an overlap of 4 hours with PST. Engagement Type: Contractor assignment (no medical/paid leave)Duration of contract: 12 months; [expected start date is next week]Evaluation ProcessTechnical Interview (60 mins)How to Apply? Please send us your updated CV to with email subject: Security Expert - AI

Apply Now

➤