Job Title:

Security Expert - AI Benchmark & Vulnerability Annotation

Company: MillionLogics

Location: Anantapur, Andhra Pradesh

Created: 2026-05-17

Job Type: Full Time

Job Description:

Company Description MillionLogics, a trusted Oracle Partner, is a global IT solutions leader with offices in London, UK, and a development hub in Hyderabad, India. We specialise in driving digital transformation through scalable and innovative solutions, focusing on Data & AI, Cloud, IT consulting, security, and Oracle technologies. Backed by a team of over 55 AI experts, we prioritise delivering tailored, result-driven solutions that empower enterprises to excel. Committed to innovation and excellence, MillionLogics is at the forefront of helping businesses adapt and thrive in a rapidly evolving digital landscape. For more insights into our services and leadership, visit our website: MillionLogics.Role Description We are looking for a Senior Security Expert (LLM Benchmark & AI Safety) to help design, build, and validate a high-difficulty cybersecurity benchmark targeting frontier AI model evaluation. This is a dual-mandate role: you will both architect challenging, real-world security scenarios for the benchmark and serve as a human annotator, verifying that included vulnerabilities are technically sound, genuinely exploitable, and represent a high-value signal for leading AI labs such as Anthropic, Google DeepMind, and OpenAI.This role sits at the intersection of offensive security, security research, and AI safety evaluation. You will work closely with ML and data teams to ensure the benchmark reflects the complexity, nuance, and adversarial depth required for evaluating frontier models.Offer Details: Mode of work: Fully Remote Pay: INR 2 lakhs to 2.25 Lakhs Per month (net/take-home) Duration of Contract: 12 months Number of positions: 5 Experience: 7+What does day-to-day look like Benchmark Design & Example Creation Design and develop complex, multi-step security challenges across: Application security Cloud misconfigurations Binary exploitation Cryptographic weaknesses API abuse Supply chain attacks Map scenarios to frameworks such as MITRE ATT&CK, OWASP Top 10, and OWASP ASVS Create challenges that distinguish surface-level pattern matching vs deep security reasoning in LLMs Develop grading rubrics and ground-truth solutions, including partial-credit logic Ensure coverage across multiple difficulty tiers to benchmark model capability progressionHuman Annotation & Vulnerability Verification Review and annotate benchmark samples across: Validity – Is the vulnerability technically correct? Reachability – Can it realistically be triggered? Exploitability – What effort and primitives are required? Clarity – Is the challenge unambiguous yet non-trivial? Flag unrealistic assumptions, inaccuracies, or oversimplifications Evaluate whether samples provide a high-value signal for AI safety and capability evaluationsQuality & Safety Review for Lab Submission Apply dual-use risk filtering to prevent real-world misuse while maintaining technical depth Produce structured metadata: Difficulty rating Domain category Required attacker knowledge Recommended use (capability eval, red-teaming, safety eval, fine-tuning) Collaborate with AI lab evaluation teams to refine benchmark quality and coverageSecurity Architecture Input Advise on secure infrastructure for: Benchmark hosting Sample storage Model evaluation pipelines Review tooling and agentic evaluation frameworks from a security perspectiveRequired Skills and Experience 7+ years in offensive or applied security roles (penetration testing, red teaming, vulnerability research, application security) Proven ability to identify, reproduce, and document real vulnerabilities across: Web applications Cloud environments APIs Systems-level software Strong knowledge of: MITRE ATT&CK, CVE/CVSS methodologies, Exploit development fundamentals Deep understanding of what makes a security challenge genuinely difficult vs superficially complex Experience writing structured technical documentation (vulnerability reports, threat models, risk assessments) Ability to work in high-volume annotation/review pipelines with consistent judgmentAdditional Details Commitments Required: 40 hours per week with an overlap of 4 hours with PST. Engagement Type: Contractor assignment (no medical/paid leave) Duration of contract: 12 months; [expected start date is next week]Evaluation Process Technical Interview (60 mins)How to Apply?Please send us your updated CV towith email subject:Security Expert - AI

Apply Now

➤