Job Title: Principal Site Reliability Engineer (Principal SRE)Experience: 7+ YearsLocation: HyderabadEmployment Type: Full-TimeAbout the RoleWe are seeking an experienced Principal Site Reliability Engineer (SRE) to provide technical leadership and strategic direction for reliability, scalability, and operational excellence across our technology platforms. This role combines deep technical expertise, people leadership, and operational strategy, and serves as a key bridge between SRE teams and broader engineering, product, and business units.As a Principal SRE, you will champion reliability engineering best practices, lead high-impact initiatives, mentor senior engineers, and drive long-term improvements in system availability, performance, and resilience.Key ResponsibilitiesTechnical Leadership & Reliability Engineering- Provide hands-on technical leadership across reliability, availability, scalability, and performance engineering initiatives. - Define and evolve SRE best practices, standards, and operational playbooks. - Lead initiatives to improve system reliability, uptime, latency, and efficiency across platforms. - Guide architectural decisions to ensure systems are resilient, observable, and fault-tolerant.Operational Excellence- Champion operational excellence by driving improvements in monitoring, alerting, incident response, and capacity planning. - Establish and track SLIs, SLOs, and error budgets to balance reliability with feature delivery. - Lead incident management, root cause analysis (RCA), and post-incident reviews to prevent recurrence. - Drive automation initiatives to reduce toil and improve operational efficiency.Leadership & People Development- Provide mentorship, coaching, and career guidance to SRE Engineers and Senior SRE Engineers. - Foster a culture of accountability, learning, and engineering excellence. - Partner with engineering managers to support team growth, performance, and succession planning.Cross-Functional Collaboration- Act as a diplomatic liaison between the SRE organization and application engineering, platform, security, and product teams. - Align reliability goals with broader organizational priorities and business outcomes. - Influence stakeholders through strong communication, data-driven insights, and technical credibility.Risk Management & Crisis Response- Lead risk assessment and proactive identification of reliability and operational risks. - Own crisis management during high-severity incidents, ensuring calm, structured, and effective response. - Drive preventative strategies through chaos engineering, resilience testing, and failure simulations.Strategy & Long-Term Planning- Apply strategic thinking to define long-term reliability roadmaps and operational improvements. - Partner with leadership to align SRE investments with long-term platform and business goals. - Continuously evaluate tools, technologies, and processes to support scalable growth.Required Skills & QualificationsExperience- 7+ years of professional experience in Site Reliability Engineering, DevOps, Platform Engineering, or related roles. - Proven experience leading large-scale, distributed systems in production environments.Technical Expertise- Exceptional technical proficiency within modern cloud-native and enterprise technology stacks. - Strong knowledge of system design, observability, incident management, and automation. - Experience with monitoring, logging, alerting, and reliability tooling. - Strong understanding of CI/CD pipelines, infrastructure automation, and operational workflows.Leadership & Soft Skills- Strong leadership and people management skills. - Excellent communication, collaboration, and stakeholder management abilities. - Proven ability to influence without authority and drive cross-team alignment. - Adept at risk assessment, decision-making, and crisis management under pressure.Project & Program Management- Advanced project and initiative management capabilities. - Ability to lead multiple high-impact initiatives in parallel while maintaining operational stability.Preferred / Nice-to-Have- Experience implementing SRE practices at enterprise scale. - Familiarity with compliance, security, and governance requirements in large organizations. - Experience driving cultural transformation toward reliability-first engineering.
Job Title
Principal Site Reliability Engineer