IN.JobDiagnosis logo

Job Title:

Site Reliability Engineer

Company: Xebia

Location: Jodhpur, Rajasthan

Created: 2025-09-04

Job Type: Full Time

Job Description:

We are looking for ahighly skilled AWS Engineer with strong Python development and Chaos Engineering expertiseto design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency of critical systems.Key Responsibilities Cloud Engineering (AWS): Architect, implement, and manage secure, scalable, and cost-efficient AWS infrastructure (EC2, Lambda, EKS, S3, RDS, IAM, CloudFront, etc.). Automate infrastructure provisioning and configuration usingTerraform / CloudFormationand AWS SDKs. Manage containerized workloads (Docker, Kubernetes, EKS). Python Development: Build automation scripts, deployment utilities, and infrastructure tooling usingPython (Boto3, Flask, FastAPI, etc.) . Develop custom monitoring/alerting integrations with APIs, SDKs, and third-party observability platforms. Implement self-healing and resilience-focused automation scripts. Chaos Engineering & Resiliency: Design and executechaos experiments(fault injection, latency, outages, resource failures) to validate system resilience. Use tools likeGremlin, Litmus, Chaos Mesh, or AWS Fault Injection Simulator . Partner with SRE and development teams to defineSLIs, SLOs, and error budgets . Document learnings from chaos tests and improve incident response & recovery playbooks. DevOps & Observability: Build and maintain CI/CD pipelines for automated deployments (Jenkins, GitHub Actions, GitLab CI, AWS CodePipeline). Integrateobservability frameworks(Prometheus, Grafana, ELK/EFK, CloudWatch, Datadog) for monitoring and tracing. Ensure proactive alerting and real-time visibility into system health. Security & Compliance: Apply AWS security best practices for IAM, networking, and data protection. Ensure compliance with internal and external regulatory frameworks (SOC2, ISO, GDPR, etc.).Required Skills & Qualifications 6–10 yearsof experience in Cloud, DevOps, or SRE roles. Strong hands-on expertise in AWS Cloud(certifications preferred: AWS DevOps Engineer / Solutions Architect). AdvancedPython developmentskills for automation and tooling (Boto3 a must). Experience designing and runningchaos experiments(Gremlin, AWS FIS, Litmus, Chaos Mesh, or custom Python-based fault injection). Solid knowledge ofIaC (Terraform / CloudFormation) . Proficiency incontainers & orchestration (Docker, Kubernetes, EKS) . Strong background inmonitoring, observability, and incident management . Familiarity withDevOps toolchain (CI/CD, Git, Jenkins, GitLab, CodePipeline) . Good understanding ofresilient architectures, reliability principles, and disaster recovery .Preferred Skills Knowledge ofGo / Shell scriptingin addition to Python. Experience withchaos testing in production-like environments . Exposure tomulti-cloud or hybrid-cloud environments . Strong problem-solving and debugging skills.What We Offer Opportunity to leadcloud reliability & chaos engineering initiatives . Culture focused onautomation, resilience, and continuous improvement . Growth opportunities through certifications, R&D projects, and leadership roles.

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2025 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.