Job Title:
Site Reliability Engineer
Company: Xebia
Location: Gurgaon, Haryana
Created: 2025-12-18
Job Type: Full Time
Job Description:
Performance & Reliability Engineer ( Senior, Lead , Principal & Manager) Hybrid Location: Pune, Chennai, Bangalore & Gurgaon Need immediate joiners onlyJob description Role: Performance & Reliability EngineerJob Location: Gurgaon, Chennai, Pune, BangaloreHybridJob Overview: We are seeking a highly skilled and motivatedPerformance & Reliability Engineerto join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications. You will leverage tools such asDynatrace ,CloudWatch , andPythonto monitor and optimize system performance, troubleshoot issues, and enhance the overall reliability of our infrastructure withSRE Best Practices .Key Responsibilities:Performance Monitoring & Optimization: UseDynatraceandCloudWatchto monitor system performance and availability. Implement performance tuning techniques to ensure high availability and optimal system performance. Identify performance bottlenecks and optimize applications and infrastructure for scalability. System Observability AppDynamics and monitoring dashboards. Collaborate with development and operations teams to troubleshoot incidents and provide recommendations for performance improvements. Proactively identify areas of risk and implement preventive measures. Automation & Scripting: Develop automation scripts inPythonto enhance monitoring, incident response, and reporting processes. Write and maintain Python-based tools for proactive monitoring, alerting, and issue resolution. Cloud Monitoring & Alerts: ConfigureCloudWatchfor real-time monitoring and alerting of cloud infrastructure, Develop and manage dashboards to visualize system health and performance metrics. Prepare and present performance reports, incident post-mortems, and improvement recommendations to senior leadership. Chaos Engineering, Fault management Vulnerability identification, Failure simulation, Stress ManagementRequired Skills and Experience: Strong experience with Dynatracefor application performance monitoring and root cause analysis. Proficiency inCloudWatchfor monitoring AWS cloud infrastructure, configuring alerts, and visualizing metrics. Solid understanding ofPythonfor automating tasks, building performance tools, and writing scripts to enhance operations. Experience in analyzing system logs, troubleshooting performance issues, and providing technical recommendations. Hands-on experience with cloud environments (AWS preferred), including development knowledge Experience with load testing and performance benchmarking.About Xebia: