IN.JobDiagnosis logo

Job Title:

Level 3 AWS Infrastructure Support Engineer

Company: Electronikmedia (EM)

Location: Kochi, Kerala

Created: 2025-12-22

Job Type: Full Time

Job Description:

Role OverviewAs a Level 3 AWS Infrastructure Support Engineer, you will own overnight monitoring and response for Electronikmedia’s Clients' AWS-based production environment. You will:- Monitor system health using Datadog and AWS-native tools - Investigate alerts and anomalies using established runbooks - Resolve production incidents when possible - Escalate complex issues quickly and accurately - Maintain clean, auditable incident documentationThis role is ideal for someone who thrives in high-trust, high-impact operational environments.Key ResponsibilitiesOn-Call & Incident Response- Provide initial response within 15 minutes for all high-priority production alerts - Investigate, mitigate, and resolve production outages when feasible - Escalate unresolved or complex issues using the defined escalation matrix - Act as the owner of the production system stabilityMonitoring, Alerting & Observability- Analyze and respond to Datadog monitor alerts across infrastructure and application layers - Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk - Proactively notify stakeholders of significant performance or stability concerns - Contribute insights for preventive and corrective actionsRoot Cause & Trend Analysis- Track recurring alerts and incidents - Provide analysis and recommendations to reduce alert noise and improve system resilience - Participate in weekly validation of Datadog alert configurations and thresholdsCommunication & Documentation- Maintain clear, concise, and timely communication during incidents - Document all incidents, alarms, and observations in Jira during each shift - Ensure handoff notes are complete and actionable for daytime engineering teamsTechnical EnvironmentCore AWS Services- ECS (Fargate) - RDS - ElastiCache - EC2 - Lambda - API Gateway - S3Tooling- Datadog (monitoring, alerts, dashboards) - Jira (incident tracking and documentation)QualificationsExperience- 5+ years of hands-on AWS infrastructure administration and support - Proven experience supporting production-grade, high-availability systems - Strong background in incident response within enterprise or scale-up environmentsSkills- Deep operational knowledge of AWS services and distributed systems - Strong troubleshooting and root-cause analysis skills under tight SLAs - Ability to follow runbooks while also knowing when to think beyond them - Calm, structured decision-making during production incidentsCertifications (Preferred)- AWS Certified Solutions Architect – Associate or Professional - AWS Certified DevOps Engineer – Professional (Nice to Have)Service Level Expectations- Alert Escalation SLA: ≤ 15 minutes for high-priority alarms - Availability: Consistent overnight coverage ( IST Day Shift ) - Reliability: Zero missed critical alerts during assigned coverage windowsDeliverables- Monthly Service Performance Report, including: - Alerts monitored - Incidents resolved - Escalations - SLA adherence metrics - Weekly Datadog Validation, ensuring alert accuracy and functionality

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2025 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.