Job Title:
Infrastructure Diagnostics & Support Engineer (Remote, US EST) — XBE
Company: XBE
Location: Nashik, Maharashtra
Created: 2026-05-15
Job Type: Full Time
Job Description:
About XBEXBE builds business-operations software for heavy-materials, logistics, and construction companies that should be generating far more profit than they do today. We are a US-based company with a fully remote, global team built on ownership, collaboration, and clear outcomes.The roleYou will be the engineer responsible for the integrity of our production system during US business hours. You will own the front line of operational issues — triaging incoming tickets, diagnosing infrastructure and platform problems, driving them to resolution, and converting recurring pain points into permanent fixes.This is not a junior support role. It is an engineering position for someone who is faster, calmer, and more curious than the average operator under pressure — and who would rather eliminate a recurring class of issues than respond to the same ticket twice.Schedule and coverageFull-time role aligned to US Eastern Time business hours — primary responder for production issues and customer-impacting tickets during your window.What you will ownTicket triage and diagnosis — receiving issues from customers, internal teams, and monitoring systems, and arriving at root cause quickly.Production incident response — primary responder during US hours, with the judgment to escalate appropriately and the discipline to author thorough post-incident reviews.Infrastructure diagnostics — application logs, database health, queue backlogs, deploy and rollback state, third-party integrations, network, and DNS.Operational hygiene — runbooks, dashboards, alert tuning, and the unglamorous work that shortens every subsequent incident.Recurrence prevention — converting recurring tickets into code fixes, monitoring improvements, or process changes that keep them from returning.Customer-visible communication — clear status updates, accurate ETAs, and straightforward language.You are a fit ifYou have operated production systems and can walk through real incidents you handled — what broke, how you isolated it, how you resolved it, and what you changed afterward.You are at ease in the shell, in application logs, and at a database with a SQL prompt open.You read stack traces and query plans with confidence.You have a strong instinct for whether the cause is in the application, the database, the queue, the deploy, or the network — and you can demonstrate it.You communicate clearly under pressure. Customers and teammates understand the situation from a single update.You prefer resolving root cause to closing the ticket.Nice to haveExperience with PaaS environments (Render, Heroku, AWS, or similar) and container-based deployments.Postgres operations — slow query triage, indexes, replication, connection pooling.Background-job and queue debugging — throughput, failures, worker scaling.Observability stacks — error tracking, APM, structured logs, alerting.Ruby on Rails or Node.js production experience.Prior experience as an on-call engineer, SRE, or production support engineer at a SaaS company.How we work: AI-first, rigorously verifiedYou will use AI agents to accelerate investigation, draft runbooks, and propose fixes — and you will verify every claim against logs, queries, and runtime behavior.Operational playbooks and rigorous diagnostics are treated as core engineering work, not afterthoughts.What success looks likeMean time to diagnosis and resolution drops measurably during your coverage window.Recurring ticket categories shrink because you addressed the cause, not just the symptom.Customers and internal teams trust your updates because they are accurate, timely, and free of jargon.On-call quality of life improves measurably for the rest of the team.What you will getStability and long-term runway — an established, growing company with a steady platform and clear priorities.High ownership and trust — clear outcomes, minimal bureaucracy, and no micromanagement.An AI-forward engineering culture — we expect you to use AI coding agents as a force multiplier for investigation, diagnosis, and operational playbooks. If you find the shift to AI-augmented engineering energizing rather than concerning, you will thrive here.Substantive technical growth — you will deepen your expertise in reliability, observability, and AI-native operational workflows alongside engineers who take craft seriously.