IN.JobDiagnosis logo

Job Title:

AIOps Engineer

Company: Imaging IQ

Location: Gurgaon, Haryana

Created: 2025-12-19

Job Type: Full Time

Job Description:

DevOps/AIOps Engineer (Platform)Experience: 3–5 YearsAbout the CompanyWe aim to bring about a new paradigm in medical image diagnostics — intelligent, holistic, ethical, explainable, and patient‑centric. We’re looking for innovative problem‑solvers who empathize with clinicians and patients, understand business problems, and can design and deliver reliable, intelligent products.Key Responsibilities·CI/CD for services & models: Own pipelines (GitHub Actions/GitLab CI), environment gates, artifact/version governance (containers, models, SBOMs), safe rollouts & instant rollbacks.·Kubernetes platform (EKS preferred): Operate multi-env clusters; Helm/Kustomize; GitOps (Argo CD/Flux); progressive delivery (canary/blue green/Argo Rollouts/Flagger).·Serving & APIs: Deploy and tune FastAPI services and Triton/ONNX/TensorRT inference; traffic shaping, runtime config, autoscaling signals.·Event-driven orchestration: Build robust consumers/producers on RabbitMQ/ActiveMQ/Kafka with back-pressure, dead-lettering, idempotency, and retry patterns.·Observability & AIOps: Define SLIs/SLOs and error budgets; metrics/logs/traces (Prometheus/Grafana/Loki/Tempo/ELK); intelligent alerting & noise reduction; basic model/data drift hooks.·Security in SDLC: Supply-chain security (image signing/provenance, SBOM scans), SAST/DAST/IaC scanning, policy-as-code (OPA/Gatekeeper), secrets hygiene in pipelines/workloads.·Data/Model platform integration: S3/MinIO for artifacts; integrate model registry (MLflow or similar) into CD; immutable, traceable releases.·Resilience & performance: Capacity planning (incl. GPU), autoscaling (HPA/VPA/KEDA), caching/queue tuning; chaos/game-days; write runbooks and own incident response for platform services.·Developer experience: Golden paths, starter repos, internal Helm charts, docs & enablement to make shipping boring and fast.·FinOps mindset: Cost dashboards, right-sizing, bin-packing, GPU utilization policies, spot vs on-demand strategy.Skills and Qualifications (Required)·3+ years in DevOps/SRE/MLOps with strong Docker & Kubernetes fundamentals.·Production CI/CD expertise; canary/blue-green; artifact & version management.·IaC (Terraform) and GitOps workflows (Argo CD/Flux).·Observability: Prometheus/Grafana; logs/traces with Loki/Tempo/ELK.·Production message queues (RabbitMQ/ActiveMQ/Kafka) with back-pressure & retries.·Cloud experience (AWS/GCP/Azure), EKS preferred; object storage (S3/MinIO); model registries (MLflow or similar).·Security in SDLC and compliance guardrails for PHI-like data (least-privilege IAM, secrets, auditability).·Incident response experience; writing SLIs/SLOs, runbooks, and operating to error budgets.·Scripting for platform tasks (Python/Bash).Preferred·Triton Inference Server, ONNX/TensorRT optimizations; GPU scheduling on K8s (NVIDIA device plugin, MIG, node pools).·Argo Rollouts/Flagger, Karpenter, KEDA; caching layers (Redis/NVCache patterns).·Policy-as-code (OPA/Gatekeeper), image signing (cosign), SBOM tools (syft/grype).·Network savvy for app delivery (ingress, service meshes, egress policies).EducationBE/B.Tech (MS/M.Tech a bonus) or equivalent experience.Location & Work SetupOn-site - Gurugram

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2025 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.