Job Title:

MLOps Engineer- Billion Dollar US Enterprise Software - Hiring in India!

Company: CareerXperts Consulting

Location: Belgaum, Karnataka

Created: 2025-10-27

Job Type: Full Time

Job Description:

Role Focus: Production ML Systems | GPU Orchestration | Inference at ScaleWhat You'll Actually Do (Not Buzzwords)Infrastructure That Doesn't BreakDesign and maintain the backbone for training, fine-tuning, and deploying ML models that actually work in productionOrchestrate GPU workloads on Kubernetes (EKS) with node autoscaling, intelligent bin-packing, and cost-aware scheduling (spot instances, preemptibles—you know the drill)Build CI/CD pipelines that handle ML code, data versioning, and model artifacts like a well-oiled machine (GitHub Actions, Argo Workflows, Terraform)Production ML, Not Science ProjectsPartner with Data Scientists and ML Engineers to turn Jupyter notebooks into production-grade systemsDeploy and scale inference backends (vLLM, Hugging Face, NVIDIA Triton) that serve real trafficOptimize GPU utilization because every idle A100 hour is money burningBuild observability that actually tells you why things broke (Prometheus, Grafana, OpenTelemetry)Ship Fast, Sleep WellCreate tooling for seamless model deployment, instant rollback, and A/B testingLead incident response when production AI systems decide to have opinionsWork with security and compliance teams to implement best practices without slowing down innovationWhat We're Really Looking ForMust-Haves (No Negotiation)5+ years in MLOps, infrastructure, or platform engineering—you've been in the trenchesProduction ML experience: At least one project that's serving real users, not a Kaggle competitionKubernetes expertise with GPUs: You understand taints, tolerations, affinity rules, and why GPU scheduling is its own special hellCloud-native architecture (AWS preferred): You think in VPCs, IAM roles, and cost optimizationTraining pipeline experience: Set up or scaled training/fine-tuning for ML models in production (PyTorch Lightning, Hugging Face Accelerate, DeepSpeed)IaC fluency: Terraform, Helm, Kustomize are second naturePython engineering skills: You can debug a distributed training failure and fix itInference scaling: You've deployed and scaled inference workloads and lived to tell the taleThe "We're Very Interested" SignalsYou mention scaling inference and we can see the fire in your eyesYou've used MLflow, W&B, or SageMaker Experiments and have opinions on which is bestYou understand CI/CD for ML and why it's different from regular softwareYou've built monitoring systems that caught issues before users didNice to Have (But Seriously Nice)GPU scheduling wizardry in KubernetesModel drift monitoring and versioning toolsLow-latency inference optimization (quantization, FP8, TensorRT—the good stuff)Experience in compliance or regulated industries where "just ship it" isn't an optionWhat Makes This Role DifferentOwnership. You're not a ticket-taker or a consultant passing through. You'll own infrastructure that powers real AI products, make architectural decisions that matter, and have the autonomy to build things the right way.Impact. Your work directly affects model training speed, inference latency, GPU costs, and system reliability. You'll see the results of your optimizations in dollars saved and milliseconds gained.Quality over speed. We value security, operational excellence, and sustainable systems. No "move fast and break things" chaos here—we move deliberately and build things that last.The Reality CheckThis role is not for you if:You prefer working on proofs-of-concept over production systemsYou think "it works on my machine" is an acceptable answerYou haven't shipped ML systems to productionYou're looking for pure research or pure DevOps (this is the intersection)This role is for you if:You get excited about making GPUs go brrr efficientlyYou've been oncall for ML systems and learned hard lessonsYou believe infrastructure is a product, not an afterthoughtYou want to build the foundation for AI that actually worksWrite to MLOps@ to get connected!

Apply Now

➤