Job Title:

AI Ops / MLOps Engineer

Company: TnT Techies Guide

Location: Kochi, Kerala

Created: 2026-03-05

Job Type: Full Time

Job Description:

Company DescriptionTnT Techies Guide is a leading training and consulting firm specializing in technology-focused solutions. Known for its comprehensive guides, hands-on training, and strategic consulting services, the company empowers tech enthusiasts, professionals, and businesses to navigate the evolving digital landscape. With tailored programs spanning various technology domains, TnT Techies Guide equips clients with practical skills and up-to-date insights. Dedicated to excellence and innovation, the firm is a trusted partner for those seeking to thrive in the fast-paced world of technology.Role DescriptionThis is a full-time remote role for an AI Ops / MLOps Engineer. The AI Ops / MLOps Engineer will be responsible for deploying and managing machine learning models, maintaining and troubleshooting ML pipelines, and ensuring systems are running smoothly and efficiently. Daily tasks include monitoring infrastructure, identifying performance issues, collaborating with cross-functional teams, and optimizing systems to improve scalability and reliability.Key Responsibilities• Design and manage scalable ML infrastructure on AWS, Azure, or GCP to support model training, experimentation, and production inference workloads.• Build and maintain automated ML pipelines for data ingestion, feature engineering, model training, validation, and deployment using CI/CD and GitOps principles.• Deploy and operate machine learning models using Kubernetes, Kubeflow, MLflow, SageMaker, Vertex AI, or Azure ML in production environments.• Implement model versioning, model registry, artifact management, and experiment tracking to enable traceability and reproducibility.• Develop monitoring solutions for model performance, drift detection, data quality validation, and inference latency tracking.• Implement observability for AI workloads using Prometheus, Grafana, OpenTelemetry, and centralized logging platforms.• Optimize GPU and CPU resource utilization, autoscaling strategies, and distributed training or inference workloads.• Enforce secure deployment practices including IAM controls, secrets management, encryption, network segmentation, and compliance alignment.• Automate ML infrastructure provisioning using Terraform or Infrastructure as Code frameworks.• Collaborate with data scientists, AI researchers, and platform teams to transition models from experimentation to reliable, highly available production systems.Required Qualifications• 5+ years of experience in Cloud Engineering, DevOps, SRE, or MLOps roles.• Strong hands-on experience with AWS, Azure, or GCP cloud environments.• Production-level Kubernetes experience with containerized ML workloads.• Experience with ML lifecycle tools such as MLflow, Kubeflow, SageMaker, Vertex AI, or Azure ML.• Proficiency in Python and familiarity with ML frameworks such as TensorFlow, PyTorch, or Scikit-learn.• Experience building CI/CD pipelines for ML workflows and automation.• Strong understanding of distributed systems, APIs, and cloud-native architecture.Preferred Qualifications• Experience operating LLM or Generative AI platforms in production environments.• Knowledge of vector databases such as Pinecone, Weaviate, or FAISS.• Experience managing GPU-based workloads and NVIDIA environments.• Familiarity with feature stores and data platforms such as Snowflake, BigQuery, or Databricks.

Apply Now

➤