IN.JobDiagnosis logo

Job Title:

DevOps Architect

Company: Yotta Data Services Private Limited

Location: Panvel, Maharashtra

Created: 2025-12-25

Job Type: Full Time

Job Description:

About YottaYotta Data Services is powering Digital Transformation with Scalable Cloud, Colocation, and Managed Services.Yotta Data Services offers a comprehensive suite of cloud, data center, and managed services designed to accelerate digital transformation for businesses of all sizes. With state-of-the-art infrastructure, cutting-edge AI capabilities, and a commitment to data sovereignty, we empower organisations to innovate securely and efficiently.Job ScopeSenior Cloud Infrastructure/DevOps Solutions Architect to join its Infrastructure Specialist Team. Join the team building many of the largest and fastest AI/HPC systems in the world! We are looking for someone with the ability to work on a dynamic customer-focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large-scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!Total /Relevant Experience- BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields. - At least 5-8 years of professional experience in networking fundamentals, TCP/IP stack, and data center architecture.Key Responsibilities- Maintain large scale HPC/AI clusters with monitoring, logging and alerting Manage Linux job/workload schedulers and orchestration tools. - Develop and maintain continuous integration and delivery pipelines - Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources. - Deploy monitoring solutions for the servers, network and storage. - Perform troubleshooting bottom up from bare metal, operating system, software stack and application level. - Being a technical resource, develop, re-define and document standard methodologies to share with internal teams Support Research & Development activities and engage in POCs/POVs for future improvements. - Knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software. - Extensive knowledge and hands-on experience with Kubernetes, including container orchestration for AI/ML workloads, resource scheduling, scaling, and integration with HPC environments. - Experience in managing and installing HPC clusters, including deployment, optimization, and troubleshooting. - Excellent knowledge of Linux systems (Redhat/CentOS and Ubuntu), including internals, ACLs, OS-level security protections, and common protocols like TCP, DHCP, DNS, etc. - Experience with multiple storage solutions, including Lustre, GPFS, ZFS, and XFS. Familiarity with newer and emerging storage technologies is a plus. - Proficiency in Python programming and bash scripting. - Comfortable with automation and configuration management tools, including Jenkins, Ansible, Puppet/Chef, etc.Good-to-Have Skills- Knowledge of CI/CD pipelines for software deployment and automation. - Knowledge of Kubernetes, container related microservice technologies. - Experience with GPU-focused hardware/software (DGX, CUDA.) - Background with RDMA (InfiniBand or RoCE) fabrics. - K8s and Cloud Certifications would be bonus.Qualifications Criteria- BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields.

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2025 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.