IN.JobDiagnosis logo

Job Title:

Senior Linux Administrator – AI/ML & Data Center Networking

Company: DC Tech Consulting

Location: Tumkur, Karnataka

Created: 2025-12-18

Job Type: Full Time

Job Description:

Location:Remote Experience:7+ Years Type:Full-timeRole OverviewWe are seeking a highly skilledSenior Linux Administrator with strong Data Center Networking expertiseto join our AI/ML infrastructure team. The role focuses on designing, deploying, and operatingon-premises Linux and Kubernetes environments optimized for AI/ML and high-performance computing (HPC) workloads , with asignificant emphasis on data center networking architectures . The ideal candidate will bring hands-on experience acrossLinux systems, Kubernetes, and modern data center networks , including high-speed Ethernet or InfiniBand fabrics used for AI/ML and GPU clusters. This role requires a deep understanding of hownetwork design, latency, throughput, and reliability directly impact AI/ML performance .Key ResponsibilitiesDeploy, configure, and manageon-premises Linux serverssupporting AI/ML and GPU-accelerated workloads.Design, implement, and operatedata center networking for AI/ML infrastructure , including:High-speed Ethernet (25G/40G/100G/400G) or InfiniBand fabricsSpine-leaf architectures and low-latency network designsConfigure and troubleshootKubernetes networking , including CNI plugins (Calico, Cilium, Flannel), service networking, ingress, and network policies.Optimizenetwork performance, latency, and throughputfor distributed training, storage access, and HPC workloads.Work closely with network teams to integrateswitching, routing, VLAN/VXLAN, BGP, and load balancinginto Kubernetes and AI platforms.(Desirable) Automate infrastructure and network provisioning usingAnsible, Terraform, and scripting (Bash/Python) .Administer and monitordata center componentssuch as compute servers, network switches, storage systems, and virtualization platforms.Troubleshoot end-to-end issues spanningLinux OS, Kubernetes and network layers .Ensuresecurity, segmentation, and complianceacross compute and network environments.Plan and implementscalable, highly available architecturesfor AI/ML platforms.Maintain accurate documentation includingnetwork diagrams, IP plans, topology maps, and runbooks .Required Skills & Qualifications7+ years of experiencein Linux system administration (RHEL, Ubuntu, CentOS).Strong hands-on experience withdata center networking , including:L2/L3 networking fundamentals (VLANs, routing, BGP, VXLAN)Spine-leaf architectures and modern DC network designsHigh-bandwidth, low-latency networks for AI/HPC workloadsProven experience managingKubernetes clusters , with solid understanding of Kubernetes networking concepts.Experience integratingcompute, storage, and networkingfor large-scale on-prem or hybrid data centers.Working knowledge ofnetwork performance tuning, packet flow, and troubleshooting tools(tcpdump, iperf, ethtool, etc.).Experience withautomation toolssuch as Ansible, Terraform, and CI/CD pipelines.Proficiency inBash and Python scripting .Strong understanding ofsystem and network performance optimization .Excellent problem-solving and cross-team collaboration skills.Preferred / Good to HaveExperience withNVIDIA GPU networking , GPUDirect, RDMA, or InfiniBand environments.Familiarity withHPC and distributed AI training frameworks .Exposure todata center switchesfrom vendors such as Cisco, Arista, Juniper, NVIDIA (Spectrum), etc.Experience withmonitoring and observabilitytools (Prometheus, Grafana).Knowledge ofhybrid cloud networkingand on-prem to cloud connectivity.CKA (Certified Kubernetes Administrator) or networking certifications (CCNA/CCNP or equivalent) are a plus.

Apply Now

➤
Home | Contact Us | Privacy Policy | Terms & Conditions | Unsubscribe | Popular Job Searches
Use of our Website constitutes acceptance of our Terms & Conditions and Privacy Policies.
Copyright © 2005 to 2025 [VHMnetwork LLC] All rights reserved. Design, Develop and Maintained by NextGen TechEdge Solutions Pvt. Ltd.