```html About the Company: UptimeAI is leading the way in predictive analytics and AI-driven solutions to optimize operational uptime and reduce downtime for industrial and enterprise clients. Our innovative platform harnesses cutting-edge data science to deliver actionable insights, ensuring maximum efficiency and reliability. UptimeAI uniquely combines Artificial Intelligence with Subject Matter Knowledge from 200+ years of cumulative experience to explain interrelations across upstream/downstream equipment, adapt to changes, identify problems, and give prescriptive diagnosis like a human expert would. About the Role: We're looking for a highly skilled and hands-on ML End-to-End Engineer. You'll need deep practical experience across the entire machine learning lifecycle, from data ingestion and model development to robust backend integration and scalable production deployment. We want someone who not only understands the theory but can demonstrate significant real-world application and problem-solving at every stage of ML product development. Responsibilities: ML Model Development & Optimization: Algorithm Proficiency: Proven experience designing, training, and optimizing diverse ML models, including strong expertise in supervised learning, and significant practical experience with unsupervised learning (e.g., clustering, dimensionality reduction, anomaly detection) and reinforcement learning algorithms (e.g., Q-learning, policy gradients). You should be able to discuss specific challenges encountered during model development across these paradigms and how you resolved them. Frameworks: Hands-on expertise with PyTorch, TensorFlow, and scikit-learn. Libraries: Strong proficiency with NumPy, Pandas, SciPy, Seaborn, and Plotly for data manipulation, analysis, and visualization. Feature Engineering: Demonstrable experience in effective feature engineering, selection, and transformation techniques. Data Engineering & Management for ML: Data Pipelining: Proven experience building and managing robust data pipelines for ML, including data ingestion, cleaning, transformation, and validation. Database Proficiency: Strong command of SQL and NoSQL databases (e.g., PostgreSQL, MongoDB) for storing and retrieving data relevant to ML models. Real-time Data Streams: Expertise with Apache Kafka for building and managing real-time data ingestion and processing pipelines. Backend Development for ML Applications: API Development: Demonstrable experience designing, building, and maintaining RESTful APIs for serving ML model predictions. Programming Language & Frameworks: Strong proficiency in Python with practical experience using either Flask or FastAPI for backend service development. System Design: Ability to design scalable, fault-tolerant, and high-performance backend systems to support ML inference. MLOps & Production Deployment: Containerization: Expertise in containerization technologies (e.g., Docker) for packaging ML models and their dependencies. Orchestration: Experience with container orchestration tools (e.g., Kubernetes) for deploying and managing ML services at scale. CI/CD for ML: Proven ability to set up and manage CI/CD pipelines specifically for ML model training, testing, and deployment (e.g., Jenkins, GitLab CI, GitHub Actions). Monitoring & Logging: Experience in implementing robust monitoring, alerting, and logging solutions for production ML systems to ensure performance, reliability, and data drift detection Model Performance & Reliability: Performance Tuning: Proven ability to identify and resolve performance bottlenecks in ML models and backend services. This includes experience with fine-tuning models and applying techniques to extract maximum performance from them, such as quantization, pruning, or model compression. Model Versioning & Experiment Tracking: Experience with tools and practices for model versioning, experiment tracking (e.g., MLflow, DVC), and reproducibility. General Engineering & Problem Solving: Competitive Coding / Algorithmic Problem Solving: Demonstrated proficiency in competitive coding platforms (e.g., LeetCode, HackerRank, TopCoder, Codeforces) or a strong, demonstrable foundation in algorithms and data structures, showcasing exceptional problem-solving abilities. Security Best Practices for ML Systems: Understanding and implementation of security best practices for ML models, data, and APIs. Qualifications: 5+ years of experience as ML Engineer in high-growth SaaS or product startups Strong problem-solving and engineering mindset, with a keen eye for scalability, reliability, and efficiency. Excellent communication skills for conveying complex technical information to both technical and non-technical stakeholders. Adaptable and enthusiastic about working in a fast-paced, product-driven environment. Proactive in learning new ML technologies, backend frameworks, and deployment methodologies Comfortable working in a fast-paced, ambiguous startup environment Why to join UptimeAI: Impact Industry-Wide Change: Contribute to transformative solutions that significantly improve operational efficiency and reliability for global clients. Collaborative and Growth-Oriented Environment: Join a talented, passionate team that values innovation, continuous learning, and professional growth. Opportunities for Leadership and Innovation: Lead pioneering projects, influence product development, and shape the future of industrial AI solutions. Pay range and compensation package: (Pay range or salary or compensation) Equal Opportunity Statement: (Include a statement on commitment to diversity and inclusivity.) ```
Job Title
Machine Learning Engineer