Job Description

About THG IngenuityTHG Ingenuity is a fully integrated digital commerce ecosystem, designed to power brands without limits. Our global end-to-end tech platform is comprised of three products: THG Commerce, THG Studios, THG Fulfilment. Each represents a single, unified solution, overcoming challenges and taking brands direct-to-consumer. Our client portfolio includes globally recognised brands such as Coca-Cola, Nestle, Elemis, Homebase, and Proctor & Gamble.About the Team You will join one of the largest AI teams in retail and e‑commerce, operating at global scale across the full retail stack — from on‑site product discovery and personalisation, through to demand forecasting, pricing, fraud prevention, and warehouse fulfilment. We work across every modality (tabular, text, image, video) and combine classical techniques with state‑of‑the‑art deep learning, NLP, and generative AI to ship solutions that are sustainable, optimised, and commercially valuable. We are an AI‑first team. That means we don’t just build AI products — we build with AI. Coding agents are a core part of how our engineers work, and we expect everyone on the team to use them well: to move faster, ship higher‑quality code, and spend more time on the problems that genuinely require human judgement. Software Engineers sit at the heart of how our ML estate reaches production. You will partner closely with Machine Learning Engineers, Data Scientists, and Platform teams to turn models and pipelines into reliable, observable, low‑latency services that the wider business depends on every day. The Role As a Software Engineer in the ML team, you will own the systems that take ML artefacts — models, features, embeddings, decisioning logic — and make them production‑grade. You will design and operate the real‑time serving layer, the streaming and event pipelines that feed it, the observability that keeps it healthy, and the internal tooling that lets ML engineers ship safely and quickly. This is a hands‑on backend engineering role with a strong SRE flavour. You will write production‑grade code, debug latency and reliability issues in live services, carry pager rotations for the systems you build, and raise the engineering bar for how ML is delivered at THG. You will partner with ML Engineers to take research‑quality work and make it business‑critical artefacts. Key Responsibilities Productionise ML artefacts. Take models, features, and pipelines from notebooks and offline jobs into reliable, versioned, well‑tested production services — with clear contracts, rollback paths, and ownership. Own the real‑time serving layer. Design, build, and operate low‑latency inference services (gRPC and REST) on GCP, with explicit latency, throughput, and cost SLOs, autoscaling, graceful degradation, and safe rollout patterns (canary, shadow, A/B). Build streaming and event pipelines. Develop event‑driven data and feature plumbing using Pub/Sub, Kafka, and Dataflow/Beam to power real‑time features, online/offline parity, and downstream decisioning. Build internal tooling and developer experience. Ship the SDKs, CLIs, service templates, and golden paths that let ML engineers go from trained models to deployed service in hours, not weeks — with safety, observability, and compliance built in by default. Write production‑quality code. Ship clean, reliable, fault‑tolerant, well‑tested Python (and where appropriate Go or Java) and SQL, and champion best practices including code review, pair programming, TDD, and internal knowledge‑sharing. Partner across the ML lifecycle. Work hand‑in‑hand with ML Engineers and Data Scientists on feature pipelines, model packaging, evaluation harnesses, A/B and shadow testing, drift detection, and retraining triggers. Tackle technical debt. Modernise legacy services, reduce inference latency and cost, harden flaky pipelines, and improve reproducibility, observability, and governance across the ML estate. Set technical direction. Contribute to coding standards, the ML platform roadmap, architectural decisions on serving and streaming infrastructure, and mentor junior engineers. What We’re Looking For Essential BSc in Computer Science, Software Engineering, or a related discipline — or equivalent practical experience. Proven track record as a backend / production software engineer shipping and operating services at scale, with a clear understanding of reliability, performance, and cost trade‑offs. Strong foundations in data structures, algorithms, distributed systems, API design, and software architecture. Hands‑on experience designing, building, and running real‑time services (gRPC and/or REST) with explicit latency SLOs, autoscaling, and safe rollout patterns (canary, shadow, blue/green, A/B). Production experience with streaming and event‑driven pipelines using Pub/Sub, Kafka, and Dataflow/Beam (or close equivalents). Advanced Python skills for production services, plus fluent SQL. Comfort with at least one additional production language (Go, Java, Scala, or similar) is a strong plus. Hands‑on experience with at least one major cloud platform — Google Cloud Platform (Cloud Run/Functions, GKE, Pub/Sub, Dataflow, BigQuery, Vertex AI) is strongly preferred. Practical experience with containerisation (Docker), orchestration (Kubernetes), and CI/CD pipelines for production services. SRE‑style ownership: defining SLIs/SLOs and error budgets, instrumenting services with metrics, logs, and traces (Prometheus, Grafana, OpenTelemetry, Cloud Monitoring), carrying on‑call, and leading incident response and post‑mortems. Experience building internal tooling, SDKs, CLIs, or service templates that improve developer experience and shorten time‑to‑production for other engineers. AI‑first mindset and hands‑on experience with coding agents (Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, or similar) as part of your daily workflow. You should be able to describe, with concrete examples, how you use agents to plan, write, test, refactor, and review code — and how you manage their limitations. Excellent communication and stakeholder‑management skills, with the ability to collaborate effectively with ML Engineers, Data Scientists, Product Managers, and commercial stakeholders. Desirable Hands‑on experience with ML serving frameworks such as Vertex AI, KServe, Triton Inference Server, TorchServe, BentoML, or Ray Serve. Experience integrating with feature stores (Feast, Vertex Feature Store, Tecton) and managing online/offline feature parity. ML‑specific observability: prediction logging, drift and data‑quality monitoring, model performance dashboards, shadow and A/B evaluation harnesses. Experience with model registries, experiment tracking (MLflow, Weights & Biases), and CI/CD pipelines tailored to ML workflows. Exposure to retail or e‑commerce production systems: recommendations, search and ranking, personalisation, demand forecasting, pricing, fraud, or warehouse optimisation. Experience with agent frameworks such as Google Agent Development Kit (ADK) and Vertex AI Agent Builder, particularly in productionising agentic systems. Internal developer platform, paved‑road, or golden‑path work in a previous role. Open‑source contributions or a strong public engineering portfolio.

Job Title

Company : THG Ingenuity

Location : Pune, Maharashtra

Created : 2026-04-26

Job Type : Full Time