Job Title:
Founding Engineer - Remote India
Company: shyva AI
Location: Thane, Maharashtra
Created: 2026-03-14
Job Type: Full Time
Job Description:
SHYVA | Founding Engineer Stealth · Enterprise AI · RemoteAboutWe are building something that should have existed a decade ago — in a market where the data is fragmented, unverified, and nobody has fixed it yet. The founder has been the customer for 25 years and knows exactly what is broken. Six Fortune 500 enterprises are already committed as design partners. We are in stealth and will stay there for now.The RoleYou will be one of the first engineering hires, working directly with the founder to build the core platform — a large-scale data intelligence system with an AI-native interface. The hard problems are data, not models: ingestion at volume, entity resolution across heterogeneous sources, auditability of every output, and a graph-based data model built to compound over time.No platform team. No DevOps org. No PM handing you specs. Full architectural ownership from day one.Must-HaveFull-Stack EngineeringPython backend (FastAPI/Django) and React/Next.js frontend — you own the entire stackCloud-native: AWS or GCP, Docker/KubernetesLarge-Scale Data EngineeringETL/ELT pipelines at 10M+ record scale — Spark, dbt, Kafka, AirflowExperience ingesting and normalising licensed third-party commercial data feeds — bulk files, schema inconsistency, freshness tracking, provenance managementData lineage and auditability: every output traceable to a source record, timestamp, and confidence levelBatch and event-driven ingestion patternsGraph Data ModelingNeo4j or graph layers on relational DBs: node/edge schema design, relationship versioning, provenance preservationGraph traversal for network analysis and entity influence rankingEntity Resolution & DeduplicationProbabilistic record linkage, fuzzy matching, multi-attribute scoring at volumeBlocking strategies for large record pools (LSH, phonetic encoding, prefix blocking)Canonical entity management with merge history and audit trailLLM & Agent OrchestrationLangChain, LangGraph, CrewAI or custom orchestrators — shipped multi-step agent workflows in productionRAG pipelines: hybrid retrieval, chunking, rerankingGuardrail architecture: post-generation validation, uncertainty flagging, stale-data detectionDocument ExtractionOCR pipelines and structured extraction from complex business documentsField normalisation across currencies, date formats, and units of measureSemantic & Vector SearchElasticsearch, pgvector, Weaviate, or Pinecone — hybrid retrieval at scaleBackgroundCS or Electrical Engineering degree from a strong institution6–10 years hands-on; at least one role with genuine end-to-end ownershipStrong PlusStartup or early-stage experience — comfortable without guardrailsSupply chain, procurement, or trade finance domain knowledgeMulti-source data reconciliation across heterogeneous commercial providersEnterprise system connectors (SAP Ariba, Oracle, or similar)What We OfferFounding engineer equityDirect collaboration with a domain expert founder — no translation layer between you and the customer problemReal customers from day one — six Fortune 500 design partners already committedFull architectural ownershipIndia remoteHow to ApplySkip the cover letter. Answer three questions:What is the most technically complex data system you have built? What made it hard?Describe an architectural decision you made with incomplete information. What did you decide and why?What draws you to a role where the hardest problems are data quality and trust, not model performance?Include a link to something you built that is running right now.