About the Role
This role is for a hands-on ML Engineer who can design, train, and productionize models powering search relevance, retrieval, personalization, and LLM-based conversational experiences at a massive scale.
You will work closely with backend, platform, and catalog enrichment teams to deliver high-quality ML components under tight performance and latency constraints.
Key Responsibilities
- Build and improve search ranking, retrieval, and query understanding models.
- Develop ML components for Conversational Search:
- Multi-turn context handling
- Query intent detection and classification
- Retrieval-augmented generation (RAG) pipelines
- Reasoning workflows (ReAct, static + dynamic agent flows)
- Design and optimize embedding models, vector stores, and similarity search systems.
- Build personalized ranking and recommendation models using deep learning.
- Work on large-scale ML systems optimized for:
- Low latency
- High throughput
- Cost-efficient inference
- Implement ML pipeline best practices (versioning, monitoring, A/B testing, observability).
- Collaborate with platform teams to integrate ML services across search, recommendations, and conversational agents.
- Develop caching strategies (prompt cache, vector cache, similarity caching) to hit strict SLA targets.
- Contribute to long-term roadmap: foundational retrieval models, multi-objective optimization, and user lifecycle modeling.