Job Description
Research at Joko
At Joko, we believe that today’s online shopping experience is fundamentally flawed, and we are putting a lot of effort into disrupting it. We are crafting a new experience that enables users to find their desired products in the smoothest way possible, to effortlessly compare all their characteristics, and to obtain transparent and clear information on both their price and environmental cost.
To achieve our goal, we are building the world's largest product catalog, a universal catalog composed of all the products sold by all e-commerce sites in the world. For this, we need to understand any web page in order to extract structured information from it, and to clean and standardize information from multiple sources in near real-time. We have developed LLM-based approaches to address all these challenges. One of the major challenges is to scale these approaches on colossal volumes of data (we have to process hundreds of millions of products several times a day). We have developed state-of-the-art approaches that rely on fine-tuning relatively small LLMs, but there is still a lot of research needed to optimize their performance and resource efficiency, and/or find more efficient approaches.
Then, we are developing an AI copilot that helps users find the right products in our gigantic product catalog. Developing this conversational experience is a huge challenge that goes beyond traditional RAG systems: it requires a deep understanding of search engines (to use combinations of traditional full-text search and vector search on huge volumes of data), and a mastery of LLMs to deliver a reliable experience with low latency and controlled costs. We are constantly iterating on this product, and have many associated research problems to improve the search accuracy, reduce the latency, and better capture the intention of the user.
What You Will Do
Joko has been offering research internships in Machine Learning for several years. All our internships are closely tied to our engineering teams to maximize their tangible impact. Almost all previous interns joined Joko in full-time positions after their internship.
As a Machine Learning Research Intern, you will work on one of the following research subject:
Improve the performance and the scalability of our LLM-based data processing pipeline for our universal product catalog. For this project, it will be necessary to explore fine-tuning LLMs for specialized tasks as well as various techniques aimed at reducing model size (such as quantization, pruning, or distillation). Rigorous evaluation of model performance (notably using LLMs as judges) will represent one of the challenges.
Improve the search performance in our universal product catalog. For this project, it will be necessary to benchmark the performance of different search techniques, combining full-text search and vector search, and to identify the most effective LLM-based embedding methods.
Improve the performance and the latency of our AI copilot. For this project, it will be necessary to benchmark numerous models, explore their fine-tuning, work on reducing their size, as well as work on ML Ops topics to ensure the best possible latency in a production context. Here again, rigorous evaluation of model performance will represent an important challenge.
Exploration will represent an important part of the internship, through experiments, literature reviews, and theoretical developments. You will have full ownership of your projects and the liberty to orient the research direction of your internship based on your results and what you consider promising among the directions we determined. Your goal will be to deploy your work in production and monitor its impact on hundreds of thousands of users.
Your responsibilities:
Research: You will work on all steps of the research process – you will formalize the objectives of your work, conduct literature reviews to have a deep understanding of the problems, design new algorithms, analyze them both theoretically and experimentally, and collect and transform relevant data for your experiments.
Exploration & ownership: You will participate in orienting the internship towards research directions you deem valuable to our users.
Implementation, deployment & monitoring in production: Helped by the engineering team, you will be responsible for integrating into our product the most scalable and robust algorithms you will have worked on. Finally, you will monitor their impact on our users.