Yulin Shi

Selected Projects

Building innovative solutions at the intersection of AI and real-world applications.

01 // ENGINEERING

Legomnia

I developed the Legomnia platform, an advanced search engine designed to handle complex legal documents, leveraging ElasticSearch 8.9. The platform integrates vector-based document storage and employs a hybrid retriever that combines both syntactic and semantic search methods for optimal results. Key features include advanced query techniques that enhance the retrieval of relevant information from intricate legal corpora. The system also automates the extraction and indexing of metadata, improving document filtering and management.

PythonElasticSearch 8.9Vector SearchHybrid Retrieval

NLP

FeedPaper

I architected FeedPaper, an automated multi-agent research pipeline designed to streamline the discovery of daily academic papers. Sourcing real-time data from arXiv RSS feeds, the system indexes articles into a vector database, utilizing hybrid search to deliver highly relevant results based on user-defined topics. A key innovation is the integration of Large Language Models (LLMs) that analyze and articulate precisely why a specific paper is relevant to the user's interests. The platform features a fully automated backend powered by Celery, managing the end-to-end lifecycle from data scraping and indexing to the delivery of personalized daily email reports.

PythonCeleryLLM IntegrationVector DatabaseHybrid SearchMulti-Agent Systems

NLP

Juridic Chatbot

I developed Oliver.legal, a chatbot designed specifically for the French legal system. This project involved comprehensive full-stack development, covering both front-end design and back-end architecture. I implemented server-side logic using FastAPI and crafted an intuitive user interface with React. For data collection, I employed advanced web scraping techniques to gather and structure information from French legal websites. To enhance the chatbot's functionality, I built a custom Retrieval-Augmented Generation (RAG) framework that integrates multiple retrieval methods and supports various LLMs, enabling context-aware, accurate answers. This system uses hybrid retrieval models and rerankers, ensuring responses are both relevant and precise.

FastAPIReactRAGLLM

NLP

Legomnia

I developed the Legomnia platform, an advanced search engine designed to handle complex legal documents, leveraging ElasticSearch 8.9. The platform integrates vector-based document storage and employs a hybrid retriever that combines both syntactic and semantic search methods for optimal results. Key features include advanced query techniques that enhance the retrieval of relevant information from intricate legal corpora. The system also automates the extraction and indexing of metadata, improving document filtering and management.

PythonElasticSearch 8.9Vector SearchHybrid Retrieval

NLP

FeedPaper

I architected FeedPaper, an automated multi-agent research pipeline designed to streamline the discovery of daily academic papers. Sourcing real-time data from arXiv RSS feeds, the system indexes articles into a vector database, utilizing hybrid search to deliver highly relevant results based on user-defined topics. A key innovation is the integration of Large Language Models (LLMs) that analyze and articulate precisely why a specific paper is relevant to the user's interests. The platform features a fully automated backend powered by Celery, managing the end-to-end lifecycle from data scraping and indexing to the delivery of personalized daily email reports.

PythonCeleryLLM IntegrationVector DatabaseHybrid SearchMulti-Agent Systems

NLP

Juridic Chatbot

I developed Oliver.legal, a chatbot designed specifically for the French legal system. This project involved comprehensive full-stack development, covering both front-end design and back-end architecture. I implemented server-side logic using FastAPI and crafted an intuitive user interface with React. For data collection, I employed advanced web scraping techniques to gather and structure information from French legal websites. To enhance the chatbot's functionality, I built a custom Retrieval-Augmented Generation (RAG) framework that integrates multiple retrieval methods and supports various LLMs, enabling context-aware, accurate answers. This system uses hybrid retrieval models and rerankers, ensuring responses are both relevant and precise.

FastAPIReactRAGLLM

Written Synapses

Deconstructing algorithms, ethics, and the philosophy of intelligence.

02 // THOUGHTS

November 15, 2025

EngineeringAI Architecture

Zero-Cost AI MVP

The backbone of this architecture is the choice of Supabase as the Backend-as-a-Service platform, which provides for free not only the relational database but also crucial services such as user authentication and file storage.

Read Article

November 15, 2025

EngineeringAI Architecture

Zero-Cost AI MVP

The backbone of this architecture is the choice of Supabase as the Backend-as-a-Service platform, which provides for free not only the relational database but also crucial services such as user authentication and file storage.

Read Article