uncloseai.
Reverse Retrieval Augmented Generation
Client-Side Context Injection for Small Language Models
How live DOM extraction makes 8B models punch above their weight class.
russell@unturf, cthegray, TimeHexOn, foxhop
Abstract
Traditional Retrieval Augmented Generation (RAG) requires a server-side pipeline: chunk documents, embed them into vectors, store them in a database, and retrieve by similarity search at query time. This architecture demands infrastructure, indexing latency, and maintenance of embedding models and vector stores.
Reverse Retrieval Augmented Generation (Reverse RAG) inverts this entirely. Instead of the server fetching documents to augment the prompt, the client extracts live content from the page the user currently views and injects it directly into the conversation context. The data comes to the model. No vector database. No embeddings. No indexing pipeline. No server-side retrieval.
uncloseai.js implements this technique as an AGPL-3.0-only algorithm. uncloseai.js serves as the entrypoint of a modular application that adds a machine learning chat interface to any webpage. By feeding the model the full, fresh content of whatever page the user visits, small 8B-parameter models produce answers that rival much larger models on page-specific questions.
Read the full whitepaper (PDF)
Citation
russell@unturf, cthegray, TimeHexOn, foxhop. "Reverse Retrieval Augmented
Generation: Client-Side Context Injection for Small Language Models."
uncloseai.com, 2026. https://uncloseai.com/reverse-retrieval-augmented-generations-rag.html