Retrieval Augmented Generation (RAG)

NATURAL LANGUAGE UNDERSTANDING + REAL KNOWLEDGE

It’s the difference between an open-book and a closed-book exam. In a RAG system, you are asking the model to respond to a question by browsing through the content in a book, as opposed to trying to remember facts from memory. - Luis Lastras, director of language technologies at IBM Research

INTRODUCTION

We’ve all heard stories of Large Language Models (LLMs) like ChatGPT can make things up, hallucinate, or be flat out wrong. Wouldn’t it be great if there were a way to leverage these models’ natural language abilities to understand our questions but not rely on their “knowledge”? What if I told there was a way! Retrieval-Augmented Generation (RAG) is a big step forward in knowledge management. The technique blends the power of generative AI models with the precision of information retrieval systems. It works by creating a database of pertinent information based on a set of documents, a website, a database, etc. and then integrating this information into the generative process, ensuring that the output is accurate and pertinent. It overcomes the limitations of current LLMs by reducing hallucinations (aka making stuff up and being wrong), adding relevant knowledge and enables meta-analysis across documents and datasets.

The impact of Information-augmented generation on knowledge management is profound. In traditional knowledge management systems, information retrieval and data processing are inflexible, require expertise in database management, or must delineate between search and analysis. We can now create a more seamless and interactive flow of information. We can increase the efficiency of knowledge retrieval processes and improve the quality of generated content, while reducing errors. By enabling natural-language interactions with knowledge bases, companies can gain even more insights from the data they already have. At F’inn, our relationship with data is constantly evolving, as are our usages of AI - we’ll share some ways we are using the technology.

THE BENEFITS IN KNOWLEDGE MANAGEMENT

How does it all work? First, documents such as web pages, articles, books, datasets, transcripts, whatever is ingested and indexed into a searchable database of information. When you ask questions about the documents, the database is searched, and the system finds the most relevant documents or passages that contain the information you’re looking for using some kind of similarity search technique. The retrieved info is then processed into the conversation context of your discussion with the LLM. The model pays attention to your question/request, in addition to the relevant info, when generating a response, whether it be a summary or a specific answer to a question. Again, we are no longer relying on the model’s knowledge of the subject, we are providing that knowledge as ground truth. We are just leveraging the LLM’s natural language abilities to understand our question, review the database, and come up with an appropriate response.

Accuracy and relevance of information retrieval is of utmost importance in knowledge management and retrieval-augmented generation comes to the rescue of LLMs. LLMs excel at understanding our request using natural language, like “Help me understand innovation.” because they have been trained on large amounts of text and are conditioned to provide correct answers. But we still run into issues of reliability of the responses. LLMs can “hallucinate” answers that they think are correct but are not. They often make mistakes when it comes to factual knowledge but sound confident in their error, so it can be hard to tell what the truth really is. Grounding LLMs with a database of “truth” turns LLMs from “hopefully-fact-generators” to “language-understanders-who-find-the-right-facts-helpers”. Retrieval-based generation can turn a long document into “Find the statistical differences in this report” and then into a chart with accuracy and reliability.

Chat window ChatGPT-4 reads document — GPT-4 read the document, was able to correctly quote appropriate statistic and explain the result.

In terms of knowledge management, we can now search and access a wide array of information sources, ensuring that generated content is not only current but also highly relevant to the query. Imagine searching the latest versions of documentation and codebases, having chatbots retrieving up-to-date factual information, and analyzing sales data across departments. This dynamic access means the system can be continually updated with new information, providing more accurate and aligned responses. The value of this transformation cannot be understated because efficiency in mining data by combining data retrieval and content generation. Information is integrated into the generation process in real time, speeding up the process, allowing rapid iteration to quickly find insights among disparate data.

Democratic access to insights is another key advantage – no longer do you need to be a SQL expert to query your database and find what you need. You can leverage LLMs’ impressive natural language understanding to understand the question and then find the relevant information. If you can write out your question, you can find an answer. Similarly, this process works for small datasets, like a sales report, or a warehouse of interdisciplinary data in various forms. Flexibility allows searching textual documents and more structured data making the capability relevant to anyone working with information. This flexibility means everyone can dig deeper into their data.

F'INN USE CASES

At F’inn, we’ve found a few key use cases for retrieval-augmented generation in our workflow. First, when we have long transcripts from qualitative interviews or copious open-end data from survey respondents, we can easily examine all of info for themes with references to relevant sections or generate summaries of the information. I use it all the time to understand the latest AI research from arxiv (because I can’t understand it otherwise). But it doesn’t stop there - being able to ask follow-up questions and interview for dataset is where the real insights can come from. Being able to do all of this with natural language means we can all get in there, build and test hypothesis and dig into the data. The system then interprets these queries, retrieves relevant information from the database, and presents it in an easily digestible format. Having a conversation with your data is already yielding insights we wouldn’t have seen or taken hours of digging. This shift not only democratizes access to complex databases but also streamlines the information retrieval process, making it more intuitive and user-friendly. We can get insights from our data quickly to inform our analysis plan and dig further into our data.

Chat window ChatGPT-4 summary of paper — GPT-4 summarizes the recent paper “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”

Link to paper

Retrieval-augmented generation is changing how we search for documents – with a document repository, we can search for relevant documents like proposals, SOWs, reports, etc. with natural language. It’s one thing to search your documents for the word “price”, it’s another to be able to find any reference to price, fees, cost, value, etc. We can easily find questions we’ve written in past surveys to make sure we’re consistent from wave to wave and a questionnaire locker means we don’t have to reinvent questions anew, speeding up questionnaire development. We don’t need to manually tag files anymore to make searches more flexible, the searchable database creates all the tags needed.

The benefits extend beyond search into meta-analysis. We can now search across documents, whether it’s various reports or waves of data, we can find common themes and other trends that would be harder to see without a 20,000 ft. view over the whole database of documents. Being grounded in knowledge means we can get links to references supporting any conclusions – all these AI outputs need to be verified by expert humans. Getting links helps! When both critically evaluating the AI’s response, or citing references in our documents, source verification is key.

FUTURE OF KNOWLEDGE MANAGEMENT

LLMs will continue to get smarter and better able to synthesize vast amounts of data and search/retrieval techniques will also improve, making retrieval augmented generation an even more powerful tool. Once they are capable of real reasoning, they will not only be able to synthesize, but take the information into new areas via deduction. More sophisticated semantic databasing will create more powerful databases of your data, able to find both detailed and high-level information. We are working towards systems that can understand complex queries and provide nuanced results, which should result in more accurate information retrieval. Imagine huge databases of knowledge like Wikipedia ready to be integrated into relevant results.

These capabilities have the potential to revolutionize many industries, not just research. In healthcare, powerful semantic search could power decision-support systems for doctors with instant access to patient data, the latest research, and treatment effectiveness. Doctors’ diagnosis ability and treatment effectiveness could go through the roof. In education, adaptive learning platforms can provide personalized content and resources based on learning styles and needs. Students can get the factual information at the level they need to encourage learning. Across businesses, more efficient knowledge management can help make better decisions, streamline operations, and foster innovation by democratizing access to knowledge, from product development to advertising.

CONCLUSION

Retrieval-augmented generation solves a big issue of current LLMs by adding factual knowledge. Removing the need to be correct about information means we instead focus on LLMs’ exceptional ability to understand natural language, context and summarize text. Organizations are starting to use this ability to manage and leverage their knowledge bases and are already reaping benefits. More people having access to data insights at their fingertips has the potential to transform innovation.

Retrieval-augmented generation is available in such chatbots at ChatGPT, Claude, h2o, etc. with the ability to upload files. Other open-source options include Langchain, PrivateGPT, NVIDIA’s Chat with RTX, etc. where you can chat with your local documents, although these require advanced setup and/or special hardware to get the most out of the tools. The future should bring even more options to everyone.