Ragpipe
Ragpipe helps you build tools to get insights from your large document repositories quickly by building fast RAG pipelines.
Ragpipe makes it easy to tweak components of your RAG pipeline so that you can iterate fast until you get desired accurate responses.
Instead of the usual chunk-embed-match-rank
flow, Ragpipe adopts a holistic, end-to-end view of the pipeline, consisting of:
- building the data model,
- choosing representations for document parts,
- specifying the correct bridges among representations,
- merging the retrieved docs across bridges,
- and using the retrieved docs to compute the query response
The represent-bridge-merge
pattern is very powerful and allows us to build all kinds of complex retrieval engines with retrieve-rank-rerank
patterns.
Key Ideas
Representations. Choose the query/document fields as well as how to represent each chosen query / document field to aid similarity/relevance computation (bridges) over the entire document repository. Representations can be text strings, dense/sparse vector embeddings or arbitrary data objects, and help bridge the gap between the query and the documents.
Bridges. Choose a pair of query and document representation to bridge. A bridge serves as a relevance indicator: one of the several criteria for identifying the relevant documents for a query. In practice, several bridges together determine the degree to which a document is relevant to a query. Computing each bridge creates a unique ranked list of documents.
Merges. Specify how to combine the bridges, e.g., combine multiple ranked list of documents into a single ranked list.
Data Model. A hierarchical data structure that consists of all the (nested) documents. The data model is created from the original document files and is retained over the entire pipeline. We compute representations for arbitrary nested fields of the data, without flattening the data tree.
To query over a data repository,
- we compute the data model over the original data repository
- specify the document fields and the (multiple) representations to be computed for each field
- specify which representations to compute for query
- specify bridges: which pair of query and doc field representation should be matched
- merges: how to combine multiple bridges, sequentially or in parallel, to yield a curated ranked list of relevant documents.
- gen-response: how to generate response to the query using the relevant document list and a large language model.
Quick Start
See the example in the ragpipe/examples/insurance
directory. The main
function from insurance.py
is inlined below.
def main(respond_flag=False):
config = load_config('examples/insurance/insurance.yml', show=True) #L1
D = build_data_model('examples/data/insurance/niva-short.mmd') #L2
query_text = config.queries[1] #L3
from ragpipe.bridge import bridge_query_doc
docs_retrieved = bridge_query_doc(query_text, D, config) #L4
for doc in docs_retrieved: doc.show() #l5
if respond_flag:
return respond(query_text, docs_retrieved, config.prompts['qa2']) #L6
else:
return docs_retrieved
In main
, we implement the following steps:
#L1
read the config fileinsurance.yml
which specifies representations, bridges and merges.- go through the
insurance.yml
file to understand the definitions.
- go through the
#L2
read the data files (niva-short.mmd
) to build a doc modelD
with following nested fieldsquery.text
which contains the query string.sections[].node
which contains the document snippets
#L4
all the heavy lifting happens inragpipe.bridge_query_doc
function based on definitions frominsurance.yml
- compute
#dense
representations forquery.text
and.sections[].node
fields usingcolbert-ir/colbertv2.0
- rank documents according to the bridge
b1
across the two representationsquery.text#dense
,.sections[].node#dense
- compute a merge
c1
to obtain the final ranked list. In this case, the merge is trivial since only a single bridge is defined.
- compute
#L6
send the retrieved documents as context to the LLM, along with a QA prompt defined ininsurance.yml
; generate the final cohesive response.
FAQ
How / where do you chunk the long documents?
Chunking happens during building the data model. Unlike conventional chunking, we do not create a flat "list of text fields" data model. Arbitrary parts of the document tree can be 'chunked' dynamically without losing the original document hierarchy.
This makes it easy to explore different chunking strategies while retaining the unaffected parts of the downstream represent-bridge-merge pipeline.
Which representations are supported?
Ragpipe already includes a few popular dense and sparse vector encoders to get your pipeline started quickly. For example, BAAI/bge-small-en-v1.5, BM25, colbert-ir/colbertv2.0. The flexible configuration allows adding new external encoders by simply adding an
Encoder
class for the new encoder. Also, this allows us to keep the core of Ragpipe very lean.