My learning documentation for RAG
<aside> đź“–
Contents
</aside>
Imagine we have a robot which knows a bit of everything. Now we want it to be able to answer some questions about my best friend. If we just ask the robot, it may answer some general answers like what most humans do. Therefore, we want to “plug in” an USB which stores the info of my best friend, so it can learn about my friend.
<aside> 🤖
The combination of “the robot” + “USB” = RAG
</aside>
One of the approach that I came across — Unstructured.io
<aside> 🧑‍🤝‍🧑
Another approach I chose —Llama Parse
parser = LlamaParse(
result_type="markdown",
parsing_instruction=instruction,
max_timeout=5000)
parser.load_data("path_to_pdf")
used LangChain Unstructured Markdown loader
why? because in the subsequent step, we want to use the LangChain text splitter
Code Example
loader = UnstructuredMarkdownLoader(
"path_to_md",
mode="single",
strategy="fast",
)
loaded_doc = loader.load()[0]