TNS
VOXPOP
What’s Slowing You Down?
What is your biggest inhibitor to shipping software faster?
Complicated codebase and technical debt.
0%
QA, writing tests, and debugging.
0%
Waiting for PR review or stakeholder approval.
0%
I'm always waiting due to long build times.
0%
Rework due to unclear or incomplete specifications.
0%
Inadequate tooling or infrastructure.
0%
Other.
0%
AI / Data / Software Development

Building GPT Applications on Open Source LangChain, Part 2

We’ll use the fast-rising LLM application framework for a practical example of how to use a GPT to help answer a question from a PDF document.
Jun 16th, 2023 8:15am by
Featued image for: Building GPT Applications on Open Source LangChain, Part 2

This is the second of two articles.

In the previous article, we discussed three considerations for developers when building GPT applications with an open source stack, such as LangChain. Let’s now use LangChain for a practical example where we want to store and analyze PDF documents.

We’ll obtain a PDF document, divide it into smaller parts, save the document text and its vector representations (embeddings*) in a database system and then query it. We’ll also use a GPT to help answer a question.

*In a GPT, an embedding is simply a numerical representation of a word or phrase. Vectors represent the semantic meaning of words and phrases in a way that a machine-learning model can understand.

Create a SingleStoreDB Cloud Account

First, sign up for a free SingleStoreDB Cloud account. Once logged in, select CLOUD > Create new workspace group from the left-hand navigation pane. Next, choose Create Workspace and just work through the wizard. Here are the recommended settings for this example:

Create Workspace Group

Workspace Group Name: LangChain Demo Group
Cloud Provider: AWS
Region: US East 1 (N. Virginia)

Click Next.

Create Workspace

Workspace Name: langchain-demo
Size: S-00

Click Create Workspace.

Once the workspace is created and available, from the left-hand navigation pane, select DEVELOP > SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS pdf_db;

Create a Notebook

From the left-hand navigation pane, select DEVELOP > Notebooks. In the top right of the web page, select New Notebook > New Notebook, as shown in Figure 1 below.

We’ll call the notebook langchain_demo. Select a Blank notebook template from the available options.

We’ll also select the Connection and Database using the drop-down menus above the notebook, as shown in Figure 2.

Figure 2. Connection and Database

Fill out the Notebook

First, we’ll import some libraries:


Next, we’ll read in a PDF document. This is an article by Neal Leavitt titled “Whatever Happened to Object-Oriented Databases?” OODBs were an emerging technology during the late 1980s and early 1990s. We’ll add leavcom.com to the firewall by selecting the Edit Firewall option in the top right. Once the address has been added to the firewall, we’ll read the PDF file:


We can use LangChain’s OnlinePDFLoader, which makes reading a PDF file easier.

Next, we’ll get some data on the document:


The output should be:


We’ll now split the document into pages containing 2,000 characters each, giving us seven pages:


Next, we’ll create a table to store the text and embeddings. We can do this directly using the %%sql magic command:


To use Python code to connect to our database, we can use the built-in connection_url, as follows:


We’ll set our OpenAI API Key:


and use LangChain’s OpenAIEmbeddings:


Now we are ready to obtain the vector embeddings and store them in the database system:


We truncate the table to ensure that we start with an empty table. Then we iterate through the pages of text, obtain the embeddings from OpenAI, and store the text and embeddings in the database table.

We can now ask a question, as follows:


Here we convert the question into vector embeddings, perform a DOT_PRODUCT and return only the highest-scoring value.

Finally, we can use a GPT to provide an answer, based on the earlier question:


Here is some example output:

Based on the information provided in the document, it seems that object-oriented databases are not expected to be commercially successful in the near future. While they are gaining some popularity in niche markets such as CAD and telecommunications, relational databases continue to dominate the market and are expected to do so for the foreseeable future. IDC predicts that the growth rate for relational databases will be significantly higher than that of OO databases through 2004. However, OO databases still have their place in certain niche markets.

Summary

In this example, we saw the benefits of LangChain in the application development process. We also saw how easily we can convert documents from one format to another, store the content in a database system, generate vector embeddings and ask questions about the data stored in the database system. We also have the full power of SQL available if we are interested in performing additional query operations on the data.

I will host a workshop on June 22 and will go through building a ChatGPT application using LangChain. I hope you can join. Sign up here.

Group Created with Sketch.
TNS owner Insight Partners is an investor in: Pragma, SingleStore.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.