In the two previous lessons, you used the LangChain Neo4jVector and Neo4jGraph classes to create nodes in the graph.
Using Neo4jVector and Neo4Graph is an efficient and easy way to get started.
To create a graph where you can also understand the relationships within the data, you must incorporate the metadata into the data model.
In this lesson, you will create a graph of the course content using the neo4j Python driver and OpenAI API.
Data Model
The data model you will create is a simplified version of the course content model you saw earlier in this module.
The graph will contain the following nodes, properties, and relationships:
- 
Course,Module, andLessonnodes with anameproperty - 
A
urlproperty onLessonnodes will hold the GraphAcademy URL for the lesson - 
Paragraphnodes will havetextandembeddingproperty - 
The
HAS_MODULE,HAS_LESSON, andCONTAINSrelationships will connect the nodes 
You can extract the name properties and url metadata from the directory structure of the lesson files.
For example, the first lesson of the Neo4j & LLM Fundamentals course has the following path:
courses\llm-fundamentals\modules\1-introduction\lessons\1-neo4j-and-genai\lesson.adocYou can extract the following metadata from the path:
- 
Course.name-llm-fundamentals - 
Module.name-1-introduction - 
Lesson.name-1-neo4j-and-genai - 
Lesson.url-graphacademy.neo4j.com/courses/{Course.name}/{Module.name}/{Lesson.name} 
Extracting the data
Open the llm-vectors-unstructured\build_graph.py file in your code editor.
This starter code loads and chunks the course content.
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from openai import OpenAI
from neo4j import GraphDatabase
COURSES_PATH = "llm-vectors-unstructured/data/asciidoc"
loader = DirectoryLoader(COURSES_PATH, glob="**/lesson.adoc", loader_cls=TextLoader)
docs = loader.load()
text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=1500,
    chunk_overlap=200,
)
chunks = text_splitter.split_documents(docs)
# Create a function to get the embedding
# Create a function to get the course data
# Create OpenAI object
# Connect to Neo4j
# Create a function to run the Cypher query
# Iterate through the chunks and create the graph
# Close the neo4j driverFor each chunk, you have to create an embedding of the text and extract the metadata.
Create a function to create and return an embedding using the OpenAI API:
def get_embedding(llm, text):
    response = llm.embeddings.create(
            input=text,
            model="text-embedding-ada-002"
        )
    return response.data[0].embeddingCreate a 2nd function, which will extract the data from the chunk:
def get_course_data(llm, chunk):
    data = {}
    path = chunk.metadata['source'].split(os.path.sep)
    data['course'] = path[-6]
    data['module'] = path[-4]
    data['lesson'] = path[-2]
    data['url'] = f"https://graphacademy.neo4j.com/courses/{data['course']}/{data['module']}/{data['lesson']}"
    data['text'] = chunk.page_content
    data['embedding'] = get_embedding(llm, data['text'])
    return dataThe get_course_data function:
- 
Splits the document source path to extract the
course,module, andlessonnames - 
Constructs the
urlusing the extracted names - 
Extracts the
textfrom the chunk - 
Creates an
embeddingusing theget_embeddingfunction - 
Returns a dictionary containing the extracted data
 
Create the graph
To create the graph, you will need to:
- 
Create an OpenAI object to generate the embeddings
 - 
Connect to the Neo4j database
 - 
Iterate through the chunks
 - 
Extract the course data from each chunk
 - 
Create the nodes and relationships in the graph
 
Create the OpenAI object:
llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))Connect to the Neo4j sandbox:
driver = GraphDatabase.driver(
    os.getenv('NEO4J_URI'),
    auth=(
        os.getenv('NEO4J_USERNAME'),
        os.getenv('NEO4J_PASSWORD')
    )
)
driver.verify_connectivity()Test the connection
You could run your code now to check that you can connect to the OpenAI API and Neo4j sandbox.
To create the data in the graph, you will need a function that incorporates the course data into a Cypher statement and runs it in a transaction.
def create_chunk(tx, data):
    tx.run("""
        MERGE (c:Course {name: $course})
        MERGE (c)-[:HAS_MODULE]->(m:Module{name: $module})
        MERGE (m)-[:HAS_LESSON]->(l:Lesson{name: $lesson, url: $url})
        MERGE (l)-[:CONTAINS]->(p:Paragraph{text: $text})
        WITH p
        CALL db.create.setNodeVectorProperty(p, "embedding", $embedding)
        """, 
        data
        )The create_chunk function will accept the data dictionary created by the get_course_data function.
You should be able to identify the $course, $module, $lesson, $url, $text, and $embedding parameters in the Cypher statement.
Iterate through the chunks and execute the create_chunk function:
for chunk in chunks:
    with driver.session(database="neo4j") as session:
        
        session.execute_write(
            create_chunk,
            get_course_data(llm, chunk)
        )A new session is created for each chunk. The execute_write method calls the create_chunk function, passing the data dictionary created by the get_course_data function.
Finally, close the driver.
driver.close()Click to view the complete code
import os
from dotenv import load_dotenv
load_dotenv()
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from openai import OpenAI
from neo4j import GraphDatabase
COURSES_PATH = "llm-vectors-unstructured/data/asciidoc"
loader = DirectoryLoader(COURSES_PATH, glob="**/lesson.adoc", loader_cls=TextLoader)
docs = loader.load()
text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=1500,
    chunk_overlap=200,
)
chunks = text_splitter.split_documents(docs)
def get_embedding(llm, text):
    response = llm.embeddings.create(
            input=text,
            model="text-embedding-ada-002"
        )
    return response.data[0].embedding
def get_course_data(llm, chunk):
    data = {}
    path = chunk.metadata['source'].split(os.path.sep)
    data['course'] = path[-6]
    data['module'] = path[-4]
    data['lesson'] = path[-2]
    data['url'] = f"https://graphacademy.neo4j.com/courses/{data['course']}/{data['module']}/{data['lesson']}"
    data['text'] = chunk.page_content
    data['embedding'] = get_embedding(llm, data['text'])
    return data
def create_chunk(tx, data):
    tx.run("""
        MERGE (c:Course {name: $course})
        MERGE (c)-[:HAS_MODULE]->(m:Module{name: $module})
        MERGE (m)-[:HAS_LESSON]->(l:Lesson{name: $lesson, url: $url})
        MERGE (l)-[:CONTAINS]->(p:Paragraph{text: $text})
        WITH p
        CALL db.create.setNodeVectorProperty(p, "embedding", $embedding)
        """, 
        data
        )
llm = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
driver = GraphDatabase.driver(
    os.getenv('NEO4J_URI'),
    auth=(
        os.getenv('NEO4J_USERNAME'),
        os.getenv('NEO4J_PASSWORD')
    )
)
driver.verify_connectivity()
for chunk in chunks:
    with driver.session(database="neo4j") as session:
        
        session.execute_write(
            create_chunk,
            get_course_data(llm, chunk)
        )
driver.close()Explore the graph
Run the code to create the graph. It will take a minute or two to complete as it creates the embeddings for each paragraph.
View the graph by running the following Cypher:
MATCH (c:Course)-[:HAS_MODULE]->(m:Module)-[:HAS_LESSON]->(l:Lesson)-[:CONTAINS]->(p:Paragraph)
RETURN *You will need to create a vector index to query the paragraph embeddings.
CREATE VECTOR INDEX paragraphs IF NOT EXISTS
FOR (p:Paragraph)
ON p.embedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}You can use the vector index and the graph to find a lesson to help with specific questions:
WITH genai.vector.encode(
    "How does RAG help ground an LLM?",
    "OpenAI",
    { token: "sk-..." }) AS userEmbedding
CALL db.index.vector.queryNodes('paragraphs', 6, userEmbedding)
YIELD node, score
MATCH (l:Lesson)-[:CONTAINS]->(node)
RETURN l.name, l.url, scoreExplore the graph and see how the relationships between the nodes can bring additional meaning to the unstructured data.
When you are ready to move on, click Continue.
Lesson Summary
In this lesson, you created a graph of course content.
In the next lesson, you will learn how to add topics to the graph.