Query a Vector Index

In this lesson, you will learn how to query a vector index. You will use the Question and Answer embeddings to find similar responses.

Querying with Embeddings

When querying a vector index, you have to query with an embedding.

For example, you want to use the vector index to find questions similar to the text "What are examples of good open-source projects?". You would first get an embedding of the text. Then, you would use the embedding to query the vector index.

You are going to explore two scenarios:

  1. A user views an existing question and wants to see similar questions.

  2. A user submits a new question and receives answers to a similar question.

In the first scenario, you will use existing question embeddings to find similar questions. In the second scenario, you will generate a new embedding for the user’s question to find similar questions and answers.

These scenarios will help you understand how to query the vector index to find similar questions and answers.

Finding similar questions

You can use the questions and answers vector indexes to find questions that are similar to each other.

A user views an existing question and wants to see similar questions.

The following Cypher query finds similar questions to the question "What are the most touristic countries in the world?".

Review the query before running it and observing the results.

cypher
MATCH (q:Question {text: "What are the most touristic countries in the world?"})

MATCH (node:Question)
SEARCH node IN (
    VECTOR INDEX questions
    FOR q.embedding
    LIMIT 6
) SCORE AS score

RETURN node.text, score

Breaking down the query, you can identify the following:

  1. The MATCH clause finds the specific Question node.

  2. The query uses the SEARCH clause to query the questions vector index with the Question node’s embedding - q.embedding. The query returns the top 6 similar nodes.

  3. SCORE returns a similarity score for each matched node.

  4. The query returns the Question node’s text property and the similarity score.

You can extend this query to return the answers to the most similar questions:

cypher
MATCH (q:Question {text: "What are the most touristic countries in the world?"})
MATCH (node:Question)
SEARCH node IN (
    VECTOR INDEX questions
    FOR q.embedding
    LIMIT 6
) SCORE AS score
MATCH (node)-[:ANSWERED_BY]->(a)
RETURN a.text, score

The query uses the node and the ANSWERED_BY relationship to find the answers.

Run the query and observe the results. You will notice that the top answers returned are similar to the question. As you get further down the list, the similarity score decreases and so does the relevance of the answers.

Finding answers to a similar question

To improve the user’s experience when asking a new question, you could use the vector index to find similar questions and answers.

To achieve this, you need to generate an embedding for the user’s new question and use it to query the vector index.

You can generate a new embedding in Cypher using the ai.text.embed function:

cypher
ai.text.embed syntax
ai.text.embed(
    resource,
    provider,
    configuration = {}
) :: VECTOR

You pass the text you want to embed as the first parameter.

You can use embedding models from different providers, such as OpenAI, Vertex AI, and Amazon Bedrock. Provider-specific details like, API keys, are passed in the configuration map.

For example, you can use the OpenAI provider to generate an embedding by passing the API key as token and model in the configuration map:

cypher
WITH ai.text.embed(
    "Test",
    "OpenAI",
    { token: "sk-...", model: "text-embedding-ada-002" }
) AS embedding
RETURN toFloatList(embedding)

OpenAI API key

To run this query, you must replace the token value with your OpenAI API key.

You can incorporate the embedding into your query to find similar questions:

cypher
WITH ai.text.embed(
    "What are good open source projects",
    "OpenAI",
    { token: "sk-...", model: "text-embedding-ada-002" }
) AS userEmbedding

MATCH (node:Question)
SEARCH node IN (
    VECTOR INDEX questions
    FOR userEmbedding
    LIMIT 6
) SCORE AS score
RETURN node.text, score

This query creates an embedding using ai.text.embed and then uses that embedding to query the questions vector index.

Try changing the text and observe the results.

Can you modify this query to work the same as the previous query and return the answers to the most similar questions?

Check your understanding

Query Vector Index

True or False - you can pass the text you wish to search for directly to a SEARCH clause over a vector index.

  • ❏ True

  • ✓ False

Hint

A vector index can only search for text embeddings.

Solution

The statement is False. SEARCH over a vector index requires an embedding to be passed to it, not text.

Lesson Summary

In this lesson, you learned how to query a vector index and generate embeddings using Cypher.

In the next module, you will learn how to import unstructured data into Neo4j using Python.

Chatbot

How can I help you today?

Data Model

Your data model will appear here.