Challenge: Weighted graph projection and analysis

Introduction

You’ve learned how to project graphs with undirected relationships and weights. You’ve used Node Similarity in the previous lesson to find similar movies and write relationships back to the database.

Now it’s time to take Node Similarity further by using it in stream mode on a directed, weighted, bipartite graph to analyze user similarity without writing back to the database.

In this challenge, you’ll:

  1. Project a user-movie bipartite graph with weighted relationships

  2. Use Node Similarity in stream mode to find users with similar taste profiles

  3. Analyze the results to understand rating patterns

Your task

Step 1: Project the graph

First, open the sandbox panel to get started.

Project a graph called 'user-movie-ratings' that connects users to movies through their ratings, using rating scores as relationship weights.

Your projection should:

  • Include both User and Movie nodes

  • Preserve node labels (User and Movie)

  • Use the rating property as the relationship weight

Think about the pattern: User → RATED → Movie

Write the complete projection query, including:

  • The correct MATCH pattern

  • Configuration to preserve both User and Movie labels

  • Relationship properties configuration for the rating weight

Verify your projection was created successfully using gds.graph.list().

Step 2: Review node similarity

Visit the Node Similarity documentation.

Review the configuration parameters, especially:

  • How to use weights with relationshipWeightProperty

  • How topK controls the number of similar nodes returned

  • The difference between stream, mutate, and write modes

Step 3: Run node similarity

Run Node Similarity on your projected graph in stream mode using the rating weights to find users with similar taste profiles.

Return the top 20 most similar user pairs with their similarity scores and the movies they’ve both rated.

Hints:

  • Use relationshipWeightProperty: 'rating' to tell Node Similarity to consider rating weights

  • Set topK: 3 to limit to the top 3 most similar users for each user

  • Convert node IDs to user names using gds.util.asNode(nodeId).name

  • Order by similarity score to see the strongest matches

  • To find shared movies, you can query the database using the user names

When you’ve completed your analysis, answer the questions below.

Solution approach

Details

Step 1: Project the user-movie graph

cypher
Solution: Project weighted user-movie bipartite graph
MATCH (source:User)-[r:RATED]->(target:Movie) // (1)
WITH gds.graph.project( // (2)
  'user-movie-ratings', // (3)
  source, // (4)
  target, // (5)
  {
    sourceNodeLabels: labels(source), // (6)
    targetNodeLabels: labels(target), // (7)
    relationshipProperties: r { .rating } // (8)
  }
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels // (9)

Projection breakdown

  1. Match User nodes connected to Movie nodes through RATED relationships

  2. Call the GDS projection function

  3. Name the projection 'user-movie-ratings'

  4. Include source (User) nodes

  5. Include target (Movie) nodes

  6. Preserve User labels

  7. Preserve Movie labels

  8. Include the rating property from relationships as weight

  9. Return projection statistics

Key components:

  • The MATCH pattern connects users directly to movies they rated

  • sourceNodeLabels and targetNodeLabels preserve the bipartite structure

  • relationshipProperties: r { .rating } captures the rating weight (1-5 stars)

  • The directed relationships ensure Node Similarity only compares users (not users to movies)

  • The result is a weighted, directed bipartite user-movie network


Step 2: Run Node Similarity

cypher
Solution: Run weighted Node Similarity and analyze user pairs
CALL gds.nodeSimilarity.stream( // (1)
  'user-movie-ratings', // (2)
  {
    relationshipWeightProperty: 'rating', // (3)
    topK: 3 // (4)
  }
)
YIELD node1, node2, similarity // (5)
RETURN
  gds.util.asNode(node1).name AS user1, // (6)
  gds.util.asNode(node2).name AS user2, // (7)
  similarity // (8)
ORDER BY similarity DESC // (9)
LIMIT 20 // (10)

Algorithm breakdown

  1. Call Node Similarity in stream mode

  2. Run on 'user-movie-ratings' projection

  3. Configure algorithm to use rating weights when calculating similarity

  4. Limit to top 3 most similar users for each user

  5. Yield pairs of similar node IDs and their similarity scores

  6. Convert first node ID to user name

  7. Convert second node ID to user name

  8. Return the similarity score

  9. Sort by similarity in descending order to see strongest matches

  10. Limit results to top 20 user pairs

Optional: Find shared movies

You can extend the analysis to see which movies similar users both rated:

cypher
Find movies that similar users both rated highly
CALL gds.nodeSimilarity.stream( // (1)
  'user-movie-ratings',
  {
    relationshipWeightProperty: 'rating',
    topK: 3
  }
)
YIELD node1, node2, similarity // (2)
WITH gds.util.asNode(node1).name AS user1,
     gds.util.asNode(node2).name AS user2,
     similarity
ORDER BY similarity DESC
LIMIT 5 // (3)
MATCH (u1:User {name: user1})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2:User {name: user2}) // (4)
WHERE r1.rating >= 4 AND r2.rating >= 4 // (5)
RETURN user1, user2, similarity, collect(m.title)[0..5] AS shared_high_rated_movies // (6)

What the results mean:

  • High similarity scores (close to 1.0) indicate users with very similar rating patterns

  • The rating weights ensure that higher ratings (4-5 stars) contribute more to similarity

  • Similar users can be used for personalized recommendations

  • Shared highly-rated movies reveal common taste profiles

  • This is the foundation of collaborative filtering

Check your understanding

What Does Node Similarity Do?

What is the primary purpose of the Node Similarity algorithm?

  • ❏ It finds the shortest path between similar nodes

  • ✓ It computes similarity scores between nodes based on their shared neighbors

  • ❏ It detects communities by grouping similar nodes together

  • ❏ It ranks nodes by their importance in the network

Hint

Think about what "similarity" means in a graph—how do you measure if two nodes are alike?

Solution

Node Similarity computes similarity scores between nodes based on their shared neighbors.

The algorithm works by:

  1. Comparing each node to others in the graph

  2. Counting how many neighbors they have in common

  3. Calculating a similarity score (typically Jaccard similarity or overlap coefficient)

  4. Optionally filtering to keep only the most similar pairs (using topK)

In a user-movie bipartite graph, users are similar if they rated many of the same movies. When using relationshipWeightProperty, the algorithm considers rating values when calculating similarity—users who gave similar ratings to the same movies are more similar.

Stream vs. Mutate Mode

What is the key difference between using Node Similarity in stream mode versus mutate mode?

  • ✓ Stream returns results for analysis; mutate creates relationships in the in-memory projection

  • ❏ Stream is faster than mutate mode

  • ❏ Stream can use weights; mutate cannot

  • ❏ Stream finds more similar pairs than mutate

Hint

Think about what happens to the similarity results in each mode—where do they go?

Solution

Stream returns results for analysis; mutate creates relationships in the in-memory projection.

Stream mode (gds.nodeSimilarity.stream):

  • Returns similarity pairs as query results

  • Use for analysis, reporting, or further Cypher processing

  • Does not modify the projection or database

  • Perfect for exploration and one-time analysis

Mutate mode (gds.nodeSimilarity.mutate):

  • Creates new SIMILAR relationships in the in-memory projection

  • Use when you want to run additional algorithms on the similarity network

  • Allows chaining algorithms (e.g., similarity → community detection)

  • Still doesn’t touch the database

Write mode does the same as mutate but persists relationships to the database.

In this challenge, you used stream mode because you only needed to analyze similarity pairs, not create a similarity network for further algorithms.

How Do Weights Affect Node Similarity?

How does using relationshipWeightProperty: 'rating' affect the Node Similarity calculation in a user-movie graph?

  • ❏ It makes the algorithm run faster

  • ❏ It filters out low-rated movies before calculating similarity

  • ✓ It gives more importance to movies with higher ratings when computing similarity scores

  • ❏ It only compares users who rated at least 5 movies

Hint

Think about two users who both rated the same movies—one gave 5 stars, the other gave 2 stars. Should they be considered similar?

Solution

It gives more importance to movies with higher ratings when computing similarity scores.

Without weights, Node Similarity only considers which movies users rated in common—not how they rated them.

With relationshipWeightProperty: 'rating':

  • Movies with higher ratings (4-5 stars) contribute more to similarity

  • If two users both love the same movies (high ratings), they’re more similar

  • If two users both disliked the same movies (low ratings), they’re still somewhat similar

  • If one user loved a movie (5 stars) and another hated it (1 star), they’re less similar

Example:

  • User A rates: Movie1=5, Movie2=5, Movie3=5

  • User B rates: Movie1=5, Movie2=5, Movie3=2

  • User C rates: Movie1=2, Movie2=3, Movie3=1

Users A and B are more similar (they both loved Movies 1 and 2) than A and C (who rated the same movies very differently).

This weighted approach is essential for recommendation systems—you want to recommend based on similar preferences, not just similar viewing history.

What Does the topK Parameter Control?

What does setting topK: 3 do in Node Similarity?

  • ❏ It only analyzes the top 3 most connected nodes

  • ❏ It limits the algorithm to 3 iterations

  • ✓ It returns only the 3 most similar nodes for each node in the graph

  • ❏ It requires nodes to have at least 3 neighbors to be included

Hint

The "K" in topK refers to "K nearest neighbors"—how many similar matches do you want for each node?

Solution

It returns only the 3 most similar nodes for each node in the graph.

topK controls how many similarity relationships are kept per node. With topK: 3:

  • For each user, only the 3 most similar users are returned

  • This prevents overwhelming results (without topK, you’d get similarities for all possible pairs)

  • It’s memory-efficient and practical for recommendations

Example:

If you have 100 users: - Without topK: Up to 4,950 similarity pairs could be returned (100 × 99 ÷ 2) - With topK: 3: Maximum of 300 pairs (100 users × 3 similar users each)

When to adjust topK:

  • Smaller topK (1-3): For "best match" recommendations, faster processing

  • Larger topK (10-20): For more comprehensive similarity networks, broader recommendations

  • No topK: When you need all similarity scores above a threshold (use similarityCutoff instead)

For recommendation systems, topK: 3-10 is typical—you want the most similar users, not all somewhat-similar users.

Interpreting Similarity Scores

What does a similarity score of 0.85 between two users indicate?

  • ❏ They rated exactly 85% of the same movies

  • ✓ They have highly overlapping rating patterns with similar preferences

  • ❏ One user copied 85% of the other user’s ratings

  • ❏ They are in the same community with 85% certainty

Hint

Similarity scores range from 0 (completely different) to 1.0 (identical). What does 0.85 tell you about their rating overlap?

Solution

They have highly overlapping rating patterns with similar preferences.

Similarity scores in Node Similarity represent how "alike" two nodes are based on their connections. A score of 0.85 is high and indicates:

What it means:

  • The users rated many of the same movies

  • When weights are used, they gave similar ratings (both liked or both disliked)

  • Their taste profiles are very compatible

  • Recommendations based on one user’s ratings would likely work well for the other

Score interpretation:

  • 0.9-1.0: Nearly identical preferences (rare)

  • 0.7-0.9: Very similar, strong match for recommendations

  • 0.5-0.7: Moderately similar, some overlap

  • 0.3-0.5: Weakly similar, little overlap

  • < 0.3: Very different preferences

In practice:

For recommendation systems, you typically: - Use scores > 0.7 for confident recommendations - Use scores 0.5-0.7 for "you might also like" suggestions - Ignore scores < 0.5 as too dissimilar

The exact score depends on the similarity metric used (Jaccard, overlap, etc.), but higher always means more similar.

Summary

You’ve successfully projected a weighted bipartite graph connecting users and movies through ratings and applied Node Similarity in stream mode to find users with similar taste profiles.

This challenge demonstrates a core GDS workflow: understanding your data model, designing the right projection, configuring relationship weights, and applying algorithms in stream mode to analyze patterns without modifying the database.

Node Similarity on weighted bipartite graphs is the foundation of collaborative filtering—finding users with similar preferences to make personalized recommendations. By using rating weights, you ensured that highly-rated movies contribute more to similarity calculations. The directed relationships ensured the algorithm only compared users based on their movie ratings.

You’ve now completed Module 3. In the next module, you’ll learn advanced projection techniques including relationship aggregation.

Chatbot

How can I help you today?