Introduction
You’ve learned how to project graphs with undirected relationships and weights. You’ve used Node Similarity in the previous lesson to find similar movies and write relationships back to the database.
Now it’s time to take Node Similarity further by using it in stream mode on a directed, weighted, bipartite graph to analyze user similarity without writing back to the database.
In this challenge, you’ll:
-
Project a user-movie bipartite graph with weighted relationships
-
Use Node Similarity in stream mode to find users with similar taste profiles
-
Analyze the results to understand rating patterns
Your task
Step 1: Project the graph
First, open the sandbox panel to get started.
Project a graph called 'user-movie-ratings' that connects users to movies through their ratings, using rating scores as relationship weights.
Your projection should:
-
Include both
UserandMovienodes -
Preserve node labels (User and Movie)
-
Use the
ratingproperty as the relationship weight
Think about the pattern: User → RATED → Movie
Write the complete projection query, including:
-
The correct MATCH pattern
-
Configuration to preserve both User and Movie labels
-
Relationship properties configuration for the
ratingweight
Verify your projection was created successfully using gds.graph.list().
Step 2: Review node similarity
Visit the Node Similarity documentation.
Review the configuration parameters, especially:
-
How to use weights with
relationshipWeightProperty -
How
topKcontrols the number of similar nodes returned -
The difference between stream, mutate, and write modes
Step 3: Run node similarity
Run Node Similarity on your projected graph in stream mode using the rating weights to find users with similar taste profiles.
Return the top 20 most similar user pairs with their similarity scores and the movies they’ve both rated.
Hints:
-
Use
relationshipWeightProperty: 'rating'to tell Node Similarity to consider rating weights -
Set
topK: 3to limit to the top 3 most similar users for each user -
Convert node IDs to user names using
gds.util.asNode(nodeId).name -
Order by similarity score to see the strongest matches
-
To find shared movies, you can query the database using the user names
When you’ve completed your analysis, answer the questions below.
Solution approach
Details
Step 1: Project the user-movie graph
MATCH (source:User)-[r:RATED]->(target:Movie) // (1)
WITH gds.graph.project( // (2)
'user-movie-ratings', // (3)
source, // (4)
target, // (5)
{
sourceNodeLabels: labels(source), // (6)
targetNodeLabels: labels(target), // (7)
relationshipProperties: r { .rating } // (8)
}
) AS g
RETURN g.graphName AS graph, g.nodeCount AS nodes, g.relationshipCount AS rels // (9)Projection breakdown
-
Match User nodes connected to Movie nodes through RATED relationships
-
Call the GDS projection function
-
Name the projection 'user-movie-ratings'
-
Include source (User) nodes
-
Include target (Movie) nodes
-
Preserve User labels
-
Preserve Movie labels
-
Include the rating property from relationships as weight
-
Return projection statistics
Key components:
-
The MATCH pattern connects users directly to movies they rated
-
sourceNodeLabelsandtargetNodeLabelspreserve the bipartite structure -
relationshipProperties: r { .rating }captures the rating weight (1-5 stars) -
The directed relationships ensure Node Similarity only compares users (not users to movies)
-
The result is a weighted, directed bipartite user-movie network
Step 2: Run Node Similarity
CALL gds.nodeSimilarity.stream( // (1)
'user-movie-ratings', // (2)
{
relationshipWeightProperty: 'rating', // (3)
topK: 3 // (4)
}
)
YIELD node1, node2, similarity // (5)
RETURN
gds.util.asNode(node1).name AS user1, // (6)
gds.util.asNode(node2).name AS user2, // (7)
similarity // (8)
ORDER BY similarity DESC // (9)
LIMIT 20 // (10)Algorithm breakdown
-
Call Node Similarity in stream mode
-
Run on 'user-movie-ratings' projection
-
Configure algorithm to use rating weights when calculating similarity
-
Limit to top 3 most similar users for each user
-
Yield pairs of similar node IDs and their similarity scores
-
Convert first node ID to user name
-
Convert second node ID to user name
-
Return the similarity score
-
Sort by similarity in descending order to see strongest matches
-
Limit results to top 20 user pairs
Optional: Find shared movies
You can extend the analysis to see which movies similar users both rated:
CALL gds.nodeSimilarity.stream( // (1)
'user-movie-ratings',
{
relationshipWeightProperty: 'rating',
topK: 3
}
)
YIELD node1, node2, similarity // (2)
WITH gds.util.asNode(node1).name AS user1,
gds.util.asNode(node2).name AS user2,
similarity
ORDER BY similarity DESC
LIMIT 5 // (3)
MATCH (u1:User {name: user1})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2:User {name: user2}) // (4)
WHERE r1.rating >= 4 AND r2.rating >= 4 // (5)
RETURN user1, user2, similarity, collect(m.title)[0..5] AS shared_high_rated_movies // (6)What the results mean:
-
High similarity scores (close to 1.0) indicate users with very similar rating patterns
-
The
ratingweights ensure that higher ratings (4-5 stars) contribute more to similarity -
Similar users can be used for personalized recommendations
-
Shared highly-rated movies reveal common taste profiles
-
This is the foundation of collaborative filtering
Check your understanding
What Does Node Similarity Do?
What is the primary purpose of the Node Similarity algorithm?
-
❏ It finds the shortest path between similar nodes
-
✓ It computes similarity scores between nodes based on their shared neighbors
-
❏ It detects communities by grouping similar nodes together
-
❏ It ranks nodes by their importance in the network
Hint
Think about what "similarity" means in a graph—how do you measure if two nodes are alike?
Solution
Node Similarity computes similarity scores between nodes based on their shared neighbors.
The algorithm works by:
-
Comparing each node to others in the graph
-
Counting how many neighbors they have in common
-
Calculating a similarity score (typically Jaccard similarity or overlap coefficient)
-
Optionally filtering to keep only the most similar pairs (using
topK)
In a user-movie bipartite graph, users are similar if they rated many of the same movies. When using relationshipWeightProperty, the algorithm considers rating values when calculating similarity—users who gave similar ratings to the same movies are more similar.
Stream vs. Mutate Mode
What is the key difference between using Node Similarity in stream mode versus mutate mode?
-
✓ Stream returns results for analysis; mutate creates relationships in the in-memory projection
-
❏ Stream is faster than mutate mode
-
❏ Stream can use weights; mutate cannot
-
❏ Stream finds more similar pairs than mutate
Hint
Think about what happens to the similarity results in each mode—where do they go?
Solution
Stream returns results for analysis; mutate creates relationships in the in-memory projection.
Stream mode (gds.nodeSimilarity.stream):
-
Returns similarity pairs as query results
-
Use for analysis, reporting, or further Cypher processing
-
Does not modify the projection or database
-
Perfect for exploration and one-time analysis
Mutate mode (gds.nodeSimilarity.mutate):
-
Creates new
SIMILARrelationships in the in-memory projection -
Use when you want to run additional algorithms on the similarity network
-
Allows chaining algorithms (e.g., similarity → community detection)
-
Still doesn’t touch the database
Write mode does the same as mutate but persists relationships to the database.
In this challenge, you used stream mode because you only needed to analyze similarity pairs, not create a similarity network for further algorithms.
How Do Weights Affect Node Similarity?
How does using relationshipWeightProperty: 'rating' affect the Node Similarity calculation in a user-movie graph?
-
❏ It makes the algorithm run faster
-
❏ It filters out low-rated movies before calculating similarity
-
✓ It gives more importance to movies with higher ratings when computing similarity scores
-
❏ It only compares users who rated at least 5 movies
Hint
Think about two users who both rated the same movies—one gave 5 stars, the other gave 2 stars. Should they be considered similar?
Solution
It gives more importance to movies with higher ratings when computing similarity scores.
Without weights, Node Similarity only considers which movies users rated in common—not how they rated them.
With relationshipWeightProperty: 'rating':
-
Movies with higher ratings (4-5 stars) contribute more to similarity
-
If two users both love the same movies (high ratings), they’re more similar
-
If two users both disliked the same movies (low ratings), they’re still somewhat similar
-
If one user loved a movie (5 stars) and another hated it (1 star), they’re less similar
Example:
-
User A rates: Movie1=5, Movie2=5, Movie3=5
-
User B rates: Movie1=5, Movie2=5, Movie3=2
-
User C rates: Movie1=2, Movie2=3, Movie3=1
Users A and B are more similar (they both loved Movies 1 and 2) than A and C (who rated the same movies very differently).
This weighted approach is essential for recommendation systems—you want to recommend based on similar preferences, not just similar viewing history.
What Does the topK Parameter Control?
What does setting topK: 3 do in Node Similarity?
-
❏ It only analyzes the top 3 most connected nodes
-
❏ It limits the algorithm to 3 iterations
-
✓ It returns only the 3 most similar nodes for each node in the graph
-
❏ It requires nodes to have at least 3 neighbors to be included
Hint
The "K" in topK refers to "K nearest neighbors"—how many similar matches do you want for each node?
Solution
It returns only the 3 most similar nodes for each node in the graph.
topK controls how many similarity relationships are kept per node. With topK: 3:
-
For each user, only the 3 most similar users are returned
-
This prevents overwhelming results (without topK, you’d get similarities for all possible pairs)
-
It’s memory-efficient and practical for recommendations
Example:
If you have 100 users:
- Without topK: Up to 4,950 similarity pairs could be returned (100 × 99 ÷ 2)
- With topK: 3: Maximum of 300 pairs (100 users × 3 similar users each)
When to adjust topK:
-
Smaller topK (1-3): For "best match" recommendations, faster processing
-
Larger topK (10-20): For more comprehensive similarity networks, broader recommendations
-
No topK: When you need all similarity scores above a threshold (use
similarityCutoffinstead)
For recommendation systems, topK: 3-10 is typical—you want the most similar users, not all somewhat-similar users.
Interpreting Similarity Scores
What does a similarity score of 0.85 between two users indicate?
-
❏ They rated exactly 85% of the same movies
-
✓ They have highly overlapping rating patterns with similar preferences
-
❏ One user copied 85% of the other user’s ratings
-
❏ They are in the same community with 85% certainty
Hint
Similarity scores range from 0 (completely different) to 1.0 (identical). What does 0.85 tell you about their rating overlap?
Solution
They have highly overlapping rating patterns with similar preferences.
Similarity scores in Node Similarity represent how "alike" two nodes are based on their connections. A score of 0.85 is high and indicates:
What it means:
-
The users rated many of the same movies
-
When weights are used, they gave similar ratings (both liked or both disliked)
-
Their taste profiles are very compatible
-
Recommendations based on one user’s ratings would likely work well for the other
Score interpretation:
-
0.9-1.0: Nearly identical preferences (rare)
-
0.7-0.9: Very similar, strong match for recommendations
-
0.5-0.7: Moderately similar, some overlap
-
0.3-0.5: Weakly similar, little overlap
-
< 0.3: Very different preferences
In practice:
For recommendation systems, you typically: - Use scores > 0.7 for confident recommendations - Use scores 0.5-0.7 for "you might also like" suggestions - Ignore scores < 0.5 as too dissimilar
The exact score depends on the similarity metric used (Jaccard, overlap, etc.), but higher always means more similar.
Summary
You’ve successfully projected a weighted bipartite graph connecting users and movies through ratings and applied Node Similarity in stream mode to find users with similar taste profiles.
This challenge demonstrates a core GDS workflow: understanding your data model, designing the right projection, configuring relationship weights, and applying algorithms in stream mode to analyze patterns without modifying the database.
Node Similarity on weighted bipartite graphs is the foundation of collaborative filtering—finding users with similar preferences to make personalized recommendations. By using rating weights, you ensured that highly-rated movies contribute more to similarity calculations. The directed relationships ensured the algorithm only compared users based on their movie ratings.
You’ve now completed Module 3. In the next module, you’ll learn advanced projection techniques including relationship aggregation.