This lesson shows how to handle: NULL values, data type mismatches, self-referencing relationships, large datasets, duplicate data, and many-to-many with extra properties.
Running Cypher
The techniques in this lesson apply to Neo4j Aura. For Cypher-based solutions, use the Query tool in your AuraDB instance.
Handling NULL Values
Relational databases use NULL to represent missing or unknown values. When importing into Neo4j:
-
Properties with NULL values are not created - Neo4j does not store NULL properties
-
Use COALESCE in Cypher to provide default values if needed
-
Decide what NULL means — "Unknown" vs "not applicable" affects whether you omit, use a default, or use conditional logic
Example: Handling NULL Regions
In the Northwind database, the region column is often NULL for non-US customers:
When importing, choose one approach:
-
Skip the property - Let Neo4j omit it naturally (recommended if NULL means "not applicable")
-
Use a default value - Use COALESCE in Cypher to set region to "N/A" when NULL
-
Use conditional logic - Use CASE WHEN row.region IS NOT NULL THEN row.region END to omit the property when NULL
Handling Data Type Mismatches
Sometimes source data does not match expected types:
Mixed Data Types in Columns
If a column contains mixed types (e.g., numbers and text):
Fix before or during import:
-
Import as String, then convert with toInteger() or toFloat() in follow-up queries
-
Clean in SQL (e.g. remove non-numeric rows) before export
-
Use CASE WHEN to route values to different properties
Dates may be stored in different formats:
Convert to ISO format (YYYY-MM-DD) in SQL before export, or use datetime(row.order_date) in Cypher during import.
Handling Self-Referencing Relationships
The employees table has a self-referencing reports_to column for the management hierarchy.
Challenge: Order of Operations
When creating REPORTS_TO relationships:
-
All Employee nodes must exist first
-
Then create the relationships
This requires a two-pass import:
Pass 1: Create Employee nodes
Pass 2: Create REPORTS_TO relationships
Handling Large Datasets
For large imports, use batch processing:
Batch Processing
Instead of importing all data at once, process in batches:
Periodic Commit (Legacy)
For older Neo4j versions:
Handling Duplicate Data
Relational data may contain duplicates that need handling:
Strategies for Duplicates
-
Merge duplicates - Combine into a single node
-
Keep all records - Use a unique identifier to distinguish
-
Clean before import - Remove duplicates in the source
Handling Many-to-Many with Additional Data
When junction tables have extra columns beyond foreign keys, choose:
-
Use relationship properties - For simple additional data
-
Create intermediate nodes - For complex data or when you need to query the junction independently
Example: Order Details with Complex Data
If order_details had shipment tracking, return status, and other complex data:
Best Practices for Complex Data Import
Best practices from real-world import projects:
Bookmark the best practices checklist
Bookmark this section. These nine best practices serve as a reference checklist for any data import project you undertake in the future.
Best Practice 1: Create Constraints Before Import
Always create unique constraints before importing any data.
Why this matters:
-
Prevents duplicate nodes during import
-
Enables efficient MERGE operations
-
Creates indexes automatically for faster lookups
-
Catches data quality issues early
Verify constraints before importing
Run SHOW CONSTRAINTS after creating them to verify they are in place before starting the import.
Best Practice 2: Import in the Correct Order
Follow this sequence for reliable imports:
| Step |
Action |
Reason |
1 |
Create constraints |
Enable MERGE and prevent duplicates |
2 |
Import independent nodes first |
Categories, Suppliers, Shippers (no foreign keys to other entities) |
3 |
Import dependent nodes |
Products (needs Categories, Suppliers), Customers, Employees |
4 |
Import transaction nodes |
Orders (needs Customers, Employees, Shippers) |
5 |
Create relationships |
All referenced nodes must exist |
Northwind Import Order:
Best Practice 3: Use MERGE for Idempotent Imports
Make your imports re-runnable without creating duplicates:
When to use MERGE and CREATE:
| Command |
Use When |
Example |
CREATE |
First-time import with guaranteed unique data |
Initial bulk load with clean data |
MERGE |
Re-runnable imports, incremental updates |
Regular sync from source system |
MERGE + ON CREATE/ON MATCH |
Different behavior for new and existing |
Update timestamps only on changes |
Best Practice 4: Handle NULL Values Explicitly
Do not let NULL values cause silent failures:
NULL Handling Strategies:
| Strategy |
When to Use |
Implementation |
Omit property |
Value is truly optional |
Use CASE to return null |
Default value |
Need consistent property for queries |
Use COALESCE(row.field, 'Unknown') |
Separate label |
NULL indicates a category |
Add label like :NoRegion |
Best Practice 5: Validate Foreign Keys Before Creating Relationships
Avoid creating relationships to non-existent nodes:
Pre-import validation query:
Keep related data consistent:
Best Practice 7: Log and Monitor Import Progress
Track what is happening during import:
Create an import log:
Best Practice 8: Clean Up After Failed Imports
If an import fails partway through:
Backup before running DELETE operations
Always backup your database before running DELETE operations, especially DETACH DELETE which removes nodes and all their relationships.
Best Practice 9: Document Your Import Process
Create a runbook for your import:
Organizing Your Queries
In Neo4j Aura
In the Aura Query tool, create folders to organize the import queries from this lesson. The folder names reflect the execution order:
-
Folder: 01-Setup-Constraints - Save constraint creation queries (from Best Practice 1)
-
Folder: 02-Import-Independent-Nodes - Save queries for Categories, Suppliers, Shippers
-
Folder: 03-Import-Dependent-Nodes - Save queries for Products, Customers, Employees
-
Folder: 04-Import-Transaction-Nodes - Save queries for Orders
-
Folder: 05-Create-Relationships - Save relationship creation queries
-
Folder: 06-Validation - Save count verification and integrity check queries
-
Folder: Maintenance-Cleanup - Save cleanup queries for failed imports (from Best Practice 8)
-
Folder: Maintenance-Monitoring - Save progress monitoring queries (from Best Practice 7)
To create folders in Aura:
-
Open the Query tool
-
Click the + button next to "Saved Cypher"
-
Select New Folder
-
Name the folder and save queries inside it
In AuraDB, use the Query tool’s saved queries with descriptive names and prefixes (e.g., "01-Constraints", "02-Categories") to execute queries in the correct dependency order.
Best Practices Summary Checklist
Use this checklist for every import:
Check Your Understanding
NULL in Neo4j imports
What happens when you import a property with a NULL value into Neo4j?
-
❏ Neo4j stores the property with a NULL value
-
❏ Neo4j throws an error and stops the import
-
✓ Neo4j does not create the property at all
-
❏ Neo4j converts NULL to an empty string
Hint
In Neo4j, a property that has no value is omitted from the node; there is no stored NULL. Relational databases store NULL explicitly.
Solution
When importing data with NULL values, Neo4j does not create the property at all.
Unlike relational databases that explicitly store NULL values, Neo4j simply omits properties that have no value. This means nodes of the same label can have different sets of properties.
If you need to distinguish between "unknown" and "not applicable," use a specific value such as "N/A" or "Unknown" instead of relying on NULL.
Import Best Practices
What is the correct order for importing relational data into Neo4j?
-
❏ Import all nodes first, then create all constraints, then create relationships
-
❏ Import relationships first, then nodes, then constraints
-
✓ Create constraints first, then import independent nodes, then dependent nodes, then relationships
-
❏ The order does not matter as long as all data is imported
Hint
Relationships connect two nodes, so those nodes must exist first. Constraints prevent duplicates and speed MERGE; the safe order is constraints, then independent nodes, then dependent nodes, then relationships.
Solution
The correct order is:
-
Create constraints first - Enables MERGE operations and prevents duplicates
-
Import independent nodes — Nodes with no foreign keys such as Categories, Suppliers, Shippers
-
Import dependent nodes — Nodes that reference other nodes such as Products, Customers, Employees, Orders
-
Create relationships - All referenced nodes must exist first
This order ensures:
* No duplicate nodes are created
* All foreign key references can be resolved
* Relationships connect to existing nodes
* Import is idempotent and can be re-run safely
Summary
In this lesson, you learned strategies for handling:
-
NULL values in source data
-
Data type mismatches
-
Self-referencing relationships
-
Large dataset imports
-
Duplicate data
-
Complex many-to-many relationships
These techniques address the complexities of production data imports that simplified examples often omit.