Prepare data for import

Preparing your data for import

You have learned how to create and connect to your Aura database instance.

In this lesson, you will learn: How to prepare your data and understand the Data Importer tool before creating your graph model. You will use the Movies dataset as an example to learn data preparation best practices.

In this lesson, you will learn how to:

  • Understand the Data Importer service and how it works

  • Prepare your CSV data for import

  • Add your data source to Aura

  • Review your data structure before modeling

Data Importer service

The Data Importer provides a visual interface for loading CSV data into your Neo4j instance.

Instead of writing complex import scripts, visually map your CSV data to graph nodes and relationships. The Data Importer provides a visual interface that makes it easy to transform tabular data into a graph structure.

How it works in the background: The Data Importer: 1. Reads your CSV file and analyzes its structure 2. Lets you map CSV columns to node properties (e.g., movieId → Movie node ID, title → Movie property) 3. Creates relationships between nodes based on your mappings (e.g., Person ACTED_IN Movie) 4. Generates Cypher statements that insert your data efficiently 5. Executes the import and verifies all nodes and relationships were created

The diagram shows the complete import process from source files to your Neo4j database.

Import process diagram showing the steps from CSV to Neo4j

Step 1: Prepare your movie dataset

The Movies dataset contains information about movies, actors, and their relationships in a tabular format. You will learn how to transform this CSV data into a graph structure. Proper data preparation ensures a smooth import and an effective graph model.

How to prepare your data before importing:

  1. Download the sample movie data: movies.csv

  2. Save it to your local machine (e.g., Downloads folder)

  3. Open the file to preview its structure—you’ll see columns like movieId, title, personId, name, and characters

Data preparation checklist:

Before importing, verify your CSV data:

  • Unique identifiers exist: Ensure each movie has a unique movieId and each person has a unique personId. Duplicate IDs will create duplicate nodes or cause import errors.

  • Data types are consistent: Check that movieId and personId are consistently formatted (all numbers or all strings). Mixed types can cause mapping issues.

  • Missing values are handled: Identify any empty cells. Decide whether to skip rows with missing data or use default values. For recommendations, missing actor names might break relationship creation.

  • Special characters are properly encoded: Ensure characters like quotes, commas, or newlines are properly escaped or use a different delimiter.

  • Column headers are clear: Verify column names are descriptive and don’t contain spaces or special characters (use movieId not Movie ID).

  • Relationships are identifiable: Confirm which columns connect entities (e.g., personId and movieId together indicate an ACTED_IN relationship).

What’s in the dataset: This CSV contains information about movies and the actors who appeared in them. Each row represents an actor’s role in a movie, which we’ll model as a relationship in the graph. This structure enables recommendation queries like "Find all movies with Tom Hanks" or "Find actors who worked together."

Example data structure:

movieId,title,personId,name,characters
123,The Matrix,456,Keanu Reeves,"Neo"
123,The Matrix,789,Laurence Fishburne,"Morpheus"

This structure shows that both Keanu Reeves and Laurence Fishburne acted in The Matrix, creating two ACTED_IN relationships in your graph.

Step 2: Add your data source to Aura

How to do it:

  1. In the Aura Console, navigate to your instance

  2. Click on Data Importer in the left sidebar

  3. Click New data source button

  4. Select CSV as the data source type (since your movie data is in CSV format)

  5. Click Upload CSV to open the file dialog

  6. Select the movies.csv file from your local machine

CSV files are easy to work with and commonly used for data imports. The Data Importer reads the CSV structure and helps you map it to graph nodes and relationships.

importer_source
importer_drop

Step 3: Review your data structure

Once the file is uploaded, you’ll see the Data Importer interface showing your CSV structure.

The Data Importer displays your CSV columns (movieId, title, personId, name, characters) and sample data rows. This preview helps you understand what data you’re working with before creating your graph model.

Understanding your data structure before modeling helps you make better decisions about: * Which columns become node properties * Which columns connect entities (relationships) * What data types to use * How to handle missing or duplicate values

importer_manual

Summary

In this lesson, you prepared your movie dataset for import into your Aura instance. You:

  • Learned about the Data Importer: Understood how it transforms CSV data into graph structures

  • Prepared your CSV data: Verified unique identifiers, data types, and proper formatting

  • Added your data source: Uploaded your movie CSV to the Aura Console

  • Reviewed your data structure: Examined the CSV columns and sample data before modeling

Proper data preparation ensures you have clean, well-structured data for import. The CSV structure (movieId, title, personId, name, characters) will become nodes and relationships in your graph.

Check your understanding

Data Preparation

What should you verify in your CSV data before importing?

  • ❏ All columns must have the same data type

  • ✓ Unique identifiers exist for each entity (like movieId, personId)

  • ❏ The CSV must have exactly 5 columns

  • ❏ All rows must have values in every column

Hint

Unique identifiers ensure that each movie or person becomes a single node in the graph, preventing duplicates.

Solution

Unique identifiers exist for each entity (like movieId, personId).

Unique identifiers (like movieId for movies, personId for actors) ensure that each entity becomes a single node in the graph. Without unique identifiers, you might create duplicate nodes or cause import errors.

Summary

In this lesson, you prepared your movie dataset for import into your Aura instance. You:

  • Learned about the Data Importer: Understood how it transforms CSV data into graph structures

  • Prepared your CSV data: Verified unique identifiers, data types, and proper formatting

  • Added your data source: Uploaded your movie CSV to the Aura Console

  • Reviewed your data structure: Examined the CSV columns and sample data before modeling

Proper data preparation ensures you have clean, well-structured data for import. The CSV structure (movieId, title, personId, name, characters) will become nodes and relationships in your graph.

What’s next: In the next lesson, you’ll create your graph data model by defining Movie and Person nodes and ACTED_IN relationships.

For more information on data preparation, see the Neo4j Aura Import documentation.

In the next lesson, you’ll create your graph data model by mapping your CSV columns to nodes and relationships.

Chatbot

How can I help you today?