Understanding the source data

Before importing, assess your source data:

  • Format and structure — CSV/JSON vs proprietary; normalized vs denormalized

  • Update frequency — One-time, recurring, incremental, or real-time

  • Data quality — Accuracy, completeness, consistency

  • Unique identifiers — Does each entity have a stable ID for MERGE and relationships?

Choosing Your Import Environment

Use Neo4j Aura for importing data in this course:

Neo4j Aura

Neo4j Aura provides a cloud-hosted database with the Data Importer built in.

  1. Log in to Neo4j Aura at console.neo4j.io/graphacademy

  2. Create a new AuraDB Professional instance (free tier available, no credit card required)

  3. Once the instance is running, click Import in the left sidebar

  4. The Data Importer opens, ready to upload CSV files and design your graph model

Why AuraDB Professional?

AuraDB Professional provides additional features like Graph Data Science algorithms that you may want to explore after importing your data. The free tier is sufficient for this course.

Data Structure and Format

Check your source format before importing:

  • CSV or JSON — Import directly with Data Importer or LOAD CSV.

  • Proprietary format — Export to CSV or JSON first, or write a custom extraction script.

  • Normalized — Map each table to nodes/relationships; verify entities match your graph design.

  • Denormalized — Plan how to split joined columns into separate nodes or relationship properties.

Frequency

Decide before importing:

  • One-time or recurring? One-time use batch import; recurring needs a sync strategy.

  • Incremental or full? Incremental imports require unique IDs and MERGE; full imports can use CREATE then replace.

  • Real-time or batch? Real-time needs event streaming or CDC; batch uses scheduled LOAD CSV or ETL.

Data Quality

Assess your source data before importing. Check these five criteria:

  • Accuracy - verify that the data is correct and error-free by cross-referencing with reliable sources.

  • Validity - Ensure the data is applicable and suitable for the intended use or context.

  • Completeness - Check that you included all necessary data and there are no missing elements.

  • Reliability - Ascertain that the data comes from a credible and dependable source.

  • Consistency - Confirm that the data does not show discrepancies when compared over time or with similar datasets.

Some common issues with data format you should also check include:

  • Are quotes used correctly?

  • Are entities and values of the correct data type?

  • Are UTF-8 prefixes used (for example \uc)?

  • Do some fields have trailing spaces?

  • Do the fields contain binary zeros?

  • Are lists formed correctly?

  • Any obvious typos?

Uniquely identifying data

A Neo4j best practice is to use an ID as a unique property value for each node. Do all the entities in the source data have a unique identifier?

For example, if you are importing sales data into Customer and Product nodes, is there a unique identifier (ID) for each customer and product?

If the IDs in your source data are not unique for the same entity (node), you will have problems loading the data and creating relationships between existing nodes.

Check Your Understanding

Number of properties

What aspect of data quality ensures that data is suitable and applicable for its intended use or context? (select all that apply)

  • ✓ Accuracy

  • ✓ Validity

  • ✓ Completeness

  • ✓ Reliability

  • ✓ Consistency

Hint

It is important you assess data quality against multiple factors.

Solution

Accuracy, validity, completeness, reliability, and consistency are all important aspects of data quality.

Summary

In this lesson, you learned the importance of understanding the source data before importing it into Neo4j.

In the next lesson, you will explore the implications of the graph data model you will implement during the import.

Chatbot

How can I help you today?