Member-only story

A quick introduction to Neo4j

Rens Verhage
7 min readNov 17, 2024

--

AI generated depiction of a graph database

Over the past years, graph databases have become increasingly popular. Many large companies like Facebook, LinkedIn and Twitter use them heavily for their social media networks. However, when selecting a database solution for a project, this category of NoSQL databases is often overlooked in favor of the traditional relational database or RDBMS for short.

While most of the time an RDBMS will be a good fit for our projects, they do come with downsides that might show over time.

In this article, we’ll set our first foot in the world of graph databases using Neo4j. Neo4j is the most popular graph database solution to date. It’s open source and written in Java. The source code is hosted on Github. All examples in this article are based on the movie recommendation dataset you can find in Neo4j’s online sandbox environment.

Before we go into Neo4j, let’s talk about relational databases for a bit and see where they fall short. This will give us a better understanding when graph databases might be a better fit.

Downsides of the relational model

The relational database model is based on a ledger like table structure. Tables represent our real world entities, each in its own table row. Rows have a primary key, a unique identifier within their table.

We define a relationship between two tables by creating a column in one table in which we store the primary key values of the rows in the other table. These key values are foreign to the table on the owning side of the relationship. We put a foreign key constraint on the column holding these values, to protect the relationships from becoming invalidated.

When writing SQL to query data, we have to join our tables together using these foreign key relations. Depending on the number of relations, queries can grow complex fairly quickly. This is certainly the case in highly normalised databases.

Furthermore, joining a lot of tables can become a performance issue. We can mitigate this by use of indices, but only to some extent. Queries with multiple (nested) joins are not only prone to bad performance, they’re more than often quite difficult to read.

--

--

Rens Verhage
Rens Verhage

Written by Rens Verhage

Software engineer. World traveller and scuba diver. Cat lover. Likes rock ‘n roll music and tattoos. Aspiring marathon runner.

No responses yet

Write a response