Categories
Uncategorized

Graph Databases: Neo4j

Neo4j is a powerful database management system, that is capable of storing and managing multiple graphs contained in databases. It uses a query language called Cypher that has a visual and logical way of pattern matching nodes and relationships in a graph. I used neo4j for a few assignments back in 2019 when I took CSCC01. One of our assignments was actually building a REST api for accessing IMDB data, and one of the endpoints actually computed the Kevin Bacon degree. I have been thinking about bringing up neo4j sometime during the lecture but I guess now is the best time.

Below is a simple example of cypher query that that will return a graph of people nodes with property height > 1.8 connected to country nodes.

MATCH (p: Person)-[:FROM]->(c:Country)
WHERE p.height > 1.8
RETURN p, c;

I decided to write about this dbms because I personally use the neo4j sandbox when I need some visualization of class topics. They offer an online sandbox for free at https://neo4j.com/sandbox/ with many pre built datasets such as movies, 2019 women’s world cup, US Congress, movie reviews… You can even generate your own graph of tweets and mentions if you connect your twitter account!

We can also compute things like the IN and OUT degree of nodes, suppose we have a twitter-esque network structure stored.

We can compute the IN and OUT of Alice with the example below:

MATCH (u:User)
WHERE u.id = 'Alice'
RETURN u.id AS name,
size((u)-[:FOLLOWS]->()) AS follows,
size((u)<-[:FOLLOWS]-()) AS followers

Then our output:

Finally, we can also compute things such as Clustering Coefficient but I forgot how and it also wasn’t the first result on google. Anyhow, neo4j offers a great way to visualize various topics covered in class, I hope that you will all play around the sandbox, and maybe we could even use it for future demos in class!

Links: https://neo4j.com/sandbox/