Categories
Uncategorized

Analyzing the NBA with PageRank

Sport Leagues such as the NBA are entering a new age of statistical analysis with teams trying to come up with new techniques from statistical analysis to gain an edge over their competitors. Houston Rockets has been in the lead in taking basketball analytics to the next level. Spearheaded by Daryl Morey, Houston Rockets GM, a former consultant and MIT Sloan Graduate who did not play basketball, his prior work as a statistical consultant helped him gain a deeper understanding of basketball and how teams operate inefficiently. Morey’s fundamental insight involved taking a tremendous amount of three-point shots, although more difficult than midrange, or driving in the paint, they boast a slightly higher expected value. Morey also recognized that three-point attempts made from the corner had a higher percentage of going in because the shape of the three-point line made it slightly closer to the basket, many of the Rocket’s set plays are specifically designed to get strong three-point shooters open for these corner threes. The Rockets are shooting more 3s than should be humanly possible, and it’s working.

The introduction is one of many examples of statistics taking over the sports leagues around the world. One of the biggest problems in sports is coming up with an accurate ranking for how good the teams are.

A directed graph of NBA teams.

This graph was constructed where the for every win team X has over Y, the weight of the arrow increased by 1. Using this graph, we can apply the Scaled PageRank algorithm until convergence to produce the PageRank score for each team.

NBA PageRank Rankings
Ranking obtained through PageRank in comparison with the NBA standings.

This is a simple and elegant way to rank NBA teams and so with the Lakers winning the 2020 champion ship last month, I have heard a lot of talk regarding how they had a relatively easy path to the finals and didn’t face much competition. I thought this was quite interesting so I decided to make a YouTube video where I use the PageRank algorithm to rank the teams in the last 6 seasons, and find out which championship runs were more competitive than others.

In the video, I constructed a directed graph of each of the NBA teams, then for every win team X has over team Y, the weight of the directed edge (X, Y) was incremented by (final score for X/final score for Y), and suppose team Y wins against team X, then the opposite directed edge (Y, X) was incremented by (final score for Y/final score for X). I was hoping that this would create a more accurate depiction of how the teams fair against each other because many wins are competitive, going to multiple overtimes where as some other wins are complete blowouts. For example on Sunday, Nov 3rd, 2019, the Miami Heat smacked the Houston Rockets with a 129-100 victory. Then next day, the Rockets beat the Grizzlies 107-100. I thought that these wins should be treated differently to create a more accurate depiction of how the teams match up.

After running the PageRank algorithm on all 30 teams from the 2014-2015 season, I plotted the highest ranking team (green) and and teams that won the championship if the highest didn’t win (red). This goes to show that the best team doesn’t always win and that there are many other random factors that determines which team ends up with the championship.

Green: Highest PageRank of each season, Red: Champions of each season.

Skipping over some plots, the most important one was the following. Here by looking at the average playoff opponent PageRank scores of each championship team, we get to see how ‘hard’ it really was for them to win.

Through this graph, we see that the claims of Lakers having an easy championship run is false (outside of injuries). As the only team that has faced better opponents in the playoffs in the last 6 seasons was the Cavs back in 2016, and that was mainly because in the finals they played the Warriors who had an astounding PageRank of 0.0650, highest of any team by roughly 0.005 in the last 6 years or so.

Source: https://www.samford.edu/sports-analytics/fans/2018/Google-PageRank-A-New-Metric-for-Gauging-NBA-Team-Quality
Source code: https://github.com/H-Richard/NBA-pagerank

Categories
Uncategorized

Graph Databases: Neo4j

Neo4j is a powerful database management system, that is capable of storing and managing multiple graphs contained in databases. It uses a query language called Cypher that has a visual and logical way of pattern matching nodes and relationships in a graph. I used neo4j for a few assignments back in 2019 when I took CSCC01. One of our assignments was actually building a REST api for accessing IMDB data, and one of the endpoints actually computed the Kevin Bacon degree. I have been thinking about bringing up neo4j sometime during the lecture but I guess now is the best time.

Below is a simple example of cypher query that that will return a graph of people nodes with property height > 1.8 connected to country nodes.

MATCH (p: Person)-[:FROM]->(c:Country)
WHERE p.height > 1.8
RETURN p, c;

I decided to write about this dbms because I personally use the neo4j sandbox when I need some visualization of class topics. They offer an online sandbox for free at https://neo4j.com/sandbox/ with many pre built datasets such as movies, 2019 women’s world cup, US Congress, movie reviews… You can even generate your own graph of tweets and mentions if you connect your twitter account!

We can also compute things like the IN and OUT degree of nodes, suppose we have a twitter-esque network structure stored.

We can compute the IN and OUT of Alice with the example below:

MATCH (u:User)
WHERE u.id = 'Alice'
RETURN u.id AS name,
size((u)-[:FOLLOWS]->()) AS follows,
size((u)<-[:FOLLOWS]-()) AS followers

Then our output:

Finally, we can also compute things such as Clustering Coefficient but I forgot how and it also wasn’t the first result on google. Anyhow, neo4j offers a great way to visualize various topics covered in class, I hope that you will all play around the sandbox, and maybe we could even use it for future demos in class!

Links: https://neo4j.com/sandbox/