Categories
Uncategorized

What to do without PageRank

When we often see Google we are impressed by how it is possible to search so many relevant things? The reason behind this is its searching algorithm PageRank. PageRank is an algorithm that Ranks a given page by its importance on the network from 0 to 10, the logic behind is that it accounts for the quality of inbound links and the and spread the quality of inbpound links to its outbound links equally. This algorithm is one of the main reasons why Google became such a big company. When Google first started the PageRank score was public but as Google has grown up the Public PageRank has become private and no longer visible to users. One of the main factors that made Google make the PageRank private is link spam in which consists of making unnatural links to increase the PageRank of some pages. Although google found a way to solve it (called nofollow), it makes sense to make the way they rank pages private so no one can alter the results of the ranking [1].

PageRank - Wikipedia
Figure1: Eample of the rank of different pages.

Although PageRank is no longer usable for people there are some good alternative metrics now a day. One example is the URL Rating by Ahrefs, it also calculates the importance of the webpage on the network in which measures the backlinks (link pointing to the page, external and internal), the more backlinks the better and ranks the pages with a score from 0 to 100. The ration does not only depends on the backlinks but also the Inlinks (links within the domain, i.e home page to contact page). The URL Rating gives you both domain level and page level metric, the page level means that it only rates the homepage of the website and not its subpages whereas the domain level metrics allows people to check the rank of the whole domain including all the subpages. The scale that URL Rating uses is logarithmic so that is easier to scale low-level rating and much more difficult to scale high-level ratings [2][3].

Figure2: Example of a result of using URL Rating UR(Pagfe-Level), DR(Domain Level)

Another Alternative is using the tool named InRank developed by Oncrawl in which consists similar to the PageRank algorithm. On top of the original PageRank algorithm, it disregards external and self-link and only account for links that have the tag <a href> [4]. InRank is used more for finding how the page is distributed within your website so it measures the popularity of a specific page is within your website. It rates the page from 0 to 10 where at least one page is 10 which usually is the homepage. It’s not just single rating the webpage but also analyses some aspects of the website in which includes the probability that a random user might see the page, the flow of the InRank, the Average InRank by the searching depth.

Figure3: Example that shows the inrank flow of a page.

In coclution, the tools mentioned can help us to determine the importance of the webpage or in other workds helps to determine the explosure of the website in the netwrok. Other than ranking they analises and provides statistics about how the page is performing. As we can see, altough PageRank is not longer available, it has influence and inpacted on other tools that we can use today as an alternative of the Google’s PageRank.

Reference:
[1] Darren, & *, N. (2020, March 21). PageRank is dead, what are the alternatives? 2020. Retrieved November 05, 2020, from https://cwtadvertising.co.uk/pagerank-is-dead-what-are-the-alternatives-2020
[2] Timsoulo. (2020, February 22). Google PageRank is NOT Dead: Why It Still Matters. Retrieved November 05, 2020, from https://ahrefs.com/blog/google-pagerank
[3] What is URL Rating (UR)? (n.d.). Retrieved November 05, 2020, from https://help.ahrefs.com/en/articles/72658-what-is-url-rating-ur

[4] Inrank. (n.d.). Retrieved November 05, 2020, from http://help.oncrawl.com/en/articles/404228-inrank

Categories
Uncategorized

Analyzing the NBA with PageRank

Sport Leagues such as the NBA are entering a new age of statistical analysis with teams trying to come up with new techniques from statistical analysis to gain an edge over their competitors. Houston Rockets has been in the lead in taking basketball analytics to the next level. Spearheaded by Daryl Morey, Houston Rockets GM, a former consultant and MIT Sloan Graduate who did not play basketball, his prior work as a statistical consultant helped him gain a deeper understanding of basketball and how teams operate inefficiently. Morey’s fundamental insight involved taking a tremendous amount of three-point shots, although more difficult than midrange, or driving in the paint, they boast a slightly higher expected value. Morey also recognized that three-point attempts made from the corner had a higher percentage of going in because the shape of the three-point line made it slightly closer to the basket, many of the Rocket’s set plays are specifically designed to get strong three-point shooters open for these corner threes. The Rockets are shooting more 3s than should be humanly possible, and it’s working.

The introduction is one of many examples of statistics taking over the sports leagues around the world. One of the biggest problems in sports is coming up with an accurate ranking for how good the teams are.

A directed graph of NBA teams.

This graph was constructed where the for every win team X has over Y, the weight of the arrow increased by 1. Using this graph, we can apply the Scaled PageRank algorithm until convergence to produce the PageRank score for each team.

NBA PageRank Rankings
Ranking obtained through PageRank in comparison with the NBA standings.

This is a simple and elegant way to rank NBA teams and so with the Lakers winning the 2020 champion ship last month, I have heard a lot of talk regarding how they had a relatively easy path to the finals and didn’t face much competition. I thought this was quite interesting so I decided to make a YouTube video where I use the PageRank algorithm to rank the teams in the last 6 seasons, and find out which championship runs were more competitive than others.

In the video, I constructed a directed graph of each of the NBA teams, then for every win team X has over team Y, the weight of the directed edge (X, Y) was incremented by (final score for X/final score for Y), and suppose team Y wins against team X, then the opposite directed edge (Y, X) was incremented by (final score for Y/final score for X). I was hoping that this would create a more accurate depiction of how the teams fair against each other because many wins are competitive, going to multiple overtimes where as some other wins are complete blowouts. For example on Sunday, Nov 3rd, 2019, the Miami Heat smacked the Houston Rockets with a 129-100 victory. Then next day, the Rockets beat the Grizzlies 107-100. I thought that these wins should be treated differently to create a more accurate depiction of how the teams match up.

After running the PageRank algorithm on all 30 teams from the 2014-2015 season, I plotted the highest ranking team (green) and and teams that won the championship if the highest didn’t win (red). This goes to show that the best team doesn’t always win and that there are many other random factors that determines which team ends up with the championship.

Green: Highest PageRank of each season, Red: Champions of each season.

Skipping over some plots, the most important one was the following. Here by looking at the average playoff opponent PageRank scores of each championship team, we get to see how ‘hard’ it really was for them to win.

Through this graph, we see that the claims of Lakers having an easy championship run is false (outside of injuries). As the only team that has faced better opponents in the playoffs in the last 6 seasons was the Cavs back in 2016, and that was mainly because in the finals they played the Warriors who had an astounding PageRank of 0.0650, highest of any team by roughly 0.005 in the last 6 years or so.

Source: https://www.samford.edu/sports-analytics/fans/2018/Google-PageRank-A-New-Metric-for-Gauging-NBA-Team-Quality
Source code: https://github.com/H-Richard/NBA-pagerank

Categories
Uncategorized

After Page Rank Is No Longer Visible

Google has been using the page ranking algorithm to best serve their clients with the most relevant search result. There was a time when page ranks are still visible to all users – it was no longer a thing after 2010 that Google hid all of them. However, not being visible does not mean that Google has stopped using it. According to Erika, not only that Google kept the PageRank algorithm after 2010, it is indeed updated in recent years and still plays a very important role in serving clients with the best search results in 2020. [1]

Many companies have tried to guess the latest Google’s page ranking algorithm. Some of them even developed their alternative algorithms. One example is SEO PowerSuite. Their self-owned Domain InLink Rank provides an alternative way to rank the most valuable pages. Similar to Google’s old page rank algorithm, it takes the number of incoming edges and their weights into account to calculate a page’s rank. However, there is no detailed formula found online that well-explained how these factors work exactly in the formula. Instead, this blog post is interested in one of the experiments conducted last year by SEO PowerSuite on how well their Domain Inlink Rank algorithm performed compared to Google’s SERP (Search Engine Result Pages). [2][3]

The experiment targeted around 33500 keywords and their search results. Only the first 30 results from each keyword search were kept, which results in over 1 billion pages. After comparing the results produced by the Domain Inlink Rank algorithm and by searching on Google, it turned out that they are positively correlated with a correlation coefficient of 0.128. This indicates that a page is likely to be ranked higher if it is also ranked high among all search results in Google. However, according to the definition of the correlation coefficient, any value under 0.3 is considered “weak”. Therefore, a coefficient value of 0.128 does not make a significant point.

Despite that, after comparing the experiment results from other page ranking algorithms, SEO makes a fair point that its InLink Rank algorithm has better performance than other alternatives. Comparing the “next best competitor” after SEO PowerSuite, Moz has published their experiment results on similar setups. It turned out that their highest correlation coefficient (0.12076) was even weaker by relatively 6%.

(figure. 1) Comparing the performance of InLink Rank with the four products by Moz, in terms of correlation coefficients.
Image source: https://cdn1.link-assistant.com/images/news/google-page-rank-2019/screen-07.png

Aside from that, it is interesting to find out that SEO PowerSuite has been working on detecting spamming hub pages and providing some proper instruction for web page owners to improve on their page rank. The top two approaches are qualifying backlinks and making use of internal links.

On the one hand, backlinks refer to those the website points to. Under this InLink Rank model, all websites are authorities and hubs at the same time. Frequently checking if any of them has a low-rank score and removing those links that point to low-quality sites can prevent loss of page rank on the next round of page rank update. A tool named “SEO SpyGlass” checks InLink Rank scores for those backlinks, as well as for potential risks and errors for backlink pages’ authority.

(figure. 2) An example of using the SEO SpyGlass tool to analyze the InLink page rank for backlink pages.
Image source: https://cdn1.link-assistant.com/images/news/google-page-rank-2019/screen-10.png

On the other hand, taking good use of internal links can save a lot of time. It is described that internal links act like a “page rank storage” under the InLink Model. To maximize the use of internal links, it is important to make sure there are no orphan pages under control because that will be a waste of source. Having pages linking to each other under a website makes sure page rank flows between pages. A tool named “WebSite Auditor” visualizes such processes and makes it easier to find any orphan pages.

(figure. 3) An example of using the WebSite Auditor tool to analyze the structure of a website and to detect if there are any orphan pages.
Image source: https://cdn1.link-assistant.com/images/news/google-page-rank-2019/screen-16.png

It is exciting to see the materials we just covered in the lecture (3 days ago) are doing some work in the real world industry. It is also important in helping me understand these articles and diagrams better since they are so closely related to what we learned. All sources are put under “Reference” below, please feel free to dig in and read more!

Reference
1. Some description of Google PageRank and why it is still important:
https://www.semrush.com/blog/pagerank/
2. The experiment on Domain InLink Rank:
https://www.link-assistant.com/news/inlink-rank-correlation.html
3. The analysis of the experiment, and more relative materials:
https://www.link-assistant.com/news/google-page-rank-2019.html