Majority of Domains on the Web have Low Page Rank

While it is easy to assume that when looking at all of the websites on internet that most of them should have a low page rank, as there are hundreds of millions of websites and a small minority of them are known by a majority of the users, there needs to be analysis on the links between websites to be sure.

To reach a definite conclusion to this, data analysis was performed on 2 billion edges of 90 million hosts, where the edges are links from one host to another. It is important to note that page rank is dependent on the number of links and the quality of those links that a host receives, rather than the number of links it has to other hosts.

Figure 1: Scatter plot of number of incoming hosts vs. page rank. This shows that as the number of hosts that link to a host increase, the page rank of that host also increases.

Some intersecting details about the Common Crawl data is that the host that receives trhe most hosts is googleapis.com, the host that sends the most links is blogspot.com, and the host that has the most subdomain, hosts that are part of a larger host, is wordpress.com.

This topic is related to the course because it shows that subdomains of a domain can be considered strongly connected components because all the subdomains have links that go to and from the domain, so all the subdomains are connected together.

https://searchengineland.com/crawl-data-analysis-of-2-billion-edges-from-90-million-domains-offers-glimpse-into-todays-web-323417

Leave a Reply

Your email address will not be published. Required fields are marked *