Majority of Domains on the Web have Low Page Rank

While it is easy to assume that when looking at all of the websites on internet that most of them should have a low page rank, as there are hundreds of millions of websites and a small minority of them are known by a majority of the users, there needs to be analysis on the links between websites to be sure.

To reach a definite conclusion to this, data analysis was performed on 2 billion edges of 90 million hosts, where the edges are links from one host to another. It is important to note that page rank is dependent on the number of links and the quality of those links that a host receives, rather than the number of links it has to other hosts.

Figure 1: Scatter plot of number of incoming hosts vs. page rank. This shows that as the number of hosts that link to a host increase, the page rank of that host also increases.

Some intersecting details about the Common Crawl data is that the host that receives trhe most hosts is googleapis.com, the host that sends the most links is blogspot.com, and the host that has the most subdomain, hosts that are part of a larger host, is wordpress.com.

This topic is related to the course because it shows that subdomains of a domain can be considered strongly connected components because all the subdomains have links that go to and from the domain, so all the subdomains are connected together.

https://searchengineland.com/crawl-data-analysis-of-2-billion-edges-from-90-million-domains-offers-glimpse-into-todays-web-323417

Social Networks altered by growing Information Network

With the world becoming more interconnected with the internet, people can share their ideologies with a significant amount of people without meeting them in person. This results in the barriers of communication such as distance and time differences being removed and the ability to access discussions becomes more available to the general public.

With social media and online discussion boards not being controlled by regulations and the ability for non-transparency, specific ideals can be broadcast throughout the social network without any accountability. This leads to

The inclusion of natural barriers like distance are what prevent the intermingling of individuals in different areas, leading to natural filters that regulate the flow of information. Without these barriers, location is no longer a main factor when analyzing the social network.

Figure 1: The structure of the social network affecting the voters’ perceptions of information

As a result, the analysis of social and information networks cannot solely rely on the location of a individual node in a geographic area as its edges connecting other nodes may be in significantly different areas.

This is interesting with respect to the course because a simple graph with edges connecting nodes that communicate with one another does not give a complete picture of the situation, as the actual network has location as part of the structure and the channels of communication can be unpredictable.