Categories
Uncategorized

Finding structure in Wikipedia Edit Activity

Information cascades are a fascinating occurrence, especially when it occurs in a free and public system in which one would assume that actions and reactions would be more randomized. One wouldn’t be faulted to think that since there are many variables and sometimes random failures or successes that the cascade would eventually stop or simply have a short life span. However a recent study “Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach” gives another example of an information cascade on ironically the free source of free information in the world: Wikipedia.

Wikipedia is an inherently untrustworthy source as anyone is allowed to change information on any article or page. But in the same way this nature of being free and public changes it from an untrustworthy resource to a trustable one because so many different users including experts have verified the information. However, when a malicious, misinformed, or even valid actor changes a fact on Wikipedia there are information cascades that occur due to an initial update to a page which results in another page adopting the same information and so on and so forth.

The research team found interesting results regarding the cascade structure and properties. The study collected start from January 1st 2015 to March 31st 2015 from the English Wikipedia, the text that was changed, and by which user.

The size of the cascade was often small, the result summary states 13.7 as the average cascade length with the smallest being 2 and largest being 8068. The average duration of a cascade was also much closer to the shortest path duration than the longest. The implication of both a short cascade length and duration implies that these blindly trusted claims are not spread too far.

From this graph it can be said that information cascades on Wikipedia tend to form strongly connected components and are likely to group together based on the information benign cascaded. This makes sense as pages that share information are more likely to be related to one another.

This revelation can potentially be used by WIkipedia to stop or confirm that the right and accurate information is being spread. For example after a newsworthy incident happens many Wikipedia pages are likely to be updated, by using this sort of recognition Wikipedia can fact check and official support or prevent the information spread.

https://wikiworkshop.org/2016/papers/Wiki_Workshop__WWW_2016_paper_2.pdf
Categories
Uncategorized

Measuring Ethereum-based ERC20 Token Networks

Ethereum is a blockchain very much like bitcoin that allows for secure, anonymous, and decentralized transfer of funds. However on top of allowing currency exchange, Ethereum also allows for “smart-contracts” or code that you run on the Ethereum network. This allows for the creation of ERC20 Tokens which are basically cryptocurrencies that piggyback on the Ethereum network using the feature of executable code. 

At the height of the cryptocurrency bubble in 2017-2018 many startups used ICO (initial coin offerings) to raise funds. This caused the creation of various new cryptocurrencies on a massive scale. Victor and Lüders looked at the first 6.3 million blocks on the Ethereum blockchain and gave an overview of 64,000 ERC20 tokens while finding some interesting facts about the movement and transfer of funds.

In this paper they describe vertices or nodes as individual accounts or public/private key pairs as an account in the crypto world is simple a combination of a public key (an address to where funds can be sent) and a private key (a string that allows verification that the legitimate owner sent these funds). Edges are transfers of funds/currency between 2 nodes.

The first item of note is that even though there were 64,000 different ERC20 tokens on the Ethereum network most of the network transfers (over 90%) were in the top 1,000 token contracts (ERC20 tokens). This indicated that a power-law existed in just the contracts themselves where certain tokens were much more popular than others and most were sparsely used.

The second item of note is that the degree distributions for each of the individual token networks within the Ethereum Network more or less follow a power-law. In an ironic twist the distribution of the alpha (exponent value) follows what seems to be a normal distribution for the top 1,000 tokens.

For both the indegree and outdegree the alpha of the top 1,000 tokens clusters around the high 4, low 5 values. This is generally high considering most alpha’s fall within the 2-3 range, there are external factors that may cause this like a lack of trust between entities, and the fact that services, stores, and middlemen for transfers are common unlike a social network.

This leads naturally into the third matter of note, clustering coefficients. The global clustering coefficient for the entire network of token transfers is 0.0001062 and it has a local clustering coefficient is 0.3042.

In general, the network of ERC20 on the Ethereum Network gives us insight into how fiat currencies (centralized, government back currencies) may move. In addition, these networks tend to have a large amount of in and out edges around very few nodes that are exchanges or other large institutions that provide services for the given token. Most entities then interact with these high edged nodes, and sparsely between individual entities.

Source: https://fc19.ifca.ai/preproceedings/130-preproceedings.pdf