Leveraging Community Detection Algorithms for Machine Learning

Hey everyone.

Today I want to discuss an interesting new study that came out recently involving the usage of social networks as a tool to group datasets for machine learning models. A paper named The Power of Communities: A Text Classification Model with Automated Labeling Process Using Network Community Detection published by Minjun Kim and Hiroki Sayama on September 25^th, 2019 highlights a useful application of network logic and analysis as it relates to training machine learning text classification models.

If anyone has worked in data science before, you’ll understand the enormous amount of time that is spent on ETL – extract, transform, and load. On top of that, the data needs to be labelled and feature engineered to be able to extract useful insights from it. These two researchers describe how supervised and semi-supervised data are often associated with pre-defined keywords or data which impacts classification. The other clustering algorithms, such as k-means relies biases towards words which repeatedly appear in different contexts, biasing the model and introducing unnecessary ambiguity. The paper explains Kim and Sayama’s methods on how to apply a network community detection algorithm in grouping the preprocessed sentences into different communities and trying to extract insights from that.

Particularly interesting, is that the method for network detection in their paper is the Louvain modularity algorithm for network community detection. This algorithm is based on evaluating density of network links. This relates well to our class discussions on strong and weak ties, as the Louvain algorithm measures modularity as a value between (-1, 1) of the density of links inside communities compared to links between communities. This Louvain method actually relates to the Girvan-Newman algorithm as it was based on that algorithm, instead introducing an aspect of heuristic analysis and local optimization on top of the original algorithm.

I found this topic interesting because it showcases the various applications of network theory. By vectorizing sentences, we can discern mathematical properties that relate sentences semantically with each other and draw out communities without using natural language processing. This application is especially interesting as it shows a concrete application of community detection as we saw in class, and how it relates to cutting-edge modern academia research. For anyone interested, I have included a diagram of the communities as detected in the paper by Kim and Sayama below.

Citation:

Kim, M., & Sayama, H. (2019, September 25). The Power of Communities: A Text Classification Model with Automated Labeling Process Using Network Community Detection. Retrieved September 30, 2019, from https://arxiv.org/abs/1909.11706v1.

Needham, M. (n.d.). 6.1. The Louvain algorithm. Retrieved September 30, 2019, from https://neo4j.com/docs/graph-algorithms/current/algorithms/louvain/.

Leave a Reply Cancel reply