Clustering

Clustering is used to analyze which objects are similar to each others by assigning the items to clusters.
In this specific project, information from each character from the Starwars Wikia has been gathered and taken into use as the dataset. After having gathered the data from each character, high and low frequent terms have been removed since these are unusable for classification purposes; only terms between 10 and 80% persist.

Afterwards, the dataset has been transformed by using TF-IDF which retrieves the most relevant and describing terms for the dataset.
Although, since everyone has the possibility to change the content in the wiki, the clusters may not always represent the clusters which would normally emerge.


K-means clustering

In essence, k-means clustering is computed on the basis of randomly generated centroids which each node is assigned to. By using a different set of k-values the result is not always the same as the amount of clusters created depends on the value of k: if k = 3 then 3 clusters emerge.

You can change the parameter yourself by using the dropdown bar below to see which reuslts are generated from the k-means value.