Clustering is used to analyze which objects are similar to each others by assigning the items to clusters.
In this specific project, information from each character from the Starwars Wikia has been gathered and taken into use as the dataset. After having gathered the data from each character, high and low frequent terms have been removed since these are unusable for classification purposes; only terms between 10 and 80% persist.
Afterwards, the dataset has been transformed by using TF-IDF which retrieves the most relevant and describing terms for the dataset.
Although, since everyone has the possibility to change the content in the wiki, the clusters may not always represent the clusters which would normally emerge.
Multi-dimensional scaling represents the distance between any two given items on a plane. It is hereby easy to obtain an overview of the clusters in the dataset, as the closer the items are, the more similar they are.