Introduction

This document is meant to explain how to use the code that accompanies the subspace clustering project (SuClust) to perform the related tasks. For more detailled information, one can refer to the reports (thesis or slides).

Overview

The project consists of 3 main parts, each of which corresponds to a phase in the workflow:

  • Preprocessing: composed of some scripts, mostly written in R to handle dataset processing (cleaning, normalization...).
  • Clustering: implementation of some subspace clustering algorithms (in Knime, python...). They perform the cluster analysis on the data provided by the previous process and output results in textfiles of a specific format.
  • Post-processing: python programs to perform these following tasks: redundancy filtering, measure scoring/ranking on files generated by clustering files.

Source code

The updated source code repository is hosted on github whose link is: subspace_clustering.

Prerequisites

To get started with SuClust, one must have the following installed:

  • For executing Python scripts (pre-processing, post-processing, clustering):
  • For performing cluster analysis with OpenSubspace (clustering):
  • For running R scripts (pre-processing):