This part introduces several big data processing techniques. Association rule mining is to find relationships in large data sets. For this mining frequent pattern is an important technique. The Apriori algorithm is a classic algorithm to detect these frequent item sets and to generate rules using this item sets. In addition to the Aprioi algorithm, the FP-Growth algorithm needs less scans of the database to extract frequent item sets and
performs the mining even faster. Next to the rule mining, the cluster analysis is another technique for analyzing big data sets. For this the K-Means algorithm is the most common one that minimizes the euclidean distance between entities in the same cluster. Furthermore the K-Means++ algorithm will be discussed, that provides an extension for the K-Means. With this preprocessing of the K-Means++, the KMeans converges much faster with a solution that is O(log(k)) competitive to the optimal K-Means solution.
Finally Google’s MapReduce will be discussed. It is a parallel and distributed
computing framework for processing Big Data on a cluster with hundred or thousands of computer.
The full paper can be accessed here.