Since the Business Intelligence area is an important topic for the decision making and reporting of companies, things have changed. Today it is possible to get storage units for less than $600 to store all of the worlds music . The ability to store and to work with data and after that the usage of results to perform further analysis becomes even more accessible as with trends such as Moore’s Law .
Big Data is a term defining data that has three main characteristics. First, it involves a great volume of data. Second, the data cannot be structured into regular database tables because of variety and third, the data is produced with great velocity and must be captured and processed rapidly . In other literature there is one additional keyword, Veracity . The use of Big Data is needed in order to discover trends or hidden data over a course of activities. In addition to transactional data, there is data from the web, e.g. social networks or sensor networks. Having a closer look to changes in our environment during the last
years, we should also mention radio-frequency identification (RFID) tags, geodata (GPS), data from satellites or further medical aspects, which all creates a variety of data. Because of this high volume of data, it is not possible to insert this in traditional databases which already contain terabytes or petabytes of data. Furthermore the use of sensors often leads to a high velocity of new data.
New information is created every second and might be computed in real time. With this there is also a challenge of veracity, because there can be wrong values, which have to be detected. The main purpose of this paper is to present different techniques for processing Big Data. The remainder of this paper is organized in this manner. Section II discusses some issues based on the research at CERN and shows some problems which are related to Big Data. Section III discusses advanced and related techniques of Big Data. Section IV discusses techniques of the area Data Mining presenting the most important algorithms and methods to analyze large data sets. Finally section V concludes the aspects described in this paper.