Big Data Analytics
Big Data is the big dataset whose size and nature exceed the capabilities of the traditional database systems for management, storage, and analysis. It is a composition of various features including data variety, velocity, volume, and veracity that provides institutions with a chance of gaining competitive benefits in nowadays market. It is a flood of data that affects every existing domain such as science, business, art, and government.
Big Data Characteristics
Big data holds three to four main characteristics including Volume, Variety, and Velocity. The volume refers to the massive volume of data that is being produced on a daily basis. Some of the existing technologies such as social media, smartphones, computers, sensors, websites, etc. generate data with enormous amounts that could reach tens of zettabytes. Variety means that data exist with various formats which can be highly structured, unstructured, or semi-structured. Relating to velocity, it refers to the pace at which data is being produced, combined, ingested, and processed. With big data, data is generated very rapidly. Velocity also involves the speed of the data availability and delivery. Value reflects the usefulness of the data and whether it can be turned into value and produce insights or not. Another feature is known as Veracity and it refers to the trustworthiness (the quality) of the data including accuracy, certainty, and precision. It has two features: the reliability of the source, and the convenience of data for its intended users. Before data experience processing, it goes through quality testing. Data complexity is considered the other characteristic of big data. Complexity refers to the degree of interdependence as well as the interconnectedness in the structures of big data.
Big Data Analytics Techniques
With big data, the form of data is complex in which various data formats are included such as structured, unstructured, and semi-structured. Thus, unlike regular analysis for data, more sophisticated techniques are needed for analysis purposes . For big data analytic, various techniques are used such as machine learning, text analytics, data mining, statistical analytics, visualization, predictive analytics, as well as deep learning.
Machine learning (ML) approaches provide different techniques to detect relationships and make predictions by modeling patterns as well as correlations from big data. The learning forms of ML algorithms are categorized into supervised learning, unsupervised learning, and reinforcement learning. The approaches of ML can be classified to clustering approaches, regression techniques, density estimation methods, and dimensionality reduction approaches. Some examples of ML approaches are artificial neural networks (ANN), decision trees, support vector machines (SVM), deep learning (DL), clustering, and classification.
1. Computational Intelligence (CI) is one of the approaches under ML. CI provides ability to process complex and uncertain data sources by imitating human processing and reasoning. It involves a number of methodologies for addressing complex problems that are unsolvable by traditional models due to its nature of being complex and uncertain to process. The basic approaches of CI are Fuzzy Logic (FL), Artificial Neural Networks (ANN), and Evolutionary Algorithms (EA).
a) Fuzzy Logic (FL) is an approach that is designed to analyze data with uncertainty and inaccurate nature. Based on the use of fuzzy sets that are used for inference and decision making, FL provides reasoning and modeling for qualitative data. Previous research studies have proven the effectiveness of Fuzzy logic in handling uncertainties of data. For example, the systems based on Fuzzy logic have proven their capability to process human emotions which contain considerable amounts of uncertainty in data with satisfactory accuracy and performance. Fuzzy Logic has been utilized in different application areas of Big Data analytics such as social networks, and medicine.
b) Evolutionary Algorithms (EA) are another CI approach that satisfies the requirements of big data. They have ability to deal with a high degree of sparseness and dimensionality of Big data with good performance. They are also considered proven methods for some machine learning problems such as feature selection, clustering, and others.
Artificial neuron networks (ANNs) are a CI approach that is designed to imitate the structure as well as the functionality of the biological brain (neural network). They are statistical and nonlinear data modeling tools that can model the complicated connections between inputs and outputs as well as to find available patterns. They are essential algorithms for image analysis, pattern recognition, as others. However, with Big Data, ANNs suffer from poor performance due to the complexity of ANNs in the learning process where it is time and memory consuming. Accordingly, ANNs have been enhanced in a distributed and parallel setting such as Hadoop and Map/Reduce in order to reduce the consumption of both time and memory. The combination of CI methods is adopted in various application domains. This integration is applied to provide smart systems for data analysis and decision support. It is required by applications that are characterized by huge amounts of complicated information where analysis is necessary for getting practical decisions with a low cost.
2. Deep-Learning (DL) is another popular ML approach that is used to discover correlations and extract relevant information from complex datasets. It performs deep analysis to discover invisible hierarchies, unobservable correlations, unnoticed parameters, as well as to find the complicated distribution of variables. It supports various types of data such as text, images, and sound. It is based on the foundation of utilizing ANN with many hidden layers with various neurons (processing units) as the sigmoid function in every layer which makes it a very powerful tool for modeling complex data. It is using unsupervised learning for generating representations that can then be used for training a classifier using supervised training. It has been utilized in different Big Data applications such as pattern recognition, image analysis, genomic medicine, and text mining. Most of the analytics techniques for big data are developed on the basis of deep-learning approach as it uses classification optimization, and statistical estimations.