Machine learning is a field of Data Analysis in which model building is automated. Algorithms learn from the data analyzed previously and improve the algorithm based on the findings. It is an application of artificial intelligence where computers (machines) learn from the analyzed data and adapt automatically. It is not a new technique, but it is attaining fresh momentum as big data and data analytics gain prominence.
There are essentially two types of machine learning, supervised and unsupervised learning.
In the case of supervised learning, the algorithm gets a set of inputs along with corresponding outputs. The machine learns by comparing actual output with correct output and adjusts the algorithm accordingly. It is commonly used where historical data is available, and it is used to predict future.
In unsupervised learning, there is no historical data on what is the correct outcome, but a machine learns based on its own analysis. The objective is to analyze and find some structure within data. It works best on transactional data.
Types of Algorithms in Machine Learning
Regression is concerned with displaying a relationship between two or more variables that are iteratively refined using a measure of error in the predictions made by the model. Regression methods are the important part of statistics and are used for statistical machine learning. There are several regression algorithms. Some popular ones are Least Square Regression, Linear Regression, Stepwise Regression, etc.
Instance Based Algorithms
Instance based learning models use decision models with instances or examples of training data that are deemed to be important for the model. The model builds up a database of example data and compares it with the new data to find similarities and make predictions. These models are also called as “winner takes it all” or “memory learning” methods. Popular instance based algorithms are k-nearest neighbour, learning vector quantization
Decision Tree Algorithms
Decision tree methods construct a model of decisions based upon actual values of attributes in the data. The tree starts with a root and fork in the tree structure until a prediction decision is made for a given data record. The model is trained on data for classification and regression problems. A decision tree is very popular because of its speed and accuracy. Some important decision tree algorithms are Classification and Regression Tree (CART), Iterative Dichotomiser 3 (ID3), Chi-squared Automatic Interaction Detection (CHAID)
Bayesian logic is a kind of logic applied to decision making and inferential statistics that deals with probability inference using the knowledge of the prior events to predict the future events. It is a mathematical model that can be used to predict target occurrence based upon the occurrence of an event in the prior trials. It gives us a way to quantify the possibility of an uncertain outcome by determining its probability. Popular Bayesian algorithms are Naïve Bayes, Gaussian Naïve Bayes, Bayesian Network.
Clustering techniques use inherent structures or attributes within the data to best organize them in clusters or groups with maximum commonality. A cluster is therefore, a collection of objects, which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. There are several data mining applications that use clustering techniques. However, clustering algorithms vary based on the type of data set. There is no one algorithm, which will work for all cases. Popular clustering algorithms are k-means clustering, fuzzy c-means clustering, hierarchical clustering, etc.