Machine Learning sometimes used as a synonym for Artificial Intelligence and is currently being implemented across the technological world. Machine Learning is a set of algorithms that learns to make decisions based on the input dataset and improves its decision making, as it learns and gains more experience over time. In this blog, we will discuss the most commonly used basic machine learning terminologies that are must know before delving into the Machine Learning world. Here are ten basic ten machine learning terminologies.
Ten basic machine learning terminologies
Algorithm: An algorithm is defined as a set of procedures to solve any problem. An algorithm can be defined to solve a simple problem like to add two numbers or as complex as mapping of the stars in our milky way. For example, let us consider an algorithm to add two numbers, the steps of the algorithm are as follows:
Step 1: Read the first number.
Step 2: Read the second number.
Step 3: Find the sum of the first and second numbers.
Step 4: Print the sum.
Machine Learning Algorithm: Machine Learning Algorithms is a category of an algorithm that allows a software application to learn based on the inputs and predict future outcomes without being programmed explicitly. There are two important goals for machine learning algorithms. First is the algorithm must learn from its own experience and should predict the future outcome with more accuracy. Second is the algorithm must learn without any human intervention and must adjust the actions accordingly to predict future outcomes.
Dataset: Data is the basic input on which any given machine learning algorithm thrives on. Data can represent just about anything and can be as simple as representing the height and weight of a person to a complex movement of a butterfly. The collection of data is called a dataset and can be represented as a table structure, collection of images, etc. At a basic level a dataset is divided into two sets and selection of the data for these sets are random in nature, the larger set is used as a training dataset to train the learning algorithm and generate the model, the remaining dataset is used to test the prediction and accuracy of the model generated and is known as test dataset.
Features: A single column of data in a dataset is termed as a feature, it is also known as an attribute of the data instance (single row of the dataset). In a dataset, some features may be an input to the learning algorithm and the other may represent the output or the features to be predicted. Features are the most important building block of any dataset and the selection of the feature as an input to the learning algorithm directly affect the prediction of the model generated, hence in order to generate a good prediction model proper and quality selection feature is a must which is one of the most difficult task of machine learning.
Target: A target is a feature in the dataset about which the user wants to gain the understanding or predict the future outcome, for example, a dataset contains the historical price of stocks and the algorithm can be used to predict the price (target) in near future. Here the price is the target feature which the user wants to predict based on the dataset.
Prediction: Prediction is the output of the trained model for a given input. The model is generated using a training dataset and validated for accuracy using the test dataset. After the model predicts the output with the level of the desired accuracy the same is used to predict the outcome. Like, for example, a weather forecasting trained model can be used to predict the level humidity during the raining season.
Model: A model is generated as a result of training a machine learning algorithm. The generated model is capable of predicting an outcome target feature for a given input feature. The accuracy of the prediction of the model directly depends upon the quality of the training dataset used to train the model.
Classification means to group the prediction of the algorithm into a class or a group and based on this class the classification can be of two types:
- Binary Classification: When we classify the prediction of an algorithm into two sets or group then it is known as Binary Classification. For example, customer feedback can be either positive or negative, an interview giving candidate will be selected or not.
- Multiclass Classification: When we classify the prediction of an algorithm into more than two sets or groups then it is known as Multiclass Classification. For example: Based on the person’s emotion he can be categorized as happy, surprised, sad or angry. Another example to predict today’s weather into a sunny, stormy, windy or rainy day.
Regression: Regression algorithm is used to predict values that are continuous in nature such as agricultural output for a season, price of a stock in near day or month, etc. Based on the algorithm Regression can be classified as the following types: Simple, Polynomial, Support Vector, Decision Tree, Random Forest. Simple Linear Regression is the most commonly used technique.
Overfitting: Overfitting occurs when a learning algorithm learns about the details and the noise present in the training dataset to an extent that it impacts the prediction of the model for unseen data. Here the noise and random fluctuations in the data are learned by the algorithm as a pattern for predicting the output and this is problematic.
These were few of the basic machine learning terminologies that are a must know.