Review of the most popular machine learning algorithms

There is such a thing as the “No Free Lunch” theorem. Its essence lies in the fact that there is no such algorithm that would be the best choice for every task, which is especially true for supervised learning.

For example, it cannot be said that neural networks always perform better than decision trees, and vice versa. The effectiveness of algorithms is influenced by many factors such as the size and structure of the data set.

For this reason, you have to try many different algorithms, testing the effectiveness of each on a test data set, and then choose the best option. Of course, you need to choose among the algorithms that suit your task. If we draw an analogy, then when cleaning the house you will most likely use a vacuum cleaner, broom or mop, but not a shovel.

Senior Software Developer (OCR, C++, ML)

ABBYY, Moscow, can be done remotely, From 230,000 ₽

tproger.ru

Jobs on tproger.ru

Machine learning algorithms can be described as learning an objective function f that best maps input variables X to output variable Y: Y = f(X).

We don't know what the function f is. After all, if they knew, they would use it directly, and not try to train it using various algorithms.

The most common task in machine learning is predicting values of Y for new values of X. This is called predictive modeling, and our goal is to make the most accurate prediction possible.

We present to your attention a brief overview of the top 10 popular algorithms used in machine learning.

Linear regression

Linear regression is perhaps one of the most well-known and understood algorithms in statistics and machine learning.

Predictive modeling is primarily concerned with minimizing model error, or in other words, making predictions as accurate as possible. We will borrow algorithms from various fields, including statistics, and use them for these purposes.

Linear regression can be represented as an equation that describes a straight line that most accurately shows the relationship between input variables X and output variables Y. To construct this equation, you need to find certain coefficients B for the input variables.

For example: Y = B0 + B1 * X

Knowing X, we must find Y, and the goal of linear regression is to find the values of the coefficients B0 and B1.

Various methods such as linear algebra or least squares are used to estimate the regression model.

Linear regression has existed for more than 200 years, and during this time it has been thoroughly studied. So here are a couple of rules of thumb: remove similar (correlated) variables and remove noise from the data if possible. Linear regression is a fast and simple algorithm that makes a good first algorithm to learn.

Singular value decomposition (SVD)

A square matrix is called orthogonal

, if all its columns are orthonormal, the norm of each of them is equal to one, and all of them are pairwise orthogonal, that is, they form an orthonormal basis. Orthogonal matrices have the following properties:

A ⋅ A ⊤ = I = A ⊤ ⋅ AA − 1 = A ⊤

The singular decomposition of a matrix is introduced by the following linear algebra theorem: any non-singular rectangular matrix Am*n

can be represented as a product of three matrices
Um*m
,
Em*n
and
Vn*n
, where
U
and
V
are orthogonal matrices, and
E
is a rectangular matrix in which all elements except the diagonal ones are equal to zero.

Singular value decomposition is widely used in recommender systems. It allows you to find the bases of the space of rows and the space of columns, that is, the elementary characteristics of both spaces. For example, if the rows of the matrix correspond to readers, the columns correspond to books, and the matrix itself contains the ratings that users gave to books, then the singular value decomposition of the matrix will identify “typical readers” and “typical books.” Every real reader and every real book can be represented by a linear combination of “typical”, and then it is quite easy to calculate the expected rating of any book by any reader.

There are very few methods that allow modern computers to process huge sparse matrices of user ratings in an acceptable time, so singular value decomposition of matrices is used very widely.

2. Logistic regression

Logistic regression is another algorithm that came to machine learning straight from statistics. It is good to use for binary classification problems (these are problems in which the output is one of two classes).

Logistic regression is similar to linear regression in that it also requires finding the values of the coefficients for the input variables. The difference is that the output value is transformed using a nonlinear or logistic function.

The logistic function looks like a capital S and converts any value into a number between 0 and 1. This is quite useful because we can apply a rule to the output of the logistic function to bind to 0 and 1 (for example, if the function's output is less than 0.5, then The output is 1) and class predictions.

Because of the way the model is trained, logistic regression predictions can be used to show the probability of a sample being in class 0 or 1. This is useful in cases where you want to have more rationale for making a prediction.

As with linear regression, logistic regression performs its task better if redundant and similar variables are removed. The logistic regression model is fast to train and well suited for binary classification problems.

Machine learning tasks

Machine learning is based on the idea that analytical systems can learn to identify patterns and make decisions with minimal human intervention.

There are four key tasks in machine learning:

regression - predicting numerical values of characteristics, for example, predicting future sales volumes based on known past sales data;
classification - predicting which of the known classes an object belongs to, for example, predicting whether a borrower will repay a loan, based on data on how borrowers have repaid loans in the past;
clustering - dividing a large set of objects into clusters - classes, within which the objects are similar to each other, for example, market segmentation, dividing all consumers into classes so that within three classes consumers are similar to each other, but in different classes they are different;
dimensionality reduction – reduction of a large number of features to a smaller number (usually 2–3) for the convenience of their subsequent visualization (for example, data compression);
search for anomalies - search for rare and unusual objects that differ significantly from the bulk, for example, search for fraudulent transactions.

Linear Discriminant Analysis (LDA)

Logistic regression is used when you want to assign a sample to one of two classes. If there are more than two classes, then it is better to use the LDA (Linear discriminant analysis) algorithm.

The representation of LDA is quite simple. It consists of statistical properties of the data calculated for each class. For each input variable this includes:

Average value for each class;
Variance calculated for all classes.

Predictions are made by calculating the discriminant value for each class and selecting the class with the highest value. The data is assumed to be normally distributed, so it is recommended that you remove anomalous values from the data before you begin. It is a simple and efficient algorithm for classification problems.

Decision Trees

A decision tree can be represented as a binary tree, familiar to many from algorithms and data structures. Each node represents an input variable and a split point for that variable (assuming the variable is a number).

Leaf nodes are the output variable that is used for prediction. Predictions are made by traversing the tree to a leaf node and outputting the class value at that node.

Trees learn quickly and make predictions. In addition, they are accurate for a wide range of tasks and do not require special data preparation.

5 . Naive Bayes classifier

Naive Bayes is a simple but surprisingly effective algorithm.

The model consists of two types of probabilities that are calculated using the training data:

Probability of each class.
Conditional probability for each class at each value of x.

Once a probabilistic model is calculated, it can be used to make predictions with new data using Bayes' theorem. If you have real data, then, assuming a normal distribution, calculating these probabilities is not particularly difficult.

Naive Bayes is called naive because the algorithm assumes that each input variable is independent. This is a strong assumption that does not correspond to real data. Nevertheless, this algorithm is very effective for a number of complex tasks such as spam classification or handwritten digit recognition.

Basic terms

In machine learning systems or neural network systems, there are inputs and outputs. What is fed to the inputs is usually called features.

Data/ML Engineer

Sportmaster Lab, Lipetsk, Moscow, St. Petersburg, From 100,000 to 150,000 ₽

tproger.ru

Jobs on tproger.ru

Signs are essentially the same as variables in a scientific experiment - they characterize some observed phenomenon and can be somehow quantitatively measured.

When features are fed to the inputs of a machine learning system, the system tries to find matches, to notice a pattern between the features. The output is the result of this work.

This result is usually called a label, since the outputs have a certain label given to them by the system, i.e., an assumption (prediction) about what category the output falls into after classification.

In the context of machine learning, classification refers to supervised learning. This type of learning implies that the data supplied to the inputs of the system is already labeled, and the important part of the features is already divided into separate categories or classes. Therefore, the network already knows which part of the inputs is important, and which part can be checked independently. An example of classification is sorting different plants into groups such as “ferns” and “angiosperms.” A similar task can be accomplished using a Decision Tree, one of the classifier types in Scikit-Learn.

In unsupervised learning, the system is fed unlabeled data and must try to categorize the data itself. Since the classification refers to the type of supervised learning, the unsupervised learning method will not be considered in this article.

The process of training a model is feeding data to a neural network, which as a result should output certain patterns for the data. In the process of training a model with a teacher, features and labels are fed to the input, and when predicting, only features are fed to the input of the classifier.

The data received by the network is divided into two groups: a training set and a testing set. You should not test the network on the same set of data on which it was trained, since the model will already be “tailored” to this set.

6. K-Nearest Neighbors (KNN)

K-nearest neighbors is a very simple and very efficient algorithm. The KNN (K-nearest neighbors) model is represented by the entire training data set. Pretty simple, right?

A prediction for a new point is made by finding the K nearest neighbors in the data set and summing the output variable for these K instances.

The only question is how to determine the similarity between data instances. If all the features are on the same scale (centimeters, for example), then the simplest way is to use the Euclidean distance, a number that can be calculated based on the differences with each input variable.

KNN may require a lot of memory to store all the data, but it will make a prediction quickly. The training data can also be updated to ensure predictions remain accurate over time.

The idea of nearest neighbors may not work well with high-dimensional data (many input variables), which will negatively affect the efficiency of the algorithm in solving the problem. This is called the curse of dimensionality. In other words, you should use only the variables that are most important for prediction.

The essence of machine learning technology

Generally speaking, machine learning is the practice of teaching a computer program or algorithm to gradually improve the performance of a given task.

Machine learning refers to a variety of mathematical, statistical and computational methods for developing algorithms that can solve a problem not in a direct way, but by searching for patterns in a variety of input data.

The solution is calculated not according to a clear formula, but according to the established dependence of the results on a specific set of characteristics and their values. For example, if every day for a week the ground is covered with snow and the air temperature is significantly below zero, then most likely winter has come. Therefore, machine learning is used for diagnosis, prediction, recognition and decision-making in various applied areas: from medicine to banking.

Machine learning is not only a mathematical, but also a practical engineering discipline. Pure theory, as a rule, does not immediately lead to methods and algorithms applicable in practice. To make them work well, it is necessary to invent additional heuristics that compensate for the discrepancy between the assumptions made in the theory and the conditions of real problems. Almost no research in machine learning is complete without an experiment on model or real data that confirms the practical performance of the method.

GeekUniversity together with Mail.ru Group opened the first Artificial Intelligence faculty in Russia teaching machine learning. School knowledge is enough for studying. The program includes all the necessary resources and tools + a whole program in higher mathematics. Not abstract, as in ordinary universities, but built in practice. The training will introduce you to machine learning technologies and neural networks, and teach you how to solve real business problems.

7. Vector Quantization (LVQ) networks

The disadvantage of KNN is that it requires storing the entire training data set. If KNN has shown itself well, then it makes sense to try the LVQ (Learning vector quantization) algorithm, which does not have this drawback.

LVQ is a set of code vectors. They are selected at random at the beginning and, over a certain number of iterations, adapted to best generalize the entire data set. Once trained, these vectors can be used for prediction in the same way as in KNN. The algorithm finds the nearest neighbor (the best-fitting code vector) by calculating the distance between each code vector and the new data instance. For the best-fit vector, the class (or number in the case of regression) is then returned as a prediction. A better result can be achieved if all data is in the same range, for example from 0 to 1.

Implementation of a classification sample

# Import all necessary libraries import pandas as pd from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score from sklearn.neighbors import KNeighborsClassifier from sklearn.svm import SVC
Since the iris dataset is quite common, in Scikit-Learn it is already present, just add this command:

sklearn.datasets.load_iris

However, you still need to upload a CSV file, which can be downloaded here.

This file must be placed in the same folder as the Python file. The Pandas library has a read_csv() function that works great with loading data.

data = pd.read_csv('iris.csv') # Check if everything loaded correctly print(data.head(5))

Due to the fact that the data has already been prepared, it does not require long pre-processing. The only thing you might need is to remove unnecessary columns (for example ID) like this:

data.drop('Id', axis=1, inplace=True)

Now we need to define the signs and marks. With the Pandas library, you can easily "slice" a table and select specific rows/columns using the iloc() function:

# ".iloc" accepts row_indexer, column_indexer X = data.iloc[:,:-1].values # Now select the desired column y = data['Species']

The code above selects each row and column, while truncating the last column.

You can also select features of the data set you are interested in by passing the column headings in parentheses:

# Alternative way to select the desired columns: X = data.iloc['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm']

Once you have selected the desired features and labels, they can be split into training and test sets using the train_test_split() function:

# test_size shows how much data to allocate to the test set # Random_state is simply a seed for random generation # This parameter can be used to recreate a specific result: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state= 27)

To ensure that the data is processed correctly, use:

print(X_train) print(y_train)

Now you can instantiate a classifier, such as support vector machines and k-nearest neighbors:

SVC_model = svm.SVC() # In the KNN model, you need to specify the n_neighbors parameter # This is the number of points that the # classifier will look at to determine which class a new point belongs to KNN_model = KNeighborsClassifier(n_neighbors=5)

Now we need to train these two classifiers:

SVC_model.fit(X_train, y_train) KNN_model.fit(X_train, y_train)

These teams trained the models and now the classifiers can make predictions and store the result in some variable.

SVC_prediction = SVC_model.predict(X_test) KNN_prediction = KNN_model.predict(X_test)

Now it's time to evaluate the accuracy of the classifier. There are several ways to do this.

It is necessary to transmit forecast readings regarding actually correct marks, the values of which were previously saved.

# Accuracy assessment is the simplest way to evaluate a classifier's performance print(accuracy_score(SVC_prediction, y_test)) print(accuracy_score(KNN_prediction, y_test)) # But the confusion matrix and classification report will give more information about performance print(confusion_matrix(SVC_prediction, y_test)) print (classification_report(KNN_prediction, y_test))

Here, for example, is the result of the obtained metrics:

SVC accuracy: 0.9333333333333333 KNN accuracy: 0.9666666666666667

At first, KNN seems to be more accurate. Here is the confusion matrix for SVC:

[[ 7 0 0] [ 0 10 1] [ 0 1 11]]

The number of correct predictions goes from the upper left corner to the lower right. Here's a comparison of the classification metrics for KNN:

precision recall f1-score support Iris-setosa 1.00 1.00 1.00 7 Iris-versicolor 0.91 0.91 0.91 11 Iris-virginica 0.92 0.92 0.92 12 micro avg 0.93 0.93 0.93 30 macro avg 0.94 0.94 0.94 30 weighted avg 0.93 0.93 0.93 30

Support Vector Machine (SVM)

Support Vector Machine is probably one of the most popular and discussed machine learning algorithms.

A hyperplane is a line dividing the space of input variables. In support vector machines, the hyperplane is chosen to best separate points in the plane of input variables by their class: 0 or 1. In a two-dimensional plane, this can be represented as a line that completely separates the points of all classes. During training, the algorithm looks for coefficients that help better separate classes by a hyperplane.

The distance between the hyperplane and the nearest data points is called the difference. The best or optimal hyperplane separating two classes is the line with the largest difference. Only these points are important in determining the hyperplane and in constructing the classifier. These points are called support vectors. To determine the values of the coefficients that maximize the difference, special optimization algorithms are used.

Support Vector Machine is probably one of the most effective classical classifiers, which is definitely worth paying attention to.

Where can you get education in machine learning?

GeekUniversity together with Mail.ru Group opened the first Artificial Intelligence faculty in Russia teaching machine learning.

School knowledge is enough for studying. You will have all the necessary resources and tools + a whole program in higher mathematics. Not abstract, as in ordinary universities, but built in practice. The training will introduce you to machine learning technologies and neural networks, and teach you how to solve real business problems.

After studying you will be able to work in the following specialties:

Artificial intelligence,
Machine learning,
Neural networks,
Big data analysis.

Features of studying at GeekUniversity

After a year and a half of practical training, you will master modern Data Science technologies and acquire the competencies necessary to work in a large IT company. Receive a professional retraining diploma and certificate.

Training is conducted on the basis of state license No. 040485. Based on the results of successful completion of training, we issue graduates with a diploma of professional retraining and an electronic certificate on the GeekBrains and Mail.ru Group portals.

Project-based learning

Training takes place in practice; programs are developed jointly with specialists from market leading companies. You will solve four data science project problems and apply the skills you learn in practice. A year and a half of training at GeekUniversity = a year and a half of real world big data experience for your resume.

Mentor

During the entire training you will have a personal assistant-curator. With it, you can quickly sort out all the problems that would otherwise take weeks. Working with a mentor doubles the speed and quality of learning.

Thorough mathematical training

Professionalism in Data Science is 50% ability to build mathematical models and another 50% ability to work with data. GeekUniversity will improve your knowledge in mathematical analysis, which will definitely be tested during an interview at any serious company.

9 . Bagging and random forest

Random Forest is a very popular and effective machine learning algorithm. This is a type of ensemble algorithm called bagging.

Bootstrap is an effective statistical method for estimating something like the mean. You take many subsamples from your data, calculate the average of each, and then average the results to get a better estimate of the actual average.

Bagging uses the same approach, but decision trees are most often used to evaluate all statistical models. The training data is divided into many samples, for each of which a model is created. When a prediction needs to be made, each model makes one, and then the predictions are averaged to give a better estimate of the output value.

In the random forest algorithm, decision trees are built for all samples from the training data. When constructing trees, random features are selected to create each node. Individually, the resulting models are not very accurate, but when combined, the quality of prediction improves significantly.

If a high variance algorithm, such as decision trees, performs well on your data, that result can often be improved by using bagging.

Machine learning methods

What is machine learning? It is divided into three main types:

With a teacher (Supervised machine learning).
Without a teacher (Unsupervised machine learning).
Deep learning .

Let's take a closer look at each of the methods and their fundamental differences.

With a teacher (Supervised machine learning)

For convenience, we will consider this method using a conditional example of analyzing aptitude for certain subjects - data about students and what results they achieve will be entered into the program.

The teacher is the person who enters data into the computer. Let's say he added the following table to the database:

Student name	Class	IQ	Floor	Mentality	Age	Top performing subject
Oleg	8	120	Male	Technical	15	Geometry
Victoria	8	100	Female	Creative	15	Literature
Ivan	8	110	Male	Humanitarian	14	Story
Igor	8	105	Male	Technical	15	Physics
Maria	8	120	Female	Humanitarian	14	Literature

Based on this data, the program can build cause-and-effect relationships and help students with career guidance. For example, she may assume that Maria can enter the Faculty of Philology because she received the highest grade in literature and has a humanitarian mindset. Oleg, with a penchant for technical sciences and good results in geometry, can look towards the profession of a design engineer.

That is, the teacher gives the computer a dataset: introductory information (gender, age, IQ, mentality, class), and then immediately gives it data on the results of studies, asking the question “here is the data, it affects the future profession, why do you think? " And the more input there is, the more accurate the analysis will be.

For example, programs are taught to recognize objects in photographs - the program looks through millions of images with a description of what is depicted in them (a tree or a cloud). She finds common features and learns to describe the images herself. The teacher shows an image without a description, and the program asks “is this a tree?” If the person answers in the affirmative, the program understands that it has made the right conclusions. A good example of such a system is a cloud service for embedding Vision computer vision applications on the Mail.Ru Cloud Solutions platform.

The object recognition system can be used to support the operation of self-driving cars. To do this, data is collected from the drone’s sensors and transmitted to users who, for example, mark cars in the pictures.

Unsupervised machine learning

At the beginning of the article there was a video about how AI learned to walk. This program received a task from the developer - to get to point B. But it did not know how to do this - it was not even shown what walking looks like, but this did not stop the AI from completing the task.

Therefore, learning from games is one of the most effective ways of machine learning. Here's a simpler example - the program receives data about how far away some objects are from it, and can choose how best to move in the Snake game to get more points:

Returning to the career guidance example, we can say that the program receives data about students and their performance, but does not know that there is a connection between them. Having processed a large amount of information, she notices that the data influences each other and draws some conclusions. For example, that mentality is more important than IQ, and age is more important than gender, and so on.

This approach is being studied to perform those tasks where there is a non-obvious solution. For example, in the same marketing. AI does not understand that offering a similar product to a person who does not need it is illogical if it makes money.

Also, neural networks can learn not independently, but in pairs. This is how a generative adversarial network (GAN) works. It consists of networks G and D - the first generates samples based on real images, and the second tries to distinguish genuine samples from incorrect ones.

The technology is used to create photographs that are indistinguishable from the real thing, as well as to restore damaged or unclear images. One company that uses GAN is Facebook.

Deep learning

Deep learning can be either supervised or unsupervised, but it involves the analysis of Big Data - such a large amount of information that one computer will not be enough. Therefore, Deep Learning uses neural networks to operate.

Neural networks allow you to divide one large task into several small ones and delegate them to other devices. For example, one processor collects information and transmits it to two others. They, in turn, analyze it and pass it on to four more, who perform some more tasks and pass it on to the next processors.

This can be considered using the example of object recognition systems:

image acquisition;
identifying all points;
finding lines constructed from points;
constructing simple figures using lines;
creating complex figures from simple ones, and so on.

That is, when receiving an image of a person, the neural network first sees points, then lines, and then circles and triangles that make up the face:

Deep learning can be used for the most unexpected purposes. For example, there is an artificial intelligence named Norman, he was sent to study sections with “tin” on Reddit - footage of dismembered people, photographs from crime scenes, creepy stories, and so on.

Norman was then asked to take a Rorschach test to compare his answers with those of other AIs - where some saw flowers, animals and umbrellas, Norman saw dead men and women killed in a variety of ways.

Working with it shows how important the information that the program receives in the first stages of work is. Now the developers are conducting research that will help “cure” Norman.

A similar situation occurred with Microsoft's Tau chatbot, which communicated with people on Twitter. In just 24 hours, he began posting Nazi, misogynistic and other offensive remarks. The company later blocked him.

10 . Boosting and AdaBoost

Boosting is a family of ensemble algorithms, the essence of which is to create a strong classifier based on several weak ones. To do this, first one model is created, then another model, which tries to correct errors in the first. Models are added until the training data is predicted perfectly or until the maximum number of models is exceeded.

AdaBoost was the first truly successful boosting algorithm designed for binary classification. This is the best place to start getting acquainted with boosting. Modern methods like stochastic gradient boosting are based on AdaBoost.

AdaBoost is used in conjunction with short decision trees. After the first tree is created, its effectiveness is tested on each training object to understand how much attention the next tree should pay to all objects. Data that is difficult to predict is given more weight, and data that is easy to predict is given less weight. The models are created sequentially one after the other, and each one updates the weights for the next tree. Once all the trees are built, predictions are made on the new data, and the performance of each tree is determined by how accurate it was on the training data.

Since this algorithm places a lot of emphasis on correcting model errors, it is important that there are no anomalies in the data.

Examples of real-life use of machine learning

Want to see how machine learning is applied in real life? Below we will give examples of the effective use of this technology in real companies.

Google - neural networks

Google has impressive technology ambitions. It is difficult to imagine an area of scientific research to which this corporation (or its parent company Alphabet) would not contribute.

For example, in recent years, Google has been developing technologies that slow down aging, medical devices and neural networks.

The company's most significant achievement is the creation of machines at DeepMind that can dream and create unusual images.

Google is committed to exploring all aspects of machine learning, which helps the company improve classic algorithms, as well as more efficiently process and translate natural speech, improve rankings and predictive systems.

Twitter - news feed

One of the biggest changes to Twitter in recent memory is the move to an algorithm-driven news feed.

Now users of the social network can sort the displayed content by popularity or by time of publication.

At the heart of these changes is the use of machine learning. Artificial intelligence analyzes each tweet in real time and evaluates it based on several indicators.

The Twitte algorithm prioritizes posts that are most likely to be liked by the user. In this case, the choice is based on his personal preferences.

facebook-armiya-chatbotov

Facebook - army of chatbots

Facebook Messenger is one of the most interesting products of the largest social platform in the world. This is because the messenger has become a kind of chatbot laboratory. When communicating with some of them, it is difficult to understand that you are not talking to a person.

Any developer can run it on Facebook Messenger. Thanks to this, even small companies are able to offer excellent customer service.

Of course, this is not the only application of machine learning at Facebook. AI applications are used to filter out spam and low-quality content, and the company is also developing computer vision algorithms that allow computers to “read” images.

Baidu is the future of voice search

Google isn't the only search giant embracing machine learning. The Chinese search engine Baidu is also actively investing in the development of AI.

One of the company's most interesting developments is Deep Voice, a neural network capable of generating synthetic human voices that are almost impossible to distinguish from real ones. The system can imitate features of intonation, pronunciation, stress and pitch.

Baidu's latest invention Deep Voice 2 will greatly impact the efficiency of natural language processing, voice search and speech recognition systems. The new technology can be used in other areas, for example, interpreting and biometric security systems.

IBM - Next Generation Healthcare

The largest technology corporation IBM is abandoning its outdated business model and is actively exploring new directions. The brand's most famous product today is artificial intelligence Watson.

Over the past few years, Watson has been used in hospitals and medical centers, where it diagnoses certain types of cancer much more efficiently than oncologists.

Watson also has huge potential in retail, where it can serve as a consultant. IBM offers its product on a license basis, making it unique and more affordable.