Contact Us
SaaS Log InXRay Login
Machine Learning

10 Best Machine Learning Algorithms

Nastel Technologies®
February 20, 2022

Though we’re living through a time of extraordinary innovation in GPU-accelerated machine learning, the latest research papers frequently (and prominently) feature algorithms that are decades, in certain cases 70 years old.


Some might contend that many of these older methods fall into the camp of ‘statistical analysis’ rather than machine learning, and prefer to date the advent of the sector back only so far as 1957, with the invention of the Perceptron.


Given the extent to which these older algorithms support and are enmeshed in the latest trends and headline-grabbing developments in machine learning, it’s a contestable stance. So let’s take a look at some of the ‘classic’ building blocks underpinning the latest innovations, as well as some newer entries that are making an early bid for the AI hall of fame.


1: Transformers

In 2017 Google Research led a research collaboration culminating in the paper Attention Is All You Need. The work outlined a novel architecture that promoted attention mechanisms from ‘piping’ in encoder/decoder and recurrent network models to a central transformational technology in their own right.


The approach was dubbed Transformer, and has since become a revolutionary methodology in Natural Language Processing (NLP), powering, amongst many other examples, the autoregressive language model and AI poster-child GPT-3.


Transformers elegantly solved the problem of sequence transduction, also called ‘transformation’, which is occupied with the processing of input sequences into output sequences. A transformer also receives and manages data in a continuous manner, rather than in sequential batches, allowing a ‘persistence of memory’ which RNN architectures are not designed to obtain. For a more detailed overview of transformers, take a look at our reference article.


In contrast to the Recurrent Neural Networks (RNNs) that had begun to dominate ML research in the CUDA era, Transformer architecture could also be easily parallelized, opening the way to productively address a far larger corpus of data than RNNs.


Popular Usage

Transformers captured the public imagination in 2020 with the release of OpenAI’s GPT-3, which boasted a then record-breaking 175 billion parameters. This apparently staggering achievement was eventually overshadowed by later projects, such as the 2021 release of Microsoft’s Megatron-Turing NLG 530B, which (as the name suggests) features over 530 billion parameters.


Transformer architecture has also crossed over from NLP to computer vision, powering a new generation of image synthesis frameworks such as OpenAI’s CLIP and DALL-E, which use text>image domain mapping to finish incomplete images and synthesize novel images from trained domains, among a growing number of related applications.


2: Generative Adversarial Networks (GANs)

Though transformers have gained extraordinary media coverage through the release and adoption of GPT-3, the Generative Adversarial Network (GAN) has become a recognizable brand in its own right, and may eventually join deepfake as a verb.


First proposed in 2014 and primarily used for image synthesis, a Generative Adversarial Network architecture is composed of a Generator and a Discriminator. The Generator cycles through thousands of images in a dataset, iteratively attempting to reconstruct them. For each attempt, the Discriminator grades the Generator’s work, and sends the Generator back to do better, but without any insight into the way that the previous reconstruction erred.


This forces the Generator to explore a multiplicity of avenues, instead of following the potential blind alleys that would have resulted if the Discriminator had told it where it was going wrong (see #8 below). By the time the training is over, the Generator has a detailed and comprehensive map of relationships between points in the dataset.


By analogy, this is the difference between learning a single humdrum commute to central London, or painstakingly acquiring The Knowledge.


The result is a high-level collection of features in the latent space of the trained model. The semantic indicator for a high level feature could be ‘person’, whilst a descent through specificity related to the feature may unearth other learned characteristics, such as ‘male’ and ‘female’. At lower levels the sub-features can break down to, ‘blonde’, ‘Caucasian’, et al.


Entanglement is a notable issue in the latent space of GANs and encoder/decoder frameworks: is the smile on a GAN-generated female face an entangled feature of her ‘identity’ in the latent space, or is it a parallel branch?


The past couple of years have brought forth a growing number of new research initiatives in this respect, perhaps paving the way for feature-level, Photoshop-style editing for the latent space of a GAN, but at the moment, many transformations are effectively ‘all or nothing’ packages. Notably, NVIDIA’s EditGAN release of late 2021 achieves a high level of interpretability in the latent space by using semantic segmentation masks.


Popular Usage

Beside their (actually fairly limited) involvement in popular deepfake videos, image/video-centric GANs have proliferated over the last four years, enthralling researchers and the public alike. Keeping up with the dizzying rate and frequency of new releases is a challenge, though the GitHub repository Awesome GAN Applications aims to provide a comprehensive list.


Generative Adversarial Networks can in theory derive features from any well-framed domain, including text.


3: SVM

Originated in 1963Support Vector Machine (SVM) is a core algorithm that crops up frequently in new research. Under SVM, vectors map the relative disposition of data points in a dataset, while support vectors delineate the boundaries between different groups, features, or traits.


The derived boundary is called a hyperplane.


At low feature levels, the SVM is two-dimensional (image above), but where there’s a higher recognized number of groups or types, it becomes three-dimensional.


Popular Usage

Since support Vector Machines can effectively and agnostically address high-dimensional data of many kinds, they crop up widely across a variety of machine learning sectors, including deepfake detectionimage classificationhate speech classificationDNA analysis and population structure prediction, among many others.


4: K-Means Clustering

Clustering in general is an unsupervised learning approach that seeks to categorize data points through density estimation, creating a map of the distribution of the data being studied.


K-Means Clustering has become the most popular implementation of this approach, shepherding data points into distinctive ‘K Groups’, which may indicate demographic sectors, online communities, or any other possible secret aggregation waiting to be discovered in raw statistical data.


The K value itself is the determinant factor in the utility of the process, and in establishing an optimal value for a cluster. Initially, the K value is randomly assigned, and its features and vector characteristics compared to its neighbors. Those neighbors that most closely resemble the data point with the randomly assigned value get assigned to its cluster iteratively until the data has yielded all the groupings that the process permits.


The plot for the squared error, or ‘cost’ of differing values among the clusters will reveal an elbow point for the data:


The elbow point is similar in concept to the way that loss flattens out to diminishing returns at the end of a training session for a dataset. It represents the point at which no further distinctions between groups is going to become apparent, indicating the moment to move on to subsequent phases in the data pipeline, or else to report findings.


Popular Usage

K-Means Clustering, for obvious reasons, is a primary technology in customer analysis, since it offers a clear and explainable methodology to translate large quantities of commercial records into demographic insights and ‘leads’.


Outside of this application, K-Means Clustering is also employed for landslide predictionmedical image segmentationimage synthesis with GANsdocument classification, and city planning, among many other potential and actual uses.


5: Random Forest

Random Forest is an ensemble learning method that averages the result from an array of decision trees to establish an overall prediction for the outcome.


This article originally appeared on, to read the full article, click here.

Nastel Technologies is the global leader in Integration Infrastructure Management (i2M). It helps companies achieve flawless delivery of digital services powered by integration infrastructure by delivering tools for Middleware Management, Monitoring, Tracking, and Analytics to detect anomalies, accelerate decisions, and enable customers to constantly innovate, to answer business-centric questions, and provide actionable guidance for decision-makers. It is particularly focused on IBM MQ, Apache Kafka, Solace, TIBCO EMS, ACE/IIB and also supports RabbitMQ, ActiveMQ, Blockchain, IOT, DataPower, MFT, IBM Cloud Pak for Integration and many more.


The Nastel i2M Platform provides:


Write a comment
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment * This field is required!
First name * This field is required!
Email * Please, enter valid email address!

Schedule your Meeting


Schedule your Meeting


Schedule a Meeting to Learn More