Subjects/Technology/Data and AI/Machine Learning/Machine learning

Machine learning - Learning Paradigms and Algorithms

Learn the core learning paradigms (supervised, unsupervised, reinforcement) and key algorithms such as decision‑tree ensembles, support‑vector machines, and autoencoders.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What kind of data is used to build a model in supervised learning?

1 of 21

Summary

Machine Learning Approaches Introduction Machine learning is a subset of artificial intelligence that enables computers to learn from data without being explicitly programmed. Rather than following predetermined rules, machine learning algorithms discover patterns in data and use those patterns to make predictions or decisions on new, unseen data. At its core, machine learning can be divided into several major paradigms, each suited to different types of problems and data structures. This guide covers the fundamental approaches you need to understand: supervised learning, unsupervised learning, semi-supervised and weakly supervised learning, and reinforcement learning. We'll also explore key algorithms and techniques used within each paradigm. Supervised Learning Supervised learning is the most common machine learning approach. It works by learning from training data that contains both inputs (features) and correct outputs (labels). How Supervised Learning Works In supervised learning, you provide the algorithm with examples of input-output pairs. The training data is typically organized as a feature matrix, where each row represents one observation and each column represents a feature (a measurable property of the input). The learning process involves iteratively optimizing an objective function—a mathematical function that measures how well the model's predictions match the true outputs. By minimizing this error, the algorithm learns a function that can predict outputs for new, unseen inputs. Two Main Types of Supervised Learning Supervised learning problems fall into two categories: Classification predicts categorical labels—discrete categories that an input belongs to. For example, determining whether an email is spam or not spam, or classifying images of handwritten digits as the numbers 0–9. The output is one of a finite set of classes. Regression predicts continuous numerical values. For example, predicting house prices, stock values, or temperature. The output is a real-valued number rather than a category. Similarity Learning A specialized approach within supervised learning is similarity learning, which learns a similarity function that measures how alike two objects are. Rather than predicting a label for a single input, similarity learning predicts the degree of similarity between two inputs. This approach is commonly used in ranking systems (determining which results are most relevant), recommendation systems (suggesting items similar to ones a user liked), and verification tasks (confirming whether two items are the same, such as facial verification). Unsupervised Learning Unsupervised learning tackles a fundamentally different problem: discovering hidden structure in data without any labelled examples. You provide only the raw input data, and the algorithm finds meaningful patterns or groupings on its own. Core Unsupervised Learning Tasks Clustering groups observations into subsets called clusters, where observations within each cluster are internally similar to each other and externally dissimilar to observations in other clusters. Imagine automatically grouping customers into segments based on purchasing behavior—the algorithm discovers these groupings without being told what the segments should be. Dimensionality reduction reduces the number of features (variables) in your data while preserving important information. Real-world datasets often contain hundreds or thousands of features, many of which may be redundant or uninformative. Dimensionality reduction extracts the principal variables—the ones that capture the most important variation in the data. Principal Component Analysis (PCA) is one of the most widely used dimensionality reduction techniques. It works by finding new directions (called principal components) along which the data varies the most, allowing you to represent the data in fewer dimensions without losing essential information. Density estimation learns the underlying probability distribution that generated the observed data. This is useful for detecting outliers (points that are very unlikely under the estimated distribution) or generating new synthetic data. Self-Supervised Learning An interesting hybrid approach is self-supervised learning, which generates its own supervisory signal from the unlabelled data itself. Rather than requiring human labels, the algorithm creates tasks from the data—for example, predicting missing parts of an input. This allows algorithms to learn useful representations from massive amounts of unlabelled data. Semi-Supervised and Weakly Supervised Learning These intermediate paradigms address practical situations where obtaining perfect labels is difficult or expensive. Semi-Supervised Learning Semi-supervised learning combines a small amount of labelled data with a much larger pool of unlabelled data to improve prediction accuracy. This is valuable because labelling data often requires human experts and is expensive, while unlabelled data is abundant and cheap to collect. The algorithm uses the labelled examples to learn initial patterns and leverages the unlabelled data to refine these patterns, often achieving better results than using the labelled data alone. Weakly Supervised Learning Weakly supervised learning works with noisy, limited, or imprecise labels that are cheaper to obtain than perfect labels. For example, instead of carefully labelled examples, you might have approximate labels, partial labels (where only some aspects are labelled), or noisy crowd-sourced labels from non-experts. Weakly supervised methods are designed to still learn effectively despite this label imperfection. Reinforcement Learning Reinforcement learning addresses a fundamentally different problem: training an agent to interact with an environment and learn an optimal strategy for maximizing cumulative reward over time. The Reinforcement Learning Setup In reinforcement learning, an agent takes actions in an environment. After each action, the environment provides two things: a new state (describing the current situation) and a reward (a numerical signal indicating how good or bad the action was). The agent's goal is to learn a policy—a strategy for choosing actions—that maximizes the total reward accumulated over many time steps. This is different from supervised learning, where the correct answer is provided for each example. In reinforcement learning, the agent must explore different actions, observe rewards, and learn which actions lead to better outcomes in which situations. Markov Decision Processes Reinforcement learning environments are typically formalized as Markov decision processes (MDPs). An MDP assumes that the future state depends only on the current state and the action taken, not on the entire history. This mathematical framework provides a clean way to model the environment and solve the agent's learning problem. Model-Based vs. Model-Free Learning Reinforcement learning algorithms come in two main varieties: Model-based algorithms learn an explicit model of the environment—a function that predicts what state you'll be in and what reward you'll receive given your current state and action. The agent then uses this model to plan ahead, like thinking through the consequences of future actions. This can be sample-efficient but requires the agent to learn an accurate model. Model-free algorithms skip learning an explicit model and instead learn directly from experience. The agent takes actions, observes outcomes, and gradually learns which actions tend to produce good rewards in which situations. Examples include Q-learning and policy gradient methods. This approach requires more samples but avoids the complexity of learning an accurate environment model. Applications Reinforcement learning has powered remarkable advances in autonomous vehicles (learning to navigate safely), game playing (achieving superhuman performance in chess, Go, and video games), and robotics (learning complex manipulation tasks). Learning Methods and Algorithms Now that we've covered the major learning paradigms, let's examine specific algorithms and techniques that implement these approaches. Supervised Learning Methods Decision tree ensembles combine predictions from multiple decision trees to create more robust and accurate models. Individual decision trees can be prone to overfitting (learning the training data too well, including its noise), but when many trees are combined—each trained slightly differently—their aggregate predictions tend to generalize better to new data. Tree ensembles are particularly effective for both classification and regression problems. Support vector machines (SVMs) are powerful supervised learning algorithms that work by finding the optimal separating hyperplane—a decision boundary that maximizes the margin (the distance between the boundary and the nearest training examples). By maximizing this margin, SVMs find a decision boundary that separates the two classes as cleanly as possible while still being as far as possible from any individual training point. This geometric approach often produces excellent generalization to new data. Random forest regressors apply the ensemble idea specifically to regression problems. Multiple decision trees are trained on different subsets of the data, and their predictions are aggregated (typically averaged) to produce a final prediction. This aggregation reduces variance and often yields more robust regression estimates than individual trees. Unsupervised Feature Learning Autoencoders are neural network architectures that learn compressed representations of data. An autoencoder has a bottleneck structure: input data flows through layers that progressively compress it into a small representation (the code), then through layers that decompress it back to the original input size. By forcing the network to reconstruct the input from this compressed representation, it learns which aspects of the input are most important to preserve—automatically discovering useful features without labels. Contrastive self-supervised learning trains models to tell the difference between similar and dissimilar data pairs, all without requiring manual labels. The algorithm learns to create representations where similar items are close together and dissimilar items are far apart in representation space. This approach leverages the vast amount of unlabelled data available and has become increasingly important in modern machine learning. Reinforcement Learning Foundations Markov decision processes formalize the problem setting, as described earlier. They provide the mathematical framework within which reinforcement learning algorithms operate. Temporal-difference learning is a fundamental reinforcement learning technique for estimating the value of different states (how much reward an agent can expect starting from that state). It updates value estimates by comparing the reward actually observed with the reward that was predicted, allowing the agent to gradually learn accurate value estimates through experience. <extrainfo> Other Approaches Feature Learning Feature learning (also called representation learning) aims to automatically discover the useful representations of inputs, replacing the traditional process of manual feature engineering where human experts manually decide which features to compute. This is particularly important because the choice of features often determines the success or failure of a machine learning system. Representation learning can be either: Supervised representation learning: Learning useful representations from labelled data Unsupervised representation learning: Learning useful representations from unlabelled data Sparse coding is a feature learning technique that learns representations containing many zeros. Sparsity (lots of zeros) encourages the representation to be compact and efficient, using fewer non-zero elements to represent the data. Anomaly Detection Anomaly detection identifies rare items or events that differ significantly from the majority of the data. Applications include fraud detection (finding unusual credit card transactions), network intrusion detection (identifying suspicious network activity), and quality control (detecting defective products). Different approaches exist depending on what data you have available: Unsupervised anomaly detection: Assumes that most data are normal and anomalies are rare. The algorithm learns the normal pattern and flags significant deviations. Supervised anomaly detection: Trains on labelled examples of both normal and abnormal cases, learning to classify new examples as one or the other. Semi-supervised anomaly detection: Models normal behaviour using unlabelled or labelled normal data, then tests whether new observations deviate significantly from this learned normal pattern. Association Rules Association rule learning discovers relationships and dependencies between variables in large databases. For example, it might discover that customers who buy bread often also buy milk. These relationships are evaluated using measures of "interestingness" that balance how frequently the pattern occurs with how surprising or informative it is. This approach is commonly used in market basket analysis and recommendation systems. Rule-Based Learning Rule-based learning identifies, learns, or evolves relational rules that capture domain knowledge. These symbolic rules (often in the form "if X then Y") are interpretable and can be directly applied to make decisions or generate explanations. Rule-based learning bridges machine learning with knowledge representation and is particularly valued when interpretability is important. </extrainfo>

Flashcards

What kind of data is used to build a model in supervised learning?

Training data containing both inputs and desired outputs.

In supervised learning, what is the difference between classification and regression?

Classification predicts categorical labels, while regression predicts continuous numerical values.

What is the primary goal of unsupervised learning?

To discover structure in data without labelled examples.

What are the central tasks of unsupervised learning?

Clustering Dimensionality reduction Density estimation

How does clustering organize observations?

Into subsets (clusters) that are internally similar and externally dissimilar.

How does self-supervised learning obtain a supervisory signal?

It generates its own signal from the data itself.

How does semi-supervised learning improve accuracy?

By using a small set of labelled examples together with many unlabelled examples.

What type of labels does weakly supervised learning typically work with?

Noisy, limited, or imprecise labels.

What is the primary objective of a reinforcement learning agent?

To maximise cumulative reward while interacting with an environment.

What mathematical framework is typically used to model environments in reinforcement learning?

Markov decision processes ($MDPs$).

What is the difference between model-based and model-free reinforcement learning algorithms?

Model-based algorithms use an explicit model of the environment, while model-free algorithms learn directly from experience.

What is the main benefit of feature learning over traditional methods?

It discovers useful representations of inputs, replacing manual feature engineering.

In representation learning, what determines if the process is supervised or unsupervised?

Whether it uses labelled data (supervised) or unlabelled data (unsupervised).

What is the defining characteristic of sparse coding representations?

They contain many zeros to encourage compactness.

What are the three main technical approaches to anomaly detection?

Unsupervised methods (assuming most data are normal) Supervised methods (training on labelled normal/abnormal examples) Semi-supervised methods (modelling normal behavior and testing deviations)

Why are multiple decision trees combined into an ensemble?

To improve predictive accuracy and reduce overfitting.

How do support-vector machines ($SVMs$) construct optimal separating hyperplanes?

By maximizing the margin between classes.

How do random forest regressors perform robust regression?

By aggregating predictions from many individual decision trees.

How do autoencoders learn compressed representations of data?

By reconstructing input data through a bottleneck layer.

What is the objective of contrastive self-supervised learning?

To train models to differentiate between similar and dissimilar data pairs without labels.

In a Markov decision process ($MDP$), what determines the future state?

The future state depends only on the current state and action.

Quiz

Machine learning - Learning Paradigms and Algorithms Quiz Question 1: Which supervised learning task predicts continuous numerical values?

Regression (correct)
Classification
Clustering
Ranking

Machine learning - Learning Paradigms and Algorithms Quiz Question 2: What does similarity learning measure between two objects?

How alike the objects are (correct)
Their absolute distance in feature space
Their classification error
Their reinforcement reward

Machine learning - Learning Paradigms and Algorithms Quiz Question 3: Which of the following is a central task of unsupervised learning?

Clustering (correct)
Classification
Policy optimization
Supervised regression

Machine learning - Learning Paradigms and Algorithms Quiz Question 4: How does self‑supervised learning obtain its supervisory signal?

It generates the signal from the data itself (correct)
It uses external human labels
It relies on reward feedback from an environment
It employs a pre‑trained teacher model

Machine learning - Learning Paradigms and Algorithms Quiz Question 5: What is the main objective of reinforcement learning?

Maximise cumulative reward (correct)
Minimise classification error
Reduce dimensionality of data
Discover clusters in unlabeled data

Machine learning - Learning Paradigms and Algorithms Quiz Question 6: How are environments typically modelled in reinforcement learning?

As Markov decision processes (correct)
As static databases
As linear regression models
As convolutional neural networks

Machine learning - Learning Paradigms and Algorithms Quiz Question 7: Which type of reinforcement‑learning algorithm learns directly from experience without an explicit model?

Model‑free (correct)
Model‑based
Supervised
Unsupervised

Machine learning - Learning Paradigms and Algorithms Quiz Question 8: Which of the following is a common application of reinforcement learning?

Autonomous vehicles (correct)
Market‑basket analysis
Principal component analysis
Sparse coding for image compression

Machine learning - Learning Paradigms and Algorithms Quiz Question 9: What does feature learning aim to replace?

Manual feature engineering (correct)
Supervised loss functions
Reinforcement reward signals
Dimensionality reduction algorithms

Machine learning - Learning Paradigms and Algorithms Quiz Question 10: What characteristic does sparse coding encourage in learned representations?

Many zeros (sparsity) (correct)
High dimensionality
Dense, non‑zero vectors
Sequential ordering of features

Machine learning - Learning Paradigms and Algorithms Quiz Question 11: Which anomaly‑detection approach models normal behaviour and tests deviations?

Semi‑supervised (correct)
Unsupervised
Supervised
Reinforcement‑based

Machine learning - Learning Paradigms and Algorithms Quiz Question 12: What do rule‑based learning algorithms generate?

Relational rules (correct)
Decision tree ensembles
Neural network weights
Markov decision processes

Machine learning - Learning Paradigms and Algorithms Quiz Question 13: What do support‑vector machines maximize when constructing a separating hyperplane?

The margin between classes (correct)
The number of support vectors
The depth of the decision tree
The reconstruction error

Machine learning - Learning Paradigms and Algorithms Quiz Question 14: What do random forest regressors aggregate to perform robust regression?

Predictions from many decision trees (correct)
Gradient updates from a single tree
Similarity scores between instances
Policy evaluations from reinforcement agents

Machine learning - Learning Paradigms and Algorithms Quiz Question 15: In a Markov decision process, future states depend on which elements?

The current state and action (correct)
All previous states and actions
Only the initial state
External reward signals alone

Which supervised learning task predicts continuous numerical values?

1 of 15

Key Concepts

Learning Paradigms

Supervised learning

Unsupervised learning

Reinforcement learning

Semi‑supervised learning

Self‑supervised learning

Data Processing Techniques

Feature learning

Anomaly detection

Association rule learning

Decision tree ensemble

Support‑vector machine

Autoencoder

Decision-Making Models

Markov decision process

Definitions

Supervised learning

A machine‑learning approach that builds predictive models from labeled training data containing input–output pairs.

Unsupervised learning

A class of algorithms that discover hidden structure in data without using labeled examples.

Reinforcement learning

A paradigm where agents learn to maximize cumulative reward through interaction with an environment, often modeled as a Markov decision process.

Semi‑supervised learning

Techniques that improve model performance by combining a small set of labeled data with a larger pool of unlabeled data.

Self‑supervised learning

Methods that generate supervisory signals from the data itself, enabling representation learning without external labels.

Feature learning

The process of automatically discovering useful data representations, reducing the need for manual feature engineering.

Anomaly detection

The identification of rare or unusual observations that deviate significantly from the majority of data.

Association rule learning

Algorithms that uncover interesting relationships between variables in large databases based on measures of support and confidence.

Decision tree ensemble

A collection of decision trees whose predictions are combined to increase accuracy and reduce overfitting.

Support‑vector machine

A supervised learning model that constructs optimal separating hyperplanes by maximizing the margin between classes.

Autoencoder

A neural network that learns compressed data representations by encoding inputs into a bottleneck layer and reconstructing them.

Markov decision process

A mathematical framework for modeling decision‑making where outcomes depend only on the current state and action.