{"id":42761,"date":"2026-03-14T11:04:18","date_gmt":"2026-03-14T06:04:18","guid":{"rendered":"https:\/\/afzalbadshah.com\/?p=42761"},"modified":"2026-03-14T11:04:20","modified_gmt":"2026-03-14T06:04:20","slug":"mathematical-foundations-and-data-representation-in-artificial-intelligence","status":"publish","type":"post","link":"https:\/\/afzalbadshah.com\/index.php\/2026\/03\/14\/mathematical-foundations-and-data-representation-in-artificial-intelligence\/","title":{"rendered":"Mathematical Foundations and Data Representation in Artificial Intelligence"},"content":{"rendered":"\n<p>Artificial Intelligence systems appear intelligent because they can recognize patterns, make predictions, and support decision-making. However, AI does not directly understand images, speech, language, or human behavior. Instead, these forms of information must first be converted into <strong>numerical representations<\/strong> that machines can process.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Mathematics as the Language of Artificial Intelligence<\/h2>\n\n\n\n<p>Mathematics therefore becomes the fundamental language through which AI systems operate. Every AI model stores data, transforms it, compares patterns, and improves predictions through mathematical operations.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Artificial Intelligence relies on mathematics to represent data, measure relationships, handle uncertainty, and learn from experience.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">1.1 Why AI Systems Depend on Mathematics<\/h3>\n\n\n\n<p>Computers cannot interpret real-world information directly. They operate on numbers according to well-defined rules. For this reason, every intelligent system must rely on mathematics to perform its tasks.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>AI systems depend on mathematics because machines can only process information when it is expressed in numerical form.<\/p>\n<\/blockquote>\n\n\n\n<p>Consider a simple example. When a smartphone unlocks using facial recognition, the system does not &#8220;see a face&#8221; in the human sense. Instead, it analyzes numerical patterns extracted from the image, such as distances between facial landmarks or pixel intensity values. These numbers are compared mathematically with stored patterns to determine whether the face belongs to the authorized user.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"427\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?resize=640%2C427&#038;ssl=1\" alt=\"\" class=\"wp-image-42782\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?resize=1024%2C683&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?resize=300%2C200&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?resize=768%2C512&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?resize=405%2C270&amp;ssl=1 405w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?w=1536&amp;ssl=1 1536w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-8.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">How AI recognizes your face<\/figcaption><\/figure>\n\n\n\n<p>AI systems depend on mathematics mainly for four purposes: representing information, measuring relationships between data, transforming inputs into meaningful outputs, and improving predictions through learning.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.2 From Data to Mathematical Representation<\/h3>\n\n\n\n<p>Before an AI system can learn from information, the data must first be converted into a numerical form.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Data representation is the process of transforming real-world information into structured numerical form so that it can be processed by an AI system.<\/p>\n<\/blockquote>\n\n\n\n<p>Different types of data require different representation methods.<\/p>\n\n\n\n<p>Text data is typically converted into numbers by assigning numerical identifiers to words or by counting their occurrences. For example, if a spam detection system monitors the words <strong>\u201cfree\u201d<\/strong>, <strong>\u201coffer\u201d<\/strong>, and <strong>\u201cwinner\u201d<\/strong>, a message containing the phrase <strong>\u201cfree offer today\u201d<\/strong> may be represented as the vector:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"226\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/text-to-matrix.png?resize=640%2C226&#038;ssl=1\" alt=\"\" class=\"wp-image-42787\" style=\"width:688px;height:auto\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/text-to-matrix.png?resize=1024%2C362&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/text-to-matrix.png?resize=300%2C106&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/text-to-matrix.png?resize=768%2C272&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/text-to-matrix.png?resize=604%2C214&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/text-to-matrix.png?w=1066&amp;ssl=1 1066w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Text to Vector<\/figcaption><\/figure>\n\n\n\n<p><strong>x = [1, 1, 0]<\/strong><\/p>\n\n\n\n<p>where each value indicates whether a particular word appears in the message.<\/p>\n\n\n\n<p>Images provide another clear example of numerical representation. A digital image consists of pixels, and each pixel stores a number representing color intensity. In a grayscale image, pixel values typically range from:<\/p>\n\n\n\n<p><strong>0 \u2264 p\u1d62\u2c7c \u2264 255<\/strong><\/p>\n\n\n\n<p>where <strong>p\u1d62\u2c7c<\/strong> represents the intensity of the pixel located at row <strong>i<\/strong> and column <strong>j<\/strong>.<\/p>\n\n\n\n<p>A small grayscale image may therefore appear mathematically as the matrix <strong>I<\/strong>:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"208\" height=\"82\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image.png?resize=208%2C82&#038;ssl=1\" alt=\"\" class=\"wp-image-42764\"\/><\/figure>\n\n\n\n<p>This matrix representation allows algorithms to analyze patterns within the image using linear algebra operations.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"254\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figur-to-metrix.png?resize=640%2C254&#038;ssl=1\" alt=\"\" class=\"wp-image-42788\" style=\"width:657px;height:auto\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figur-to-metrix.png?resize=1024%2C406&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figur-to-metrix.png?resize=300%2C119&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figur-to-metrix.png?resize=768%2C304&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figur-to-metrix.png?resize=604%2C239&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figur-to-metrix.png?w=1113&amp;ssl=1 1113w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Figure to metix<\/figcaption><\/figure>\n\n\n\n<p>Sensor data is also represented numerically. For instance, an autonomous vehicle may receive values such as speed, distance to nearby vehicles, steering angle, and GPS coordinates. These measurements can be written as a vector:<\/p>\n\n\n\n<p><strong>s = [72, 3.5, 0.2, 15, 31.5204, 74.3587]<\/strong><\/p>\n\n\n\n<p>Through such representations, complex real-world information becomes suitable for computational analysis.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"124\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/sensor-data.png?resize=640%2C124&#038;ssl=1\" alt=\"\" class=\"wp-image-42789\" style=\"width:770px;height:auto\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/sensor-data.png?resize=1024%2C198&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/sensor-data.png?resize=300%2C58&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/sensor-data.png?resize=768%2C149&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/sensor-data.png?resize=604%2C117&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/sensor-data.png?w=1271&amp;ssl=1 1271w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Sensor data to Vector<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Examples of Data Representation in Artificial Intelligence<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Data Type<\/th><th>Real World Example<\/th><th>Mathematical Representation<\/th><th>Example<\/th><\/tr><\/thead><tbody><tr><td>Text<\/td><td>Spam detection message<\/td><td>Vector<\/td><td>x = [1, 1, 0]<\/td><\/tr><tr><td>Image (grayscale)<\/td><td>3\u00d73 pixel image<\/td><td>Matrix<\/td><td>See matrix I above<\/td><\/tr><tr><td>Sensor data<\/td><td>Autonomous vehicle sensors<\/td><td>Feature vector<\/td><td>s = [72, 3.5, 0.2, 15, 31.5204, 74.3587]<\/td><\/tr><tr><td>Dataset<\/td><td>Student performance records<\/td><td>Data matrix<\/td><td>See matrix X below<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Example dataset matrix <strong>X<\/strong>:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"138\" height=\"86\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-2.png?resize=138%2C86&#038;ssl=1\" alt=\"\" class=\"wp-image-42767\" style=\"width:375px;height:auto\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"239\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/audio-to-metrix.png?resize=640%2C239&#038;ssl=1\" alt=\"\" class=\"wp-image-42790\" style=\"width:664px;height:auto\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/audio-to-metrix.png?resize=1024%2C383&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/audio-to-metrix.png?resize=300%2C112&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/audio-to-metrix.png?resize=768%2C287&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/audio-to-metrix.png?resize=604%2C226&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/audio-to-metrix.png?w=1076&amp;ssl=1 1076w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Audio to vector<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">1.3 How AI Models Use Mathematical Operations<\/h3>\n\n\n\n<p>Once data has been represented numerically, AI models apply mathematical operations to discover patterns and generate predictions.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>An AI model can be viewed as a mathematical function that transforms input data into useful outputs.<\/p>\n<\/blockquote>\n\n\n\n<p>For example, a house price prediction model may take inputs such as house size and number of bedrooms. The model estimates the price using a function:<\/p>\n\n\n\n<p><strong>Price = f(Size, Bedrooms)<\/strong><\/p>\n\n\n\n<p>In many machine learning systems, predictions are produced using weighted combinations of input features. A simple linear model can be expressed as:<\/p>\n\n\n\n<p><strong>z = w\u2081x\u2081 + w\u2082x\u2082 + w\u2083x\u2083 + b<\/strong><\/p>\n\n\n\n<p>where <strong>x\u2081, x\u2082, x\u2083<\/strong> represent input features, <strong>w\u2081, w\u2082, w\u2083<\/strong> are learned weights, and <strong>b<\/strong> is a bias parameter.<\/p>\n\n\n\n<p>Neural networks extend this idea further using matrix operations. A typical neural network layer performs the transformation:<\/p>\n\n\n\n<p><strong>z = Wx + b<\/strong><\/p>\n\n\n\n<p>where <strong>W<\/strong> is a weight matrix and <strong>x<\/strong> is the input vector.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"343\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?resize=640%2C343&#038;ssl=1\" alt=\"\" class=\"wp-image-42792\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?resize=1024%2C548&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?resize=300%2C161&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?resize=768%2C411&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?resize=504%2C270&amp;ssl=1 504w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?w=1412&amp;ssl=1 1412w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/scalar-and-vector.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Linear and Neural Network Layer<\/figcaption><\/figure>\n\n\n\n<p>Probability calculations are also frequently used. A classification model may produce an estimate such as:<\/p>\n\n\n\n<p><strong>P(spam) = 0.87<\/strong><\/p>\n\n\n\n<p>This means the model estimates an <strong>87% probability<\/strong> that the message is spam.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1.4 Overview of Mathematical Tools Used in AI<\/h3>\n\n\n\n<p>Several branches of mathematics contribute to the functioning of AI systems. The primary mathematical foundations of AI include linear algebra, probability, statistics, calculus, and optimization.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Linear algebra<\/strong> provides structures such as vectors, matrices, and tensors that represent data and support efficient computation.<\/li>\n\n\n\n<li><strong>Probability <\/strong>allows AI systems to reason under uncertainty and estimate the likelihood of events.<\/li>\n\n\n\n<li><strong>Statistics <\/strong>helps analyze datasets, summarize patterns, and evaluate the reliability of models.<\/li>\n\n\n\n<li><strong>Calculus <\/strong>is used in learning algorithms, particularly when adjusting model parameters to reduce prediction error.<\/li>\n\n\n\n<li><strong>Optimization <\/strong>techniques guide the process of finding the best parameters that allow the model to perform accurately.<\/li>\n<\/ol>\n\n\n\n<h1 class=\"wp-block-heading\">Part I \u2014 Representing Data Mathematically<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">2. Data Representation in Artificial Intelligence<\/h2>\n\n\n\n<p>Artificial Intelligence systems depend on mathematical structures to process information. Real-world data such as images, text, sensor measurements, and financial records must first be translated into numerical form before machine learning algorithms can analyze them.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Data representation is the process of converting real-world observations into numerical structures that computational models can process and learn from.<\/p>\n<\/blockquote>\n\n\n\n<p>Through numerical representation, AI models can measure relationships between data points, detect patterns, and generate predictions. This transformation allows diverse types of information to be handled using the same mathematical operations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2.1 Numbers as Machine-Readable Information<\/h3>\n\n\n\n<p>Computers operate entirely on numbers. Every form of information stored inside a computer system is ultimately represented using numerical values.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Numbers are the fundamental units through which machines interpret and process information.<\/p>\n<\/blockquote>\n\n\n\n<p>For example, a temperature sensor may record a measurement such as:<\/p>\n\n\n\n<p><strong>T = 32.5<\/strong><\/p>\n\n\n\n<p>Similarly, the speed of a vehicle may be recorded as:<\/p>\n\n\n\n<p><strong>v = 72<\/strong><\/p>\n\n\n\n<p>indicating that the vehicle is traveling at <strong>72 km\/h<\/strong>.<\/p>\n\n\n\n<p>Even complex forms of information are broken down into numerical components:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>audio signals \u2192 amplitude values<\/li>\n\n\n\n<li>images \u2192 pixel intensity values<\/li>\n\n\n\n<li>text \u2192 numerical word identifiers<\/li>\n<\/ul>\n\n\n\n<p>Thus, numbers form the basic language that computers use to interpret the world.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2.2 Encoding Real-World Data into Numerical Form<\/h3>\n\n\n\n<p>Before AI models can analyze information, real-world observations must be encoded numerically.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Encoding is the process of translating real-world data into numerical values that represent meaningful features.<\/p>\n<\/blockquote>\n\n\n\n<p>Consider a movie recommendation system tracking three genres:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Action<\/li>\n\n\n\n<li>Comedy<\/li>\n\n\n\n<li>Drama<\/li>\n<\/ul>\n\n\n\n<p>A user&#8217;s preference can be represented as a vector:<\/p>\n\n\n\n<p><strong>x = [1, 0, 1]<\/strong><\/p>\n\n\n\n<p>This indicates the user likes <strong>Action and Drama<\/strong> but not <strong>Comedy<\/strong>.<\/p>\n\n\n\n<p>In natural language processing, words can also be converted into numbers.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Word<\/th><th>Numeric ID<\/th><\/tr><\/thead><tbody><tr><td>free<\/td><td>1<\/td><\/tr><tr><td>offer<\/td><td>2<\/td><\/tr><tr><td>winner<\/td><td>3<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The message <strong>\u201cfree offer today\u201d<\/strong> may therefore be represented as:<\/p>\n\n\n\n<p><strong>[1, 2]<\/strong><\/p>\n\n\n\n<p>These numerical encodings allow algorithms to process language mathematically.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2.3 Feature Representation in AI Systems<\/h3>\n\n\n\n<p>Machine learning models learn from <strong>features<\/strong>, which describe measurable properties of the data.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A feature is a measurable attribute or characteristic used by an AI system to represent data.<\/p>\n<\/blockquote>\n\n\n\n<p>For example, a house price prediction model might use:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>house size<\/li>\n\n\n\n<li>number of bedrooms<\/li>\n\n\n\n<li>location score<\/li>\n<\/ul>\n\n\n\n<p>These features can be written as a feature vector:<\/p>\n\n\n\n<p><strong>x = [2000, 3, 8]<\/strong><\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>2000 = house size (square feet)<\/li>\n\n\n\n<li>3 = number of bedrooms<\/li>\n\n\n\n<li>8 = location score<\/li>\n<\/ul>\n\n\n\n<p>When many observations are collected, they form a dataset represented as a matrix.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"165\" height=\"107\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-3.png?resize=165%2C107&#038;ssl=1\" alt=\"\" class=\"wp-image-42768\" style=\"width:268px;height:auto\"\/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>each <strong>row represents a data record<\/strong><\/li>\n\n\n\n<li>each <strong>column represents a feature<\/strong><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">2.4 Example: Representing Images, Text, and Sensor Data<\/h3>\n\n\n\n<p>Different types of real-world data are represented using mathematical structures.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Image Representation<\/h4>\n\n\n\n<p>A grayscale image is represented as a <strong>matrix of pixel values<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"196\" height=\"83\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-4.png?resize=196%2C83&#038;ssl=1\" alt=\"\" class=\"wp-image-42769\" style=\"width:342px;height:auto\"\/><\/figure>\n\n\n\n<p>Each value represents the <strong>intensity of a pixel<\/strong>.<\/p>\n\n\n\n<p>Color images extend this concept using three channels (Red, Green, Blue).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Text Representation<\/h4>\n\n\n\n<p>Text data can be represented using vectors indicating word occurrences.<\/p>\n\n\n\n<p>If a spam filter tracks the words:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>free<\/li>\n\n\n\n<li>offer<\/li>\n\n\n\n<li>winner<\/li>\n<\/ul>\n\n\n\n<p>then the message <strong>\u201cfree offer today\u201d<\/strong> can be represented as:<\/p>\n\n\n\n<p><strong>x = [1, 1, 0]<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Sensor Data Representation<\/h4>\n\n\n\n<p>Autonomous systems rely on numerical sensor measurements.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<p><strong>s = [72, 3.5, 0.2, 15, 31.5204, 74.3587]<\/strong><\/p>\n\n\n\n<p>These values may represent:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>speed<\/li>\n\n\n\n<li>distance to obstacle<\/li>\n\n\n\n<li>steering angle<\/li>\n\n\n\n<li>acceleration<\/li>\n\n\n\n<li>GPS latitude<\/li>\n\n\n\n<li>GPS longitude<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>By representing diverse real-world information as vectors, matrices, and tensors, AI systems transform complex environments into mathematical structures that algorithms can analyze and learn from.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">3. Vectors<\/h2>\n\n\n\n<p>Vectors play a central role in Artificial Intelligence because they represent structured data records.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A vector is an ordered collection of numerical values used to represent features or data points in AI systems.<\/p>\n<\/blockquote>\n\n\n\n<p>Vectors allow algorithms to compare observations, compute similarity, and perform transformations using linear algebra.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3.1 Scalars and Feature Values<\/h3>\n\n\n\n<p>A <strong>scalar<\/strong> represents a single numerical value.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<p><strong>x = 72<\/strong><\/p>\n\n\n\n<p><strong>p = 0.85<\/strong><\/p>\n\n\n\n<p>Scalars often represent individual feature values such as speed, probability, or temperature.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3.2 Vectors as Data Records<\/h3>\n\n\n\n<p>A vector can represent an entire data record.<\/p>\n\n\n\n<p>Example: student performance data.<\/p>\n\n\n\n<p><strong>x = [5, 80, 75]<\/strong><\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>5 = study hours<\/li>\n\n\n\n<li>80 = attendance<\/li>\n\n\n\n<li>75 = assignment score<\/li>\n<\/ul>\n\n\n\n<p>Each element corresponds to a feature describing the record.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3.3 Vector Operations in AI<\/h3>\n\n\n\n<p>AI algorithms frequently perform mathematical operations on vectors.<\/p>\n\n\n\n<p>Common operations include:<\/p>\n\n\n\n<p>Vector addition<br><strong>a + b<\/strong><\/p>\n\n\n\n<p>Scalar multiplication<br><strong>c \u00d7 x<\/strong><\/p>\n\n\n\n<p>Dot product<br><strong>a \u00b7 b<\/strong><\/p>\n\n\n\n<p>The dot product measures how closely two vectors are related.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3.4 Example: Measuring Similarity Between Data Points<\/h3>\n\n\n\n<p>Suppose two students are represented by vectors:<\/p>\n\n\n\n<p><strong>x\u2081 = [5, 80, 75]<\/strong><\/p>\n\n\n\n<p><strong>x\u2082 = [6, 78, 82]<\/strong><\/p>\n\n\n\n<p>AI systems may measure similarity using <strong>cosine similarity<\/strong>:<\/p>\n\n\n\n<p>cos(\u03b8) = (x\u2081 \u00b7 x\u2082) \/ (||x\u2081|| ||x\u2082||)<\/p>\n\n\n\n<p>This technique is widely used in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>recommendation systems<\/li>\n\n\n\n<li>search engines<\/li>\n\n\n\n<li>document similarity detection<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">4. Matrices<\/h2>\n\n\n\n<p>Matrices extend vectors by organizing multiple records into rows and columns.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A matrix is a rectangular arrangement of numbers organized into rows and columns.<\/p>\n<\/blockquote>\n\n\n\n<p>Matrices are widely used for storing datasets and performing large-scale computations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4.1 Dataset Representation Using Matrices<\/h3>\n\n\n\n<p>Example dataset:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"176\" height=\"105\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-5.png?resize=176%2C105&#038;ssl=1\" alt=\"\" class=\"wp-image-42771\" style=\"width:268px;height:auto\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">4.2 Matrix Operations in Machine Learning<\/h3>\n\n\n\n<p>Machine learning frequently uses operations such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>matrix addition<\/li>\n\n\n\n<li>matrix multiplication<\/li>\n\n\n\n<li>matrix transpose<\/li>\n<\/ul>\n\n\n\n<p>These operations enable efficient processing of large datasets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">4.3 Matrix Multiplication in Neural Networks<\/h3>\n\n\n\n<p>Neural networks perform transformations using matrix multiplication.<\/p>\n\n\n\n<p>A neural layer can be expressed as:<\/p>\n\n\n\n<p><strong>z = Wx + b<\/strong><\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>W<\/strong> = weight matrix<\/li>\n\n\n\n<li><strong>x<\/strong> = input vector<\/li>\n\n\n\n<li><strong>b<\/strong> = bias vector<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">4.4 Example: Data Transformation in AI Models<\/h3>\n\n\n\n<p>Suppose a model uses the transformation<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"263\" height=\"62\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/image-6.png?resize=263%2C62&#038;ssl=1\" alt=\"\" class=\"wp-image-42772\" style=\"width:579px;height:auto\"\/><\/figure>\n\n\n\n<p>Then the output vector becomes<\/p>\n\n\n\n<p><strong>z = Wx<\/strong><\/p>\n\n\n\n<p>which transforms the input into a new feature representation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5. Tensors<\/h2>\n\n\n\n<p>Modern AI systems often work with multidimensional data structures.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A tensor is a multidimensional array used to represent complex data structures in machine learning and deep learning.<\/p>\n<\/blockquote>\n\n\n\n<p>Tensors extend vectors and matrices to higher dimensions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5.1 From Vectors and Matrices to Tensors<\/h3>\n\n\n\n<p>Data structures can be organized as:<\/p>\n\n\n\n<p>Scalar \u2192 single number<br>Vector \u2192 one-dimensional array<br>Matrix \u2192 two-dimensional array<br>Tensor \u2192 multidimensional array<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">5.2 Tensor Dimensions and Data Shapes<\/h2>\n\n\n\n<p>Each tensor has a <strong>shape<\/strong> describing its dimensions.<\/p>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<p>Vector shape \u2192 (n)<br>Matrix shape \u2192 (m, n)<br>Image tensor \u2192 (height, width, channels)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5.3 Tensors in Image Processing<\/h3>\n\n\n\n<p>Color images are typically represented as <strong>three-dimensional tensors<\/strong>.<\/p>\n\n\n\n<p>Image shape:<\/p>\n\n\n\n<p><strong>(height \u00d7 width \u00d7 3)<\/strong><\/p>\n\n\n\n<p>The three channels correspond to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Red<\/li>\n\n\n\n<li>Green<\/li>\n\n\n\n<li>Blue<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"194\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?resize=640%2C194&#038;ssl=1\" alt=\"\" class=\"wp-image-42795\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?resize=1024%2C310&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?resize=300%2C91&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?resize=768%2C232&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?resize=604%2C183&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?w=1427&amp;ssl=1 1427w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/tesnsor-1.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Tensor representation<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">5.4 Tensors in Deep Learning Frameworks<\/h3>\n\n\n\n<p>Deep learning libraries such as <strong>TensorFlow<\/strong> and <strong>PyTorch<\/strong> perform computations using tensors. These models learn patterns by applying mathematical operations to tensors representing input data, weights, and intermediate outputs. Through tensor operations, neural networks process large datasets efficiently.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"427\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?resize=640%2C427&#038;ssl=1\" alt=\"\" class=\"wp-image-42796\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?resize=1024%2C683&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?resize=300%2C200&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?resize=768%2C512&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?resize=405%2C270&amp;ssl=1 405w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?w=1536&amp;ssl=1 1536w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/figure-to-tensor.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Figure to Tensor<\/figcaption><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">Part II \u2014 Handling Uncertainty in Artificial Intelligence<\/h1>\n\n\n\n<p>Artificial Intelligence systems frequently operate in environments where information is incomplete or uncertain. Real-world data is rarely perfect, and AI models must often make decisions based on probabilities rather than certainty.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A spam filter cannot guarantee that a message is spam.<\/li>\n\n\n\n<li>A medical diagnosis system cannot be absolutely certain about a disease.<\/li>\n\n\n\n<li>A recommendation system cannot perfectly predict user preferences.<\/li>\n<\/ul>\n\n\n\n<p>Instead, AI systems estimate <strong>likelihoods<\/strong> and make decisions based on probabilities.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Probability provides the mathematical framework that allows AI systems to reason and make decisions under uncertainty.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">6. Probability in AI Systems<\/h2>\n\n\n\n<p>Probability theory allows AI systems to quantify uncertainty and make predictions about possible outcomes. Instead of producing only deterministic answers, AI models often produce <strong>probabilistic estimates<\/strong>.<\/p>\n\n\n\n<p>For example, a classifier may produce the prediction:<\/p>\n\n\n\n<p><strong>P(spam) = 0.87<\/strong><\/p>\n\n\n\n<p>This means the system estimates that there is an <strong>87% probability that the message is spam<\/strong>.<\/p>\n\n\n\n<p>Such probabilistic outputs allow AI systems to handle ambiguous situations where multiple outcomes are possible.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6.1 Why AI Needs Probability<\/h3>\n\n\n\n<p>Real-world environments contain uncertainty, noise, and incomplete information. AI models must therefore reason about possibilities rather than absolute truths.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Probability enables AI systems to represent uncertainty and evaluate the likelihood of different outcomes.<\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\">Example: Recommendation Systems<\/h4>\n\n\n\n<p>Recommendation systems estimate the likelihood that a user will enjoy a particular item.<\/p>\n\n\n\n<p>Example prediction:<\/p>\n\n\n\n<p><strong>P(user likes movie) = 0.73<\/strong><\/p>\n\n\n\n<p>This probability-based reasoning allows AI systems to operate effectively in uncertain environments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6.2 Probability for Decision Making<\/h3>\n\n\n\n<p>Many AI systems make decisions based on the probability of different outcomes.<\/p>\n\n\n\n<p>Suppose an email classification system produces the following probabilities:<\/p>\n\n\n\n<p><strong>P(spam) = 0.87<\/strong><br><strong>P(not spam) = 0.13<\/strong><\/p>\n\n\n\n<p>The system chooses the label with the <strong>highest probability<\/strong>, which in this case is <strong>spam<\/strong>.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>AI decision-making often involves selecting the outcome with the highest estimated probability.<\/p>\n<\/blockquote>\n\n\n\n<h4 class=\"wp-block-heading\">Example: Risk-Based Decisions<\/h4>\n\n\n\n<p>In some applications, decisions also consider <strong>risk and consequences<\/strong>.<\/p>\n\n\n\n<p>For example, a medical diagnosis system might evaluate:<\/p>\n\n\n\n<p><strong>P(disease) = 0.35<\/strong><\/p>\n\n\n\n<p>Even if the probability is moderate, doctors may still recommend further testing to avoid risk.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6.3 Conditional Probability in AI Models<\/h3>\n\n\n\n<p>Many AI systems depend on <strong>conditional probability<\/strong>, which describes the probability of an event given that another event has already occurred.<\/p>\n\n\n\n<p>Conditional probability is written as:<\/p>\n\n\n\n<p><strong>P(A | B)<\/strong><\/p>\n\n\n\n<p>This expression means:<\/p>\n\n\n\n<p>the probability of event A occurring given that event B has occurred.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example: Spam Filtering<\/h4>\n\n\n\n<p>Let<\/p>\n\n\n\n<p>A = message is spam<br>B = message contains the word \u201cfree\u201d<\/p>\n\n\n\n<p>Then<\/p>\n\n\n\n<p><strong>P(spam | contains &#8220;free&#8221;)<\/strong><\/p>\n\n\n\n<p>represents the probability that an email is spam <strong>given that it contains the word \u201cfree\u201d.<\/strong><\/p>\n\n\n\n<p>Conditional probability is widely used in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>spam filtering<\/li>\n\n\n\n<li>speech recognition<\/li>\n\n\n\n<li>recommendation systems<\/li>\n\n\n\n<li>medical diagnosis<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6.4 Example: Spam Detection and Medical Diagnosis<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">Spam Detection<\/h4>\n\n\n\n<p>Suppose a dataset shows that:<\/p>\n\n\n\n<p><strong>P(spam) = 0.40<\/strong><\/p>\n\n\n\n<p>meaning <strong>40% of emails are spam<\/strong>.<\/p>\n\n\n\n<p>Now assume the probability that a spam message contains the word <strong>\u201cfree\u201d<\/strong> is:<\/p>\n\n\n\n<p><strong>P(&#8220;free&#8221; | spam) = 0.70<\/strong><\/p>\n\n\n\n<p>If a message contains the word <strong>\u201cfree\u201d<\/strong>, the probability that the message is spam increases.<\/p>\n\n\n\n<p>Spam filters use such probability relationships to classify emails automatically.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Medical Diagnosis Example<\/h2>\n\n\n\n<p>Consider a disease detection system.<\/p>\n\n\n\n<p>Suppose:<\/p>\n\n\n\n<p><strong>P(disease) = 0.02<\/strong><\/p>\n\n\n\n<p>meaning <strong>2% of the population has the disease<\/strong>.<\/p>\n\n\n\n<p>If a patient tests positive, the system calculates:<\/p>\n\n\n\n<p><strong>P(disease | positive test)<\/strong><\/p>\n\n\n\n<p>This probability helps doctors evaluate the likelihood that the patient actually has the disease.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>AI systems use probability models to estimate risks, detect patterns, and support decision making in uncertain environments.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">7. Bayesian Thinking in Artificial Intelligence<\/h2>\n\n\n\n<p>Bayesian reasoning is a powerful approach used by AI systems to update beliefs when new information becomes available.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Bayesian thinking allows AI systems to revise probabilities as new evidence is observed.<\/p>\n<\/blockquote>\n\n\n\n<p>Instead of treating probabilities as fixed values, Bayesian models continuously update their estimates as additional data arrives.<\/p>\n\n\n\n<p>This approach is widely used in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>spam filtering<\/li>\n\n\n\n<li>recommendation systems<\/li>\n\n\n\n<li>robotics<\/li>\n\n\n\n<li>medical diagnosis<\/li>\n\n\n\n<li>natural language processing<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7.1 Updating Beliefs with Data<\/h3>\n\n\n\n<p>Suppose an AI system initially estimates:<\/p>\n\n\n\n<p><strong>P(spam) = 0.40<\/strong><\/p>\n\n\n\n<p>This estimate represents the system&#8217;s <strong>initial belief<\/strong> about the likelihood of spam messages.<\/p>\n\n\n\n<p>After observing new data, such as the presence of certain words, the system updates its belief.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Bayesian learning updates probabilities as new evidence becomes available.<\/p>\n<\/blockquote>\n\n\n\n<p>For example, if the message contains the word <strong>\u201cfree\u201d<\/strong>, the probability that the message is spam may increase.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">7.2 Prior and Posterior Probabilities<\/h3>\n\n\n\n<p>Bayesian reasoning distinguishes between two types of probabilities.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Prior Probability<\/h4>\n\n\n\n<p>The <strong>prior probability<\/strong> represents the system&#8217;s belief before observing new data.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p><strong>P(spam) = 0.40<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Posterior Probability<\/h4>\n\n\n\n<p>The <strong>posterior probability<\/strong> represents the updated belief after incorporating new evidence.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p><strong>P(spam | message contains &#8220;free&#8221;)<\/strong><\/p>\n\n\n\n<p>This updated probability reflects the influence of the observed evidence.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"279\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?resize=640%2C279&#038;ssl=1\" alt=\"\" class=\"wp-image-42798\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?resize=1024%2C446&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?resize=300%2C131&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?resize=768%2C334&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?resize=604%2C263&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?w=1399&amp;ssl=1 1399w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/baysian.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Working of Bayes Rule<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Bayes&#8217; Rule<\/h4>\n\n\n\n<p>Bayesian updating uses <strong>Bayes&#8217; theorem<\/strong>:<\/p>\n\n\n\n<p><strong>P(A | B) = [ P(B | A) \u00d7 P(A) ] \/ P(B)<\/strong><\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>P(A | B) = posterior probability<\/li>\n\n\n\n<li>P(B | A) = likelihood<\/li>\n\n\n\n<li>P(A) = prior probability<\/li>\n\n\n\n<li>P(B) = evidence probability<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Bayesian reasoning enables AI systems to learn from evidence and refine their predictions over time.<\/p>\n<\/blockquote>\n\n\n\n<h1 class=\"wp-block-heading\">Part III \u2014 Understanding Data Through Statistics<\/h1>\n\n\n\n<p>Artificial Intelligence systems learn patterns from data. However, before models can learn effectively, it is necessary to understand the structure, distribution, and relationships within the data itself.<\/p>\n\n\n\n<p>Statistics provides the mathematical tools that allow AI systems to summarize datasets, detect patterns, and evaluate relationships between variables.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Statistics helps AI systems understand how data is distributed, how variables relate to each other, and how reliable patterns can be extracted from datasets.<\/p>\n<\/blockquote>\n\n\n\n<p>Through statistical analysis, AI models can identify trends, detect anomalies, and prepare data for machine learning algorithms.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">8. Describing Data in AI Systems<\/h2>\n\n\n\n<p>Descriptive statistics provides simple numerical summaries that help AI systems understand the characteristics of a dataset.<\/p>\n\n\n\n<p>These measures allow us to answer questions such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is the average value of the data?<\/li>\n\n\n\n<li>How spread out are the observations?<\/li>\n\n\n\n<li>Do variables influence each other?<\/li>\n<\/ul>\n\n\n\n<p>Such insights are essential before training machine learning models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8.1 Mean and Data Distribution<\/h3>\n\n\n\n<p>One of the most commonly used statistical measures is the <strong>mean<\/strong>, which represents the average value of a dataset.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"318\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/mean.png?resize=640%2C318&#038;ssl=1\" alt=\"\" class=\"wp-image-42799\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/mean.png?resize=1024%2C508&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/mean.png?resize=300%2C149&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/mean.png?resize=768%2C381&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/mean.png?resize=545%2C270&amp;ssl=1 545w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/mean.png?w=1073&amp;ssl=1 1073w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Mean<\/figcaption><\/figure>\n\n\n\n<p>The mean is calculated as:<\/p>\n\n\n\n<p>mean = (x\u2081 + x\u2082 + &#8230; + x\u2099) \/ n<\/p>\n\n\n\n<p>where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>x\u2081, x\u2082, &#8230;, x\u2099 represent the data values<\/li>\n\n\n\n<li>n represents the number of observations<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The mean summarizes the central value of a dataset and helps AI systems understand typical behavior within the data.<\/p>\n<\/blockquote>\n\n\n\n<p>Example: student exam scores<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Student<\/th><th>Score<\/th><\/tr><\/thead><tbody><tr><td>A<\/td><td>75<\/td><\/tr><tr><td>B<\/td><td>80<\/td><\/tr><tr><td>C<\/td><td>90<\/td><\/tr><tr><td>D<\/td><td>85<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Mean score:<\/p>\n\n\n\n<p>mean = (75 + 80 + 90 + 85) \/ 4<br>mean = 82.5<\/p>\n\n\n\n<p>This value represents the <strong>average performance of the class<\/strong>.<\/p>\n\n\n\n<p>Understanding the average helps AI systems establish a baseline for identifying patterns.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8.2 Variance and Data Spread<\/h3>\n\n\n\n<p>While the mean tells us the central value of the data, it does not indicate how widely the values are distributed.<\/p>\n\n\n\n<p>Variance measures how much the data points deviate from the mean.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"378\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/distribution.png?resize=640%2C378&#038;ssl=1\" alt=\"\" class=\"wp-image-42800\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/distribution.png?resize=1024%2C605&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/distribution.png?resize=300%2C177&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/distribution.png?resize=768%2C454&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/distribution.png?resize=457%2C270&amp;ssl=1 457w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/distribution.png?w=1235&amp;ssl=1 1235w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Distribution<\/figcaption><\/figure>\n\n\n\n<p>Variance is calculated as:<\/p>\n\n\n\n<p>variance = \u03a3(x\u1d62 \u2212 mean)\u00b2 \/ n<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Variance measures the spread of data around the average value.<\/p>\n<\/blockquote>\n\n\n\n<p>If the variance is small, most values are close to the mean.<\/p>\n\n\n\n<p>If the variance is large, the data values are more widely distributed.<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p>Dataset A: 78, 80, 82, 84<br>Dataset B: 50, 70, 90, 110<\/p>\n\n\n\n<p>Both datasets may have similar means, but Dataset B has a much larger spread.<\/p>\n\n\n\n<p>Understanding variance helps AI models detect <strong>data variability and potential noise in the dataset<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8.3 Correlation Between Variables<\/h3>\n\n\n\n<p>In many AI applications, multiple variables influence each other. Correlation measures the strength of the relationship between two variables.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"234\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?resize=640%2C234&#038;ssl=1\" alt=\"\" class=\"wp-image-42801\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?resize=1024%2C375&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?resize=300%2C110&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?resize=768%2C281&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?resize=604%2C221&amp;ssl=1 604w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?w=1382&amp;ssl=1 1382w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/corelation.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">correlation<\/figcaption><\/figure>\n\n\n\n<p>The correlation coefficient is commonly represented as:<\/p>\n\n\n\n<p>r = covariance(x, y) \/ (\u03c3\u2093 \u03c3\u1d67)<\/p>\n\n\n\n<p>Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u03c3\u2093 and \u03c3\u1d67 represent the standard deviations of variables x and y<\/li>\n<\/ul>\n\n\n\n<p>The value of correlation ranges from:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>+1 \u2192 strong positive relationship<\/strong><\/li>\n\n\n\n<li><strong>0 \u2192 no relationship<\/strong><\/li>\n\n\n\n<li><strong>\u22121 \u2192 strong negative relationship<\/strong><\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Correlation helps AI systems identify whether two variables move together.<\/p>\n<\/blockquote>\n\n\n\n<p>Example:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Study Hours<\/th><th>Exam Score<\/th><\/tr><\/thead><tbody><tr><td>2<\/td><td>60<\/td><\/tr><tr><td>4<\/td><td>70<\/td><\/tr><tr><td>6<\/td><td>80<\/td><\/tr><tr><td>8<\/td><td>90<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Here we observe a <strong>positive correlation<\/strong>: as study hours increase, exam scores also increase.<\/p>\n\n\n\n<p>AI systems use such relationships when building predictive models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8.4 Example: Understanding Patterns in Datasets<\/h3>\n\n\n\n<p>Consider a dataset describing house prices.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>House Size (sq ft)<\/th><th>Bedrooms<\/th><th>Price ($1000s)<\/th><\/tr><\/thead><tbody><tr><td>1500<\/td><td>2<\/td><td>220<\/td><\/tr><tr><td>1800<\/td><td>3<\/td><td>260<\/td><\/tr><tr><td>2200<\/td><td>3<\/td><td>310<\/td><\/tr><tr><td>2600<\/td><td>4<\/td><td>370<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Statistical analysis may reveal several patterns:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the <strong>average house size<\/strong><\/li>\n\n\n\n<li>the <strong>spread of prices<\/strong><\/li>\n\n\n\n<li>the <strong>relationship between house size and price<\/strong><\/li>\n<\/ul>\n\n\n\n<p>If house size and price show strong correlation, a machine learning model can use size to predict house prices.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Statistical analysis helps AI systems detect patterns and relationships before training predictive models.<\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">9. Probability Distributions in AI<\/h2>\n\n\n\n<p>Probability distributions describe how values in a dataset are spread across different ranges.<\/p>\n\n\n\n<p>Many AI algorithms assume that data follows certain probability distributions, which helps models estimate likelihoods and detect patterns.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A probability distribution describes how frequently different values occur in a dataset.<\/p>\n<\/blockquote>\n\n\n\n<p>Understanding distributions allows AI systems to model real-world uncertainty more accurately.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">9.1 Discrete and Continuous Distributions<\/h3>\n\n\n\n<p>Probability distributions can be classified into two main types.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"334\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?resize=640%2C334&#038;ssl=1\" alt=\"\" class=\"wp-image-42803\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?resize=1024%2C534&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?resize=300%2C156&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?resize=768%2C400&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?resize=518%2C270&amp;ssl=1 518w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?w=1427&amp;ssl=1 1427w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/discreate-and-continuous-1.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Discrete and Continuous Distribution<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Discrete Distributions<\/h4>\n\n\n\n<p>Discrete distributions describe variables that take <strong>countable values<\/strong>.<\/p>\n\n\n\n<p>Examples include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>number of emails received<\/li>\n\n\n\n<li>number of website visits<\/li>\n\n\n\n<li>number of customers in a a store<\/li>\n<\/ul>\n\n\n\n<p>Example variable:<\/p>\n\n\n\n<p>X = number of spam emails received today<\/p>\n\n\n\n<p>Possible values:<\/p>\n\n\n\n<p>0, 1, 2, 3, &#8230;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Continuous Distributions<\/h4>\n\n\n\n<p>Continuous distributions describe variables that can take <strong>any value within a range<\/strong>.<\/p>\n\n\n\n<p>Examples include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>temperature<\/li>\n\n\n\n<li>height<\/li>\n\n\n\n<li>time<\/li>\n\n\n\n<li>speed<\/li>\n<\/ul>\n\n\n\n<p>Example variable:<\/p>\n\n\n\n<p>T = temperature<\/p>\n\n\n\n<p>Possible values:<\/p>\n\n\n\n<p>21.3\u00b0C, 21.35\u00b0C, 21.351\u00b0C<\/p>\n\n\n\n<p>These values form a continuous range.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">9.2 Gaussian Distribution<\/h3>\n\n\n\n<p>One of the most important distributions used in AI is the <strong>Gaussian distribution<\/strong>, also called the <strong>Normal distribution<\/strong>.<\/p>\n\n\n\n<p>The Gaussian distribution is characterized by a <strong>bell-shaped curve<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"334\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?resize=640%2C334&#038;ssl=1\" alt=\"\" class=\"wp-image-42804\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?resize=1024%2C534&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?resize=300%2C156&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?resize=768%2C401&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?resize=518%2C270&amp;ssl=1 518w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?w=1342&amp;ssl=1 1342w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gaussian.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Gaussian Distribution<\/figcaption><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>The Gaussian distribution describes data that clusters around a central average value with symmetrical variation on both sides.<\/p>\n<\/blockquote>\n\n\n\n<p>The probability density function of the Gaussian distribution is written as:<\/p>\n\n\n\n<p>f(x) = (1 \/ \u221a(2\u03c0\u03c3\u00b2)) \u00b7 e^(-(x \u2212 \u03bc)\u00b2 \/ (2\u03c3\u00b2))<\/p>\n\n\n\n<p>where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u03bc represents the mean<\/li>\n\n\n\n<li>\u03c3 represents the standard deviation<\/li>\n<\/ul>\n\n\n\n<p>Many real-world phenomena approximately follow a Gaussian distribution, including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>measurement errors<\/li>\n\n\n\n<li>human heights<\/li>\n\n\n\n<li>sensor noise<\/li>\n<\/ul>\n\n\n\n<p>Because of this property, Gaussian distributions are widely used in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>machine learning models<\/li>\n\n\n\n<li>anomaly detection<\/li>\n\n\n\n<li>probabilistic reasoning systems<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Understanding probability distributions allows AI systems to model uncertainty and interpret patterns within real-world datasets.<\/p>\n\n\n\n<p><\/p>\n<\/blockquote>\n\n\n\n<h1 class=\"wp-block-heading\">Part IV \u2014 Learning Through Mathematical Optimization<\/h1>\n\n\n\n<p>Artificial Intelligence systems do not simply handle data; they learn patterns from it. This learning process is achieved through mathematical optimization, where models adjust their internal parameters to improve predictions. Machine learning models start with initial guesses about how inputs relate to outputs. Through repeated evaluation and adjustment, these models gradually improve their performance. Mathematical optimization enables AI systems to learn from data by adjusting model parameters to reduce prediction errors.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">10. Functions in Machine Learning<\/h2>\n\n\n\n<p>Machine learning models can be understood as mathematical functions that transform inputs into outputs.<\/p>\n\n\n\n<p>A function defines a relationship between variables. In machine learning, the function represents the rule that maps input data to predicted outcomes.<\/p>\n\n\n\n<p>For example:<\/p>\n\n\n\n<p>f(x) = y<\/p>\n\n\n\n<p>This expression means that the function <strong>f<\/strong> transforms the input <strong>x<\/strong> into the output <strong>y<\/strong>.<\/p>\n\n\n\n<p>In AI systems, this function may represent a prediction model, a classifier, or a decision-making algorithm.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">10.1 Models as Mathematical Functions<\/h3>\n\n\n\n<p>In machine learning, a model can be viewed as a function that takes input data and produces predictions.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A machine learning model is essentially a mathematical function that maps input features to predicted outputs.<\/p>\n<\/blockquote>\n\n\n\n<p>For example, a simple house price prediction model may use the function:<\/p>\n\n\n\n<p>price = f(size)<\/p>\n\n\n\n<p>If the model learns that larger houses tend to have higher prices, the function might behave as:<\/p>\n\n\n\n<p>price = 120 \u00d7 size<\/p>\n\n\n\n<p>This function describes the relationship between house size and price.<\/p>\n\n\n\n<p>More complex models may use many variables, such as:<\/p>\n\n\n\n<p>price = f(size, bedrooms, location)<\/p>\n\n\n\n<p>The learning process involves discovering the mathematical relationship that best fits the data.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">10.2 Input\u2013Output Relationships in AI Systems<\/h3>\n\n\n\n<p>Machine learning models learn patterns by analyzing relationships between inputs and outputs.<\/p>\n\n\n\n<p>For example, consider a dataset describing house prices:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Size (sq ft)<\/th><th>Bedrooms<\/th><th>Price ($1000s)<\/th><\/tr><\/thead><tbody><tr><td>1500<\/td><td>2<\/td><td>220<\/td><\/tr><tr><td>1800<\/td><td>3<\/td><td>260<\/td><\/tr><tr><td>2200<\/td><td>3<\/td><td>310<\/td><\/tr><tr><td>2600<\/td><td>4<\/td><td>370<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>The model learns how input variables such as <strong>size and number of bedrooms<\/strong> influence the output variable <strong>price<\/strong>.<\/p>\n\n\n\n<p>This relationship can be expressed as:<\/p>\n\n\n\n<p>price = f(size, bedrooms)<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Learning in AI involves discovering mathematical relationships between input features and target outputs.<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">11. Derivatives and Learning Algorithms<\/h3>\n\n\n\n<p>To improve predictions, AI models must determine how changes in parameters affect their output. This is where derivatives become important.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"376\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?resize=640%2C376&#038;ssl=1\" alt=\"\" class=\"wp-image-42806\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?resize=1024%2C601&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?resize=300%2C176&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?resize=768%2C451&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?resize=460%2C270&amp;ssl=1 460w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?w=1379&amp;ssl=1 1379w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Derivatives.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Derivatives in Machine Learning<\/figcaption><\/figure>\n\n\n\n<p>Derivatives measure how a function changes when its inputs change.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A derivative measures the rate at which a function changes with respect to its input.<\/p>\n<\/blockquote>\n\n\n\n<p>Derivatives allow AI algorithms to determine whether model predictions should increase or decrease in order to reduce errors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11.1 Derivatives and Rate of Change<\/h3>\n\n\n\n<p>Consider the function:<\/p>\n\n\n\n<p>y = x\u00b2<\/p>\n\n\n\n<p>The derivative of this function describes how quickly <strong>y<\/strong> changes when <strong>x<\/strong> changes.<\/p>\n\n\n\n<p>The derivative is written as:<\/p>\n\n\n\n<p>dy\/dx = 2x<\/p>\n\n\n\n<p>If <strong>x increases<\/strong>, the value of <strong>y<\/strong> changes according to this rate.<\/p>\n\n\n\n<p>In machine learning, derivatives help determine how model parameters influence prediction errors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11.2 Gradients in Machine Learning<\/h3>\n\n\n\n<p>When models have multiple parameters, derivatives are combined into a structure called a <strong>gradient<\/strong>.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A gradient is a vector containing the partial derivatives of a function with respect to its parameters.<\/p>\n<\/blockquote>\n\n\n\n<p>The gradient indicates the direction in which the model parameters should move to reduce error.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"381\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?resize=640%2C381&#038;ssl=1\" alt=\"\" class=\"wp-image-42807\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?resize=1024%2C610&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?resize=300%2C179&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?resize=768%2C457&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?resize=453%2C270&amp;ssl=1 453w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?w=1476&amp;ssl=1 1476w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradiant.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Gradient in Machine Learning<\/figcaption><\/figure>\n\n\n\n<p>Example parameter vector:<\/p>\n\n\n\n<p>w = [w\u2081, w\u2082, w\u2083]<\/p>\n\n\n\n<p>The gradient shows how each parameter influences the loss function.<\/p>\n\n\n\n<p>Gradients are essential for training neural networks and other machine learning models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">11.3 Example: Improving Model Predictions<\/h3>\n\n\n\n<p>Suppose a model predicts exam scores using study hours.<\/p>\n\n\n\n<p>Prediction function:<\/p>\n\n\n\n<p>score = w \u00d7 hours<\/p>\n\n\n\n<p>If the predicted score is too low, the learning algorithm adjusts <strong>w<\/strong> to improve accuracy.<\/p>\n\n\n\n<p>Derivatives help determine:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>whether the parameter should increase or decrease<\/li>\n\n\n\n<li>how much the parameter should change<\/li>\n<\/ul>\n\n\n\n<p>Through repeated adjustments, the model gradually improves its predictions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">12. Optimization in Artificial Intelligence<\/h2>\n\n\n\n<p>Optimization is the process of finding the best parameters that allow a model to make accurate predictions.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Optimization algorithms search for parameter values that minimize prediction error.<\/p>\n<\/blockquote>\n\n\n\n<p>During training, models evaluate how far their predictions deviate from actual outcomes.<\/p>\n\n\n\n<p>This deviation is measured using a <strong>loss function<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12.1 Objective Functions and Loss Functions<\/h3>\n\n\n\n<p>An objective function defines the goal of the learning process.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"329\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?resize=640%2C329&#038;ssl=1\" alt=\"\" class=\"wp-image-42810\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?resize=1024%2C526&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?resize=300%2C154&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?resize=768%2C395&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?resize=526%2C270&amp;ssl=1 526w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?w=1536&amp;ssl=1 1536w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Objective-and-loss-function.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Objective and Loss Functions<\/figcaption><\/figure>\n\n\n\n<p>In machine learning, this objective is usually to <strong>minimize prediction error<\/strong>.<\/p>\n\n\n\n<p>A commonly used loss function is the <strong>Mean Squared Error (MSE)<\/strong>:<\/p>\n\n\n\n<p>MSE = (1\/n) \u03a3 (y\u1d62 \u2212 \u0177\u1d62)\u00b2<\/p>\n\n\n\n<p>where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>y\u1d62 represents the true value<\/li>\n\n\n\n<li>\u0177\u1d62 represents the predicted value<\/li>\n<\/ul>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Loss functions measure how far model predictions deviate from actual data.<\/p>\n<\/blockquote>\n\n\n\n<p>Lower loss values indicate better model performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12.2 Gradient Descent<\/h3>\n\n\n\n<p>Gradient descent is one of the most widely used optimization algorithms in machine learning.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"640\" height=\"345\" src=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?resize=640%2C345&#038;ssl=1\" alt=\"\" class=\"wp-image-42812\" srcset=\"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?resize=1024%2C552&amp;ssl=1 1024w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?resize=300%2C162&amp;ssl=1 300w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?resize=768%2C414&amp;ssl=1 768w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?resize=501%2C270&amp;ssl=1 501w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?w=1427&amp;ssl=1 1427w, https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Gradient-Decent.png?w=1280&amp;ssl=1 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Gradient Descent<\/figcaption><\/figure>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Gradient descent improves model parameters by moving in the direction that reduces prediction error.<\/p>\n<\/blockquote>\n\n\n\n<p>The algorithm updates parameters using the rule:<\/p>\n\n\n\n<p>w_new = w_old \u2212 \u03b1 \u00d7 gradient<\/p>\n\n\n\n<p>where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u03b1 represents the learning rate<\/li>\n\n\n\n<li>gradient indicates the direction of steepest increase in error<\/li>\n<\/ul>\n\n\n\n<p>By moving in the opposite direction, the algorithm reduces the loss.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">12.3 Iterative Learning in AI Systems<\/h3>\n\n\n\n<p>Machine learning models learn through repeated updates.<\/p>\n\n\n\n<p>The training process typically follows these steps:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The model makes predictions.<\/li>\n\n\n\n<li>The loss function measures prediction error.<\/li>\n\n\n\n<li>Gradients are computed.<\/li>\n\n\n\n<li>Model parameters are updated.<\/li>\n<\/ol>\n\n\n\n<p>This cycle repeats many times until the model reaches a stable solution.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>AI systems learn through iterative optimization, gradually adjusting parameters until prediction errors are minimized.<\/p>\n<\/blockquote>\n\n\n\n<p>Through mathematical optimization, AI models transform raw data into accurate predictive systems used in applications such as image recognition, recommendation systems, and autonomous vehicles.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-canva wp-block-embed-canva\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"Maths and Data Representation for AI\" src=\"https:\/\/www.canva.com\/design\/DAHDEM40wEY\/A2a67fpjg9yNBKLrpmfXXQ\/view?embed&amp;meta\" height=\"360\" width=\"640\" style=\"border: none; border-radius: 8px; width: 640px; height: 360px;\" allowfullscreen=\"allowfullscreen\" allow=\"fullscreen\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code><br><br><\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Artificial Intelligence systems appear intelligent because they can recognize patterns, make predictions, and support decision-making. However, AI does not directly understand images, speech, language, or human behavior. Instead, these forms of information must first be converted into numerical representations that machines can process. Mathematics as the Language of Artificial Intelligence Mathematics therefore becomes the fundamental language through which AI systems operate. Every AI model stores data, transforms it, compares patterns, and improves predictions through mathematical operations. Artificial Intelligence relies on&#8230;<\/p>\n<p class=\"read-more\"><a class=\"btn btn-default\" href=\"https:\/\/afzalbadshah.com\/index.php\/2026\/03\/14\/mathematical-foundations-and-data-representation-in-artificial-intelligence\/\"> Read More<span class=\"screen-reader-text\">  Read More<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":42818,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"enabled":false},"version":2}},"categories":[742],"tags":[267,748,749],"class_list":["post-42761","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","tag-ai","tag-ai-mathematics","tag-ai-statistics"],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/afzalbadshah.com\/wp-content\/uploads\/2026\/03\/Maths-and-Data-Representation-for-AI.jpg?fit=1920%2C1080&ssl=1","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pf3emP-b7H","jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/42761","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/comments?post=42761"}],"version-history":[{"count":22,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/42761\/revisions"}],"predecessor-version":[{"id":42820,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/posts\/42761\/revisions\/42820"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/media\/42818"}],"wp:attachment":[{"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/media?parent=42761"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/categories?post=42761"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/afzalbadshah.com\/index.php\/wp-json\/wp\/v2\/tags?post=42761"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}