What is a Perceptron?

A perceptron is the simplest kind of neural network unit.
It takes some input values, multiplies each one by a weight,
adds them together, and then passes the result through an
activation function to decide the output.

output = activation( w₁·x₁ + w₂·x₂ + b )

It’s basically a tiny decision-maker. If the combined input is strong enough, it “activates.” If not, it stays inactive.

x₁ x₂ w₁ w₂ Σ Activation Function Step Function Output

A Simple Real-Life Example

Imagine you're deciding whether to go outside. You think about two things:

Your brain assigns importance (weights):

  • w₁ = 0.8 (sun matters a lot)
  • w₂ = 0.5 (free time matters moderately)

Then your brain combines the information:

weighted_sum = (w₁·x₁) + (w₂·x₂)
weighted_sum = (0.8 × x₁) + (0.5 × x₂)

Now apply a step function:

If weighted_sum ≥ 1 → Go Outside 😄
Else → Stay Home 😴

Example Scenario:
It’s sunny (x₁ = 1), but you don’t have free time (x₂ = 0).

weighted_sum = (0.8 × 1) + (0.5 × 0) = 0.8
0.8 < 1 → Stay Home

What is Forward Propagation?

Forward propagation is the process of passing the input values forward through the network to get the final output. Each neuron multiplies the input by weights, adds a bias, applies an activation function, and sends the result onward.

Input → Weights → Bias → Activation → Output

In simple words: It's the network's "thinking". It takes the inputs and calculates what it believes the answer should be.

Step 1: Inputs Enter the Network

Input Layer Hidden Layer iq cgpa

In this step, our network receives two input values: IQ and CGPA. These inputs are passed forward along the connections into the neurons in the hidden layer. The softly glowing lines represent the signal traveling forward through the network.

Step 2: Weighted Inputs and Summation

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ

Each connection has a weight, shown as W¹₁₁, W¹₁₂, W¹₂₁, W¹₂₂. Each hidden neuron multiplies the input values with their corresponding weights and sums them:

Σ ( Wᵀ × X + b )

Activation functions allow the network to learn non-linear patterns, which is what makes neural networks powerful.

In short, the hidden neurons learn features from the input data. They do not make the final decision — they simply transform the information so that the output neuron can make a clearer decision later.

Step 3: Outputs from Hidden Layer Move to Output Neuron

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ Output Layer Σ Σ b₁₁ b₂₂

Now, each hidden neuron has produced an output. These outputs are calculated as the weighted combination of iq and cgpa, plus a bias term for each neuron.

Neuron 1 Output = activation( W¹₁₁ · iq + W¹₂₁ · cgpa + b₁₁ )
Neuron 2 Output = activation( W¹₁₂ · iq + W¹₂₂ · cgpa + b₂₂ )

These two outputs now flow forward to the final neuron in the output layer.

Step 4: Final Output Calculation

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ Output Layer b₁₁ b₂₂ Σ W²₁₁ W²₂₁ b₂₁ ŷ

The outputs from the two hidden neurons are now combined in the final output neuron. Just like before, the weighted values and a bias term are added:

ŷ = activation( W²₁₁ · O₁₁ + W²₂₁ · O₁₂ + b₂₁ )

Here:
O₁₁ and O₁₂ are the outputs from the two hidden neurons.
W²₁₁ and W²₂₁ are the weights from the hidden layer to the output neuron.
b₂₁ is the bias for the output neuron.

If the problem is a Yes/No decision (like “Placement or Not”), we apply a Step Function or Sigmoid at the output neuron:

▼ Show Mathematical Derivation

The forward propagation can also be expressed using matrix multiplication.

Notation (Single Example)

Input vector:

$$ x= \begin{bmatrix} \text{iq}\\ \text{cgpa} \end{bmatrix} $$

Layer-1 (Hidden Layer) Weights and Biases:

$$ W^{(1)}= \begin{bmatrix} W^{1}_{11} & W^{1}_{12}\\ W^{1}_{21} & W^{1}_{22} \end{bmatrix}, \qquad b^{(1)}= \begin{bmatrix} b_{11}\\ b_{12} \end{bmatrix} $$

Layer-2 (Output Layer) Weights and Bias:

$$ W^{(2)}= \begin{bmatrix} W^{2}_{11}\\ W^{2}_{21} \end{bmatrix}, \qquad b^{(2)} = b_{21} $$

\(a(\cdot)\): Activation function in hidden layer (ReLU / Sigmoid / Tanh)
\(g(\cdot)\): Activation function in output layer (Step or Sigmoid)


Layer 1 — Hidden Layer Computation

Component-wise:

$$ z^{(1)}_1 = W^{1}_{11}\cdot \text{iq} + W^{1}_{21}\cdot \text{cgpa} + b_{11} $$ $$ z^{(1)}_2 = W^{1}_{12}\cdot \text{iq} + W^{1}_{22}\cdot \text{cgpa} + b_{12} $$

Matrix Form:

$$ z^{(1)} = (W^{(1)})^{T} x + b^{(1)} $$ $$ z^{(1)} = \begin{bmatrix} W^{1}_{11} & W^{1}_{12}\\ W^{1}_{21} & W^{1}_{22} \end{bmatrix}^{T} \begin{bmatrix} \text{iq}\\ \text{cgpa} \end{bmatrix} + \begin{bmatrix} b_{11}\\ b_{12} \end{bmatrix} $$ $$ z^{(1)} = \begin{bmatrix} W^{1}_{11} & W^{1}_{21}\\ W^{1}_{12} & W^{1}_{22} \end{bmatrix} \begin{bmatrix} \text{iq}\\ \text{cgpa} \end{bmatrix} + \begin{bmatrix} b_{11}\\ b_{12} \end{bmatrix} $$ $$ z^{(1)} = \begin{bmatrix} W^{1}_{11}\cdot \text{iq} + W^{1}_{21}\cdot \text{cgpa} + b_{11}\\ W^{1}_{12}\cdot \text{iq} + W^{1}_{22}\cdot \text{cgpa} + b_{12} \end{bmatrix} $$

Apply activation to get hidden outputs:

$$ h = a(z^{(1)}) = \begin{bmatrix} o_{11}\\ o_{12} \end{bmatrix} $$

Layer 2 — Output Layer Computation

Matrix Form:

$$ z^{(2)} = (W^{(2)})^{T} h + b^{(2)} $$ $$ z^{(2)} = \begin{bmatrix} W^{2}_{11} & W^{2}_{21} \end{bmatrix} \begin{bmatrix} o_{11}\\ o_{12} \end{bmatrix} + b_{21} $$

Scalar Form:

$$ z^{(2)} = W^{2}_{11}\cdot o_{11} + W^{2}_{21}\cdot o_{12} + b_{21} $$

Final Prediction:

$$ \hat{y} = g(z^{(2)}) $$

If the task is a Yes/No decision (e.g., Placement or Not):
Use Step Function.

If the task requires a probability (e.g., likelihood of placement):
Use Sigmoid.

What is Backpropagation?

Backward propagation — or backpropagation — is how a neural network learns from its mistakes.

When the network makes a wrong prediction, it doesn’t just say “oops.” It traces the error backward through all the layers to see which weights and connections caused the mistake, and then adjusts them a little to do better next time.

You can think of it like this:

        => Forward propagation is “thinking.”
        => Backward propagation is “learning from being wrong.”
    

Scene 1 — Loss Computation

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ Output Layer b₁₁ b₂₂ Σ W²₁₁ W²₂₁ b₂₁ ŷ

The network made a prediction ŷ and we compare it with the true value y. The difference between them is the loss, which tells us how wrong the prediction was. To Calculate the loss we need to choose a loss function for eg., MSE(y - ŷ)².

True Label (y): 1

Predicted (ŷ): 0.42

Loss increases as the prediction moves away from the true value.

Scene 2 — Backward Pass (Error Flow)

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ Output Layer b₁₁ b₂₂ Σ W²₁₁ W²₂₁ b₂₁ ŷ

-> We want to reduce the Loss = (y − ŷ)².
-> To reduce this loss, the network must adjust ŷ.
-> But ŷ depends on the outputs of the two hidden neurons (O₁₁ and O₁₂).

-> And these hidden neuron outputs depend on their weights and biases:

  • O₁₁ depends on W¹₁₁, W¹₂₁ and b₁₁
  • O₁₂ depends on W¹₁₂, W¹₂₂ and b₁₂
So the error first flows backward from the output neuron to the hidden neurons. Each hidden neuron receives a portion of the blame proportional to how much its output influenced the final prediction (through its weights).

Scene 3 — Weights & Bias Update By Gradient Descent

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ Output Layer b₁₁ b₂₂ Σ W²₁₁ W²₂₁ b₂₁ ŷ

To update a weight, the network checks two things: (1) How large the error was, and (2) How strong the input signal on that connection was. This gives the gradient, which tells the network how much that weight affected the mistake.

∂L/∂W = (error signal) × (activation from previous neuron)

The connection is continuously adjusting — the color shift shows the weight learning over time.

Gradient formula for Weights & Biase:

Wnew = Wold − η · ( ∂L / ∂Wold )

bnew = bold − η · ( ∂L / ∂bold )

▼ Show Mathematical Derivation

Scene 4 — Training Over Time

Input Layer Hidden Layer iq cgpa W¹₁₁ W¹₁₂ W¹₂₁ W¹₂₂ Σ Σ Output Layer b₁₁ b₂₂ Σ W²₁₁ W²₂₁ b₂₁ ŷ

Loss: 1.000

The network learns by repeating this cycle:

Forward Pass → Loss → Backward Pass → Weight Update

Over many iterations, the loss becomes smaller. This means the model’s predictions are getting closer to the real target.