Neural Network Visualizer

What is Forward Propagation?

Forward propagation is the process of passing the input values forward through the network to get the final output. Each neuron multiplies the input by weights, adds a bias, applies an activation function, and sends the result onward.

Input → Weights → Bias → Activation → Output

In simple words: It's the network's "thinking". It takes the inputs and calculates what it believes the answer should be.

Step 1: Inputs Enter the Network

In this step, our network receives two input values: IQ and CGPA. These inputs are passed forward along the connections into the neurons in the hidden layer. The softly glowing lines represent the signal traveling forward through the network.

Step 2: Weighted Inputs and Summation

Each connection has a weight, shown as W¹₁₁, W¹₁₂, W¹₂₁, W¹₂₂. Each hidden neuron multiplies the input values with their corresponding weights and sums them:

Σ ( Wᵀ × X + b )

Activation functions allow the network to learn non-linear patterns, which is what makes neural networks powerful.

ReLU (Rectified Linear Unit) — Passes only positive signals. Simplifies learning. Most common in hidden layers.
Sigmoid — Squashes values between 0 and 1. Useful when signals represent probabilities.
Tanh — Squashes values between -1 and 1. Data is centered around zero, which can help learning.

In short, the hidden neurons learn features from the input data. They do not make the final decision — they simply transform the information so that the output neuron can make a clearer decision later.

Step 3: Outputs from Hidden Layer Move to Output Neuron

Now, each hidden neuron has produced an output. These outputs are calculated as the weighted combination of iq and cgpa, plus a bias term for each neuron.

Neuron 1 Output = activation( W¹₁₁ · iq + W¹₂₁ · cgpa + b₁₁ )
Neuron 2 Output = activation( W¹₁₂ · iq + W¹₂₂ · cgpa + b₂₂ )

These two outputs now flow forward to the final neuron in the output layer.

Step 4: Final Output Calculation

The outputs from the two hidden neurons are now combined in the final output neuron. Just like before, the weighted values and a bias term are added:

ŷ = activation( W²₁₁ · O₁₁ + W²₂₁ · O₁₂ + b₂₁ )

Here:
O₁₁ and O₁₂ are the outputs from the two hidden neurons.
W²₁₁ and W²₂₁ are the weights from the hidden layer to the output neuron.
b₂₁ is the bias for the output neuron.

If the problem is a Yes/No decision (like “Placement or Not”), we apply a Step Function or Sigmoid at the output neuron:

Step Function → If we need a hard final decision (0 or 1).
Sigmoid → If we want a probability like “there is an 82% chance of placement”.

▼ Show Mathematical Derivation

The forward propagation can also be expressed using matrix multiplication.

Notation (Single Example)

Input vector:

$$ x= \begin{bmatrix} \text{iq}\\ \text{cgpa} \end{bmatrix} $$

Layer-1 (Hidden Layer) Weights and Biases:

$$ W^{(1)}= \begin{bmatrix} W^{1}_{11} & W^{1}_{12}\\ W^{1}_{21} & W^{1}_{22} \end{bmatrix}, \qquad b^{(1)}= \begin{bmatrix} b_{11}\\ b_{12} \end{bmatrix} $$

Layer-2 (Output Layer) Weights and Bias:

$$ W^{(2)}= \begin{bmatrix} W^{2}_{11}\\ W^{2}_{21} \end{bmatrix}, \qquad b^{(2)} = b_{21} $$

$a(\cdot)$: Activation function in hidden layer (ReLU / Sigmoid / Tanh)
$g(\cdot)$: Activation function in output layer (Step or Sigmoid)

Layer 1 — Hidden Layer Computation

Component-wise:

$$ z^{(1)}_1 = W^{1}_{11}\cdot \text{iq} + W^{1}_{21}\cdot \text{cgpa} + b_{11} $$ $$ z^{(1)}_2 = W^{1}_{12}\cdot \text{iq} + W^{1}_{22}\cdot \text{cgpa} + b_{12} $$

Matrix Form:

$$ z^{(1)} = (W^{(1)})^{T} x + b^{(1)} $$ $$ z^{(1)} = \begin{bmatrix} W^{1}_{11} & W^{1}_{12}\\ W^{1}_{21} & W^{1}_{22} \end{bmatrix}^{T} \begin{bmatrix} \text{iq}\\ \text{cgpa} \end{bmatrix} + \begin{bmatrix} b_{11}\\ b_{12} \end{bmatrix} $$ $$ z^{(1)} = \begin{bmatrix} W^{1}_{11} & W^{1}_{21}\\ W^{1}_{12} & W^{1}_{22} \end{bmatrix} \begin{bmatrix} \text{iq}\\ \text{cgpa} \end{bmatrix} + \begin{bmatrix} b_{11}\\ b_{12} \end{bmatrix} $$ $$ z^{(1)} = \begin{bmatrix} W^{1}_{11}\cdot \text{iq} + W^{1}_{21}\cdot \text{cgpa} + b_{11}\\ W^{1}_{12}\cdot \text{iq} + W^{1}_{22}\cdot \text{cgpa} + b_{12} \end{bmatrix} $$

Apply activation to get hidden outputs:

$$ h = a(z^{(1)}) = \begin{bmatrix} o_{11}\\ o_{12} \end{bmatrix} $$

Layer 2 — Output Layer Computation

Matrix Form:

$$ z^{(2)} = (W^{(2)})^{T} h + b^{(2)} $$ $$ z^{(2)} = \begin{bmatrix} W^{2}_{11} & W^{2}_{21} \end{bmatrix} \begin{bmatrix} o_{11}\\ o_{12} \end{bmatrix} + b_{21} $$

Scalar Form:

$$ z^{(2)} = W^{2}_{11}\cdot o_{11} + W^{2}_{21}\cdot o_{12} + b_{21} $$

Final Prediction:

$$ \hat{y} = g(z^{(2)}) $$

If the task is a Yes/No decision (e.g., Placement or Not):
Use Step Function.

If the task requires a probability (e.g., likelihood of placement):
Use Sigmoid.

What is Backpropagation?

Backward propagation — or backpropagation — is how a neural network learns from its mistakes.

When the network makes a wrong prediction, it doesn’t just say “oops.” It traces the error backward through all the layers to see which weights and connections caused the mistake, and then adjusts them a little to do better next time.

You can think of it like this:

        => Forward propagation is “thinking.”
        => Backward propagation is “learning from being wrong.”

Scene 1 — Loss Computation

The network made a prediction ŷ and we compare it with the true value y. The difference between them is the loss, which tells us how wrong the prediction was. To Calculate the loss we need to choose a loss function for eg., MSE(y - ŷ)².

True Label (y): 1

Predicted (ŷ): 0.42

Loss increases as the prediction moves away from the true value.

Scene 2 — Backward Pass (Error Flow)

-> We want to reduce the Loss = (y − ŷ)².
-> To reduce this loss, the network must adjust ŷ.
-> But ŷ depends on the outputs of the two hidden neurons (O₁₁ and O₁₂).

-> And these hidden neuron outputs depend on their weights and biases:

O₁₁ depends on W¹₁₁, W¹₂₁ and b₁₁
O₁₂ depends on W¹₁₂, W¹₂₂ and b₁₂

So the error first flows backward from the output neuron to the hidden neurons. Each hidden neuron receives a portion of the blame proportional to how much its output influenced the final prediction (through its weights).

Scene 3 — Weights & Bias Update By Gradient Descent

To update a weight, the network checks two things: (1) How large the error was, and (2) How strong the input signal on that connection was. This gives the gradient, which tells the network how much that weight affected the mistake.

∂L/∂W = (error signal) × (activation from previous neuron)

The connection is continuously adjusting — the color shift shows the weight learning over time.

Gradient formula for Weights & Biase:

W_new = W_old − η · ( ∂L / ∂W_old )

b_new = b_old − η · ( ∂L / ∂b_old )

▼ Show Mathematical Derivation

Gradient Derivation (Backpropagation)

Loss:

\[ L = (y - \hat{y})^2 \]

Output layer:The ŷ depends on the weights like: W²₁₁,W²₂₁ & b₂₁ so,

\[ \hat{y} = W^{2}_{11}\,O_{11} + W^{2}_{21}\,O_{12} + b_{21} \] \[ \frac{\partial L}{\partial \hat{y}} = -2\,(y - \hat{y}), \qquad \frac{\partial \hat{y}}{\partial W^{2}_{11}} = O_{11}, \; \frac{\partial \hat{y}}{\partial W^{2}_{21}} = O_{12}, \; \frac{\partial \hat{y}}{\partial b_{21}} = 1 \] Hence, \[ \frac{\partial L}{\partial W^{2}_{11}} = -2\,(y - \hat{y})\,O_{11}, \qquad \frac{\partial L}{\partial W^{2}_{21}} = -2\,(y - \hat{y})\,O_{12}, \qquad \frac{\partial L}{\partial b_{21}} = -2\,(y - \hat{y}) \]

Hidden layer (for $O_{11}$):

\[ O_{11} = a\!\left(W^{1}_{11}\,iq + W^{1}_{21}\,cgpa + b_{11}\right) \] \[ \frac{\partial L}{\partial W^{1}_{11}} = -2\,(y - \hat{y})\, W^{2}_{11}\, iq, \qquad \frac{\partial L}{\partial W^{1}_{21}} = -2\,(y - \hat{y})\, W^{2}_{11}\, cgpa, \qquad \frac{\partial L}{\partial b_{11}} = -2\,(y - \hat{y})\, W^{2}_{11} \]

Hidden layer (for $O_{12}$):

\[ \frac{\partial L}{\partial W^{1}_{12}} = -2\,(y - \hat{y})\, W^{2}_{21}\, iq, \qquad \frac{\partial L}{\partial W^{1}_{22}} = -2\,(y - \hat{y})\, W^{2}_{21}\, cgpa, \qquad \frac{\partial L}{\partial b_{12}} = -2\,(y - \hat{y})\, W^{2}_{21} \]

Update rule:

\[ W_{\text{new}} = W_{\text{old}} - \eta \, \frac{\partial L}{\partial W_{\text{old}}}, \qquad b_{\text{new}} = b_{\text{old}} - \eta \, \frac{\partial L}{\partial b_{\text{old}}} \]

Scene 4 — Training Over Time

Loss: 1.000

The network learns by repeating this cycle:

Forward Pass → Loss → Backward Pass → Weight Update

Over many iterations, the loss becomes smaller. This means the model’s predictions are getting closer to the real target.

What is a Perceptron?

A Simple Real-Life Example