Actually, the simplest form of a perceptron neuron activation, the Heaviside step function, is essentially a binary if-else statement. The reason we moved away from that if-else structure is for trainability. Because the step function is discontinuous, it has no meaningful derivative, which makes it impossible to use gradient-based optimization like backpropagation.
This is not entirely true, e.g. ReLU also has no derivative at 0, and is widely used with backpropagation. In practice, we often just set ReLU'(0) = 0 and carry on.
ReLU-like functions that are actually differentiable everywhere (e.g GELU, SwiGLU etc) became popular only fairly recently (and honestly for reasons other than differentiability, see https://arxiv.org/pdf/1606.08415).
Well that depends on how you see it. This a methaphor we are talking about. In its simplest form both are conditional based outcomes. Be it binary, a gradient, or even multi dimensional.
if-else's are all conditionals, but not all conditionals are if-else's
And i can't imagine a gradient activation function being described as "behaves like if-else's" since they have so little in common. Though I suppose that is a matter of opinion.
48
u/SKRyanrr 14d ago
Tell me you know nothing about LLMs without telling me