The
L1 norm and
L2 norm are both ways to measure the "size" or "magnitude" of a vector, but they do it in different ways. Here's the difference between them:
1. L1 Norm (Manhattan Norm or Taxicab Norm):
- Definition: The L1 norm is the sum of the absolute values of the components of the vector.
For a vector \( \mathbf{x} = [x_1, x_2, \dots, x_n] \), the L1 norm is:
\[
\| \mathbf{x} \|_1 = |x_1| + |x_2| + \dots + |x_n|
\]
- Geometric interpretation: Imagine you are moving in a grid system, like a taxicab driving on streets that run parallel to the axes (like in Manhattan). The L1 norm calculates the total distance you travel by adding up the "blocks" you move in each direction.
- Use case: The L1 norm is often used when you want to promote sparsity (i.e., many values being zero). It is commonly used in L1 regularization (also known as Lasso in machine learning).
2. L2 Norm (Euclidean Norm):
- Definition: The L2 norm is the square root of the sum of the squared values of the components of the vector.
For the same vector \( \mathbf{x} = [x_1, x_2, \dots, x_n] \), the L2 norm is:
\[
\| \mathbf{x} \|_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}
\]
- Geometric interpretation: The L2 norm represents the straight-line (Euclidean) distance from the origin to the point \( (x_1, x_2, \dots, x_n) \) in space. It's like measuring the "direct" distance from the origin to the point, as if using a ruler.
- Use case: The L2 norm is more commonly used when you want to minimize the overall error or difference in predictions, like in L2 regularization (also known as Ridge regression).
Key Differences:
- L1 norm: Adds up the absolute values of the components (can be thought of as a "grid" distance).
- L2 norm: Squares the components, sums them, and then takes the square root (direct "straight-line" distance).
In summary:
- L1 norm is simpler and often used in contexts where sparsity (fewer non-zero elements) is desired.
- L2 norm is more common in situations where a smooth, continuous penalty is needed for large values.