Losses
These losses, and their documentation, are included from the LossFunctions.jl package.
Pass the function as, e.g., elementwise_loss=L1DistLoss()
.
You can also declare your own loss as a function that takes two (unweighted) or three (weighted) scalar arguments. For example,
f(x, y, w) = abs(x-y)*w
options = Options(elementwise_loss=f)
Regression
Regression losses work on the distance between targets and predictions: r = x - y
.
LossFunctions.LPDistLoss
— TypeLPDistLoss{P} <: DistanceLoss
The P-th power absolute distance loss. It is Lipschitz continuous iff P == 1
, convex if and only if P >= 1
, and strictly convex iff P > 1
.
\[L(r) = |r|^P\]
LossFunctions.L1DistLoss
— TypeL1DistLoss <: DistanceLoss
The absolute distance loss. Special case of the LPDistLoss
with P=1
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(r) = |r|\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
3 │\. ./│ 1 │ ┌------------│
│ '\. ./' │ │ | │
│ \. ./ │ │ | │
│ '\. ./' │ │_ | _│
L │ \. ./ │ L' │ | │
│ '\. ./' │ │ | │
│ \. ./ │ │ | │
0 │ '\./' │ -1 │------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-3 3 -3 3
ŷ - y ŷ - y
LossFunctions.L2DistLoss
— TypeL2DistLoss <: DistanceLoss
The least squares loss. Special case of the LPDistLoss
with P=2
. It is strictly convex.
\[L(r) = |r|^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
9 │\ /│ 3 │ .r/ │
│". ."│ │ .r' │
│ ". ." │ │ _./' │
│ ". ." │ │_ .r/ _│
L │ ". ." │ L' │ _:/' │
│ '\. ./' │ │ .r' │
│ \. ./ │ │ .r' │
0 │ "-.___.-" │ -3 │ _/r' │
└────────────┴────────────┘ └────────────┴────────────┘
-3 3 -2 2
ŷ - y ŷ - y
LossFunctions.PeriodicLoss
— TypePeriodicLoss <: DistanceLoss
Measures distance on a circle of specified circumference c
.
\[L(r) = 1 - \cos \left( \frac{2 r \pi}{c} \right)\]
LossFunctions.HuberLoss
— TypeHuberLoss <: DistanceLoss
Loss function commonly used for robustness to outliers. For large values of d
it becomes close to the L1DistLoss
, while for small values of d
it resembles the L2DistLoss
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(r) = \begin{cases} \frac{r^2}{2} & \quad \text{if } | r | \le \alpha \\ \alpha | r | - \frac{\alpha^3}{2} & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (d=1) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ │ 1 │ .+-------│
│ │ │ ./' │
│\. ./│ │ ./ │
│ '. .' │ │_ ./ _│
L │ \. ./ │ L' │ /' │
│ \. ./ │ │ /' │
│ '. .' │ │ ./' │
0 │ '-.___.-' │ -1 │-------+' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
ŷ - y ŷ - y
LossFunctions.L1EpsilonInsLoss
— TypeL1EpsilonInsLoss <: DistanceLoss
The $ϵ$-insensitive loss. Typically used in linear support vector regression. It ignores deviances smaller than $ϵ$, but penalizes larger deviances linarily. It is Lipschitz continuous and convex, but not strictly convex.
\[L(r) = \max \{ 0, | r | - \epsilon \}\]
Lossfunction (ϵ=1) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\ /│ 1 │ ┌------│
│ \ / │ │ | │
│ \ / │ │ | │
│ \ / │ │_ ___________! _│
L │ \ / │ L' │ | │
│ \ / │ │ | │
│ \ / │ │ | │
0 │ \_________/ │ -1 │------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-3 3 -2 2
ŷ - y ŷ - y
LossFunctions.L2EpsilonInsLoss
— TypeL2EpsilonInsLoss <: DistanceLoss
The quadratic $ϵ$-insensitive loss. Typically used in linear support vector regression. It ignores deviances smaller than $ϵ$, but penalizes larger deviances quadratically. It is convex, but not strictly convex.
\[L(r) = \max \{ 0, | r | - \epsilon \}^2\]
Lossfunction (ϵ=0.5) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
8 │ │ 1 │ / │
│: :│ │ / │
│'. .'│ │ / │
│ \. ./ │ │_ _____/ _│
L │ \. ./ │ L' │ / │
│ \. ./ │ │ / │
│ '\. ./' │ │ / │
0 │ '-._______.-' │ -1 │ / │
└────────────┴────────────┘ └────────────┴────────────┘
-3 3 -2 2
ŷ - y ŷ - y
LossFunctions.LogitDistLoss
— TypeLogitDistLoss <: DistanceLoss
The distance-based logistic loss for regression. It is strictly convex and Lipschitz continuous.
\[L(r) = - \ln \frac{4 e^r}{(1 + e^r)^2}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ │ 1 │ _--'''│
│\ /│ │ ./' │
│ \. ./ │ │ ./ │
│ '. .' │ │_ ./ _│
L │ '. .' │ L' │ ./ │
│ \. ./ │ │ ./ │
│ '. .' │ │ ./ │
0 │ '-.___.-' │ -1 │___.-'' │
└────────────┴────────────┘ └────────────┴────────────┘
-3 3 -4 4
ŷ - y ŷ - y
LossFunctions.QuantileLoss
— TypeQuantileLoss <: DistanceLoss
The distance-based quantile loss, also known as pinball loss, can be used to estimate conditional τ-quantiles. It is Lipschitz continuous and convex, but not strictly convex. Furthermore it is symmetric if and only if τ = 1/2
.
\[L(r) = \begin{cases} -\left( 1 - \tau \right) r & \quad \text{if } r < 0 \\ \tau r & \quad \text{if } r \ge 0 \\ \end{cases}\]
Lossfunction (τ=0.7) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │'\ │ 0.3 │ ┌------------│
│ \. │ │ | │
│ '\ │ │_ | _│
│ \. │ │ | │
L │ '\ ._-│ L' │ | │
│ \. ..-' │ │ | │
│ '. _r/' │ │ | │
0 │ '_./' │ -0.7 │------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-3 3 -3 3
ŷ - y ŷ - y
Classification
Classifications losses (assuming binary) work on the margin between targets and predictions: r = x y
, assuming the target y
is either -1
or +1
.
LossFunctions.ZeroOneLoss
— TypeZeroOneLoss <: MarginLoss
The classical classification loss. It penalizes every misclassified observation with a loss of 1
while every correctly classified observation has a loss of 0
. It is not convex nor continuous and thus seldom used directly. Instead one usually works with some classification-calibrated surrogate loss, such as L1HingeLoss.
\[L(a) = \begin{cases} 1 & \quad \text{if } a < 0 \\ 0 & \quad \text{if } a >= 0\\ \end{cases}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
1 │------------┐ │ 1 │ │
│ | │ │ │
│ | │ │ │
│ | │ │_________________________│
│ | │ │ │
│ | │ │ │
│ | │ │ │
0 │ └------------│ -1 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y * h(x) y * h(x)
LossFunctions.PerceptronLoss
— TypePerceptronLoss <: MarginLoss
The perceptron loss linearly penalizes every prediction where the resulting agreement <= 0
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, -a \}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ┌------------│
│ '.. │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ '. │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ \.____________│ -1 │------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.LogitMarginLoss
— TypeLogitMarginLoss <: MarginLoss
The margin version of the logistic loss. It is infinitely many times differentiable, strictly convex, and Lipschitz continuous.
\[L(a) = \ln (1 + e^{-a})\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ \. │ 0 │ ._--/""│
│ \. │ │ ../' │
│ \. │ │ ./ │
│ \.. │ │ ./' │
L │ '-_ │ L' │ .,' │
│ '-_ │ │ ./ │
│ '\-._ │ │ .,/' │
0 │ '""*-│ -1 │__.--'' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -4 4
y ⋅ ŷ y ⋅ ŷ
LossFunctions.L1HingeLoss
— TypeL1HingeLoss <: MarginLoss
The hinge loss linearly penalizes every predicition where the resulting agreement < 1
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, 1 - a \}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
3 │'\. │ 0 │ ┌------│
│ ''_ │ │ | │
│ \. │ │ | │
│ '. │ │ | │
L │ ''_ │ L' │ | │
│ \. │ │ | │
│ '. │ │ | │
0 │ ''_______│ -1 │------------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.L2HingeLoss
— TypeL2HingeLoss <: MarginLoss
The truncated least squares loss quadratically penalizes every predicition where the resulting agreement < 1
. It is locally Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \max \{ 0, 1 - a \}^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 0 │ ,r------│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. │ │ ./ │
0 │ '-.________│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.SmoothedL1HingeLoss
— TypeSmoothedL1HingeLoss <: MarginLoss
As the name suggests a smoothed version of the L1 hinge loss. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} \frac{0.5}{\gamma} \cdot \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge 1 - \gamma \\ 1 - \frac{\gamma}{2} - a & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (γ=2) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │\. │ 0 │ ,r------│
│ '. │ │ ./' │
│ \. │ │ ,/ │
│ '. │ │ ./' │
L │ '. │ L' │ ,' │
│ \. │ │ ,/ │
│ ', │ │ ./' │
0 │ '*-._________│ -1 │______./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.ModifiedHuberLoss
— TypeModifiedHuberLoss <: MarginLoss
A special (4 times scaled) case of the SmoothedL1HingeLoss
with γ=2
. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} \max \{ 0, 1 - a \} ^2 & \quad \text{if } a \ge -1 \\ - 4 a & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ '. │ 0 │ .+-------│
│ '. │ │ ./' │
│ '\ │ │ ,/ │
│ \ │ │ ,/ │
L │ '. │ L' │ ./ │
│ '. │ │ ./' │
│ \. │ │______/' │
0 │ '-.________│ -5 │ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.L2MarginLoss
— TypeL2MarginLoss <: MarginLoss
The margin-based least-squares loss for classification, which penalizes every prediction where agreement != 1
quadratically. It is locally Lipschitz continuous and strongly convex.
\[L(a) = {\left( 1 - a \right)}^2\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ . │ 2 │ ,r│
│ '. │ │ ,/ │
│ '\ │ │ ,/ │
│ \ │ ├ ,/ ┤
L │ '. │ L' │ ./ │
│ '. │ │ ./ │
│ \. .│ │ ./ │
0 │ '-.____.-' │ -3 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.ExpLoss
— TypeExpLoss <: MarginLoss
The margin-based exponential loss for classification, which penalizes every prediction exponentially. It is infinitely many times differentiable, locally Lipschitz continuous and strictly convex, but not clipable.
\[L(a) = e^{-a}\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
5 │ \. │ 0 │ _,,---:'""│
│ l │ │ _r/"' │
│ l. │ │ .r/' │
│ ": │ │ .r' │
L │ \. │ L' │ ./ │
│ "\.. │ │ .' │
│ '":,_ │ │ ,' │
0 │ ""---:.__│ -5 │ ./ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.SigmoidLoss
— TypeSigmoidLoss <: MarginLoss
Continuous loss which penalizes every prediction with a loss within in the range (0,2). It is infinitely many times differentiable, Lipschitz continuous but nonconvex.
\[L(a) = 1 - \tanh(a)\]
Lossfunction Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │""'--,. │ 0 │.. ..│
│ '\. │ │ "\. ./" │
│ '. │ │ ', ,' │
│ \. │ │ \ / │
L │ "\. │ L' │ \ / │
│ \. │ │ \. ./ │
│ \, │ │ \. ./ │
0 │ '"-:.__│ -1 │ ',_,' │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ
LossFunctions.DWDMarginLoss
— TypeDWDMarginLoss <: MarginLoss
The distance weighted discrimination margin loss. It is a differentiable generalization of the L1HingeLoss that is different than the SmoothedL1HingeLoss. It is Lipschitz continuous and convex, but not strictly convex.
\[L(a) = \begin{cases} 1 - a & \quad \text{if } a \le \frac{q}{q+1} \\ \frac{1}{a^q} \frac{q^q}{(q+1)^{q+1}} & \quad \text{otherwise}\\ \end{cases}\]
Lossfunction (q=1) Derivative
┌────────────┬────────────┐ ┌────────────┬────────────┐
2 │ ". │ 0 │ ._r-│
│ \. │ │ ./ │
│ ', │ │ ./ │
│ \. │ │ / │
L │ "\. │ L' │ . │
│ \. │ │ / │
│ ":__ │ │ ; │
0 │ '""---│ -1 │---------------┘ │
└────────────┴────────────┘ └────────────┴────────────┘
-2 2 -2 2
y ⋅ ŷ y ⋅ ŷ