View Source Axon.Losses (Axon v0.7.0)
Loss functions.
Loss functions evaluate predictions with respect to true data, often to measure the divergence between a model's representation of the data-generating distribution and the true representation of the data-generating distribution.
Each loss function is implemented as an element-wise function
measuring the loss with respect to the input target y_true
and input prediction y_pred. As an example, the mean_squared_error/2
loss function produces a tensor whose values are the mean squared
error between targets and predictions:
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_squared_error(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.5, 0.5]
>It's common to compute the loss across an entire minibatch.
You can easily do so by specifying a :reduction mode, or
by composing one of these with an Nx reduction method:
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_squared_error(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.5
>You can even compose loss functions:
defn my_strange_loss(y_true, y_pred) do
y_true
|> Axon.Losses.mean_squared_error(y_pred)
|> Axon.Losses.binary_cross_entropy(y_pred)
|> Nx.sum()
endOr, more commonly, you can combine loss functions with penalties for regularization:
defn regularized_loss(params, y_true, y_pred) do
loss = Axon.mean_squared_error(y_true, y_pred)
penalty = l2_penalty(params)
Nx.sum(loss) + penalty
endAll of the functions in this module are implemented as
numerical functions and can be JIT or AOT compiled with
any supported Nx compiler.
Summary
Functions
Applies label smoothing to the given labels.
Binary cross-entropy loss function.
Categorical cross-entropy loss function.
Categorical hinge loss function.
Connectionist Temporal Classification loss.
Cosine Similarity error loss function.
Hinge loss function.
Huber loss.
Kullback-Leibler divergence loss function.
Modifies the given loss function to smooth labels prior to calculating loss.
Logarithmic-Hyperbolic Cosine loss function.
Margin ranking loss function.
Mean-absolute error loss function.
Mean-squared error loss function.
Poisson loss function.
Soft margin loss function.
Functions
Applies label smoothing to the given labels.
Label smoothing is a regularization technique which shrink targets towards a uniform distribution. Label smoothing can improve model generalization.
Options
:smoothing- smoothing factor. Defaults to 0.1
References
Binary cross-entropy loss function.
$$ l_i = -\frac{1}{2}(\hat{y_i} \cdot \log(y_i) + (1 - \hat{y_i}) \cdot \log(1 - y_i)) $$
Binary cross-entropy loss is most often used in binary classification problems.
By default, it expects y_pred to encode probabilities from [0.0, 1.0], typically
as the output of the sigmoid function or another function which squeezes values
between 0 and 1. You may optionally set from_logits: true to specify that values
are being sent as non-normalized values (e.g. weights with possibly infinite range).
In this case, input values will be encoded as probabilities by applying the logistic
sigmoid function before computing loss.
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.:negative_weight- class weight for0class useful for scaling loss by importance of class. Defaults to1.0.:positive_weight- class weight for1class useful for scaling loss by importance of class. Defaults to1.0.:from_logits- whethery_predis a logits tensor. Defaults tofalse.
Examples
iex> y_true = Nx.tensor([[0, 1], [1, 0], [1, 0]])
iex> y_pred = Nx.tensor([[0.6811, 0.5565], [0.6551, 0.4551], [0.5422, 0.2648]])
iex> Axon.Losses.binary_cross_entropy(y_true, y_pred)
#Nx.Tensor<
f32[3]
[0.8644826412200928, 0.5150600075721741, 0.45986634492874146]
>
iex> y_true = Nx.tensor([[0, 1], [1, 0], [1, 0]])
iex> y_pred = Nx.tensor([[0.6811, 0.5565], [0.6551, 0.4551], [0.5422, 0.2648]])
iex> Axon.Losses.binary_cross_entropy(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.613136351108551
>
iex> y_true = Nx.tensor([[0, 1], [1, 0], [1, 0]])
iex> y_pred = Nx.tensor([[0.6811, 0.5565], [0.6551, 0.4551], [0.5422, 0.2648]])
iex> Axon.Losses.binary_cross_entropy(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
1.8394089937210083
>
Categorical cross-entropy loss function.
$$ l_i = -\sum_i^C \hat{y_i} \cdot \log(y_i) $$
Categorical cross-entropy is typically used for multi-class classification problems.
By default, it expects y_pred to encode a probability distribution along the last
axis. You can specify from_logits: true to indicate y_pred is a logits tensor.
# Batch size of 3 with 3 target classes
y_true = Nx.tensor([0, 2, 1])
y_pred = Nx.tensor([[0.2, 0.8, 0.0], [0.1, 0.2, 0.7], [0.1, 0.2, 0.7]])Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.:class_weights- 1-D list corresponding to weight of each class useful for scaling loss according to importance of class. Tensor size must match number of classes in dataset. Defaults to1.0for all classes.:from_logits- whethery_predis a logits tensor. Defaults tofalse.:sparse- whethery_trueencodes a "sparse" tensor. In this case the inputs are integer values corresponding to the target class. Defaults tofalse.
Examples
iex> y_true = Nx.tensor([[0, 1, 0], [0, 0, 1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
iex> Axon.Losses.categorical_cross_entropy(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.051293306052684784, 2.3025851249694824]
>
iex> y_true = Nx.tensor([[0, 1, 0], [0, 0, 1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
iex> Axon.Losses.categorical_cross_entropy(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
1.1769392490386963
>
iex> y_true = Nx.tensor([[0, 1, 0], [0, 0, 1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
iex> Axon.Losses.categorical_cross_entropy(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
2.3538784980773926
>
iex> y_true = Nx.tensor([1, 2], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
iex> Axon.Losses.categorical_cross_entropy(y_true, y_pred, reduction: :sum, sparse: true)
#Nx.Tensor<
f32
2.3538784980773926
>
Categorical hinge loss function.
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[1, 0, 0], [0, 0, 1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05300799, 0.21617081, 0.68642382], [0.3754382 , 0.08494169, 0.13442067]])
iex> Axon.Losses.categorical_hinge(y_true, y_pred)
#Nx.Tensor<
f32[2]
[1.6334158182144165, 1.2410175800323486]
>
iex> y_true = Nx.tensor([[1, 0, 0], [0, 0, 1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05300799, 0.21617081, 0.68642382], [0.3754382 , 0.08494169, 0.13442067]])
iex> Axon.Losses.categorical_hinge(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
1.4372167587280273
>
iex> y_true = Nx.tensor([[1, 0, 0], [0, 0, 1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.05300799, 0.21617081, 0.68642382], [0.3754382 , 0.08494169, 0.13442067]])
iex> Axon.Losses.categorical_hinge(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
2.8744335174560547
>
Connectionist Temporal Classification loss.
Argument Shapes
l_true- $(B)$y_true- $(B, S)$y_pred- $(B, T, D)$
Options
:reduction- reduction mode. One of:sumor:none. Defaults to:none.
Description
l_true contains lengths of target sequences. Nonzero positive values.
y_true contains target sequences. Each value represents a class
of element in range of available classes 0 <= y < D. Blank element
class is included in this range, but shouldn't be presented among
y_true values. Maximum target sequence length should be lower or equal
to y_pred sequence length: S <= T.
y_pred - log probabilities of classes D along the
prediction sequence T.
Cosine Similarity error loss function.
$$ l_i = \sum_i (\hat{y_i} - y_i)^2 $$
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.:axes- Defaults to[1].:eps- Defaults to1.0e-6.
Examples
iex> y_pred = Nx.tensor([[1.0, 0.0], [1.0, 1.0]])
iex> y_true = Nx.tensor([[0.0, 1.0], [1.0, 1.0]])
iex> Axon.Losses.cosine_similarity(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.0, 1.0000001192092896]
>
Hinge loss function.
$$ \frac{1}{C}\max_i(1 - \hat{y_i} * y_i, 0) $$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Examples
iex> y_true = Nx.tensor([[ 1, 1, -1], [ 1, 1, -1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.45440044, 0.31470688, 0.67920924], [0.24311459, 0.93466766, 0.10914676]])
iex> Axon.Losses.hinge(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.9700339436531067, 0.6437881588935852]
>
iex> y_true = Nx.tensor([[ 1, 1, -1], [ 1, 1, -1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.45440044, 0.31470688, 0.67920924], [0.24311459, 0.93466766, 0.10914676]])
iex> Axon.Losses.hinge(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.806911051273346
>
iex> y_true = Nx.tensor([[ 1, 1, -1], [ 1, 1, -1]], type: {:s, 8})
iex> y_pred = Nx.tensor([[0.45440044, 0.31470688, 0.67920924], [0.24311459, 0.93466766, 0.10914676]])
iex> Axon.Losses.hinge(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
1.613822102546692
>
Huber loss.
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.:delta- the point where the Huber loss function changes from a quadratic to linear. Defaults to1.0.
Examples
iex> y_true = Nx.tensor([[1], [1.5], [2.0]])
iex> y_pred = Nx.tensor([[0.8], [1.8], [2.1]])
iex> Axon.Losses.huber(y_true, y_pred)
#Nx.Tensor<
f32[3][1]
[
[0.019999997690320015],
[0.04499998688697815],
[0.004999990575015545]
]
>
iex> y_true = Nx.tensor([[1], [1.5], [2.0]])
iex> y_pred = Nx.tensor([[0.8], [1.8], [2.1]])
iex> Axon.Losses.huber(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.02333332598209381
>
Kullback-Leibler divergence loss function.
$$ l_i = \sum_i^C \hat{y_i} \cdot \log(\frac{\hat{y_i}}{y_i}) $$
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[0, 1], [0, 0]], type: {:u, 8})
iex> y_pred = Nx.tensor([[0.6, 0.4], [0.4, 0.6]])
iex> Axon.Losses.kl_divergence(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.916289210319519, -3.080907390540233e-6]
>
iex> y_true = Nx.tensor([[0, 1], [0, 0]], type: {:u, 8})
iex> y_pred = Nx.tensor([[0.6, 0.4], [0.4, 0.6]])
iex> Axon.Losses.kl_divergence(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.45814305543899536
>
iex> y_true = Nx.tensor([[0, 1], [0, 0]], type: {:u, 8})
iex> y_pred = Nx.tensor([[0.6, 0.4], [0.4, 0.6]])
iex> Axon.Losses.kl_divergence(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
0.9162861108779907
>
Modifies the given loss function to smooth labels prior to calculating loss.
See apply_label_smoothing/2 for details.
Options
:smoothing- smoothing factor. Defaults to 0.1
Logarithmic-Hyperbolic Cosine loss function.
$$ l_i = \frac{1}{C} \sum_i^C (\hat{y_i} - y_i) + \log(1 + e^{-2(\hat{y_i} - y_i)}) - \log(2) $$
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]])
iex> y_pred = Nx.tensor([[1.0, 1.0], [0.0, 0.0]])
iex> Axon.Losses.log_cosh(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.2168903946876526, 0.0]
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]])
iex> y_pred = Nx.tensor([[1.0, 1.0], [0.0, 0.0]])
iex> Axon.Losses.log_cosh(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.1084451973438263
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]])
iex> y_pred = Nx.tensor([[1.0, 1.0], [0.0, 0.0]])
iex> Axon.Losses.log_cosh(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
0.2168903946876526
>
Margin ranking loss function.
$$ l_i = \max(0, -\hat{y_i} * (y^(1)_i - y^(2)_i) + \alpha) $$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([1.0, 1.0, 1.0], type: {:f, 32})
iex> y_pred1 = Nx.tensor([0.6934, -0.7239, 1.1954], type: {:f, 32})
iex> y_pred2 = Nx.tensor([-0.4691, 0.2670, -1.7452], type: {:f, 32})
iex> Axon.Losses.margin_ranking(y_true, {y_pred1, y_pred2})
#Nx.Tensor<
f32[3]
[0.0, 0.9909000396728516, 0.0]
>
iex> y_true = Nx.tensor([1.0, 1.0, 1.0], type: {:f, 32})
iex> y_pred1 = Nx.tensor([0.6934, -0.7239, 1.1954], type: {:f, 32})
iex> y_pred2 = Nx.tensor([-0.4691, 0.2670, -1.7452], type: {:f, 32})
iex> Axon.Losses.margin_ranking(y_true, {y_pred1, y_pred2}, reduction: :mean)
#Nx.Tensor<
f32
0.3303000032901764
>
iex> y_true = Nx.tensor([1.0, 1.0, 1.0], type: {:f, 32})
iex> y_pred1 = Nx.tensor([0.6934, -0.7239, 1.1954], type: {:f, 32})
iex> y_pred2 = Nx.tensor([-0.4691, 0.2670, -1.7452], type: {:f, 32})
iex> Axon.Losses.margin_ranking(y_true, {y_pred1, y_pred2}, reduction: :sum)
#Nx.Tensor<
f32
0.9909000396728516
>
Mean-absolute error loss function.
$$ l_i = \sum_i |\hat{y_i} - y_i| $$
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_absolute_error(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.5, 0.5]
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_absolute_error(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.5
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_absolute_error(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
1.0
>
Mean-squared error loss function.
$$ l_i = \sum_i (\hat{y_i} - y_i)^2 $$
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_squared_error(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.5, 0.5]
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_squared_error(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.5
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [1.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.mean_squared_error(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
1.0
>
Poisson loss function.
$$ l_i = \frac{1}{C} \sum_i^C y_i - (\hat{y_i} \cdot \log(y_i)) $$
Argument Shapes
y_true- $(d_0, d_1, ..., d_n)$y_pred- $(d_0, d_1, ..., d_n)$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.poisson(y_true, y_pred)
#Nx.Tensor<
f32[2]
[0.9999999403953552, 0.0]
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.poisson(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.4999999701976776
>
iex> y_true = Nx.tensor([[0.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[1.0, 1.0], [0.0, 0.0]], type: {:f, 32})
iex> Axon.Losses.poisson(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
0.9999999403953552
>
Soft margin loss function.
$$ l_i = \sum_i \frac{\log(1 + e^{-\hat{y_i} * y_i})}{N} $$
Options
:reduction- reduction mode. One of:mean,:sum, or:none. Defaults to:none.
Examples
iex> y_true = Nx.tensor([[-1.0, 1.0, 1.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[0.2953, -0.1709, 0.9486]], type: {:f, 32})
iex> Axon.Losses.soft_margin(y_true, y_pred)
#Nx.Tensor<
f32[3]
[0.851658046245575, 0.7822436094284058, 0.3273470401763916]
>
iex> y_true = Nx.tensor([[-1.0, 1.0, 1.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[0.2953, -0.1709, 0.9486]], type: {:f, 32})
iex> Axon.Losses.soft_margin(y_true, y_pred, reduction: :mean)
#Nx.Tensor<
f32
0.6537495255470276
>
iex> y_true = Nx.tensor([[-1.0, 1.0, 1.0]], type: {:f, 32})
iex> y_pred = Nx.tensor([[0.2953, -0.1709, 0.9486]], type: {:f, 32})
iex> Axon.Losses.soft_margin(y_true, y_pred, reduction: :sum)
#Nx.Tensor<
f32
1.9612486362457275
>