
CONTENTS
Functions
f ◦ g Composition of the functions f and g
f(x; θ) A function of x parameterized by θ
log x Natural logarithm of x
σ(x) Logistic sigmoid, 1/(1 + exp(−x))
ζ(x) Softplus, log(1 + exp(x))
||x||
p
L
p
norm of x
x
+
Positive part of x, i.e., max(0, x)
1
condition
is 1 if the condition is true, 0 otherwise.
Sometimes we write f(x), f(X), or f(X), when f is a function of a scalar rather
than a vector, matrix, or tensor. In this case, we mean to apply f to the array
element-wise. For example, if C = σ(X), then C
i,j,k
= σ(X
i,j,k
) for all valid values
of i, j and k.
Datasets and distributions
X A set of training examples
x
(i)
The i-th example (input) from a dataset
y
(i)
or y
(i)
The target associated with x
(i)
for supervised learning
X The m × n matrix with input example x
(i)
in row X
i,:
xii