Deep Learning - Mathematical Foundations

Wang Tiven July 10, 2018
425 favorite favorites
bookmark bookmark
share share

Linear Algebra

Matrix

Linear Dependence and Span

• linear dependence 线性相关
• linearly independent 线性无关
• linear combination 线性组合
• span 生成子空间
• singular square 奇异方阵
• matrix inversion 矩阵逆

Special Kinds of Matrices and Vectors

• diagonal matrix 对角矩阵
• symmetric matrix 对称矩阵
• unit vector 单位向量
• unit norm 单位范数
• orthogonal matrix 正交矩阵
• Jacobian matrix
Jacobian matrix

Khan Academy Lessons: Jacobian

Norms 范数

Eigendecomposition 特征分解

• eigenvector 特征向量
• positive definite 正定
• positive semidefinite 半正定
• negative definite 负定
• negative semidefinite 半负定

Tensors

Tensor: In mathematics, tensors are geometric objects that describe linear relations between geometric vectors, scalars, and other tensors. Elementary examples of such relations include the dot product, the cross product, and linear maps. Geometric vectors, often used in physics and engineering applications, and scalars themselves are also tensors.

In short, tensor is a mathematical term for n-dimensional arrays. For example, a 1×1 tensor is a scalar, a 1×n tensor is a vector, an n×n tensor is a matrix, and an n×n×n tensor is just a three-dimensional array.

Statistics

Bayesian probability (贝叶斯概率) 是由 Bayes’ theorem (贝叶斯理论) 所提供的一种对概率的解释，它采用将概率定义为某人对一个命题信任的程度的概念。贝叶斯理论同时也建议贝叶斯定理可以用作根据新的信息导出或者更新现有的置信度的规则。

Random Variables

On its own, a random variable is just a description of the states that are possible; it must be coupled with a probability distribution that specifies how likely each of these states are.

Probability distributions

At a high level, probabiity distributions provide a mathematical trick that allows you to relax a discrete set of choices into a continuum.

• Probability mass function, PMF 概率质量函数用来描述离散型随机变量的概率的函数，它代表的是变量在某个取值上的概率。
• Probability density function, PDF 概率密度函数用来描述连续型随机变量的概率的函数，它代表的是变量在某个取值上的概率密度，要说概率只能是变量在某个区间上的概率，要通过 PDF 在此区间求积分得到此概率。
• Joint probability distribution 联合概率分布是对多个随机变量如两个随机变量 $X$$Y$ 的概率分布。
• Marginal probability distribution 边缘概率分布是指对于多个变量的联合分布，其中一部分变量（子集）的概率分布。要通过对此子集外的变量函数求和或者积分来计算得到。
• Conditional probability distribution 条件概率分布是指对于多个变量的联合分布，如果指定一部分变量的值的情况下，变量剩下的部分（子集）的概率分布。
• 条件概率的链式法则或者乘法法则，即任何多维随机变量的联合概率分布，都可以分解成只有一个变量的条件概率相乘的形式:
$P(x^{(1)},\dots,x^{(n)})=P(x^{(1)})\prod_{i=2}^nP(x^{(i)}|x^{(1)},\dots,x^{(i-1)})$
• 直觉上是指一次实验中一事件的发生不会影响到另一事件发生的概率，那么称为两个事件的随机变量是独立 (independent) 的，记作$x \bot y$Conditional independence (条件独立)是指在给定随机变量 $z$ 时两个随机变量$x$$y$是独立的，记作 $x \bot y|z$

Common Probability Distributions

Bernoulli distribution

Bernoulli distribution 伯努利分布又名两点分布或者0-1分布，是一个离散型概率分布。 它具有如下性质:

Multinoulli Distribution

Multinoulli Distribution 或称 Categorical distribution 范畴分布，它是 Multinomial distribution 多项式分布 $\{0,\dots,n\}^k$ 的一个特例 ($n=1$)。

Gaussian Distribution

Normal Distribution 正态分布，也称为 Gaussian Distribution 高斯分布，是实数上最常用的分布。

$\mu=0,\sigma=1$ 时为标准正态分布

• 指数分布和 Laplace 分布
• Dirac 分布和经验分布

分布的混合

Mixture model (混合模型)是组合简单概率分布来生成更丰富的分布的一种简单策略。一个非常强大且常见的混合模型是 Guassian Mixture Model (高斯混合模型)

// TODO

Least Squares

zhihu - 最小二乘法

Entropy

Cross-entropy is a mathematical method for gauging the distance between two probability distributions: Here $p$ and $q$ are two probability distributions. the notation $p(x)$ denotes the probability $p$ accords to event $x$. Like the $L2$ norm, $H$ provides a notion of distance. Note that in the case where $p = q$ , this quantity is the entropy of $p$ and is usually written simply $H(p)$. It’s a measure of how disorderd the distribution is; The entropy is maximized when all events are equally likely. $H(p)$ is always less than or equal to $H(p, q)$. In fact, the “further away” distribution $q$ is from $p$, the larger the cross-entropy gets.

As an aside, note that unlike $L2$ norm, $H$ is asymmetric.

Convolution

Convolution of probability distributions

planetmath - Convolution

Calculus

Machine Learning

At a very high level, machine learning is simply the act of function minimization: learning algorithms are nothing more than minima finders for suitably defined functions.

学习算法

Learning 学习：“对于某类任务$T$和性能度量$P$，一个计算机程序被认为可以从经验$E$中学习是指，通过经验$E$改进后，它在任务$T$上由性能度量$P$衡量的性能有所提升。”

example (样本) 表示为向量 $\pmb{x} \in \mathbb{R}^n$, feature (特征) 是向量的一个元素 $x_i$

• Classification 分类
• Classification with missing inputs 输入缺失分类
• Regression
• Transcription
• Machine translation
• Structured output
• Anomaly detection
• Synthesis and sampling
• Imputation of missing values
• Denoising
• Density estimation or probability mass function estimation

设计学习算法的基本原则

training error (训练误差) test error (测试误差)

underfitting (欠拟合) 是指模型不能在训练集上获得足够低的误差，overfitting (过拟合) 是指训练误差和测试误差之间的差距过大。通过调整模型的 capacity (容量) 可以控制模型是否偏向于过拟合或者欠拟合。

Regularization (正则化) 是指修改学习算法，使其降低泛化误差而非训练误差，正则化是机器学习领域的中心问题之一，只有优化能够与其重要性相提并论。

hyperparameters (超参数) 是指学习算法能学习的参数之外的参数，为什么有学习之外的参数，因为它太难优化了，尽管可以设计一个嵌套的学习过程。

学习的目标

Point estimation (点估计) 是用样本统计量来估计总体参数，因为样本统计量为数轴上某一点值，估计的结果也以一个点的数值表示，所以称为点估计。点估计和区间估计属于总体参数估计问题。何为总体参数统计，当在研究中从样本获得一组数据后，如何通过这组信息，对总体特征进行估计，也就是如何从局部结果推论总体的情况，称为总体参数估计。

Maximum likelihood estimation (最大似然估计)

CSDN - 极大似然估计详解

构建机器学习算法

Closed-form expression (解析解)，又称为闭式解，是可以用解析表达式来表达的解。 在数学上，如果一个方程或者方程组存在的某些解，是由有限次常见运算的组合给出的形式，则称该方程存在解析解。当解析解不存在时，比如五次以及更高次的代数方程，则该方程只能用数值分析的方法求解近似值 (数值解)。

Manifold learning

Manifold learning

http://scikit-learn.org/stable/modules/manifold.html

https://jakevdp.github.io/PythonDataScienceHandbook/05.10-manifold-learning.html

Loss function

All of machine learning, and much of artificial intelligence, boils down to the creation of the right loss function to solve the problem at hand.

MSE

L1 L2 loss functions

MSE (Mean squared error)

Why Mean Squared Error and L2 regularization? A probabilistic justification

Classification and regression

Machine learning algorithms can be broadly categorized as supervised or unsupervised problems. Supervised problems are those for which both datapoints $x$ and labels $y$ are available, while unsupervised problems have only datapoints $x$ without labels $y$.

Supervised machine learning can be broken up into the two subproblems of classification and regression.

专业词汇

• approximation
• continuous function approximation
• numerical 数值
• numerical analysis 数值分析