Python - Softmax Regression

Softmax Regression Linear Model Regression Python

Python - Softmax Regression

Google Earth

Wang Tiven July 17, 2018

425 favorites

bookmark

Cost Function
Softmax regression

上一篇 Python - Building A Logistic Regression 我们介绍了逻辑回归模型，主要针对的是 1 个特征维度的样本 x 分为 2 类的问题，模型函数如下

\[f(z) = \frac{e^{z}}{1 + e^{z}} = \frac{1}{1+e^{-z}}\] \[z = wx + b\]

那么样本特征扩展到 m 个维度，样本类扩展为 k 个的问题怎么解决呐？

首先对于具有 m 个特征的某一个样本数据我们有以下公式，则 \(\mathbf{w}\) 为样本特征权重系数的数组

\[z = w_1x_1 + ... + w_mx_m + b= \sum_{l=1}^{m} w_l x_l + b= \mathbf{w}^T\mathbf{x} + b\]

那么对于它要计算出其在 k 个分类上的概率分布，则假设有 k 个 \(\mathbf{w}\) 系数组，则其组成一个矩阵

\[\begin{align} \begin{bmatrix} z_1 \\ z_2 \\ \vdots \\ z_k \end{bmatrix} = \begin{bmatrix} w_{1,1} & w_{1,2} & \cdots & w_{1,m} \\ w_{2,1} & w_{2,2} & \cdots & w_{2,m} \\ \vdots & \vdots & \ddots & \vdots \\ w_{k,1} & w_{k,2} & \cdots & w_{k,m} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_m \end{bmatrix} + b = \begin{bmatrix} w_{1,1} x_1 + w_{1,2} x_2 + \cdots + w_{1,m} x_m + b \\ w_{2,1} x_1 + w_{2,2} x_2 + \cdots + w_{2,m} x_m + b \\ \vdots \\ w_{k,1} x_1 + w_{k,2} x_2 + \cdots + w_{k,m} x_m + b \end{bmatrix} \end{align}\]

最终计算样本在每个分类上的概率，则有以下公式

\[\begin{align} h_w(x) = \begin{bmatrix} P(y = 1 | x; w) \\ P(y = 2 | x; w) \\ \vdots \\ P(y = k | x; w) \end{bmatrix} = \frac{1}{ \sum_{j=1}^{k}{\exp(z_j) }} \begin{bmatrix} \exp(z_1) \\ \exp(z_2) \\ \vdots \\ \exp(z_k) \\ \end{bmatrix} \end{align}\]

Cost Function

结合上一篇讲的交叉熵的概念，对于某一个样本的 Cost Function 可以表示如下

\[\begin{align} J(w) = - \left[ \sum_{k=1}^{K} 1\left\{y = k\right\} \log \frac{\exp(z_k)}{\sum_{j=1}^K \exp(z_j)}\right] \end{align}\]

假设有 \(n\) 个样本那么汇总的 Cost Function 可写为

\[\begin{align} J(w) = - \left[ \sum_{i=1}^{n}\sum_{k=1}^{K} 1\left\{y^i = k\right\} \log \frac{e^{(z_{ik})}}{\sum_{j=1}^K e^{(z_{ij})}}\right] \end{align}\]

Softmax regression

Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes. In logistic regression we assumed that the labels were binary: y(i)∈{0,1}. We used such a classifier to distinguish between two kinds of hand-written digits. Softmax regression allows us to handle y(i)∈{1,…,K} where K is the number of classes.

References

[1] UFLDL Tutorial - Softmax Regression http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/