Last updated 2 years ago
模型:y^=Xw{\hat{\mathbf{y}}}=\mathbf{X}\mathbf{w}y^=Xw
损失函数:l(X,y,w)=12n∥y−Xw∥2l(\textbf{X},\textbf{y},\textbf{w})=\frac{1}{2n}\|\textbf{y}-\textbf{Xw}\|^2l(X,y,w)=2n1∥y−Xw∥2
显式解:w∗=(XTX)−1Xyw^*=(\textbf{X}^T\textbf{X})^{-1}\textbf{Xy}w∗=(XTX)−1Xy
梯度下降(gradient descent)
小批量(mini-batch)梯度下降
超参数:批量大小,学习率
softmax:y^=softmax(o),y^i=exp(oi)∑kexp(oi)\hat{\textbf{y}}=softmax(\textbf{o}), \hat{y}_i=\frac{exp(o_i)}{\sum_k exp(o_i)}y^=softmax(o),y^i=∑kexp(oi)exp(oi)
交叉熵(概率区别):H(p,q)=∑i−pilog(qi)H(\textbf{p}, \textbf{q})=\sum_i -p_ilog(q_i)H(p,q)=∑i−pilog(qi)
损失函数:l(y,y^)=−∑iyilogy^il(\textbf{y},\hat{\textbf{y}})=-\sum_i y_i log\hat{y}_il(y,y^)=−∑iyilogy^i
蓝色:损失函数;绿色:似然函数;橙色:梯度
L2 Loss(均方损失,MSE): l(y,y′)=12(y−y′)2l(y,y')=\frac12(y-y')^2l(y,y′)=21(y−y′)2
L1 Loss(绝对损失,MAE): l(y,y′)=∣y−y′∣l(y,y')=|y-y'|l(y,y′)=∣y−y′∣
Huber‘s Robust Loss:l(y,y′)={∣y−y′∣−12,if∣y−y′∣>112(y−y′)2,otherwisel(y,y')=\begin{cases}|y-y'|-\frac12, &if |y-y'|>1\\ \frac12(y-y')^2, &otherwise\end{cases}l(y,y′)={∣y−y′∣−21,21(y−y′)2,if∣y−y′∣>1otherwise