language-icon Old Web
English
Sign In

Hinge loss

In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for 'maximum-margin' classification, most notably for support vector machines (SVMs). In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for 'maximum-margin' classification, most notably for support vector machines (SVMs). For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as Note that y should be the 'raw' output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, y = w ⋅ x + b {displaystyle y=mathbf {w} cdot mathbf {x} +b} , where ( w , b ) {displaystyle (mathbf {w} ,b)} are the parameters of the hyperplane and x {displaystyle mathbf {x} } is the input variable(s). When t and y have the same sign (meaning y predicts the right class) and | y | ≥ 1 {displaystyle |y|geq 1} , the hinge loss ℓ ( y ) = 0 {displaystyle ell (y)=0} . When they have opposite signs, ℓ ( y ) {displaystyle ell (y)} increases linearly with y, and similarly if | y | < 1 {displaystyle |y|<1} , even if it has the same sign (correct prediction, but not by enough margin). While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion,it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed. For example, Crammer and Singerdefined it for a linear classifier as Where t {displaystyle t} the target label, w t {displaystyle mathbf {w} _{t}} and w y {displaystyle mathbf {w} _{y}} the model parameters. Weston and Watkins provided a similar definition, but with a sum rather than a max: In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, φ the joint feature function, and Δ the Hamming loss: The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it. It is not differentiable, but has a subgradient with respect to model parameters w of a linear SVM with score function y = w ⋅ x {displaystyle y=mathbf {w} cdot mathbf {x} } that is given by

[ "Support vector machine", "Function (mathematics)" ]
Parent Topic
Child Topic
    No Parent Topic