Discriminative model

Discriminative models, also referred to as conditional models, are a class of models used in statistical classification, especially in supervised machine learning. A discriminative classifier tries to model by just depending on the observed data while learning how to do the classification from the given statistics. f ( x , w ) = arg ⁡ m a x y w T ϕ ( x , y ) {displaystyle f(x,w)=arg max_{y}w^{T}phi (x,y)} P ( y | x ; w ) = 1 Z ( x ; w ) exp ⁡ ( w T ϕ ( x , y ) ) {displaystyle P(y|x;w)={frac {1}{Z(x;w)}}exp(w^{T}phi (x,y))} , with Z ( x ; w ) = ∑ y exp ⁡ ( w T ϕ ( x , y ) ) {displaystyle Z(x;w)= extstyle sum _{y}displaystyle exp(w^{T}phi (x,y))} L ( w ) = ∑ i log ⁡ p ( y i | x i ; w ) {displaystyle L(w)= extstyle sum _{i}displaystyle log p(y^{i}|x^{i};w)} l log ( x i , y i , c ( x i ; w ) ) = − log ⁡ p ( y i | x i ; w ) = log ⁡ Z ( x i ; w ) − w T ϕ ( x i , y i ) {displaystyle l^{log }(x^{i},y^{i},c(x^{i};w))=-log p(y^{i}|x^{i};w)=log Z(x^{i};w)-w^{T}phi (x^{i},y^{i})} ∂ L ( w ) ∂ w = ∑ i ϕ ( x i , y i ) − E p ( y | x i ; w ) ϕ ( x i , y ) {displaystyle {frac {partial L(w)}{partial w}}= extstyle sum _{i}displaystyle phi (x^{i},y^{i})-E_{p(y|x^{i};w)}phi (x^{i},y)} l h i n g e ( x i , y i , c ( x i , y i , w ) ) = m a x y ( w T ϕ ( x i + y ) + l 0 / 1 ( x i , y i , c ( x i ; w ) ) ) − w T ϕ ( x i , y i ) . {displaystyle l^{hinge}(x^{i},y^{i},c(x^{i},y^{i},w))=max_{y}(w^{T}phi (x^{i}+y)+l^{0/1}(x^{i},y^{i},c(x^{i};w)))-w^{T}phi (x^{i},y^{i}).} Discriminative models, also referred to as conditional models, are a class of models used in statistical classification, especially in supervised machine learning. A discriminative classifier tries to model by just depending on the observed data while learning how to do the classification from the given statistics. The approaches used in supervised learning can be categorized into discriminative models or generative models. Comparing with the generative models, discriminative model makes fewer assumptions on the distributions but depends heavily on the quality of the data. For example, given a set of labeled pictures of dog and rabbit, discriminative models will be matching a new, unlabeled picture to a most similar labeled picture and then give out the label class, a dog or a rabbit. However, generative will develop a model which should be able to output a class label to the unlabeled picture from the assumption they made, like all rabbits have red eyes. The typical discriminative learning approaches include Logistic Regression (LR), Support Vector Machine (SVM), conditional random fields (CRFs) (specified over an undirected graph), and others. The typical generative model approaches contain Naive Bayes, Gaussian Mixture Model, and others. Unlike the generative modelling, which studies from the joint probability P ( x , y ) {displaystyle P(x,y)} , the discriminative modeling studies the P ( y | x ) {displaystyle P(y|x)} or the direct maps the given unobserved variable (target) x {displaystyle x} a class label y {displaystyle y} depended on the observed variables (training samples). For example, in object recognition, x {displaystyle x} is likely to be a vector of raw pixels (or features extracted from the raw pixels of the image). Within a probabilistic framework, this is done by modeling the conditional probability distribution P ( y | x ) {displaystyle P(y|x)} , which can be used for predicting y {displaystyle y} from x {displaystyle x} . Note that there is still distinction between the conditional model and the discriminative model, though more often they are simply categorised as discriminative model. Conditional Model models the conditional probability distribution while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples. The following approach is based on the assumption that it is given the training data-set D = { ( x i ; y i ) | i ≤ N ∈ Z } {displaystyle D={(x_{i};y_{i})|ileq Nin mathbb {Z} }} , where y i {displaystyle y_{i}} is the corresponding output for the input x i {displaystyle x_{i}} . We intend to use the function f ( x ) {displaystyle f(x)} to simulate the behavior of what we observed from the training data-set by the Linear Classifier method. Using the joint feature vector ϕ ( x , y ) {displaystyle phi (x,y)} , the decision function is defined as: According to Memisevic's interpretation, w T ϕ ( x , y ) {displaystyle w^{T}phi (x,y)} , which is also c ( x , y ; w ) {displaystyle c(x,y;w)} , computes a score which measures the computability of the input x {displaystyle x} with the potential output y {displaystyle y} . Then the arg ⁡ m a x {displaystyle arg max} determines the class with the highest score. Since the 0-1 loss function is a commonly used one in the decision theory, the conditional probability distribution P ( y | x ; w ) {displaystyle P(y|x;w)} , where w {displaystyle w} is a parameter vector for optimizing the training data, could be reconsidered as following for the logistics regression model:

Parent Topic

Child Topic

No Parent Topic