Maximum likelihood

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.In other words, different parameter values θ correspond to different distributions within the model. If this condition did not hold, there would be some value θ1 such that θ0 and θ1 generate an identical distribution of the observable data. Then we would not be able to distinguish between these two parameters even with an infinite amount of data—these parameters would have been observationally equivalent. The identification condition establishes that the log-likelihood has a unique global maximum. Compactness implies that the likelihood cannot approach the maximum value arbitrarily close at some other point (as demonstrated for example in the picture on the right). In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference. If the likelihood function is differentiable, the derivative test for determining maxima can be applied. In some cases, the first-order conditions of the likelihood function can be solved explicitly; for instance, the ordinary least squares estimator maximizes the likelihood of the linear regression model. Under most circumstances, however, numerical methods will be necessary to find the maximum of the likelihood function. From the point of view of Bayesian inference, MLE is a special case of maximum a posteriori estimation (MAP) that assumes a uniform prior distribution of the parameters. In frequentist inference, MLE is a special case of an extremum estimator, with the objective function being the likelihood. From a statistical standpoint, the observations y = ( y 1 , y 2 , … , y n ) {displaystyle mathbf {y} =(y_{1},y_{2},ldots ,y_{n})} are a random sample from an unknown population. The goal is to make inferences about the population that is most likely to have generated the sample, specifically the probability distribution corresponding to the population. Associated with each probability distribution is a unique vector θ = [ θ 1 , θ 2 , … , θ k ] T {displaystyle heta =left^{mathsf {T}}} of parameters that index the probability distribution within a parametric family { f ( ⋅ ; θ ) ∣ θ ∈ Θ } {displaystyle {f(cdot ,; heta )mid heta in Theta }} . As θ {displaystyle heta } changes in value, different probability distributions are generated. The idea of maximum likelihood is to re-express the joint probability of the sample data f ( y 1 , y 2 , … , y n ; θ ) {displaystyle f(y_{1},y_{2},ldots ,y_{n}; heta )} as a likelihood function L ( θ ; y ) {displaystyle {mathcal {L}}( heta ,;mathbf {y} )} that treats θ {displaystyle heta } as a variable. For independent and identically distributed random variables, the likelihood function is defined as and evaluated at the observed data sample. The goal is then to find the values of the model parameter that maximize the likelihood function over the parameter space Θ {displaystyle Theta } . Intuitively, this selects the parameter values that make the observed data most probable. The problem is thus to find the supremum value of the likelihood function by choice of the parameter where the estimator θ ^ = θ ^ ( y ) {displaystyle {widehat { heta ,}}={widehat { heta ,}}(mathbf {y} )} is function of the sample. A sufficient but not necessary condition for its existence is for the likelihood function to be continuous over a parameter space Θ {displaystyle Theta } that is compact. For an open Θ {displaystyle Theta } the likelihood function may increase without ever reaching a supremum value. In practice, it is often convenient to work with the natural logarithm of the likelihood function, called the log-likelihood: Since the logarithm is a monotonic function, the maximum of ℓ ( θ ; y ) {displaystyle ell ( heta ,;mathbf {y} )} occurs at the same value of θ {displaystyle heta } as does the maximum of L {displaystyle {mathcal {L}}} . If ℓ ( θ ; y ) {displaystyle ell ( heta ,;mathbf {y} )} is differentiable in θ {displaystyle heta } , the necessary conditions for the occurrence of a maximum (or a minimum) are known as the likelihood equations. For some models, these equations can be explicitly solved for θ ^ {displaystyle {widehat { heta ,}}} , but in general no closed-form solution to the maximization problem is known or available, and an MLE can only be found via numerical optimization. Another problem is that in finite samples, there may exist multiple roots for the likelihood equations. Whether the identified root θ ^ {displaystyle {widehat { heta ,}}} of the likelihood equations is indeed a (local) maximum depends on whether the matrix of second-order partial and cross-partial derivatives,

Parent Topic

Child Topic

No Parent Topic