Mean opinion score

Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual 'values on a predefined scale that a subject assigns to his opinion of the performance of a system quality'. Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated.it is not meaningful to directly compare MOS values produced from separate experiments, unless those experiments were explicitly designed to be compared, and even then the data should be statistically analysed to ensure that such a comparison is valid. Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual 'values on a predefined scale that a subject assigns to his opinion of the performance of a system quality'. Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated. MOS is a commonly used measure for video, audio, and audiovisual quality evaluation, but not restricted to those modalities. ITU-T has defined several ways of referring to a MOS in Recommendation P.800.1, depending on whether the score was obtained from audiovisual, conversational, listening, talking, or video quality tests. The MOS is expressed as a single rational number, typically in the range 1–5, where 1 is lowest perceived quality, and 5 is the highest perceived quality. Other MOS ranges are also possible, depending on the rating scale that has been used in the underlying test. The Absolute Category Rating scale is very commonly used, which maps ratings between Bad and Excellent to numbers between 1 and 5, as seen in below table. Other standardized quality rating scales exist in ITU-T recommendations (such as P.800 or P.910). For example, one could use a continuous scale ranging between 1–100. Which scale is used depends on the purpose of the test. In certain contexts there are no statistically significant differences between ratings for the same stimuli when they are obtained using different scales. The MOS is calculated as the arithmetic mean over single ratings performed by human subjects for a given stimulus in a subjective quality evaluation test. Thus: Where R {displaystyle R} are the individual ratings for a given stimulus by N {displaystyle N} subjects. The MOS is subject to certain mathematical properties and biases. In general, there is an ongoing debate on the usefulness of the MOS to quantify Quality of Experience in a single scalar value. When the MOS is acquired using a categorical rating scales, it is based on – similar to Likert scales – an ordinal scale. In this case, the ranking of the scale items is known, but their interval is not. Therefore, it is mathematically incorrect to calculate a mean over individual ratings in order to obtain the central tendency; the median should be used instead. However, in practice and in the definition of MOS, it is considered acceptable to calculate the arithmetic mean. It has been shown that for categorical rating scales (such as ACR), the individual items are not perceived equidistant by subjects. For example, there may be a larger 'gap' between Good and Fair than there is between Good and Excellent. The perceived distance may also depend on the language into which the scale is translated. However, there exist studies that could not prove a significant impact of scale translation on the obtained results.

Parent Topic

Child Topic

No Parent Topic