language-icon Old Web
English
Sign In

Frequency analysis

In cryptanalysis, frequency analysis (also known as counting letters) is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers. In cryptanalysis, frequency analysis (also known as counting letters) is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers. Frequency analysis is based on the fact that, in any given stretch of written language, certain letters and combinations of letters occur with varying frequencies. Moreover, there is a characteristic distribution of letters that is roughly the same for almost all samples of that language. For instance, given a section of English language, E, T, A and O are the most common, while Z, Q and X are rare. Likewise, TH, ER, ON, and AN are the most common pairs of letters (termed bigrams or digraphs), and SS, EE, TT, and FF are the most common repeats. The nonsense phrase 'ETAOIN SHRDLU' represents the 12 most frequent letters in typical English language text. In some ciphers, such properties of the natural language plaintext are preserved in the ciphertext, and these patterns have the potential to be exploited in a ciphertext-only attack. In a simple substitution cipher, each letter of the plaintext is replaced with another, and any particular letter in the plaintext will always be transformed into the same letter in the ciphertext. For instance, if all occurrences of the letter e turn into the letter X, a ciphertext message containing numerous instances of the letter X would suggest to a cryptanalyst that X represents e. The basic use of frequency analysis is to first count the frequency of ciphertext letters and then associate guessed plaintext letters with them. More Xs in the ciphertext than anything else suggests that X corresponds to e in the plaintext, but this is not certain; t and a are also very common in English, so X might be either of them also. It is unlikely to be a plaintext z or q which are less common. Thus the cryptanalyst may need to try several combinations of mappings between ciphertext and plaintext letters. More complex use of statistics can be conceived, such as considering counts of pairs of letters (bigrams), triplets (trigrams), and so on. This is done to provide more information to the cryptanalyst, for instance, Q and U nearly always occur together in that order in English, even though Q itself is rare. Suppose Eve has intercepted the cryptogram below, and it is known to be encrypted using a simple substitution cipher as follows: For this example, uppercase letters are used to denote ciphertext, lowercase letters are used to denote plaintext (or guesses at such), and X~t is used to express a guess that ciphertext letter X represents the plaintext letter t. Eve could use frequency analysis to help solve the message along the following lines: counts of the letters in the cryptogram show that I is the most common single letter, XL most common bigram, and XLI is the most common trigram. e is the most common letter in the English language, th is the most common bigram, and the is the most common trigram. This strongly suggests that X~t, L~h and I~e. The second most common letter in the cryptogram is E; since the first and second most frequent letters in the English language, e and t are accounted for, Eve guesses that E~a, the third most frequent letter. Tentatively making these assumptions, the following partial decrypted message is obtained.

[ "Algorithm", "Statistics", "Acoustics", "Rapidly oscillating Ap star" ]
Parent Topic
Child Topic
    No Parent Topic