Index of Coincidence

The index of coincidence is a number that can help cryptanalysts guess the cryptographic system in use in a ciphertext. This page explains what the index of coincidence is and how to use it for cryptanalysis.

Please use the Index of Coincidence calculator if you want to find out the index of coincidence of a given text.

What is Index of Coincidence?

If you choose a letter from the English alphabet at random, then choose again at random, you have a 1 in 26 probability (0.0385) that you chose the same letter both times.

If you choose a letter from English plaintext at random, then choose again at random, you have approximately a 2 out of 30 probability (0.0667) that you chose the same letter both times. This probability has been determined through frequency studies.

If you calculate the probability of coincidences for a text, then calculate the ratio of that probability to the probability of random coincidences in the English language, this is the index of coincidence.

The Index of Coincidence for English Plaintext

The typical index of coincidence for English plaintext, therefore, is calculated by dividing the plaintext probability ( 0.0667 as mentioned above) by the random probability ( 0.0385 as mentioned above). The resulting number is
0.0667 / 0.0385 = 1.73
The index of coincidence of an English plaintext message is usually between 1.50 and 2.00. The larger the message, the closer it should be to 1.73.

How to Calculate the Index of Coincidence of a Given Text: The Monographic Phi Test

A typical way to calculate the Index of Coincidence is the Monographic Phi Test. It is called Monographic because it deals with one letter at a time. The greek letter φ is used to represent the number of coincidences in a text. The numbers above may sometimes be referred to as φr for probabilities of coincidence in a random text (0.0385) and φp for probabilities of coincidence in English plaintext (0.0667).

The first step to calculating the index of coincidence of a given text is to count how many times each letter occurs. For example:

RIP VAN WINKLE
R = 1
I = 2
P = 1
V = 1
A = 1
N = 2
W = 1
K = 1
L = 1
E = 1

Next, for each frequency above, which I'll refer to as f, calculate f * (f - 1). In other words:

RIP VAN WINKLE
R = 1 → 1 * 0 = 0
I = 2 → 2 * 1 = 2
P = 1 → 1 * 0 = 0
V = 1 → 1 * 0 = 0
A = 1 → 1 * 0 = 0
N = 2 → 2 * 1 = 2
W = 1 → 1 * 0 = 0
K = 1 → 1 * 0 = 0
L = 1 → 1 * 0 = 0
E = 1 → 1 * 0 = 0

Next, add up the resulting numbers from the previous step. In this example: 2 + 2 = 4. Call this number φo (where O stands for observed coincidences).

Next, count how many total letters are in the text. I'll call this number N. For example:

RIP VAN WINKLE → 12 letters

Next, multiply the probability of coincidences in a random text (0.0385 as mentioned above) by
N * (N - 1)
where N is the total letter count from above. In this example: 0.0385 * 12 * 11 = 5.077. This is φr, where r stands for random coincidences.

Finally, divide the φo from the earlier step by the φr from the above step. In this example: 4 / 5.077 = 0.788. The index of coincidence for the text "RIP VAN WINKLE" is 0.788.

Interpreting Index of Coincidence

As mentioned above, the index of coincidence of an English plaintext message is usually between 1.50 and 2.00 if the message consists of 50 or more letters. The larger the message, the closer it should be to 1.73.

The monographic phi test is not very reliable for texts that are 50 letters or shorter in length, such as the example on this page.

If a text is composed of random letters, its index of coincidence is likely to be around 1.00, typically falling between 0.75 and 1.25 if the text consists of 50 or more letters. The longer the text, the closer it will be to 1.00.

If ciphertext was encrypted with a polyalphabetic system such as the Vigenère cipher, its IC will resemble that of random text, and will therefore be around 1.00.