When discussing data quality issues, we frequently address issues where inconsistent data can lead to analysis problems.
The 2012 Olympics present an interesting variant where the input data is the same, but leads to a different answer…
When looking at the different positions of the countries competing in the Olympics, Britain and most of the world use a system whereby the number of Gold medals is the prime decider of rank, followed by Silver and then Bronze. On the 5th August, this leads to the following top 10 countries:
Rank | Country | Gold | Silver | Bronze | Total |
1 | China | 29 | 16 | 14 | 59 |
2 | United States | 27 | 14 | 15 | 56 |
3 | Britain | 16 | 10 | 10 | 36 |
4 | South Korea | 10 | 4 | 6 | 20 |
5 | France | 8 | 7 | 9 | 24 |
6 | Germany | 5 | 10 | 7 | 22 |
7 | Italy | 5 | 5 | 3 | 13 |
8 | Kazakhstan | 5 | 0 | 0 | 5 |
9 | North Korea | 4 | 0 | 1 | 5 |
10 | Russia | 3 | 16 | 15 | 34 |
However, for some reason, the US media have used a different system to the rest of the world where rank is based on the total number of medals and then the numbers of Gold, Silver and Bronze. This leads to the following result:
Rank | Country | Gold | Silver | Bronze | Total |
1 | China | 29 | 16 | 14 | 59 |
2 | United States | 27 | 14 | 15 | 56 |
3 | Britain | 16 | 10 | 10 | 36 |
4 | Russia | 3 | 16 | 15 | 34 |
5 | Japan | 2 | 11 | 12 | 25 |
6 | France | 8 | 7 | 9 | 24 |
7 | Germany | 5 | 10 | 7 | 22 |
8 | South Korea | 10 | 4 | 6 | 20 |
9 | Australia | 1 | 12 | 7 | 20 |
10 | Italy | 5 | 5 | 3 | 13 |
So what are the effects of the different approach?
The rank of the top 3 countries are unchanged, but the US old Olympic rivals Russia are 4th rather than 10th, however the two Koreas are pushed down the rankings. Australia also makes an appearance in the US version of the top 10.
So, same input data, very different output.
What other similar examples are you aware of?