Benford’s law also known as the “first digit law” or the “law of anomalous numbers” owns the name to Frank Benford, a physicist at the General Electric company who made public this law by 1938. However, this is again one of those scientific injustices in which an earlier work is not properly recognized because we humans have very short memories. In fact, Bedford’s law should properly be named “the Newcomb-Benford law” as it was first discovered by Simon Newcomb in 1881.
Newcomb was indeed, much greater scientist than Benford. Despite receiving a very unorthodox education, he was mostly an autodidact that manage to achieve a lot of success in astronomy and mathematics. Notably, when he died he was buried at Arlington Cemetery with military honours and on his funeral a president of the United States was present.
But how was this law discovered? It all came from an observation made by Simon Newcomb as he was working with logarithm tables. He noticed that the books containing those tables were more worn out at the beginning than at the end. More precisely, that the pages of logarithms containing numbers that start with the digit “1” were dirtier and more worn out than other pages. This observation led him to formulate the hypothesis that maybe the numbers starting with the digit “1” were found more frequently than other numbers. Or as he put it “in any list of numbers taken from an arbitrary set of data, more numbers will tend to begin with “1” than with any other digit.”
Going beyond this irregularity for the digit “1” we would like to express, in general, the probability of finding any digit at the beginning of the numbers found in any data set. If we are to believe that the Newcomb-Benford law applies for an arbitrary dataset then the mathematical expression to quantify the probability of finding any digit at the beginning those numbers can be given by a simple formula with the associated bar plot in which where we can see that we find digit “1” as leading digit approximately 30% of the time, we find digit “2” as leading digit 17.6% of the time, etc.
But, does this formula and this law applies to any type of data? Well as it happens the Newcomb-Benford law only applies to data that follows what is called a power-law distribution, for example, the distribution of earthquakes.
If we measured the frequency and intensity of certain phenomena, let’s say, the frequency and intensity of earthquakes; and we plot these two quantities in a cartesian plane, we obtain a distribution similar to the one described by the Newcomb-Benford law. This distribution follows closely our intuition that small earthquakes or shakes can be observed quite frequently while on the other side high intensity and destructive earthquakes are very uncommon.
Maybe a fast take away message from this short section is that: when analysing your data you should always pay close attention and study carefully its distribution. If your data follow a power-law distribution, it is also likely that you might find some other irregularities like the one described by the Newcomb-Stanford law.