Review 11: The Bandwagon

The Bandwagon by Claude E. Shannon

Citation: Shannon, Claude E. “The bandwagon.” IRE transactions on Information Theory 2.1 (1956): 3.
In the intersection of probability, statistics, and engineering, information theory has become applied ubiquitously throughout a wide range of research. In the 1940s, Dr. Claude Shannon conducted some of the most fundamental contributions to this field and has had the fascinating perspective of experiencing the adoption of these computational methods by computer scientists, physicists, and various engineers. Here he makes some disclaimers about the impact of information theory.
This perspective is perpendicular to the linear thinking that creators have around their algorithms nowadays that goes like this:
1. Design algorithm.
2. Show that it is better than methods $X$, $Y$, and $Z$.
  - Advertise
3. Have lots of other researchers and industries use it.
  - clone and star on github. Community develops around tool or algo.
4. Make some $, but also get influence and accolades, which also translate to $ / other generally desired things.
I’m not saying that this is always the case or that it’s even a bad thing; Simply that there is a regression towards this trend. All we need to do is look at thing like ImageNet, AlexNet, GPT3, PyTorch, Tensorflow and now Jax. All of these things are awesome! There has been so much growth and development of useful things are these technologies. Taking this example further, cryptocurrency, WEB3, and NFTs all also enjoy this kind of advertising by its contributors.
Conversely, Dr. Claude Shannon makes a case against widespread adoption in this article. He hits back at the idea that the volume of papers published doesn’t represent the progress in the field. In lieu, he advocates for well thought out and carefully experimented research. This kind of suggestion, along with the description as careful experimentation as tedious and slow is surprisingly honest. Simply using the tool from information theory and applying the word “mutual information” or “entropy” doesn’t give the experimental community very much. Shannon is arguing that researchers do not use his methods as a black box to put in A and get some claim about B. The same sentiment is echoed by Judea Pearl on causality. Concepts in this realm of computational explainability are particularly tedious because of the following metaphor. Code in python can be bound to C and C++ and in the same way, probability, math, and English can be smashed together to make a claim or statement. However, the simplicity of the answer often depends on the question. Certain principles of math or probability aren’t enough to characterize all kinds of wild questions a researcher can ask. I think this is the big warning that Shannon has here. Something along the lines of: “I made some hammers, but don’t go using them to dig a hole or write a good book”. The hammer analogy obfuscates the fact that the methods of information theory are not a black box. They can be tinkered with in various ways and with the right parameterization and use case, they uncover causal relationships that we doing have any other tools to understand intuitively. And speaking of intuitive understanding of information theory… if you are interested, maybe check out this twitter thread that helps me.