Review 19: Let us never speak of these values again.

This blog covers the utiliy of statistical signifigance measures like effect size and p value.
The range of success of claims is predicated off of a ratio of effect size to standard error.
Exposes that fact that signifigance is weighted by the spread of probability denisty. Meaning that if there is a slightly favorable outcome with a small PDF and one with a more favorable average outcome but larger PDF, the former may be favored by the p-value.
Simple approximations can distort this p-value. Esp. when it comes to averageing across groups.

Most practicing scientists would be better off not knowing what a p-value is.

This leads to the philosophical problem here: how can we really trust the effect size is valid. This is made challenging by varying levels of validity:
- study design is valid?
- hypothesis testing is valid?
- claims are valid?
I think the third bullet is especially hard. How can you best gurantee performance on an unsen input/output?