1) The discussion on populations and replications fails to address sampling error within a population, what makes a population distinct, and most importantly, why any finding from a sample within a population is of importance to individuals outside that sample (i.e. why science uses inferential statistics in the first place)

2) The Karg study is a poor example of using meta-analytic techniques to test how the serotonin transporter gene moderates depression, as many of the ‘newly included’ studies used came from selected samples (i.e. the entire sample had heart disease) where no moderation was even tested. Furthermore, the example was mentioned in this post as meta-analyses of a main effect, whereas the meta-analyses were considering an interaction.

]]>I don’t really like to think in terms of “correcting” a p-value. Instead, let’s think instead of the probability that a variant is truly associated with a disease. This probability depends on the level of evidence from the study, as well as the prior probability of the association. All these sorts of “correction factors” go into the prior. Should the number of tests you do influence your prior? I would say no. Should previous studies influence your prior? Probably. On the other hand, you might want to be conservative and not use the previous studies.

]]>“This is a common fallacy, but a fallacy nonetheless. This is perhaps best illustrated by noting that the logical outcome of this line of reasoning is that, if you had genome-wide data but were only interested in a single gene, you should throw away the rest of the data before looking at it!”

Can I extrapolate the logic to another direction so that, in one genome-wide study (e.g. on height), to correct the p value, you need to take into account all previous published GWAS (e.g. on height) ?

]]>Berkson, J. (1938). Some difficulties of interpretation encountered in the application of the chisquare test. Journal of the American Statistical Association, 33(203), 526-536.

]]>http://www.tqmp.org/Content/vol03-2/p043/p043.pdf

“For instance, Cohen and Cohen

(1975) demonstrate that with a single predictor that in the

population correlates with the DV at .30, 124 participants are

needed to maintain 80% power. With five predictors and a

population correlation of .30, 187 participants would be

needed to achieve 80% power.”

Also:

Wilkinson, L., & Task Force on Statistical Inference, APA

Board of Scientific Affairs. (1999). Statistical methods in

psychology journals: Guidelines and explanations.

American Psychologist, 54, 594‐604.

Raw data being not available, we can still do reasonable inference from this: “… people with two G-copies came across better than their peers, regardless of gender. Of the ten most trusted listeners, six were double G-carriers, while nine of the ten least trusted listeners had at least one A-copy.”

The theory we should build is: what is the distribution of probabilities that a person having two copies of G is trusted (T), given the data above. Since we don’t know the number of G-people, we must average our estimate of this probability over all prior assumptions that the total number of G people is from 6 to 14 (9 people are A-type). Turns out P(T|G) is pretty wide. The mode of the distribution is around 0.5, but the mass is concentrated at the values greater than 0.5. In fact, Probability that P(T|G) is greater than 0.5 is 0.69. That’s all we know. Literally. I might have missed or goofed a few numbers, but I doubt we can extract more theory and more confidence from the data above.

We can build other theories, more or less detailed, but what we should test is the theory, not the null-theory. “Cult of statistical significance” is a good start.

]]>“So unless the authors were incredibly unlucky (or lucky!) or are biasing their results through other less than ethical practices, we can say that there is likely a real difference between the two groups in the general population.”

These statements are incorrect. They are examples of the p-value fallacy:

http://www.graphpad.com/faq/viewfaq.cfm?faq=1317

http://www.annals.org/content/130/12/995.abstract

http://en.wikipedia.org/wiki/P-value#Misunderstandings

“Calculation of a P value is predicated on the assumption that the null hypothesis is correct. P values cannot tell you whether this assumption is correct.”

]]>Cheers,

-Shane @shanemuk ]]>

Sadly, data collection from such a large number remains financially difficult for many labs, and pragmatically unrealistic for any truly complex designs.

Large sample sizes require more money. In the startup space, you can put out a flawed minimum viable product, get some cash, and do it right the second time. Kogan’s study is sort of the academic equivalent of this.

]]>