Recent quotes:

An Intuitive Explanation of Bayes's Theorem - LessWrong

The most effective presentation found so far is what’s known as natural frequencies—saying that 40 out of 100 eggs contain pearls, 12 out of 40 eggs containing pearls are painted blue, and 6 out of 60 eggs containing nothing are painted blue. A natural frequencies presentation is one in which the information about the prior probability is included in presenting the conditional probabilities. If you were just learning about the eggs’ conditional probabilities through natural experimentation, you would—in the course of cracking open a hundred eggs—crack open around 40 eggs containing pearls, of which 12 eggs would be painted blue, while cracking open 60 eggs containing nothing, of which about 6 would be painted blue. In the course of learning the conditional probabilities, you’d see examples of blue eggs containing pearls about twice as often as you saw examples of blue eggs containing nothing.

Why Doctors Are Bad At Stats — And How That Could Affect Your Health

Gerd and his team have explored whether medical professionals understand the statistics measures actually needed to prove that a cancer screening programme saves lives. This is a classic problem in health statistics. What clinicians need to compare is mortality rates, not 5-year survival rates. The mortality rate tells the number of deaths in a period of time. In contrast, the 5-year survival rate only tells how many people live 5 years after the day they have been diagnosed with cancer. Some screening programmes can diagnose people earlier — which can increase those ‘5-year survival rates’ — without making them live any longer.

Concerns with that Stanford study of coronavirus prevalence « Statistical Modeling, Causal Inference, and Social Science

The authors of this article put in a lot of work because they are concerned about public health and want to contribute to useful decision making. The study got attention and credibility in part because of the reputation of Stanford. Fair enough: Stanford’s a great institution. Amazing things are done at Stanford. But Stanford has also paid a small price for publicizing this work, because people will remember that “the Stanford study” was hyped but it had issues. So there is a cost here. The next study out of Stanford will have a little less of that credibility bank to borrow from. If I were a Stanford professor, I’d be kind of annoyed. So I think the authors of the study owe an apology not just to us, but to Stanford. Not to single out Stanford, though. There’s also Cornell, which is known as that place with the ESP professor and that goofy soup-bowl guy who faked his data. And I teach at Columbia; our most famous professor is . . . Dr. Oz.

A brief history of medical statistics and its impact on reproducibility.

By failing to address the structurally driven deficits in scientific and statistical thinking already apparent decades earlier, and then amplifying them with professional incentives to produce more and more research (grossly mismeasured by papers published), medical research had become a scandal. Altman concluded that “We need less research, better research, and research done for the right reasons.” This is, in my opinion, the most important sentence ever written about medical research, but it fell on deaf ears. How do I know this? A decade or so later, the Lancet published a special issue on research waste, which made the case that medical research costing billions of dollars each year is wasted due to things like poor research questions, flawed study designs, erroneous statistical analyses, and clumsy research reports (15).

Some t-tests for N-of-1 trials with serial correlation

This work develops a formula-based statistical method for N-of-1 studies that accounts for serial correlation while using only the data from a single individual to draw inferences. Most existing methods emerged with increases in computing power. These methods typically provide inference on two types of differences between two treatments: level- and rate-change. Level-change is when the difference in means is not dependent on the time series of the treatments, whereas rate-change is when the difference in means is dependent on the time series of the treatments. Rochon (1990) describes a large-sample, maximum likelihood method that evaluates both level- and rate-change, but no closed-form estimator exists [8]. Hence, an iterative procedure produces the estimates. McKnight et al. (2000) developed a double-bootstrap method for making inference on level- and rate-change [9]. Their first bootstrap estimates serial correlation; the second uses the estimated correlation to compare two treatments. They provide statistical properties for their method, and they focus on trials having as few as 20 or 30 observations. Borckardt and company describe statistical properties of the Simulation Modelling Analysis for N-of-1 trials, and consider trials having between 16 and 28 observations from an individual [10, 11]. Simulation Modelling Analysis is similar to a parametric bootstrap method, with the bootstrap method generating replicates under the null hypothesis. Empirical p-values for level- and rate-change result. Lin et al. (2016) propose semiparametric and parametric bootstrap methods (only one bootstrap needed) for evaluating level- and rate-change [12]. They explore the statistical properties of their method for trials having 28 observations. Other N-of-1 methods exist, but the methods described here are the only ones we could find that use only the observations from a single individual and account for serial correlation.

Bayes' theorem

Despite the apparent accuracy of the test, if an individual tests positive, it is more likely that they do not use the drug than that they do. This surprising result arises because the number of non-users is very large compared to the number of users; thus, the number of false positives outweighs the number of true positives. To use concrete numbers, if 1000 individuals are tested, there are expected to be 995 non-users and 5 users. From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the 5 users, 0.99 × 5 ≈ 5 true positives are expected. Out of 15 positive results, only 5, about 33%, are genuine. This illustrates the importance of base rates, and how the formation of policy can be egregiously misguided if base rates are neglected.[15] The importance of specificity in this example can be seen by calculating that even if sensitivity is raised to 100% and specificity remains at 99% then the probability of the person being a drug user only rises from 33.2% to 33.4%, but if the sensitivity is held at 99% and the specificity is increased to 99.5% then probability of the person being a drug user rises to about 49.9%.