Many patients read "sciency" sounding posts on the internet and youtube videos. They follow poor advice, thinking it is scientific. I compiled this checklist for the patient who is wondering if what he saw is science or fake news. Often, it looks like science because there are a lot of footnotes. I adapted a document I saw on Twitter. I have added to it and added some explanation below:
Some Characteristics of Pseudoscience
1. Is UNFALSIFIABLE (can’t be proven wrong); makes vague or unfalsifiable claims.
2. Relies heavily on ANECDOTES, personal experiences, testimonials, “professional” opinions, and preclinical (test tube or animal) studies. IGNORES “LEVELS OF EVIDENCE,” and GRADE given by professional consensus.
3. CHERRY PICKS confirming evidence while ignoring/minimizing disconfirming (especially higher level) evidence.
4. Uses TECHNOBABBLE: Words that sound scientific but don’t make sense.
5. Lacks PLAUSIBLE MECHANISM: No way to explain it based on existing knowledge, or deficient evidence for the proposed mechanism.
6. Is UNCHANGING: doesn’t self-correct or progress.
7. Makes EXTRAORDINARY/EXAGGERATED CLAIMS with insufficient clinical evidence.
8. Professes CERTAINTY; talks of “proof” with great confidence. Ignores statistical confidence intervals and power.
9. Commits LOGICAL FALLACIES: Arguments contain errors in reasoning.
10. Lacks PEER REVIEW: Goes directly to the public (e.g. YOUTUBE videos, blogs, direct-to-patient presentations only), avoiding scientific scrutiny.
11. Claims there is a CONSPIRACY (e.g., Big Pharma/FDA conspiracy) to suppress their ideas.
12. OVERSIMPLIFIES biochemistry (e.g. alkaline water, reducing sugar intake, antioxidants or anti-inflammatories will slow cancer)
13. Ignores INTERACTIONS with other substances, bioavailability, biochemical feedback effects, microbiome, substance purity, or adulteration
14. Claims “causation” when only “ASSOCIATION” has been demonstrated. (See the Bradford-Hill checklist)
15. LACK OF DISCUSSION of potential biases, missing confounding variables, effects that may have changed over time and/or with improved technology.
16. INAPPROPRIATE STATISTICS AND RESEARCH METHODS. Non-valid endpoints or subset conclusions, lack of pre-announced endpoint and subsets, lack of power to detect endpoint within sample size and timeframe, poor choice of surrogate endpoint or subsets, "p hacking," biases in retrospective studies.
17. Failure to disclose CONFLICTS OF INTEREST or sponsors.
#15 and #16 require some explanation:
Surrogate endpoints: Ideally, we would have long-term follow-up until death ("overall survival") for every trial. This is impractical, particularly for prostate cancer that has a very long natural history. ICECAP has identified "metastasis-free survival" as an appropriate surrogate for overall survival in trials involving men with localized prostate cancer. The appearance of metastases has been suggested as appropriate for men with recurrent PCa but requires validation. Biochemical recurrence-free survival is only useful for predicting the success of a therapy for localized prostate cancer. PSA doubling time is definitely inappropriate without a control group. Radiographic progression-free survival seems to be a good surrogate endpoint in men who are metastatic and castration-resistant (see this link and this one). If the pattern holds, PSA-based endpoints are inadequate (see this link and this one) and only metastasis-based endpoints are adequate. Typically, trials are only powered (have enough sample size) to reliably detect differences in their primary endpoint.
Subset conclusions: Because there is only enough sample size to reliably detect differences in the primary endpoint, subset analysis is suspect. Using subset analysis, Spears et al. showed that men diagnosed on Mondays did not benefit from abiraterone - a ridiculous conclusion. They also showed that men diagnosed with metastases (M1) benefited while men diagnosed without metastases (M0) did not. Both conclusions are inappropriate. In the case of men without metastases, there were only 34 deaths among the 460 patients in the treatment group and 44 deaths among the 455 patients in the treatment group - not enough to prove a statistically significant effect with 95% confidence. However, with time, there may be enough deaths to achieve a statistically significant effect, so we have to be cautious about labeling it as ineffective in the M0 subgroup.
"P hacking" or "data-dredging"/positive results bias occurs when researchers do not announce before the study begins exactly which subgroups or variables will be looked at and which measures will be used to judge success or failure. They are going on a fishing expedition to find at least some variable or subgroup with statistically significant results. Because of random probabilities, if there are enough variables there will almost always be some that have statistically significant outcomes, like the "Monday diagnosis" subgroup above. Starting in 2000, all peer-reviewed journals required researchers to state upfront what they would be looking for. This made a large change in the number of positive results reported (see this link). Journals often would not print negative findings. In 2017, NIH and the FDA required the sponsors of all clinical trials listed in clinicaltrials.gov to provide results whether positive or negative. Policing and compliance are spotty.
Biases in retrospective studies and database analyses: Common biases are selection bias, ascertainment bias, lead-time bias, length bias, survivorship bias, confounding by unmeasured variables, and others.