The HHS Report on Pediatric Gender Dysphoria - a summary (Part 2 of 4)
Continuing with the chapter-by-chapter summary of the HHS report
The other parts are here: Part 1, Part 3, Part 4.
In this post, I go over Part 2 (of four) of the HHS report, which reviews the evidence for PGM. And all the while, I am itching to go over Chapter 13, the chapter on ethics.
For people who have been studying the quality of research in this area of Pediatric Medical Transition—to use the technically accurate term adopted by this HHS report—much of this evidence (or their lack thereof) is known. Still, it is extremely useful to have all that put together in one place. For those who are interested in the details, there’s Appendix 4, a separate 174-page document in itself. With so many systematic reviews already available (the report found 17 eligible ones), some as recent as early 2025, there was no point in YASR (Yet Another Systematic Review), and so what the HHS report did was to conduct an umbrella review—an overview of the existing systematic reviews. The process and findings are summarized in Chapter 5.
Chapter 6 goes over an important detail that we often gloss over—the limitations of systematic reviews. Not only are the systematic reviews often unsatisfying (because they often find that there is nothing concrete that can be said, as the quality of the evidence is so low), but they also underestimate harms. Unlike movie cartoon villains, harm from medical interventions is often insidious: it creeps up upon us.
One reason for the insidiousness of harm is that when researchers set up their experiments to quantify the benefits of an intervention, they remain focused on those hypothesized benefits. For example, when 2 adolescents committed suicide within the first 12 months after starting hormone therapy in the much-publicized NIH-funded study published in the New England Journal of Medicine (these weren’t random kids in any random gender clinic; they were highly vetted participants in four of the most prominent gender clinics within the United States: the Children’s Hospital Los Angeles, Boston Children’s Hospital, Lurie Children’s Hospital of Chicago, and the Benioff Children’s Hospital San Francisco) at a rate that was much higher than gender-distressed kids at the NHS, the researchers did not stop their study. They continued undaunted, and when the purported benefits did not materialize, they disregarded their original study protocol, and their new hypotheses were cut out of whole cloth from any correlations that they could find from their dataset.
That is something that happens too often—researchers are human beings, after all—because our focus is on the benefits. Furthermore, harms are recorded later, sometimes years and even decades later—long after the study has ended and the researchers have moved on to greener pastures. And so, Chapter 6 goes over this issue and identifies the specific factors that amplify this issue in the area of pediatric gender medicine (PGM).
Continuing on that theme, Chapter 7 goes over the evidence from basic science, human physiology, development, and known drug mechanisms that must be considered to understand the plausible effects and risks of pediatric medical transition (PMT) interventions.
Finally, Chapter 8 summarizes the information from the previous three chapters: the lack of any good evidence of benefits and the less uncertain evidence of harm, especially if we consider elements of basic science and human physiology. The conclusion in the case of PGM is that the risk-to-benefit ratio is unfavorable for these medical interventions: an issue that is covered in much more detail in Chapter 13.
PART II: EVIDENCE REVIEW
Chapter 5: Overview of Systematic Reviews
This chapter presents the findings of an “umbrella review” (an overview of systematic reviews or SRs) conducted for the HHS report, assessing the evidence for interventions used in pediatric gender medicine (PGM). It is based on Appendix 4.
Methodology (5.1): Explains the EBM principle of relying on SRs. Defines SRs as methodical syntheses minimizing bias. Describes the overview methodology (following Cochrane Handbook): searching Medline, Embase, PsycINFO for SRs on social transition, PBs, CSH, surgery, psychotherapy for youth (<26 years); assessing SR quality using ROBIS tool; summarizing outcomes and GRADE certainty levels (High, Moderate, Low, Very Low) from low-risk-of-bias SRs. Note modifications: Interpreting phrases like “very uncertain“ as “very low quality“ and resolving GRADE disagreements de novo. It found 17 eligible SRs, 10 rated low risk of bias (though potentially still with limitations), and seven at high risk (serious flaws). Explains exclusion of 2020 NICE SRs (they were superseded by 2024 York SRs for Cass Review) and the Baker et al. (2021) SR (focused on adults, but assessed separately as high risk due to its influence on WPATH SOC-8). It includes a flow diagram (Figure 5.1) for the searching, screening, and inclusion process.
Outcomes of social transition (5.2): Summarizes findings from 2 low-risk SRs. Concludes that the impact on long-term GD, psychological outcomes, well-being, and future treatment is poorly understood (very low certainty evidence). Notes limitations: mostly cross-sectional studies, no prospective controlled studies, effects often conflated with other interventions.
Outcomes of puberty blockers (PBs) (5.3): Summarizes findings from 4 low-risk-of-bias SRs. Finds very low certainty evidence for effects on GD, mental health improvement, and safety. Finds high certainty evidence for physiological effects (hormone suppression) and infertility risk when followed by CSH. Finds low certainty evidence for compromised bone health. Notes high progression rate to CSH, but very low certainty about the causal role of PBs. Highlights evidence gaps: lack of focus on GD/mental health outcomes; no systematic study of discontinuation; focus on short-term/surrogate outcomes (ideation vs. suicide, BMD vs. fractures); limited data on long-term fertility, growth, neurocognitive effects; lack of data distinguishing effects by sex.
Outcomes of cross-sex hormones (CSH) (5.4): Summarizes findings from 4 low-risk SRs. Finds very low certainty evidence for effects on GD, mental health improvement, and safety (fertility, bone health). Finds high certainty evidence for physiological effects (inducing secondary sex characteristics). Highlights similar evidence gaps as PBs: focus on short-term/surrogate outcomes; inconsistent measurement of GD/mental health/QoL; infrequent assessment of sexual dysfunction; limited long-term cardiovascular data; sparse fertility evidence (reversibility unknown); inadequate investigation of compounded effects with PBs; lack of distinction between effects of estrogen vs. testosterone.
Outcomes of surgery (5.5): Summarizes findings from 2 low-risk SRs (mostly mastectomy). Finds high certainty evidence for predictable surgical complications (necrosis, scarring). Finds very low certainty evidence for effects on GD, mental health (suicidality, depression), and long-term outcomes (sexual function, QoL, regret). Highlights evidence gaps: mostly case series/small observational studies; inability to isolate surgery effects; inconsistent/unvalidated outcome measures; poor characterization of long-term outcomes (durability, revisions, satisfaction, regret); lack of data on other surgeries (e.g., genital).
Outcomes of psychotherapy (5.6): Summarizes findings from 2 low-risk SRs. Notes that there is very limited evidence due to conflation with “conversion therapy” and heterogeneity of interventions. Finds very low certainty evidence for mental health outcomes. Finds no evidence of harm reported in studies. Notes the lack of evidence on its effect on GD itself and limited research on treating comorbidities in this specific context, despite a robust evidence base for psychotherapy for these conditions more generally.
Discussion (5.7)
Findings (5.7.1): Reiterates the consistent pattern: interventions reliably produce expected physiological changes, but psychological/long-term health impacts remain highly uncertain (very low certainty evidence for benefits across interventions). [There has been a noticeable shift in recent times in the arguments among some proponents of these medical interventions. Some recent publications, appearing to lean more towards philosophical or advocacy-based arguments than empirical research, now suggest that an individual's autonomy and desire for GAMT (i.e., Gender-Affirming Medical Treatment, or what this review would call PMT) should be a more appropriate foundation for its provision. After so many systematic reviews, it seems that the proponents, too, are now resigned to the fact that there is a distinct lack of evidence for their provision.]
Sources of uncertainty (5.7.2): Attributes low/very low certainty primarily to lack of methodologically rigorous studies (no RCTs for PBs, lack of proper controls, small samples, imprecision, inconsistent findings). Emphasizes that well-conducted observational studies could improve evidence quality, but are lacking.
Robustness (5.7.3): Notes findings align with a German low-risk SR and earlier NICE reviews. Acknowledges even low-risk SRs have limitations (e.g., outdated searches). Contrasts with high-risk SRs (serious flaws). States that a targeted search of recent studies didn't change conclusions, as methodological problems persist in those studies. Questions whether the planned UK PB trial will adequately address key questions (long-term outcomes, combined PB/CSH pathway). Argues that new SRs are unlikely to be helpful without better primary research.
Limitations/Strengths (5.7.4): Notes limitations: SRs potentially affected by publication bias (significant in PGM); may miss harms not systematically reported; focus on population data limits mechanistic insight; overview scope limited to youth, doesn't reanalyze primary studies. Strengths: systematic quality assessment (ROBIS); comparison across SRs identifies consistency/gaps.
Conclusion (5.7.5): Synthesizes findings: benefits/harms of social transition unknown; PBs/CSH/surgery produce physical effects, but the psychological and/or long-term impacts remain highly uncertain.
Chapter 6: Limitations of Systematic Reviews
This chapter argues that while systematic reviews (SRs) show very low-quality evidence for the benefits of pediatric medical transition (PMT), they are likely underestimating the harms. It explains why SRs are generally better at detecting benefits than harms and identifies specific factors in pediatric gender medicine (PGM) that amplify this issue.
Factors Limiting Harm Detection:
Insufficient elapsed time (6.1): PMT only became widespread recently (late 2010s/early 2020s). Many potential harms (e.g., cardiovascular, cancer) may take decades to manifest, but current follow-up periods are short, with patients typically still young adults.
Short-term observational studies (6.2): The evidence base consists mainly of short-term, before-and-after observational studies prone to bias. These often focus on hypothesized benefits, inconsistently track/report harms, and suffer high loss to follow-up. SRs can only synthesize reported outcomes, potentially missing harms mentioned incidentally or not surveilled. An example given is the high rate (>90%) of progression from PBs to CSH, often mentioned only in text and missed by SRs, undermining the “pause button” rationale. Three specific study examples—all extremely well-cited in this area—illustrate poor harm reporting:
de Vries et al., 2011, 2014 (6.2.1): The seminal Dutch studies failed to report serious adverse events (diabetes/obesity disqualifying patients from surgery; a death likely related to early PBs) as formal outcomes, only mentioning them incidentally. These harms were then missed by SRs that excluded the studies due to poor design (because of the comingling of CSH/surgery phases).
Tordoff et al., 2023 (6.2.2): This influential Seattle study focused only on psychological outcomes, not physical harms. Claimed mental health improvements despite unchanged depression rates (60% pre/post). Reported large odds ratios based on a flawed comparison with an untreated group that dwindled from 35 to 7 patients (i.e., an 80% attrition rate!). Despite critiques highlighting misrepresentation, confounding, and selective reporting, the study is still cited as strong evidence for benefits.
Chen et al., 2023 (6.2.3): This NIH-funded study had a flawed design (short follow-up, no control group, reliance on biochemical markers, missing key safety outcomes like growth/fertility, small sample size). Despite reporting two suicides (a very high rate) and suicidal ideation as the most common adverse event during CSH treatment, the published paper framed findings positively, omitting pre-registered outcomes like GD, self-harm, and suicidality. Claimed depression improvements were questionable (no change in males). Critiques were published a year later, but the study is still cited as evidence of benefit. [I will point to an extensive critique of the last two studies here.]
Publication bias (6.3): Positive findings are more likely to be published quickly than inconclusive/negative ones. Examples in PGM include the multi-year delay in publishing unfavorable UK PB replication study results; a US investigator acknowledging delaying publication of unimpressive PB results; WPATH suppressing SRs on harms (cardiovascular, cancer, fertility risks); SRs reaching conclusions contradicting their own evidence (e.g., Baker et al. 2021 claiming likely benefit despite low/insufficient evidence); and the Bustos et al. (2021) review claiming very low regret (~1%) based on poor-quality studies with short follow-up and high loss-to-follow-up, ignoring critiques.
Summary (6.4): Concludes that while high-quality SRs find low certainty evidence for most harms, this likely reflects under-detection due to insufficient time, poor study design/reporting, and publication bias. SRs cannot easily detect harms not systematically sought or reported. Therefore, other evidence (basic science, physiology) must be considered to understand risks (see Chapter 7 below).
Chapter 7: Evidence from Basic Science and Physiology
This chapter argues that when clinical studies and SRs provide insufficient information about harms (as shown in Chapter 6), evidence from basic science, human physiology, development, and known drug mechanisms must be considered to understand the plausible effects and risks of pediatric medical transition (PMT) interventions.
Puberty (7.1): Defines puberty as the transition to sexual maturity/reproductive capability, driven by sex hormones (estrogen/progesterone in females, testosterone in males) via the HPG (Hypothalamic-Pituitary-Gonadal) axis. Outlines Tanner Staging for tracking physical progression (7.1.2) and the neuroendocrine regulation involving GnRH, LH, and FSH (7.1.3).
Puberty blockers and central precocious puberty (CPP) (7.2): Explains central precocious puberty or CPP (rare, early puberty activation) and the FDA-approved use of PBs (GnRHa) to treat it, aiming to delay puberty until a typical age and potentially increase final height. Notes that PBs are discontinued to allow normal puberty resumption.
Puberty blockers and gender dysphoria (7.3): Explains PBs are used off-label in PGM to arrest normally timed puberty by inducing hypogonadotropic hypogonadism (HH), suppressing sex hormone production. Notes that HH is linked to risks like infertility and low bone density. Discusses potential developmental risks:
Psychosocial (7.3.1): Contrasts with CPP use; in PMT, patients remain prepubertal while peers mature, potentially impacting psychosocial development.
Bone density (7.3.2): Puberty is critical for peak bone mass accrual (influenced by sex hormones). Multiple studies consistently show PBs decrease bone density Z-scores (compared to age/sex norms), potentially increasing long-term osteoporosis/fracture risk, especially hip fractures. Recovery after subsequent CSH is uncertain; a critical window may be missed.
Neurocognitive (7.3.3): Adolescent brain reorganization (pruning, myelination) is influenced by sex hormones, affecting executive function, emotion regulation, etc. Effects of PBs are understudied; some evidence suggests the potential detrimental impact on IQ, though one study found normal educational achievement. The Cass Review raised concerns about unknown effects on brain maturation.
Reproductive maturation (7.3.4): PBs halt gonad/gamete development. If followed by CSH (as it occurs >90% of the time), permanent infertility is likely, especially if started before full gamete maturation (Tanner Stage 4). Fertility preservation (FP) options are limited/experimental for early pubertal youth (testicular tissue cryopreservation for males; ovarian tissue for females has unproven viability post-PMT). FP utilization rates are very low (0-5%). Raises concerns about minors' capacity to consent to potential sterility.
Sexual dysfunction (7.3.5): Cites concerns (e.g., from surgeon Marci Bowers) that males starting PBs at Tanner Stage 2 may experience anorgasmia/lack of genital sensation, even after vaginoplasty. Notes a striking lack of research on sexual function outcomes post-PMT.
[Whenever it comes to PBs, I am reminded of the letter to the editor of the New York Times by Dr. Marc Garnick, one of the three academic principal clinical investigators of studies that led to the initial F.D.A. approval of Lupron for the treatment of metastatic prostate cancer: “As one of three academic principal clinical investigators of studies that led to the initial F.D.A. approval of Lupron for the treatment of metastatic prostate cancer—and having studied this class of drugs, which includes puberty blockers, for more than four decades—I can say that physicians are still learning and continue to be concerned about the safety of these agents in adults. Woefully little safety data is available for the likely more vulnerable younger population. Bone loss in adult men who have been on these agents is significant and a leading cause of morbidity with long-term administration. Other safety issues include cognitive, metabolic, and cardiovascular effects, still under intense investigation. The prudent and ethical use of such agents in the younger population should demand that every pubertal or pre-pubertal child be part of rigorous clinical research studies that evaluate both the short-term and longer-term effects of these agents to better define the true risks and benefits, rather than relying on anecdotal information.”]
Cross-sex hormones (CSH) and gender dysphoria (7.4): Explains CSH (estrogen for males, testosterone for females, often with blockers for males) induce secondary sex characteristics of the other sex, not actual opposite-sex puberty (reproductive capacity is hindered, not enabled) (7.4.1). Notes ES/WPATH guidelines aim for hormone levels typical of the identified gender, resulting in supraphysiologic levels relative to natal sex (hyperandrogenism in females, hyperestrogenemia in males) (7.4.2-7.4.4). Discusses the risks of CSH:
Reproductive system effects: Testosterone causes pathological changes in female organs (PCOS-like ovaries, endometrial fibrosis, vaginal atrophy, potential cancer risk prompting hysterectomy recommendations) (7.4.5). Estrogen impairs male testosterone/sperm production, causing testicular atrophy and abnormal histology (7.4.6).
Cardiovascular/metabolic risks (7.4.7): CSH linked to elevated risk (heart attack, stroke, VTE). Testosterone in females can cause high blood pressure, polycythemia (increased clot risk), adverse lipid changes, and increased BMI. Estrogen in males is linked to VTE and stroke.
Other risks (7.4.8): Potential early mortality link. Testosterone in females: breast tissue remodeling, possible cancer risk (breast/ovarian). Estrogen in males: possible MS/thyroid cancer risk, potential brain volume decrease, increased breast cancer risk, diminished libido/erectile dysfunction.
Surgery and gender dysphoria (7.5): Notes that all surgery carries risks, amplified after PBs/CSH. Highlights the unique risks of removing healthy organs.
Surgical problems after PBs (7.5.1): Arrested genital development (e.g., in males starting PBs at Tanner 2) limits future options like penile inversion vaginoplasty, necessitating riskier procedures (intestinal vaginoplasty), citing the Dutch cohort death after such surgery.
Other risks (7.6):
Adverse psychiatric effects (7.6.1): Notes testosterone's abuse potential and link (esp. anabolic steroids) to mood instability, psychosis, aggression, dependence. Cites FAERS data showing high rates of serious adverse reactions (including psychological) for CSH in transition. Mentions Swedish study finding elevated crime rate in females post-transition and higher suicide/mortality/psychiatric care rates overall post-transition.
Detransition and regret (7.6.2): Notes that the irreversible effects of CSH (voice deepening, hair growth/loss, breast growth, potential infertility) can lead to regret even if identity remains stable (e.g., regretting infertility). Detransition rate is unknown, challenging claims it's vanishingly low. Cites UK study showing ~20% discontinued hormones within 5 years, over half reporting detransition/regret experiences.
Mortality risk (7.7): Cites four population-based cohort studies (US, Netherlands, UK, Sweden) finding higher all-cause mortality risk (2-3x) for transgender individuals vs. controls, linked to suicide, cardiovascular disease, and HIV. Notes uncertainty about causes and applicability to current youth cohorts but suggests caution, especially as the Dutch study found risk persisted over time despite care changes.
Chapter 8: Summary and Implications of Evidence Review
This chapter synthesizes the findings from Chapters 5-7 regarding the evidence for pediatric medical transition (PMT) interventions.
It reiterates the core finding: while PMT interventions (puberty blockers [PBs], cross-sex hormones [CSH], surgery) are promoted as essential and lifesaving, the evidence base supporting their effectiveness in improving mental health or reducing gender dysphoria (GD) is of very low certainty (Chapter 5).
It emphasizes the importance of considering harms, especially when benefits are uncertain. It draws parallels with FDA risk/benefit assessments, which consider known class effects and mechanisms of action. Applying this to PMT, it argues that predictable harms (like infertility from early PBs followed by CSH) don’t require RCTs to be established, akin to not needing a trial for parachutes (Chapter 7).
The chapter concludes that analysis of all available data, including clinical studies (Chapter 5), limitations in harm detection (Chapter 6), while basic science/physiology (Chapter 7) suggests the risk/benefit profile of medical and surgical interventions for GD youth is unfavorable.
It discusses the implications for decision-makers at different levels:
Individual Care: Requires transparency about benefits, harms, alternatives, and evidence certainty for true shared decision-making. Inaccurate information compromises patient/family values and preferences.
Clinical Guidelines/Health Policy: Values/preferences should reflect a well-informed population perspective. Weak/uncertain evidence demands caution and flexibility, not stronger recommendations.
Regulation: Must prioritize public health/safety. Lacking/low-certainty evidence for benefit warrants focus on evaluation, risk mitigation, and data collection before broad implementation.
The chapter stresses that responsible decision-making at all levels requires understanding what is known, what is uncertain, and the resulting ethical obligations.