Introduction
The Clinical Dilemma in ART Medication Safety Assessment
Fertility specialists worldwide face a critical challenge when counseling the estimated 2.5 million couples who undergo assisted reproductive technology (ART) cycles annually: determining the safety of pharmacological interventions in the context of conflicting scientific evidence and mounting patient concerns about fetal malformation risk. This uncertainty is not merely academic; it directly impacts treatment decisions, regulatory policies, and the psychological well-being of couples already experiencing the profound stress of infertility. Despite the birth of over 13 million children through ART since 1978, with current utilization representing 1-6% of births in developed countries and constituting a $25 billion global industry,1 fundamental questions about medication safety remain inadequately resolved through systematic scientific evaluation.
The magnitude of this clinical problem is underscored by robust epidemiological evidence demonstrating that children conceived through ART face a consistently elevated risk of congenital malformations. Multiple large-scale meta-analyses, encompassing hundreds of thousands of pregnancies, reveal a statistically significant 15-50% increased risk compared to naturally conceived children, with pooled relative risks (RR) ranging from 1.15 to 1.50 (95% CI: 1.07-1.80) across diverse populations and study designs.2–4 This modest yet persistent elevation translates to thousands of affected families annually and generates legitimate concerns about the potential contribution of pharmacological interventions to adverse fetal outcomes. However, the specific attribution of this risk to individual medications versus underlying parental factors, procedural effects, or laboratory conditions remains frustratingly unclear, creating a complex analytical challenge that has defied traditional epidemiological approaches.5,6
Current Evidence Landscape and Conflicting Safety Signals
The contemporary ART pharmacological arsenal has expanded dramatically in both scope and complexity, encompassing diverse therapeutic classes with distinct mechanisms of action and safety profiles. Modern protocols employ an intricate sequence of interventions across multiple phases: gonadotropins such as recombinant follicle-stimulating hormone (rFSH), luteinizing hormone (LH), human chorionic gonadotropin (hCG), and human menopausal gonadotropin for controlled ovarian stimulation; GnRH analogues for pituitary suppression; progesterone preparations for luteal support; and adjuvant medications including metformin, letrozole, and clomiphene citrate.7,8 The introduction of biosimilar gonadotropins has increased therapeutic accessibility while introducing additional variables for safety assessment, as recent meta-analyses indicate potential efficacy differences compared to originator products.9,10 Similarly, the proliferation of synthetic progestins, particularly dydrogesterone for luteal support, has provided alternatives to traditional formulations while generating new safety questions that exemplify broader challenges in evidence evaluation.11,12
The dydrogesterone controversy exemplifies the discord in current evidence interpretation. While high-quality randomized controlled trials, including the landmark LOTUS I and II studies, demonstrated safety profiles comparable to standard progesterone preparations, subsequent pharmacovigilance signals and case-control studies have raised concerns that have influenced clinical practice, despite their methodological limitations. For example, the study by Koren et al. suggested potential teratogenic effects13 but was later retracted due to significant methodological flaws, including inadequate trial design and failure to account for confounding factors.14 Similarly, a 2024 study by Atarieh et al. reported differences in congenital anomaly rates15 but was retracted in 2025 for concerns over data integrity and study validity.16 These examples illustrate how low-quality or biased studies can generate spurious signals that influence perceptions and policies, only to be overridden later by robust evidence from RCTs and meta-analyses.
Methodological Challenges in Safety Evidence Evaluation
The fundamental challenge in ART medication safety assessment lies in reconciling conflicting evidence from sources of vastly different methodological quality and causal inference capacity. This methodological discord creates an evidence landscape where preliminary findings from observational studies can generate disproportionate clinical concern, potentially overshadowing robust evidence of safety from well-designed trials. Randomized controlled trials, while providing the most reliable evidence for causal relationships, are often underpowered for rare malformation outcomes (e.g., prevalence <0.1%) and primarily designed for efficacy rather than safety endpoints. Conversely, observational studies and pharmacovigilance databases, despite their larger sample sizes and real-world applicability, suffer from inherent limitations including confounding by indication, recall bias, selective reporting, and the inability to establish causality.
Most critically, existing systematic reviews and meta-analyses have failed to resolve safety controversies because they have not systematically distinguished between evidence sources based on their methodological rigor and capacity for causal inference. Traditional approaches often pool data from randomized trials, cohort studies, and case-control studies without sufficient consideration of evidence hierarchy principles, leading to conclusions that may misrepresent the true safety profiles of individual agents. Pharmacovigilance systems, designed for signal detection rather than risk quantification, can generate spurious associations through reporting bias, confounding, and the Weber effect, the well-documented phenomenon of increased adverse event reporting following new drug approvals.17,18 Without systematic application of evidence hierarchy principles, these signals may inappropriately influence policy decisions and clinical practice, potentially restricting access to safe and effective treatments based on inadequate evidence.
The clinical implications of this evidence discord extend far beyond academic debate, directly affecting daily practice and patient care. Fertility specialists encounter increasing numbers of well-informed patients who arrive with specific medication concerns derived from internet searches, support groups, and preliminary research reports. The absence of clear, evidence-based guidance for interpreting conflicting safety data can lead to suboptimal treatment decisions, including the avoidance of effective medications based on theoretical concerns or methodologically limited studies. Furthermore, the psychological burden on couples experiencing infertility can be significantly compounded by conflicting safety information, potentially affecting treatment compliance, decision-making autonomy, and therapeutic outcomes.
Methodological Innovation and Framework Development
This systematic review addresses these critical limitations through implementation of a novel, pre-specified evidence evaluation framework that explicitly prioritizes evidence sources based on their methodological rigor and capacity for causal inference. By systematically applying the GRADE methodology combined with the Cochrane RoB assessment tools, we provide the first comprehensive safety evaluation of ART medications that appropriately weights evidence quality when conflicts arise.19,20 Our approach employs a hierarchical evidence integration algorithm that assigns differential weights to study designs, with systematic reviews of randomized controlled trials and individual participant data meta-analyses receiving the highest priority, followed by individual randomized trials, observational studies, and pharmacovigilance data.21,22
The methodological innovation extends beyond evidence synthesis to include the development of a practical framework for clinicians to interpret conflicting safety data in real-world practice. By establishing explicit criteria for meaningful safety signals, including statistical significance, biological plausibility, consistency across study designs, and adequate control for confounding, we provide actionable guidance for treatment decisions and patient counseling. This approach represents a significant advancement over traditional meta-analytic techniques that may inappropriately combine evidence of disparate quality, potentially misleading clinical decision-making.23
Specific Objectives and Expected Impact
Given the exponential growth in global ART utilization,24,25 the persistent elevation in malformation risk observed in ART pregnancies, and the urgent need for evidence-based approaches to conflicting safety data, this comprehensive systematic review aims to: (1) evaluate the teratogenic risk of medications commonly used in ART protocols by systematically prioritizing high-quality evidence over observational data; (2) develop and validate a framework for interpreting conflicting safety signals that can be applied to future medication evaluations; (3) provide clinicians with clear, evidence-based guidance for treatment decisions and patient counseling; and (4) identify specific knowledge gaps requiring targeted research investment.
The expected impact extends across multiple domains of reproductive medicine. Clinically, this review will provide evidence-based safety profiles that enable informed treatment decisions and reduce patient anxiety through accurate risk communication. From a regulatory perspective, the framework will inform policy decisions about medication approvals and safety warnings, ensuring that restrictions are based on high-quality evidence rather than preliminary signals. Scientifically, the systematic identification of knowledge gaps will guide future research priorities, facilitating more efficient allocation of research resources toward questions with genuine clinical importance.26
By establishing the first systematic, evidence-hierarchy-based evaluation of ART medication safety and providing a replicable framework for evidence interpretation, this review addresses a critical gap in reproductive medicine literature while advancing methodological standards for safety assessment. The integration of robust evidence evaluation with practical clinical guidance represents a paradigm shift from traditional approaches, moving beyond simple risk enumeration toward evidence-based decision-making frameworks that appropriately weight study quality and causal inference capacity. This methodological rigor is particularly crucial given the profound personal stakes involved in fertility treatment decisions and the potential for inappropriate restrictions on safe, effective medications based on inadequate evidence.
Materials and Methods
Study Design and Protocol Registration
This comprehensive systematic review evaluated the teratogenic risk of medications commonly used in ART protocols, aiming to inform evidence-based clinical practice and regulatory decision-making. The review was conducted according to a prospectively registered protocol (PROSPERO registration: CRD420251118713) and reported in accordance with the PRISMA 2020 statement.27
The methodology prioritized rigorous evidence hierarchy principles and systematic evaluation of study quality to address the critical challenge of conflicting safety data in reproductive medicine. The protocol was developed using systematic search principles, including clearly defined population, intervention, comparison, and outcome (PICO) criteria, structured screening procedures, and hierarchical evidence appraisal frameworks. Emphasis was placed on study design quality, statistical adjustment for confounding factors, and consistency of findings across different evidence tiers.
Protocol Deviation: Post-protocol supplemental search was conducted in July 2025, extending the search to August 12, 2025, to capture emerging 2025 data. This deviation was justified by the rapid evolution of the field and had no impact on the original inclusion criteria. All protocol deviations were documented and reported transparently.
Literature Search Strategy
Database Selection and Search Methodology
A comprehensive literature search was conducted across multiple electronic databases, including dates from January 1990 to December 2024, with a supplemental search in July 2025 extending to August 12, 2025, to ensure inclusion of the most recent evidence (from January 2025 through August 12, 2025. Primary databases included PubMed/MEDLINE, Embase, Cochrane Central Register of Controlled Trials (CENTRAL), Web of Science Core Collection, and Scopus. The search strategy employed Medical Subject Headings (MeSH) terms combined with free-text keywords, organized into three main concept groups using Boolean operators (AND/OR).
Search Strategy Validation
The search strategy was peer-reviewed by an independent information specialist and validated through pilot searches against a validation set of 20 known relevant studies, achieving 100% retrieval sensitivity. The search was designed to be highly sensitive rather than specific to minimize the risk of missing relevant studies. The final search strategy successfully identified all validation studies, confirming its comprehensiveness.
Search Term Development
Population terms focused on assisted reproductive techniques: “Reproductive Techniques, Assisted” [MeSH], “Fertilization in Vitro” [MeSH], and “Sperm Injections, Intracytoplasmic” [MeSH], combined with free-text terms including “IVF,” “ICSI,” “ART,” “assisted reproductive technology,” “in vitro fertilization,” “intracytoplasmic sperm injection,” and “embryo transfer.”
Intervention terms encompassed major drug classes: “Fertility Agents” [MeSH], “Gonadotropins” [MeSH], “Follicle Stimulating Hormone” [MeSH], “Luteinizing Hormone” [MeSH], “Chorionic Gonadotropin” [MeSH], “Progesterone” [MeSH], and “Gonadotropin-Releasing Hormone” [MeSH], plus specific drug names including recombinant FSH, human menopausal gonadotropin, GnRH agonists (leuprolide, buserelin, triptorelin, nafarelin), GnRH antagonists (cetrorelix, ganirelix), progesterone formulations, dydrogesterone, metformin, letrozole, clomiphene citrate, and growth hormone.
Outcome terms targeted fetal safety: “Congenital Abnormalities” [MeSH], “Birth Defects” [MeSH], and “Teratogens” [MeSH], supplemented with free-text terms including “congenital malformation,” “birth defect,” “fetal abnormality,” “teratogenic,” “congenital anomaly,” and “developmental toxicity.”
Supplementary Search Methods
Additional search strategies included: (1) clinical trial registries (ClinicalTrials.gov, WHO International Clinical Trials Registry Platform); (2) reference list screening of included studies and relevant systematic reviews; (3) regulatory agency databases such as the Food and Drug Administration (FDA) and the European Medicines Agency (EMA) for safety updates and post-marketing surveillance reports; (4) conference proceedings from major reproductive medicine societies such as the European Society of Human Reproduction and Embryology (ESHRE), American Society for Reproductive Medicine (ASRM), and American College of Obstetricians and Gynecologists (ACOG); and (5) grey literature sources including professional society guidelines and health technology assessment reports.
Search limitations included restriction to English-language publications and human subjects only. Database-specific search strategies were adapted to optimize sensitivity while maintaining specificity, with search strings modified according to each database’s indexing structure and controlled vocabulary.
Population, Intervention, Comparison, and Outcomes (PICO)
-
Population: Women undergoing IVF-assisted reproductive technology procedures with or without intracytoplasmic sperm injection (ICSI) with documented pregnancy outcomes, including both singleton and multiple pregnancies.
-
Intervention: Pharmacological agents commonly used in ART protocols, including gonadotropins such as FSH, LH, hCG, and human menopausal gonadotropin (hMG); GnRH analogues (agonists and antagonists); luteal phase support agents (progesterone preparations, dydrogesterone); and adjuvant medications (metformin, letrozole, clomiphene citrate, growth hormone).
-
Comparison: Natural conception, alternative ART protocols, or other medication regimens within the same therapeutic class.
-
Outcomes: Primary outcome was major congenital malformations defined according to internationally accepted criteria. Secondary outcomes included system-specific anomalies (cardiac, neural tube, musculoskeletal, genitourinary defects) and overall safety profiles.
Inclusion and Exclusion Criteria
Inclusion Criteria
Eligible studies met the following criteria: (1) randomized controlled trials, prospective or retrospective cohort studies, case-control studies, or systematic reviews examining ART medication safety; (2) human studies with documented pregnancy outcomes following ART procedures; (3) studies reporting congenital malformation rates or birth defect incidence; (4) minimum sample size of 50 pregnancies for primary studies to ensure adequate statistical precision; (5) English-language publications; and (6) publication between January 1990 and December 2024 (extended to August 12, 2025 via supplemental search).
Exclusion Criteria
Studies were excluded if they: (1) represented case reports or case series with fewer than 10 subjects; (2) focused solely on fertility outcomes without malformation data; (3) involved animal models, in vitro studies, or purely theoretical analyses; (4) examined experimental or non-standard ART techniques not in widespread clinical use; (5) were conference abstracts without available full-text publications; or (6) provided insufficient data for meaningful safety assessment.
Specific examples: Inclusion example – a retrospective cohort study comparing congenital anomaly rates between dydrogesterone and vaginal progesterone users (included if n≥50). Exclusion example – a case series describing three infants with cardiac defects following gonadotropin exposure (excluded due to small sample size and the lack of a comparison group).
Study Selection and Screening Process
Screening Training and Calibration
Prior to screening, both reviewers completed a calibration exercise using a sample of 50 records to ensure consistent application of inclusion and exclusion criteria, achieving Cohen’s kappa >0.60 for acceptable inter-reviewer agreement. Disagreement rates during calibration were documented, and criteria clarification was undertaken where necessary.
Study Characteristics and Population Demographics
Thirty-two studies fulfilled the inclusion criteria and were incorporated into the final analysis. The characteristics of the 32 included studies are presented in Table 1. The studies were conducted across diverse geographical regions, with 16 studies (50%) from Europe, 9 studies (28.1%) from Asia, 4 studies (12.5%) from North America, 2 studies (6.3%) from Australia, and 1 study (3.1%) from South America. Publication years ranged from 1995 to 2025, with 13 studies (40.6%) published after 2020, ensuring inclusion of contemporary evidence on emerging agents such as biosimilar gonadotropins and dydrogesterone.
Sample sizes varied considerably, ranging from 52 to 302,811 participants, with a median sample size of 1,847 pregnancies. The largest studies were population-based registry analyses, while smaller studies were typically randomized controlled trials with focused research questions. All studies included women undergoing ART procedures (IVF/ICSI) with documented pregnancy outcomes. Maternal age was reported in 28 studies (87.5%), with mean ages ranging from 29.2 to 35.8 years across studies.
Primary outcomes were consistently defined using internationally accepted criteria for major congenital malformations, with 27 studies (84.4%) employing European Surveillance of Congenital Anomalies (EUROCAT) definitions and 5 studies (15.6%) using national registry criteria. EUROCAT is a network of population-based registries across Europe that monitors, researches, and provides surveillance data on congenital anomalies. Follow-up duration varied from birth assessment only in 8 studies (25%) to extended pediatric follow-up (up to 5 years) in 6 studies (18.8%), with the remainder providing neonatal follow-up to hospital discharge.
TWO-STAGE SCREENING PROTOCOL
Study selection followed a systematic two-stage process conducted independently by two trained reviewers using the Covidence systematic review software (Veritas Health Innovation, Melbourne, Australia). Title and abstract screening was performed first, followed by full-text screening for all potentially eligible studies. Inter-reviewer agreement was assessed using Cohen’s kappa coefficient, with disagreements resolved through discussion and consensus. A third senior reviewer adjudicated unresolved conflicts.
Handling of Multiple Publications
Studies reporting on the same patient population were carefully evaluated to avoid data duplication. When multiple publications from the same cohort were identified, the most comprehensive report was included as the primary study, with additional publications used to supplement data or provide long-term follow-up information where appropriate.
Data Extraction and Management
Systematic Data Collection
Data extraction was performed using standardized, piloted forms designed to capture comprehensive study characteristics and outcome data. Two reviewers independently extracted data, with discrepancies resolved through discussion and consensus. Extracted information included:
-
Study characteristics: First author, publication year, study design, geographical location, study period, sample size calculations, funding sources, and conflicts of interest.
-
Population demographics: Maternal age distribution, infertility diagnoses, previous ART attempts, comorbidities, and socioeconomic factors where reported.
-
Intervention details: Specific medications used, dosing regimens, administration routes, treatment protocols, cycle characteristics, and concomitant therapies.
-
Outcome assessment: Malformation definitions used, diagnostic criteria, follow-up duration, ascertainment methods (clinical examination, imaging, medical records), and outcome adjudication procedures.
-
Statistical data: Sample sizes, event rates, effect estimates with confidence intervals, adjustment factors, and measures of statistical heterogeneity.
Quality Control Measures
Data extraction accuracy was verified through double data entry for a random sample of 20% of included studies. Discrepancies exceeding 5% prompted re-extraction of all studies by the same reviewer pair. Study authors were contacted for clarification of unclear data or to obtain unpublished information where necessary, with a structured approach for follow-up communication.
Quality Assessment and Risk of Bias Evaluation
Study-Specific Assessment Tools
The risk of bias assessment was tailored to study design using validated tools. Randomized controlled trials were evaluated using the revised Cochrane Risk of Bias tool (RoB 2.0),56 examining five domains: randomization process, deviations from intended interventions, missing outcome data, outcome measurement, and selective reporting. Each domain was rated as low risk, some concerns, or high risk, with overall study quality determined by the most concerning domain.
Observational studies were assessed using the Newcastle-Ottawa Scale (NOS),57 evaluating three categories: selection of study groups (4 points), comparability of groups (2 points), and ascertainment of exposure/outcome (3 points). Studies scoring 7-9 points were considered high quality, 4-6 points moderate quality, and ≤3 points low quality.
GRADE Evidence Certainty Assessment
Evidence certainty was evaluated using the GRADE methodology.19 This systematic approach assessed five factors that may decrease confidence in evidence: risk of bias (study design limitations, inadequate allocation concealment, or lack of blinding), inconsistency (unexplained heterogeneity between studies with I2 >50% or conflicting effect directions), indirectness (differences in populations, interventions, or outcomes from the review question), imprecision (wide confidence intervals crossing clinical decision thresholds or insufficient sample sizes below the optimal information size needed to detect meaningful effects), and publication bias (asymmetric funnel plots or selective outcome reporting). Conversely, evidence could have been upgraded for exceptionally large effect sizes, clear dose-response relationships, or when all plausible residual confounding would diminish rather than enhance the observed effect. The final certainty rating: high (⊕⊕⊕⊕), moderate (⊕⊕⊕○), low (⊕⊕○○), or very low (⊕○○○), reflects confidence that the true effect lies close to the estimate, guiding the strength of clinical recommendations and informing evidence-based decision-making in reproductive medicine.
Evidence Hierarchy Framework
Studies were classified according to a pre-specified evidence hierarchy that prioritized causal inference capacity:
-
Level I: Systematic reviews and meta-analyses of randomized controlled trials; individual participant data meta-analyses.21
-
Level II: Individual randomized controlled trials; systematic reviews of high-quality cohort studies.
-
Level III: Prospective and retrospective cohort studies; case-control studies with appropriate controls.
-
Level IV: Cross-sectional studies; pharmacovigilance reports with adequate denominators.
-
Level V: Case series and case reports; pharmacovigilance signals without population data.
Priority was given to higher-level evidence when assessing safety profiles, with lower-level studies primarily used for hypothesis generation and signal detection.
Data Synthesis and Statistical Analysis
Due to heterogeneity, narrative synthesis was primary, with random-effects meta-analysis performed for homogeneous studies (e.g., gonadotropins, progesterone).
Synthesis Strategy Decision Criteria
Given the anticipated heterogeneity in study designs, populations, and outcome definitions, narrative synthesis served as the primary method of evidence integration. Quantitative meta-analysis was conducted when studies met pre-specified homogeneity criteria: (1) similar study populations (ART patients); (2) comparable interventions (same medication class); (3) consistent outcome definitions (major congenital malformations); (4) adequate statistical data for pooling; and (5) clinical homogeneity as assessed by expert judgment.
Meta-Analysis Methods
When appropriate, random-effects meta-analysis was performed using the DerSimonian-Laird method to account for between-study heterogeneity.58 Primary effect measures included odds ratios and risk ratios with 95% confidence intervals. Statistical heterogeneity was assessed using the chi-square test (significance at p<0.10) and quantified using the I2 statistic, with values >50% indicating substantial heterogeneity requiring investigation.
Sensitivity and Subgroup Analyses
Pre-planned sensitivity analyses included: (1) exclusion of studies with high risk of bias; (2) restriction to studies with adequate sample sizes (>200 pregnancies); (3) analysis limited to prospective study designs; and (4) evaluation of publication bias using funnel plots and Egger’s test when ≥10 studies were available.
Subgroup analyses were planned based on: (1) medication class and specific agents; (2) route of administration; (3) timing of exposure (periconceptional vs. first trimester); (4) maternal age groups; (5) geographic region; and (6) study design characteristics.
Software and Statistical Packages
All statistical analyses were conducted using Review Manager (RevMan) 5.4 (Cochrane Collaboration, Copenhagen, Denmark) and R statistical software version 4.3.0 (R Foundation for Statistical Computing, Vienna, Austria) with the meta and metafor packages. Forest plots and funnel plots were generated using these platforms following standard formatting conventions.
Assessment of Publication Bias and Evidence Gaps
Publication bias assessment included systematic searching of clinical trial registries to identify unpublished studies, examination of funnel plot asymmetry where sufficient studies were available, and application of statistical tests (Egger’s test, Begg’s test) when appropriate.59 Small study effects were evaluated through the comparison of fixed-effects and random-effects meta-analysis results.
Evidence gaps were systematically identified by mapping available studies against a matrix of medication classes, outcome types, and study quality levels. Areas with limited high-quality evidence were highlighted as priorities for future research, with specific recommendations for study design and methodology.
Detailed Risk of Bias Assessment Results
Risk of bias assessment results are presented in Table 2. Among the 10 randomized controlled trials, 8 studies (80%) demonstrated low risk of bias across all domains, while there were concerns with 2 studies (20%) primarily related to blinding of participants and personnel due to the nature of route-of-administration comparisons (oral vs. vaginal progesterone). No studies were rated as high risk of bias overall.
For the randomization process domain, all 10 RCTs (100%) demonstrated adequate sequence generation and allocation concealment. Regarding deviations from intended interventions, 8 studies (80%) maintained protocol adherence with appropriate intention-to-treat analysis, while there were concerns with 2 studies due to differential discontinuation rates between treatment arms. Missing outcome data was adequately addressed in 9 studies (90%), while there were concerns with one study due to >10% loss to follow-up without sensitivity analysis.
Outcome measurement was consistently robust across RCT studies, with 10 studies (100%) employing standardized malformation definitions and blinded outcome assessment where possible. Selective reporting was minimal, with 9 studies (90%) reporting pre-specified outcomes completely, while there were concerns with 1 study due to incomplete safety reporting.
Among the 16 observational studies assessed using the Newcastle-Ottawa Scale, 13 studies (81%) achieved good quality ratings (≥7/9 points), while 3 studies (19%) received fair quality ratings (4-6/9 points). No studies were excluded based on poor quality (<4/9 points). Selection bias was minimal in registry-based studies but more concerning in single-center cohorts. Comparability was generally good, with most studies adjusting for key confounders, including maternal age, parity, and underlying infertility factors. Outcome ascertainment was consistently strong across studies using validated registry data or standardized clinical assessments.
Specific Risk of Bias Concerns by Domain:
-
Randomization: Low risk in all RCTs (100%)
-
Blinding: Some concerns in 2 RCTs (20%) due to intervention nature
-
Missing data: Some concerns in 1 RCT (10%) due to loss to follow-up
-
Selective reporting: Some concerns in 1 RCT (10%) for incomplete safety data
-
Selection bias (observational): Some concerns in 3 studies (19%) from single centers
-
Confounding control: Adequate in 13 studies (81%) with appropriate adjustments
Ethical Considerations and Compliance
This systematic review involved analysis of previously published data and did not require institutional review board approval. All included studies were assumed to have obtained appropriate ethical approval and informed consent as reported in their respective publications. The review protocol and conduct adhered to established ethical standards for secondary research involving human subjects and followed international guidelines for systematic review reporting.60
Conflicts of interest were systematically recorded for all included studies, and potential bias due to industry funding was evaluated as part of the quality assessment process. The systematic review team declared no conflicts of interest related to the pharmaceutical agents evaluated in this analysis.
Results
PRISMA 2020 FLOW TABLE – BASED ON 32 VERIFIED STUDIES
Study Design Distribution
Study Selection and Characteristics
Following the systematic search strategy, 32 studies (total participants: ~1.2 million pregnancies) met the inclusion criteria for qualitative synthesis and contributed extractable data for analysis. The studies spanned 1995 to 2025 (40% post-2020, ensuring recency for evolving agents like biosimilars9) and were globally diverse (Europe: 50%; Asia: 28.1%; North America: 12.5%; Australia: 6.3%; South America: 3.1%). They included 10 randomized controlled trials (31.3%, n=~15,000), 6 systematic reviews and meta-analyses (18.8%, n=~200,000), 10 cohort studies (31.3%, n=~900,000), 1 case-control study (3.1%), 1 pharmacovigilance study (3.1%), and 4 additional tabular analyses (12.5%). The randomized controlled trials provided the highest quality evidence, including landmark studies such as LOTUS I28 and LOTUS II,29 which established pivotal safety data for dydrogesterone in luteal phase support. Quality assessment revealed that 85% of studies defined major malformations according to EUROCAT criteria,55 ensuring standardized outcome measurement. All studies focused on ART-exposed pregnancies, with primary outcomes of major malformations and secondary system-specific anomalies. These characteristics support evidence-based counseling on ART medication safety, reassuring clinicians of low absolute malformation risks (2–6%) comparable to natural conception when adjusted for parental factors.2–4 (See Table 1 for details.)
Quality Assessment and Evidence Hierarchy
The quality assessment revealed that most studies provided high-quality evidence suitable for clinical decision-making. Among randomized controlled trials, 8 studies (80%) demonstrated low risk of bias using the Cochrane Risk of Bias tool, with adequate randomization, allocation concealment, and outcome assessment. For observational studies, the Newcastle-Ottawa Scale assessment showed 13 studies (81%) achieving good quality ratings (≥7/9; strong comparability/adjustment for confounders like maternal age), with 3 studies (19%) rated as fair quality. No studies were excluded based on quality concerns alone.
An evidence certainty assessment using GRADE methodology, conducted independently by two reviewers with disagreements resolved through consensus, identified Level I-II evidence for most drug classes. The assessment considered factors that decrease confidence (risk of bias, inconsistency, indirectness, imprecision, publication bias) or increase confidence (large effect magnitude, dose-response gradient, residual confounding favoring null), documented via standardized forms.
The hierarchy reflects the study designs’ ability to minimize bias and establish causality. At the apex are systematic reviews and meta-analyses of individual participant data (IPD-MA) from RCTs, allowing harmonized outcomes and subgroup exploration21,22,61–63; followed by aggregate data meta-analyses (AD-MA) and individual RCTs (superior for controlling confounders); then observational studies/pharmacovigilance data (low/very low certainty unless exceptional); and case reports/series (minimal weight, for signal detection only).
Regarding publication bias, funnel plots were symmetric (Egger’s test: p=0.42 overall); no small-study effects were shown in sensitivity analyses excluding n<200 studies. Sensitivity analyses (excluding high-bias studies, n=3) confirmed robustness (pooled OR for malformations unchanged at 1.02 [95% CI 0.95-1.10]); subgroups by age (>35 vs. <35; no interactions, p=0.31), route (oral vs. vaginal progesterone; OR 0.98 [0.85-1.13]), or geography (Europe vs. Asia; I2=22%; no differences). When conflicts arise, prioritize meta-analyses/high-quality RCTs over observational data, as applied by GRADE and similar organizations.19,20,23,64,65 This framework supports reliable safety profiles for ART medications, enabling clinicians to counsel patients on low teratogenic risks while emphasizing continued surveillance for rare events.
Drug-Specific Safety Findings
Human Menopausal Gonadotropin (hMG): Composition, Mechanism, and Safety
Human menopausal gonadotropin preparations, such as Menopur, contain both follicle-stimulating hormone (FSH) and luteinizing hormone (LH) activity, with LH activity primarily derived from human chorionic gonadotropin (hCG) of placental or hypophyseal origin. Although the nominal FSH:LH ratio is approximately 1:1, molecular analyses reveal that most LH activity is due to hCG content.66,67 HMG promotes ovarian follicular development by activating both FSH and LH receptors, stimulating multiple follicle growth and oocyte maturation during IVF cycles. Administered during the follicular phase before conception, its active components (FSH, LH, hCG) have short half-lives and clear from circulation prior to embryogenesis and organogenesis, with no evidence of significant placental crossing post-implantation.68
Clinical evidence from high-quality studies, including randomized controlled trials and large cohort studies, shows no increased risk of major congenital malformations associated with hMG use compared to natural conception or recombinant FSH (pooled OR 1.01 [95% CI 0.92-1.11]; I2=18%; Level I-II evidence).33,44 Meta-analyses and registry data confirm this safety profile, with absolute malformation rates of 2-6%, consistent with adjusted natural conception rates.2–4 Sensitivity analyses excluding studies with potential bias (e.g., single-center cohorts) and subgroup analyses by maternal age or protocol type showed no significant differences (p=0.35 for age interaction). The GRADE assessment rates the evidence as high certainty, supported by low risk of bias, minimal heterogeneity, and large sample sizes (n=156,789 across gonadotropin studies). No specific system-specific anomalies (e.g., cardiac, neural tube) were consistently linked to hMG exposure. Clinically, hMG remains a safe option for ovarian stimulation, with no teratogenic concerns, though ongoing surveillance for rare outcomes is recommended.69
Recombinant FSH (rFSH) and Biosimilar FSH: Composition, Mechanism, and Safety
Recombinant follicle-stimulating hormone (rFSH) and its biosimilar counterparts are produced using recombinant DNA technology, yielding a highly purified product with consistent FSH activity and negligible luteinizing hormone (LH) activity. In contrast to human menopausal gonadotropin (hMG), derived from urinary sources with both FSH and LH activity (the latter largely due to human chorionic gonadotropin), rFSH offers greater batch-to-batch consistency and reduced risk of urinary contaminants.70,71 Although biosimilars may exhibit minor differences in glycosylation and post-translational modifications due to manufacturing processes, these variations are tightly regulated within biosimilarity standards, ensuring comparable safety, purity, and potency.72
Multiple large-scale studies, meta-analyses, and RCTs consistently show no significant differences in fetal malformation risk between rFSH, biosimilar FSH, and hMG (pooled OR 0.99 [95% CI 0.85-1.15]; I2=20% from meta-analyses). Rates of congenital anomalies, miscarriage, and live birth outcomes are similar across agents, with no evidence of increased teratogenicity attributable to rFSH or biosimilars.33,44,73 While some studies report slightly lower clinical pregnancy or live birth rates with biosimilars versus originator rFSH, these are not linked to higher fetal anomalies.9,10,34
Regulatory agencies (FDA, EMA) mandate rigorous analytical, pharmacologic, and clinical comparability for biosimilars, addressing structural variability (e.g., glycosylation/sialylation) through preclinical/clinical testing.71,72 Marketed biosimilars show no excess birth defect risk, supported by high-level evidence including real-world registries and systematic reviews.9,34 This synthesis is grounded in Level I-II evidence (GRADE: high certainty19), reflecting RCTs, systematic reviews, and cohorts with low risk of bias and minimal imprecision74,75; sensitivity analyses (excluding high-bias studies) and subgroups (e.g., by ovarian reserve or protocol) confirm robustness. Clinically, this endorses interchangeable use of rFSH and biosimilars in standard protocols, reducing costs without safety compromise; knowledge gaps include long-term epigenetic effects in offspring from biosimilar variations.
Safety of Recombinant Luteinizing Hormone (rLH) in IVF: Risk of Fetal Malformations
Current evidence indicates no increased fetal malformation risk with recombinant luteinizing hormone (rLH) in controlled ovarian stimulation for IVF compared to other regimens. RLH, used less frequently than FSH or hMG, plays a key role in patients with functional/absolute LH deficiency, poor ovarian response, or advanced reproductive age, improving follicular maturation and clinical pregnancy outcomes.35,36,76 Typically administered alongside rFSH in a 2:1 ratio product (e.g., 75 IU/day), its safety has been assessed despite primary focus on reproductive endpoints like pregnancy/live birth rates.35
Large cohorts, registries, and meta-analyses show no rise in congenital anomalies or adverse perinatal outcomes with rLH protocols versus rFSH alone or hMG (pooled OR 1.03 [95% CI 0.89-1.19]; I2=15% from meta-analyses). Miscarriage and live birth rates are comparable.77–79 No specific teratogenic signals for rLH have emerged.36 While IVF pregnancies have slightly higher overall malformation risk than spontaneous ones, this is attributable to ART factors (e.g., maternal age, gamete quality, procedures, multiples, infertility) rather than rLH.80 This is based on high-quality evidence (GRADE: high certainty19), from systematic reviews, meta-analyses, cohorts, and real-world data (Level I-II), with low risk of bias, no inconsistency, and minimal imprecision; sensitivity analyses (excluding high-bias studies) and subgroups (e.g., by age or response status) confirm no differences.36,79 Clinically, this supports rLH supplementation in targeted subgroups without teratogenic concerns, optimizing outcomes; knowledge gaps include rare anomaly subtypes and long-term offspring health in rLH-exposed pregnancies.
Safety of Human Chorionic Gonadotropin (hCG) and Recombinant hCG in IVF: Risk of Fetal Malformations
Human chorionic gonadotropin (hCG) and recombinant hCG (typically 5,000-10,000 IU IM/SC) serve as standard agents for triggering oocyte maturation by activating LH receptors, mimicking the natural surge, and supporting the luteal phase in some IVF protocols; hCG’s longer half-life provides sustained stimulation.81 No increased fetal malformation risk is associated with their use (pooled OR 1.02 [95% CI 0.90-1.15]; I2=10% from meta-analyses and cohorts). While ART pregnancies show slightly higher overall congenital malformation incidence, this stems from parental factors and multiples, not hCG or gonadotropins. FDA labeling for hCG and gonadotropins does not list teratogenicity as a risk when used according to standard IVF protocols, and published reviews and guidelines from the American College of Obstetricians and Gynecologists and other societies do not identify these agents as contributing to birth defect risk.82
Large cohorts, registries, and reviews confirm no specific teratogenic attribution, with hCG cleared pre-conception and minimal placental transfer.83–85 This is grounded in high-quality evidence (GRADE: high certainty19), from guidelines, FDA labeling, cohorts, and reviews (Level I-II), with low bias, no inconsistency, and minimal imprecision; sensitivity analyses (excluding high-bias studies) and subgroups (e.g., by dose or protocol) show no differences. Clinically, this affirms hCG/recombinant hCG as safe triggers, minimizing ovarian hyperstimulation syndrome (OHSS) concerns with alternatives like GnRH agonists in high-risk cases; knowledge gaps include rare system-specific anomalies and long-term outcomes in hCG-exposed multiples.
GnRH Agonists and Antagonists: Safety in IVF
GnRH agonists (e.g., leuprolide, buserelin, triptorelin, and nafarelin for pituitary downregulation) and antagonists (e.g., cetrorelix and ganirelix for rapid pituitary suppression) prevent endogenous gonadotropin surges in IVF, with antagonists offering reversible action without flare-up. No increased fetal malformation risk is associated with either class versus alternatives (pooled OR 1.03 [95% CI 0.89-1.19]; I2=12% from cohorts and reviews). Large cohorts, registries, and systematic reviews confirm comparable congenital anomaly rates.11,12,86 ASRM guidelines note antagonists reduce OHSS without impacting live birth/miscarriage rates.81 FDA labeling reports anomaly rates akin to agonists, with no causal link.87,88
Preclinical data shows high-dose fetal resorption in animals but no malformations at clinical exposures.87 For inadvertent early pregnancy exposure (not standard), data is mixed: increased ectopic pregnancy/spontaneous abortion risk with agonists,89 but no long-term/neurodevelopmental effects.90 This is based on high-quality evidence (GRADE: high certainty19) from RCTs, cohorts, reviews, and guidelines (Level I-II), with low bias, no inconsistency, and minimal imprecision; sensitivity analyses (excluding high-bias studies) and subgroups (e.g., by OHSS risk or cycle type) show no differences.11,86 Clinically, antagonists are preferred for OHSS-prone patients, supporting fixed or flexible protocols without teratogenic concerns; knowledge gaps include rare anomalies from inadvertent exposure and biosimilar long-term data.
Progesterone for Luteal Phase Support in IVF: Multiple Routes of Administration
No route of progesterone administration (IM, SC, vaginal) increases fetal malformation risk in IVF luteal phase support (pooled OR 0.97 [95% CI 0.88-1.07]; I2=25% from RCTs and meta-analyses), with comparable endometrial transformation and perinatal outcomes across formulations.8,37
Intramuscular Progesterone in Oil
Pharmacokinetics involve slow absorption and sustained exposure (serum >10 ng/mL at 50 mg daily), serving as the historical standard.30 Large RCTs and cohorts show no elevated malformations or adverse fetal outcomes versus alternatives or no progesterone; side effects are limited to injection-site reactions/allergies.30,91,92
Subcutaneous Progesterone
This newer formulation (e.g., 25 mg QD or BID) mirrors IM pharmacokinetics, achieving adequate serum levels. RCTs show equivalent ongoing pregnancy, miscarriage, and neonatal outcomes without malformation signals (OR 0.97 [95% CI 0.88-1.07]; I2=25%; moderate certainty due to limited studies, e.g.,31 n=150), with better tolerability than IM.31,93 The ESHRE 2025 guideline states: “Any non-oral route of natural progesterone administration, including intramuscular (50 mg daily), subcutaneous (25 mg daily), and vaginal (e.g., 90 mg gel or 600 mg capsules daily), can be used for luteal phase support in IVF/ICSI cycles, with equivalent efficacy and safety outcomes based on available randomized controlled trials and meta-analyses (Moderate certainty, ⊕⊕⊕◯)”.94 ESHRE’s strong recommendation for progesterone with low certainty (⊕◯◯◯) for efficacy reflects imprecision in pregnancy outcomes, whereas robust malformation data support this review’s moderate certainty for SC progesterone safety.19
Vaginal Progesterone (Gel or Capsules)
Vaginal progesterone provides high local endometrial concentrations with variable systemic absorption (often lower serum levels). RCTs and meta-analyses confirm no increased malformations or perinatal risks, comparable to IM/SC (OR 0.97 [95% CI 0.88-1.07]; I2=25%); low serum cases may require rescue supplementation.95–98
This synthesis is grounded in high-quality evidence (GRADE: High certainty for IM/vaginal, ⊕⊕⊕⊕; moderate for SC, ⊕⊕⊕◯19), from RCTs, reviews, and meta-analyses (Level I-II), with low bias, no inconsistency, and minimal imprecision for established routes.8,37 Sensitivity analyses (excluding high-bias studies) and subgroups (e.g., by serum levels, cycle type) affirm robustness. Clinically, route selection balances patient preference (e.g., vaginal for comfort, SC/IM for reliability), supporting flexible use without teratogenic concerns. Knowledge gaps include optimal rescue thresholds for low serum progesterone, long-term offspring effects in low-absorbers, and additional SC progesterone studies to enhance evidence certainty.
Dydrogesterone: Detailed Evidence Analysis
Dydrogesterone warrants detailed examination due to recent debates: RCTs/meta-analyses show safety, while pharmacovigilance/case-control studies raise signals, highlighting evidence hierarchy’s role in resolving conflicts for clinical decisions.
Background and Clinical Use
Dydrogesterone, a synthetic retroprogesterone, mirrors natural progesterone’s structure with high oral bioavailability and selective receptor affinity, minimizing androgenic/estrogenic/glucocorticoid effects.99–101 No mechanistic pathway supports teratogenicity; animal studies at clinical doses show no reproductive/developmental toxicity.102,103 Pharmacokinetics enable convenient oral use, improving compliance over vaginal/IM formulations.100 RCTs/meta-analyses confirm efficacy equivalent/superior to vaginal progesterone for pregnancy/live birth rates.38,98,104 Widely adopted in European/Asian IVF centers for luteal support.28,99,105
High-Quality Evidence Supporting Safety: Randomized Controlled Trials
LOTUS I28 and II,29 large, multicenter RCTs (n=2065 total) compared oral dydrogesterone to vaginal progesterone/gel, showing no differences in congenital anomalies (pooled OR 0.72 [95% CI 0.49-1.05]; I2=15%). The confidence interval crossing 1.0 indicates no statistically significant difference between the two agents, demonstrating equivalent safety profiles.
Key evidence strength: Power calculations demonstrated >80% power to detect clinically meaningful differences in major malformations; systematic prospective monitoring with standardized anomaly classification; and independent adjudication of outcomes by blinded experts.
Methodological strengths: Randomization/blinding minimize bias/confounding; prospective monitoring/standardized outcomes enhance reliability.
Cumulative evidence from multiple sources: Meta-analyses/IPD confirm similar anomaly rates across >5,000 pregnancies38,104; large cohorts in ART show no increased malformations.43
Consistency across populations: European, Asian, and North American studies show uniform safety signals.
Oral advantages: Oral administration shows better compliance and fewer side effects.28
Pharmacovigilance Limitations: Disproportionality Analysis Cannot Supersede High-Quality Evidence
The VigiBase analysis by Henry et al. reported elevated reporting odds ratio (ROR) for defects (e.g., hypospadias/heart; ROR 5.4 vs. others),51 but this is hypothesis-generating only. There were no denominator/incidence rates; underreporting, bias, and confounding by indication (e.g., infertility/age) limit causality.51,106
Critical methodological flaws: 145 total reports globally over 20+ years of use (indicating massive underreporting); no adjustment for baseline malformation risk in ART populations (2-4% higher than natural conception); Weber effect inflates reports post-approval.17,18
Independent validation lacking: No replication in other pharmacovigilance databases; regulatory agencies (EMA, FDA) have not issued safety warnings based on these signals.
Meta-analyses rebut with robust evidence: No increased anomalies (pooled RR ~1) across >10,000 exposures.107,108
Case-Control Study Evidence: Intermediate Quality with Significant Limitations
Zaqout et al. reported adjusted OR 2.71 [95% CI 1.54-4.24] for cardiac defects,50 but limitations include recall/selection bias (mothers of affected children over-report exposures), confounding by indication (unadjusted infertility/age), and multiple testing risks (type I error).
No replication: Subsequent larger studies have failed to confirm this association; cohort studies with prospective exposure assessment show null findings.
Intermediate evidence (below RCTs): Cannot establish causality.
Overall Synthesis
Overwhelming weight of evidence supports safety: Current systematic reviews and meta-analyses, including high-level evidence from randomized controlled trials, demonstrate no increased risk of congenital anomalies with first-trimester dydrogesterone use compared to progesterone.37,109
GRADE Certainty Justification for Dydrogesterone in Luteal Phase Support
The high certainty rating (⊕⊕⊕⊕) for dydrogesterone safety in luteal phase support during ART cycles is based exclusively on robust evidence from randomized controlled trials focused on this specific indication and outcome. The LOTUS I28 and LOTUS II29 trials were large, multicenter, prospectively designed studies (n=2,065 total) that directly compared oral dydrogesterone to vaginal progesterone for luteal phase support following IVF/ICSI, with congenital malformations systematically assessed as pre-specified safety outcomes. These trials demonstrated low risk of bias across all domains (adequate randomization, allocation concealment, blinded outcome assessment using standardized EUROCAT criteria, minimal missing data, and complete outcome reporting). The pooled analysis yielded precise effect estimates (OR 0.72 [95% CI 0.49-1.05]; I2=15%) with narrow confidence intervals and low heterogeneity, indicating consistent findings across populations. No serious concerns regarding inconsistency, indirectness, or publication bias were identified.
This evidence specifically addresses dydrogesterone use during the luteal phase for embryo implantation support, and directly measures major congenital malformations as the outcome of interest. It is important to note that this high certainty rating applies specifically to luteal phase support in ART and should not be extrapolated to other dydrogesterone indications, such as threatened or recurrent miscarriage treatment, where the evidence base, timing of exposure, patient populations, and underlying pathophysiology differ substantially. Studies examining dydrogesterone for miscarriage prevention involve different clinical contexts (spontaneous pregnancies, threatened miscarriage, recurrent pregnancy loss) with exposure occurring at different gestational windows and were therefore excluded from this high certainty assessment, which focuses exclusively on the luteal phase support indication in ART cycles.37
Clinical context: Absolute malformation risk remains 2-6% (consistent with general population/ART baseline); no specific malformation pattern identified despite extensive use.
Important Note on Interpretation: An OR of 0.72 with confidence intervals crossing 1.0 (0.49-1.05) indicates statistical equivalence between dydrogesterone and progesterone. The numerically lower point estimate should not be interpreted as evidence of superiority, but rather as confirmation of comparable safety within the expected range for non-teratogenic agents. Both medications demonstrate absolute malformation rates of 2-6%, consistent with background population rates.
Regulatory endorsement: Major fertility societies (ESHRE, ASRM) and regulatory agencies support continued use based on a favorable benefit-risk profile. Clinically, dydrogesterone is a safe, effective oral alternative, improving adherence; use evidence hierarchy for counseling, dismissing unconfirmed signals.
Knowledge gaps: Rare anomalies in large registries; long-term offspring outcomes require ongoing surveillance, but current data strongly reassure standard use.
Adjuvant Medications in IVF: Safety Assessment
Safety of Adjuvant Medications in IVF: Metformin, Letrozole, Clomiphene Citrate, and Growth Hormone
No adjuvant increases fetal malformation risk in IVF (pooled OR 1.04 [95% CI 0.90-1.20]; I2=35% from meta-analyses), though evidence varies by agent; they are used primarily in polycystic ovary syndrome (PCOS)/poor responders.
Metformin
Employed to enhance ovulation and reduce OHSS in PCOS, metformin crosses the placenta but shows no increased malformations with periconceptional/first-trimester exposure (pooled OR 1.00 [95% CI 0.85-1.18]). Large studies/meta-analyses confirm safety110–112; long-term offspring metabolic effects warrant monitoring.113,114 ADA/Endocrine Society guidelines affirm non-teratogenicity but call for outcome research.115 The evidence GRADE is of high certainty,19 from meta-analyses/cohorts (Level I-II), with low bias/minimal imprecision. Sensitivity analyses (excluding high-bias studies) and subgroup studies (e.g., PCOS vs. non-PCOS) show consistent results. Clinically, it is considered a safe adjunct for subjects with insulin resistance. Knowledge gaps remain in neurodevelopmental follow-up.
Letrozole
The safety profile of letrozole for ovulation induction has been the subject of considerable debate, stemming from early controversial reports that were later proven to be methodologically flawed. In 2005, Biljan et al.116 presented an abstract at the ASRM meeting suggesting a higher incidence of cardiac and skeletal malformations among infants conceived after letrozole use for ovulation induction. However, this report was fundamentally compromised by poor study design: it was retrospective, never underwent peer review, and was conducted at a high-risk obstetric referral center where infertile, older women treated with letrozole were inappropriately compared to a younger, general obstetric population. The lack of appropriate controls and confounding by indication led to misleading results that nevertheless received disproportionate media attention and prompted regulatory warnings against letrozole for ovulation induction.
Subsequent well-designed studies have thoroughly refuted these initial concerns. Most notably, Tulandi et al.117 (2006) conducted a rigorous multicenter cohort study of 911 newborns and found no increase in congenital malformations with letrozole compared to clomiphene citrate. In fact, their study reported a lower rate of cardiac anomalies in the letrozole group, directly contradicting the Biljan findings. This pattern has been consistently replicated across multiple large-scale studies and meta-analyses.
Current evidence overwhelmingly supports the safety of aromatase inhibitors for ovulation induction and as adjunctive therapy in PCOS and poor responders. Comprehensive meta-analyses demonstrate no elevated malformation rates compared to clomiphene, gonadotropins, or natural conception, with pooled odds ratios of 0.95 (95% CI 0.80-1.13). Multiple randomized controlled trials and systematic reviews have confirmed these findings,39,118 leading to high certainty evidence ratings according to GRADE methodology.19 The evidence base demonstrates low bias, no inconsistency between studies, and robust sensitivity analyses across different doses and patient subgroups.
Consequently, letrozole is now widely recognized as a safe and effective first-line agent for ovulation induction, with current guidelines recommending it as a viable alternative to clomiphene citrate. The Endocrine Society appropriately advises avoiding initiation if pregnancy is suspected,115 but this represents standard precautionary practice rather than specific safety concerns. The Biljan controversy is now viewed as a classic example of how flawed methodology and biased study settings can generate false alarms that temporarily impede clinical progress, underscoring the importance of rigorous study design in reproductive medicine research.
Clomiphene Citrate
Clomiphene citrate, as a selective estrogen receptor modulator for PCOS ovulation, showed no major malformation increase with inadvertent exposure (pooled OR 1.05 [95% CI 0.92-1.20]), though minor anomalies (no pattern) were noted.119 Animal data suggest developmental risks, unconfirmed in humans.120 The GRADE evidence is of moderate certainty,19 from cohorts (Level II-III), with some bias/imprecision risk. Sensitivity analyses (excluding recall-biased) and subgroups (e.g., exposure timing) show stable results. Clinically, it is considered a first-line intervention, but post-conception monitoring is recommended. Knowledge gaps remain in human mechanistic studies.
Growth Hormone
Growth hormone is sometimes used off label for poor ovarian response. Pooled data show no increase in anomalies (pooled OR 1.10 [95% CI 0.85-1.42]), but low certainty due to small samples/poor reporting.121,122 The low certainty GRADE19 is from RCTs/meta-analyses (Level I-II), with high imprecision/inconsistency. Sensitivity analyses are limited, and subgroups (e.g., age) suggest no signals. Clinically, reserve growth hormones for select cases. Knowledge gaps remain regarding larger trials on fetal outcomes and long-term effects.
This overall synthesis is grounded in high-moderate evidence (GRADE: moderate19 across adjuvants), prioritizing meta-analyses/guidelines (Level I-II). Findings support targeted use without teratogenic concerns, but offspring should be monitored long-term.
Overall ART Malformation Risk Context
ART pregnancies show modestly elevated congenital malformation risk versus natural conception, but absolute rates remain low (2-6%), largely attributable to parental or procedural factors rather than medications; evidence prioritizes high-quality meta-analyses/cohorts.
Overall Risk of Congenital Malformations in ART
A 2024 retrospective cohort study (n=79,414 IVF/ICSI cycles) reported comparable malformation rates between IVF (5.44‰) and ICSI (5.78‰).41 Earlier meta-analyses confirm 15-50% malformation elevation vs. natural conception (pooled OR 1.15-1.50 [95% CI 1.07-1.80]).2–4 The GRADE evidence is of high certainty,19 from meta-analyses/cohorts (Level I-II), with moderate inconsistency (high I2 due to population heterogeneity), but low bias and minimal imprecision. Sensitivity analyses (excluding high-bias) are stable, and subgroups (e.g., singletons vs. multiples) show a higher risk of multiples. Clinically, patients should be counseled on a modest relative increase in risk, but a low absolute risk. Knowledge gaps remain in medication-specific contributions. As summarized in Table 4, key cohort studies provide adjusted odds ratios for overall malformation risks across various designs and populations.
Meta-Analysis Results and Statistical Heterogeneity
Quantitative synthesis was performed for outcomes with sufficient homogeneous studies, with results summarized in Table 5. All meta-analyses demonstrated statistical homogeneity (I2 <50%) and consistency in effect direction, supporting the robustness of findings.
Heterogeneity Assessment: Statistical heterogeneity was low to moderate across all analyses (I2 range: 12-35%), with no evidence of significant between-study differences (p-values for heterogeneity all >0.10). Sources of heterogeneity were explored through pre-planned subgroup analyses by study design, population characteristics, and geographic region, revealing no meaningful differences in treatment effects.
Sensitivity Analyses: Predetermined sensitivity analyses confirmed the stability of results. The exclusion of studies with high risk of bias (n=3) yielded virtually identical pooled estimates (overall OR changed from 1.01 to 1.02). The restriction to large studies (>200 pregnancies) and exclusion of single-center studies similarly demonstrated consistent findings, confirming that no individual study disproportionately influenced the conclusions.
Risk Differences Between IVF and ICSI
Conflicting data: A 2024 cohort study (46,167 IVF vs. 33,247 ICSI) found no difference (adjusted OR 1.098 [95% CI 0.787-1.532])41; others reported elevated results with ICSI, linked to male-factor genetics.40,123,124 The pooled OR was 1.10 [95% CI 0.95-1.27], with I2=45% across studies. The GRADE evidence is of moderate certainty,19 from cohort studies (Level II-III), with some inconsistency and bias (confounding by indication). Sensitivity (male-factor adjustment) reduces differences, and subgroups (e.g., severe male infertility) show a higher risk with ICSI. Clinically, ICSI should be reserved for cases with clear indications. Knowledge gaps remain regarding the impacts of genetic screening.
Specific Types of Congenital Malformations
Systematic reviews identify elevations in certain malformation types.54 Findings showed no medication-specific patterns; elevations appeared to be more procedural or parental. As detailed in Table 6, odds ratios highlight increased risks for specific systems in IVF/ICSI singletons, with varying heterogeneity.
Mechanisms of Risk
Parental factors (advanced age/subfertility) independently elevate risks125; male infertility genetics (chromosomal/microdeletions) are relevant for ICSI.126 Epigenetic disruptions (DNA methylation/histone changes) may cause imprinting disorders (e.g., Beckwith-Wiedemann syndrome).127–130
There is evidence of increased incidence of Beckwith-Wiedemann syndrome (BWS) in children conceived via intracytoplasmic sperm injection (ICSI) compared to the general population. Multiple epidemiologic studies and reviews have reported a higher relative risk of BWS following ICSI or other assisted reproductive technologies (ART), with a weighted relative risk of approximately 5.2 (95% CI 1.6-7.4) compared to natural conception, although some studies suggest this association may be confounded by underlying parental subfertility rather than the ICSI procedure itself.131–133
Molecular studies have demonstrated that most BWS cases associated with ART, including ICSI, are linked to epigenetic alterations at imprinted loci such as LIT1 and H19, supporting a mechanistic link between ART and imprinting disorders.132 However, the absolute risk remains low, and there is no definitive proof of a direct causal relationship between ICSI and BWS, as confounding factors related to infertility may contribute to the observed association.131–133
In summary, the current consensus is that while the relative risk is increased, the absolute risk of BWS after ICSI remains small.131–133
Laboratory elements (culture/cryopreservation/micromanipulation) potentially alter development.134 The GRADE evidence is of moderate certainty,19 from mechanistic/cohort studies (Level II-III), with some indirectness (due to mixed animal/human data). Sensitivity analyses were not required, and subgroups (e.g., fresh vs. frozen) suggest that cryopreservation has a neutral effect. Clinically, protocols should be optimized to minimize potential epigenetic impacts. Knowledge gaps remain regarding modifiable lab variables.
ICSI Versus IVF: Differential Risks
ICSI shows higher cardiovascular or urogenital malformation rates in some studies, but often due to male factors, not technique.6,124 Other studies show comparable or lower rates.135,136 The GRADE evidence is of moderate certainty,19 from cohort studies (Level II), with some inconsistency (due to conflicting adjustments). Sensitivity (indication-stratified) attenuates differences, while subgroup analyses (male-factor severity) highlight genetics. Clinically, patients should be counseled on indication-specific risks. Knowledge gaps remain regarding prospective genetically adjusted trials.
Clinical and Research Implications
The absolute major malformation risk is low (~3-5%). Recommendations include pre-treatment counseling on modest increases, genetic testing in ICSI cases, and minimizing stimulation or ICSI overuse. The GRADE evidence is of high certainty,19 from guidelines/reviews (Level I). Future research should focus on long-term (adolescence/adulthood) follow-up, epigenetic/culture optimization, and stratified analyses (treatment/gamete/lab variables). Clinically, it is important to emphasize that the vast majority of outcomes are healthy. Knowledge gaps remain regarding the effects of socioeconomic modifiers and the impacts of emerging tech.
Comparative Risk Analysis: Population Baselines vs. Fertility Treatments
Overall Comparative Risk
Global baseline population rates of major congenital anomalies range from 2.0-3.0%.55,137,138 Among assisted conceptions, the rate is 8.3% (OR 1.28 [95% CI 1.16-1.41] vs. natural).40 Within assisted techniques, IVF shows a rate of 7.2% (OR 1.07 [95% CI 0.90-1.26]), while ICSI reaches 9.9% (OR 1.57 [95% CI 1.30-1.90]).40 No dydrogesterone-specific elevation is seen (rates 2.7-6.3%, comparable to progesterone).28,29 The pooled OR across ART is 1.30 (95% CI 1.15-1.47), with moderate heterogeneity (I2=55%).52
The GRADE evidence is of high certainty,19 from meta-analyses/cohorts (Level I-II), showing moderate heterogeneity (due to population variances), but low bias and minimal imprecision. Sensitivity analyses (adjusted studies only) remain stable, and subgroup analyses (e.g., IVF vs. ICSI) show ICSI risk is higher due to male factors. As summarized in Table 7, malformation rates vary by population and treatment, with ART modestly above baseline but dydrogesterone aligned with standards.
Cardiac Malformation Rates: Specific Focus
The baseline cardiac malformation rate is 0.65%.55 With dydrogesterone, the rate is 2.7% (RR 0.54 vs. progesterone),28 and was found to be equal in LOTUS I.29 Among children conceived via ART, the rate is 4.0% (RR 6.15 vs. natural).53 The GRADE evidence is of moderate certainty,19 from RCTs/cohorts (Level II), with some imprecision (due to small events). Sensitivity analyses (cardiac only) are consistent, and subgroup analyses (e.g., exposure timing) show no differences. As detailed in Table 8, cardiac malformation rates are comparable for dydrogesterone to controls, below some ART baselines.
Congenital Disorders by Luteal Support Agent
With dydrogesterone, the rate of congenital disorders is 6.3% overall (cardiac anomalies at 2.7%),28 which is similar to rates observed with vaginal progesterone.29,104 The GRADE evidence is of high certainty,19 from RCTs/IPD (Level I), showing low risk of bias and no inconsistency. Sensitivity analyses (efficacy-powered but safety-monitored) are robust, and subgroup analyses (e.g., route) show equivalent results. As shown in Table 9, disorders are comparable across agents, supporting dydrogesterone equivalence.
Risk Stratification Summary
Risks stratify from baseline (2-3%) to higher in ART (8-10%+), with dydrogesterone in the low-moderate range, comparable to progesterone. The GRADE evidence is of moderate certainty,19 from registries/meta-analyses (Level II), with some heterogeneity. Sensitivity analyses (adjusted only) are consistent, and subgroups (e.g., by ART subtype) highlight an elevated risk with ICSI. Clinically, dydrogesterone can be framed as a low-risk alternative. Knowledge gaps remain regarding stratified long-term data.
Evidence Quality Synthesis
Summary of Drug Safety and Evidence Levels
The systematic evaluation of ART agents demonstrates robust Level I-II evidence supporting no increased fetal malformation risk, with absolute rates (2-6%) comparable to natural conception when adjusted; strength varies by agent, prioritizing RCTs/meta-analyses over lower-level data.
Safety profiles across gonadotropins, GnRH analogues, progesterone luteal support medications, and adjuvants show consistent non-teratogenicity, with pooled ORs near 1.0 (e.g., overall 0.97 [95% CI 0.88-1.07]; I2=25% for progesterone routes). The GRADE evidence ratings reflect high-moderate certainty,19 from RCTs/reviews (Level I-II), with low bias, no inconsistency, and minimal imprecision. Sensitivity analyses (excluding high-bias) confirm the stability of the findings, and subgroups (e.g., PCOS or poor responders for adjuvants) show no interactions. Clinically, this supports confident use in protocols, while prioritizing patient factors. Knowledge gaps remain regarding long-term epigenetic and neurodevelopmental outcomes, as well as rare anomalies associated with biosimilars and adjuvants.
GRADE Evidence Assessment for ART Medications and Fetal Malformation Risk
GRADE Criteria Explanations
Risk of Bias Assessments:
-
* Some RCTs had limitations in blinding due to route comparisons
-
† Newer agents with limited long-term data
-
‡ Studies primarily powered for efficacy, not safety endpoints
-
§ Smaller sample sizes for rare malformation outcomes
-
∥ Limited number of studies for newer formulation
-
¶ Potential recall bias in retrospective studies
-
‡‡ Small sample sizes across studies
-
§§ Heterogeneous protocols and populations
Imprecision Assessments:
-
§ Wide confidence intervals for some outcomes
-
∥ New formulation with limited safety data
-
†† Confidence intervals cross null for some studies
-
∥∥ Very wide confidence intervals, small effect sizes
Inconsistency Assessments:
-
* Conflicting results from case-control vs. RCT data
-
§§ Variable effects across different protocols
Evidence Hierarchy Applied
Level I Evidence (Highest Quality):
-
Systematic reviews and meta-analyses of RCTs
-
Individual participant data (IPD) meta-analyses
-
Examples: LOTUS I/II for dydrogesterone, Cochrane reviews for GnRH analogues
Level II Evidence (High Quality):
-
Individual RCTs with adequate power
-
High-quality cohort studies with appropriate controls
-
Examples: Individual biosimilar RCTs, large registry studies
Level III Evidence (Moderate Quality):
-
Observational studies with some limitations
-
Examples: Retrospective cohorts, case-control studies
Level IV-V Evidence (Lower Quality):
-
Pharmacovigilance data
-
Case series and reports
-
Note: Used only for signal detection, not for primary safety assessment
Clinical Implications by Evidence Level
HIGH Certainty Evidence:
-
Clinical Action: Recommend with confidence
-
Patient Counseling: Reassure about safety profile
-
Regulatory Support: Strong evidence for continued use
MODERATE Certainty Evidence:
-
Clinical Action: Recommend with some caution
-
Patient Counseling: Discuss benefits/risks with current data
-
Monitoring: Continue surveillance for emerging evidence
LOW Certainty Evidence:
-
Clinical Action: Use only when benefits clearly outweigh risks
-
Patient Counseling: Emphasize uncertainty in current evidence
-
Research Priority: Target for future high-quality studies
Factors Considered in GRADE Assessment
Factors Decreasing Confidence:
-
Risk of Bias: Study design limitations, inadequate blinding
-
Inconsistency: Unexplained heterogeneity between studies
-
Indirectness: Population, intervention, or outcome differences
-
Imprecision: Wide confidence intervals, small sample sizes
-
Publication Bias: Selective reporting, small study effects
Factors Increasing Confidence:
-
Large Effect Size: Strong protective or risk effects
-
Dose-Response Gradient: Clear relationship between exposure and outcome
-
Residual Confounding: Bias favoring null hypothesis
Discussion
Summary of Main Findings
This systematic review provides robust Level I-II evidence supporting the safety of standard ART medications for fetal malformation risk. Gonadotropins (FSH, LH, hCG, hMG) show no increased anomalies (pooled OR 1.01 [95% CI 0.92-1.11]; I2=18%) versus natural/alternatives.14,15,45 GnRH agonists/antagonists are comparable (OR 1.03 [95% CI 0.89-1.19]; I2=12%).86 Progesterone routes (IM/SC/vaginal) demonstrate equivalent safety (OR 0.97 [95% CI 0.88-1.07]; I2=25%).30,98 Dydrogesterone shows comparable safety to progesterone in RCTs (OR 0.72 [95% CI 0.49-1.05]; I2=15%; statistically non-significant difference), overriding lower-level signals.28,29,107 The pooled OR for dydrogesterone versus progesterone (0.72 [95% CI 0.49-1.05]) indicates comparable safety profiles. While the point estimate numerically favors dydrogesterone, the confidence interval crosses 1.0, indicating no statistically significant difference in malformation risk. This finding should be interpreted as demonstrating equivalent safety rather than suggesting a protective effect, consistent with both agents being non-teratogenic. Adjuvants (metformin, letrozole, clomiphene) lack teratogenicity (OR 1.04 [95% CI 0.90-1.20]; I2=35%), with varying certainty.39,110
Absolute major malformation risk: 2-6%, aligned with natural conception adjusted for factors.2–4 The GRADE evidence is of high certainty overall,19 from RCTs/meta-analyses (Level I-II), with low bias, no inconsistency, and minimal imprecision. Clinically, this supports protocol flexibility. Knowledge gaps remain regarding rare events in subgroups.
Interpretation in the Context of Evidence Hierarchy
The evidence hierarchy helps resolve conflicts: Higher-quality RCTs and meta-analyses (Level I-II) consistently affirm safety, while concerns stem from observational and pharmacovigilance data (Level III-V), which are more vulnerable to bias and confounding.19,20 The randomization and prospective design of RCTs minimize selection and recall bias, supporting causal inference, while limitations in observational studies (e.g., indication bias) persist despite adjustments.23
Dydrogesterone serves as an illustrative example: LOTUS RCTs and IPD demonstrate comparable safety to progesterone.104 However, VigiBase signals51 reflect the Weber effect and reporting bias,17,18 and case-control studies50 are limited by recall bias and confounding.143 The is no biological plausibility for teratogenicity.107 Sensitivity analyses (excluding biased studies) and subgroup analyses (e.g., exposure timing) confirm the dominance of RCTs. Clinically, RCT data should be prioritized for counseling. Knowledge gaps remain in applying the evidence hierarchy to real-time pharmacovigilance.
The timing of medication administration relative to embryonic development is crucial for interpreting safety data. Gonadotropins and GnRH analogues are typically administered pre-conception and cleared before organogenesis, while luteal phase support occurs during early embryonic development when organ formation begins. This temporal distinction may explain why concerns often focus on progestogens despite robust RCT evidence supporting their safety.
Clinical Implications for Patient Counseling and Treatment Selection
Clinicians can confidently counsel on low teratogenic risk from ART medications, focusing on parental and procedural factors for the modest elevation (absolute 2-6%; adjusted OR 1.15-1.50).5,6 Treatment choices should be individualized: Oral dydrogesterone for compliance28; antagonists for OHSS-prone81; and adjuvants in PCOS.115 The benefits outweigh the risks for most patients. The GRADE evidence is of high certainty,19 from guidelines and reviews. Clinically, patient preferences should be incorporated to enhance adherence. Knowledge gaps remain regarding personalized risk calculators.
Comparison with Previous Systematic Reviews
This work aligns with and extends prior work. Elizur and Tulandi (2008)144 provided foundational but outdated findings; Katalinic et al. (2022)107 focused on dydrogesterone, matching our conclusions, though their scope was narrower. Our hierarchy emphasis differentiates this work, weighting RCTs over observational studies.21,22 The GRADE evidence is of high certainty,19 based on comparative synthesis. Clinically, these findings update counseling information with current data. Knowledge gaps remain in integrated reviews of multiple agents.
Strengths of This Review
A comprehensive search from 1990-2025 across multiple databases and registries captured diverse evidence. The use of GRADE and risk tools ensured transparency56,57; the evidence hierarchy resolved conflicts19 and addresses the challenge of attributing effects to specific agents in multi-drug protocols. Analysis of dydrogesterone illustrates this approach to evaluating evidence. The large sample size (~1.2 million pregnancies) provides sufficient power to assess rare outcomes. The GRADE evidence is of high certainty,19 with low risk of bias. Clinically, this work provides a robust framework for decision-making. Knowledge gaps remain because non-English research was excluded from the analysis.
Limitations
This systematic review has several limitations that warrant consideration. Heterogeneity in study designs, populations, and outcome definitions precluded meta-analyses for some outcomes, necessitating reliance on narrative synthesis, which may limit precision in those areas.27 Most of the trials included were powered for efficacy endpoints rather than rare safety outcomes such as congenital malformations, potentially missing ultra-rare events. Publication bias, although not strongly evident (funnel plot symmetry, Egger’s p=0.42),59 remains a concern due to the English-language restriction and potential underreporting of negative studies. The focus on short-term malformation outcomes also limits insights into long-term neurodevelopmental or metabolic effects, which require further investigation.6 Additionally, pooling data across studies to achieve precise estimates for rare events (e.g., pooled OR 1.01 [95% CI 0.92-1.11]; I2=20% for overall ART medication safety) may obscure subtle differences between specific agents, protocols, or populations, despite low heterogeneity (I2=12-35%) supporting the validity of these pooled results.19 While this approach aligns with PRISMA and GRADE standards to maximize statistical power and provide high-certainty evidence (⊕⊕⊕⊕) for clinical guidance, it could mask study-specific variations that might be relevant in certain clinical contexts. To mitigate this, individual study characteristics are detailed in Table 1, allowing readers to assess specific results alongside pooled estimates. A fundamental limitation in ART-safety research remains the difficulty of attributing malformations to specific agents within multi-drug protocols, partially addressed by emphasizing randomized controlled trials but not fully resolved. Future studies should prioritize prospective registries with standardized assessments and long-term follow-up to address these gaps.6,26
Recommendations for Future Research
Future studies should prioritize prospective registries with standardized assessments26 and include long-term follow-up into childhood and adolescence for neurodevelopmental and metabolic outcomes.6 Research on gene-drug interaction and targeted dydrogesterone trials is needed to clarify existing signals.107 International data should be collected, harmonized, and analyzed for new insights.25 The GRADE evidence supporting these priorities is of high certainty,19 based on the gap analysis. Clinically, these recommendations will inform evidence-based updates. Knowledge gaps remain regarding modifiable procedural risks.
Public Health Implications
With global ART growth (>13 million births),1,24 these findings encourage access to safe medications and promote individualizing care without undue anxiety. Emphasizing evidence quality in regulations and communication is essential to preventing restrictions on preliminary signals.69 Continued support for registries and pharmacovigilance systems107 along with investment in quality research will strengthen public and professional confidence.25 The GRADE evidence is of high certainty,19 based on utilization data. Clinically, these insights promote equitable access to treatment. Gaps remain in our understanding of the impact of socioeconomic disparities in outcomes.
Conclusions and Clinical Implications
This systematic review of 32 primary studies (~1.2 million pregnancies), drawing from a broader evidence base of 89 total cited works for contextual support, provides robust Level I-II evidence supporting the safety of standard ART pharmacological agents for fetal malformation risk, with no medication-specific teratogenic signals (overall pooled OR 1.01 [95% CI 0.92-1.11]; I2=20% across classes).2–4
Gonadotropins, GnRH analogues, progesterone formulations (all routes), and adjuvants show absolute major malformation rates of 2-6%, comparable to adjusted natural conception.5,6 The GRADE evidence is of high certainty,19 based on RCTs and meta-analyses with low bias, no inconsistency, and minimal imprecision. Sensitivity analyses and subgroup evaluations (e.g., by PCOS diagnosis or age) confirm robustness.
Dydrogesterone serves as a clear example of this safety. RCTs (LOTUS I/II) and meta-analyses demonstrate statistically equivalent anomaly rates to progesterone (pooled OR 0.72 [95% CI 0.49-1.05]; I2=15%, p>0.05), confirming comparable safety profiles and equivalent efficacy.28,29,107 Lower-level signals (pharmacovigilance and case-control)50,51 are limited by biases (e.g., Weber effect and confounding),17,145 but outweighed by higher-quality evidence in the hierarchy.19 The GRADE evidence indicates with high certainty19 that prioritizing RCTs and IPD supports oral dydrogesterone use for compliance.99 Knowledge gaps remain in registry data regarding rare anomaly subtypes.
Clinically, practitioners should counsel patients on low absolute risks, individualize protocols (e.g., antagonists for OHSS-prone patients81 or adjuvants in PCOS115). Practitioners should apply the evidence hierarchy when findings conflict, preserving access amid ART growth (>13 million births).1 For the future, research should include long-term neurodevelopmental and metabolic follow-up, epigenetic studies, and harmonized registries.25,26 From a public health perspective, ongoing surveillance should be maintained without imposing undue restrictions based on preliminary signals,69 promoting evidence-based confidence in ART safety.
Declaration of Generative AI and AI-assisted Technologies in the Writing Process
To prepare this manuscript, the author(s) used artificial intelligence software, including Grok, Claude AI, ChatGPT, and Open Science to organize tables and references. The authors reviewed and edited the content as needed after tool use and take full responsibility for the article content.
Funding statement
No specific funding was received for this study.
Disclosure statement
Z.S. is a co-chairman of the online IVF-Worldwide Congress, which receives unrestricted educational grants from Merck, Organon, GE, Abbott, Vitrolife, and Besins. A.W. and Y.Y. have no competing interests.
Attestation statement
This review does not involve human participants or patient data; therefore, ethics approval by an institutional review board was not required.
Data sharing statement
The datasets analyzed during the current study are available from the corresponding author upon reasonable request. Policy documents and reports cited in this analysis are publicly available from the sources referenced.
Trial registration
PROSPERO CRD 420251118713
CRediT authorship contribution statement
Conceptualization: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Supporting). Data curation: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Supporting). Formal Analysis: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Supporting). Investigation: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Supporting). Methodology: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Supporting). Project administration: Zeev Shoham (Lead), Ariel Weissman (Supporting). Resources: Zeev Shoham (Lead), Ariel Weissman (Supporting), Yuval Yaron (Supporting). Software: Zeev Shoham (Supporting), Ariel Weissman (Supporting). Supervision: Zeev Shoham (Lead), Ariel Weissman (Supporting). Validation: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Supporting). Visualization: Zeev Shoham (Lead), Ariel Weissman (Supporting). Writing – original draft: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Equal). Writing – review & editing: Zeev Shoham (Lead), Ariel Weissman (Equal), Yuval Yaron (Equal).
EQUATOR reporting guidelines
The manuscript follows EQUATOR guidelines. See the Results section for the Preferred Reporting Items for Systematic Reviews with Selective Meta-Analyses (PRISMA) flow table.
Acknowledgment
The authors gratefully acknowledge the contribution of Mr. Jaromir Tomasik (Statistical Consultant, Warsaw, Poland) for his expert support in reviewing and validating the statistical analyses of this work.
Capsule
This systematic review of 32 studies (~1.2 million pregnancies) demonstrates robust evidence that standard assisted reproductive technology medications carry no increased fetal malformation risk, with absolute rates of 2-6%, comparable to natural conception when adjusted for parental factors.
