Warning: fopen(/home/virtual/enm-kes/journal/upload/ip_log/ip_log_2026-06.txt): failed to open stream: Permission denied in /home/virtual/lib/view_data.php on line 100 Warning: fwrite() expects parameter 1 to be resource, boolean given in /home/virtual/lib/view_data.php on line 101 Incorporating Artificial Intelligence into Fracture Risk Assessment: Using Clinical Imaging to Predict the Unpredictable
Skip Navigation
Skip to contents

Endocrinol Metab : Endocrinology and Metabolism

clarivate
OPEN ACCESS
SEARCH
Search

Articles

Page Path
HOME > Endocrinol Metab > Volume 40(4); 2025 > Article
Review Article
Mineral, bone & muscle Incorporating Artificial Intelligence into Fracture Risk Assessment: Using Clinical Imaging to Predict the Unpredictable
Keypoint
- Artificial intelligence offers new opportunities to improve fracture risk assessment by extracting structural signals from routine medical imaging.
- This review summarizes recent advances in imaging-based artificial intelligence models.
- AI models using X-ray, CT, and DXA scans can detect and predict osteoporotic fractures with improved accuracy.
- Time-to-event prediction and visual outputs such as survival plots and Grad-CAM enhance interpretability.
- Challenges include data bias, generalizability, automation bias, and regulatory integration.
Sung Hye Kongorcid
Endocrinology and Metabolism 2025;40(4):499-507.
DOI: https://doi.org/10.3803/EnM.2025.2518
Published online: August 4, 2025

Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea

Corresponding author: Sung Hye Kong. Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, 82 Gumi-ro 173beon-gil, Bundang-gu, Seongnam 13620, Korea Tel: +82-31-787-7029, Fax: +82-31-787-7029, E-mail: endo.shkong@gmail.com
• Received: June 24, 2024   • Revised: July 3, 2025   • Accepted: July 4, 2025

Copyright © 2025 Korean Endocrine Society

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 5,920 Views
  • 119 Download
  • 4 Web of Science
  • 7 Crossref
  • 4 Scopus
next
  • Artificial intelligence (AI) is increasingly being explored as a complementary tool to traditional fracture risk assessment methods. Conventional approaches, such as bone mineral density measurement and established clinical risk calculators, provide population-level stratification but often fail to capture the structural nuances of bone fragility. Recent advances in AI—particularly deep learning techniques applied to imaging—enable opportunistic screening and individualized risk estimation using routinely acquired radiographs and computed tomography (CT) data. These models demonstrate improved discrimination for osteoporotic fracture detection and risk prediction, supporting applications such as time-to-event modeling and short-term prognosis. CT- and radiograph-based models have shown superiority over conventional metrics in diverse cohorts, while innovations like multitask learning and survival plots contribute to enhanced interpretability and patient-centered communication. Nevertheless, challenges related to model generalizability, data bias, and automation bias persist. Successful clinical integration will require rigorous external validation, transparent reporting, and seamless embedding into electronic medical systems. This review summarizes recent advances in AI-driven fracture assessment, critically evaluates their clinical promise, and outlines a roadmap for translation into real-world practice.
Despite substantial progress in osteoporosis management, fragility fractures remain prevalent among patients who are neither adequately screened nor treated in time. Frequently, these fractures represent the initial clinical manifestation of skeletal fragility and are associated with considerable morbidity, mortality, and societal impact [1]. Therefore, improving our ability to identify individuals genuinely at risk—particularly those facing imminent fracture risk—has emerged as an urgent clinical priority [1].
Conventional tools, such as bone mineral density (BMD) measurement and well-established fracture assessment models—including the fracture risk assessment tool (FRAX) algorithm and the Garvan calculator—have formed the foundation of fracture risk evaluation for decades [2,3]. Their clinical utility is supported by extensive validation across varied populations and integration into international guidelines. These tools offer accessible and standardized estimates of long-term fracture risk, remaining central to routine practice. However, while BMD and clinical risk calculators perform well at the population level, they may not adequately capture subtle changes in bone quality, microarchitecture, or skeletal structure—features that can influence fracture risk in ways not fully reflected by conventional metrics [4].
In recent years, artificial intelligence (AI)—in particular, deep learning approaches applied to medical imaging—has emerged as a promising adjunct to traditional risk assessment strategies [5]. These models can utilize high-dimensional image data to detect subclinical skeletal changes that may be imperceptible to the human eye. Nonetheless, important challenges remain, including issues of interpretability, algorithmic bias, and effective integration into clinical workflows.
This review delineates the current landscape of AI-based fracture assessment, critically evaluates its clinical potential, and discusses the key considerations necessary for successful implementation in real-world clinical practice.
Terminology and concepts
AI refers to computational systems engineered to perform tasks traditionally requiring human intelligence, including recognition, prediction, and decision-making [6]. In clinical settings, AI may be conceptualized as an automated assistant capable of learning patterns from data, analogous to how experienced clinicians develop diagnostic intuition over time. Machine learning (ML), a subfield of AI, enables models to enhance their performance by learning from labeled datasets. Deep learning, a further subdomain of ML, utilizes multilayered neural networks designed to process complex, high-dimensional data such as medical images [7].
Convolutional neural networks (CNNs) are the most widely used deep learning architecture in medical imaging, particularly suited to analyzing pixel-based data [8]. Analogous to how radiologists interpret visual patterns in X-rays, CNNs learn to detect discriminative image features through iterative training [9]. Ensemble learning, another approach in AI, leverages the combined strengths of multiple models to improve predictive accuracy, much as a multidisciplinary team integrates expert opinions [9].
Radiomics involves extracting quantitative features from medical images, thereby transforming them into structured data suitable for AI analysis [10,11]. In contrast, deep learning models can directly interpret unstructured inputs, such as raw images or free-text clinical notes, eliminating the need for predefined feature engineering. Most current imaging AI models for fracture risk prediction employ supervised learning, where outcomes guide the model to recognize and prioritize predictive features [12].
Prerequisites for AI development
Developing robust, clinically relevant AI models requires access to large, high-quality datasets [13]. In fracture prediction, this means not only acquiring imaging data, such as X-rays, computed tomography (CT), or dual-energy X-ray absorptiometry (DXA) scans, but also ensuring the availability of accurate outcome labels indicating whether and when a fracture occurred. Such labeled datasets are the foundation for supervised learning, allowing models to discern associations between input features and clinical events. Just as evidence-based guidelines are grounded in well-characterized cohorts, AI algorithms depend on rigorously curated data to minimize spurious associations and overfitting.
Equally important is the diversity and representativeness of training data [13]. Models developed solely from narrow populations, such as a single ethnicity, region, or healthcare system, may not generalize effectively to other clinical environments. This issue, known as dataset shift, can significantly diminish real-world model performance. Therefore, including demographically diverse patients, varied imaging equipment, and data from multiple institutions is essential to ensure fairness and broader applicability.
Proper data partitioning is another fundamental aspect of AI model development [14]. Typically, datasets are divided into training, validation, and test subsets. The training set is used for model learning, the validation set for parameter tuning and overfitting prevention, and the test set for unbiased performance estimation. To rigorously assess generalizability, external validation using independent cohorts, preferably from different institutions, is strongly recommended before clinical deployment.
Opportunistic osteoporosis screening from routine imaging
Medical imaging performed for non-skeletal indications, such as chest radiographs or chest and abdominal CT scans obtained during health check-ups, offers a valuable opportunity for opportunistic osteoporosis screening [15,16]. These studies are frequently conducted in older adults for various clinical purposes and often capture bony structures that are not the primary focus of interpretation [17]. With AI, especially deep learning algorithms, these routine images can be retrospectively analyzed to extract clinically relevant information on skeletal health, requiring no additional radiation exposure or patient burden.
AI models developed for osteoporosis screening typically estimate BMD or classify bone quality based on image-derived features from visible skeletal regions, including the spine, ribs, clavicle, or pelvis [18-20]. Studies have shown that chest radiograph–based deep learning algorithms can achieve high diagnostic performance in identifying individuals with osteoporosis, with accuracy comparable to DXA-based screening in certain cohorts [17,21]. These findings suggest that opportunistic AI-drivenscreening may help identify patients at risk who would otherwise not be referred for formal osteoporosis assessment.
Opportunistic fracture screening from routine imaging
Deep learning algorithms have demonstrated strong performance in the opportunistic detection of osteoporotic fractures at various skeletal sites, including the spine, hip, proximal humerus, pelvis, and distal radius [22]. These fractures are frequently missed in routine imaging, particularly when displacement is subtle or the fracture is incidental.
Multiple studies have demonstrated that AI models can match or surpass radiologists’ performance. For vertebral fractures, CNNs applied to CT scans effectively identify compression fractures, often outperforming human readers, especially for mild to moderate cases that tend to be underreported in clinical practice [23,24]. In hip fracture detection, a multi-center CNN trained on over 5,000 pelvic radiographs achieved expert-level accuracy in identifying and classifying femoral neck and intertrochanteric fractures [25]. Similarly, AI systems have demonstrated high accuracy (area under the receiver operating characteristic curve [AUROC] >0.90) in detecting proximal humerus fractures on shoulder radiographs. The detection of distal radius fractures has also been improved by AI [26]. For example, a dual-channel network analyzing anterolateral and lateral wrist views achieved approximately 96% accuracy and superior precision-recall performance compared to Faster R-CNN models [27].
As these AI models undergo validation across diverse imaging modalities and clinical environments, opportunistic AI-assisted fracture screening presents a scalable strategy to improve diagnostic sensitivity, reduce missed injuries, and support timely osteoporosis management.
DXA-based fracture risk assessment
Texture-based radiomic analysis of DXA images has emerged as a powerful approach for fracture risk stratification that goes beyond traditional BMD metrics. A landmark study by Hong et al. [28] analyzed texture features extracted from hip DXA scans of older women. Using a random forest-based radiomics score, they demonstrated robust predictive performance: each unit increase in the score was linked to a 4% to 6% rise in hip fracture risk, independent of femoral neck T-score, with predictive accuracy surpassing FRAX [28]. Supporting research using finite element analysis (FEA) derived from DXA or quantitative CT (QCT) has also shown that imaging-based texture and biomechanical features add significant value [29]. In particular, QCT-based FEA outperformed areal BMD in smaller cohorts, demonstrating superior discriminative ability for hip fractures overall [29].
Building on this foundation, multi-modal AI models now combine DXA-derived radiomics with clinical risk factors. In a recent arXiv preprint, a staged ensemble model that integrated DXA images with clinical features (age, body mass index [BMI], fracture history) achieved an AUROC of 0.95, significantly outperforming clinical variables alone (AUROC approximately 0.72). This model also utilized uncertainty metrics to guide the selective use of imaging, thereby improving efficiency [30].
Collectively, these studies support the idea that DXA-based AI models, especially those leveraging texture and biomechanical features, can provide meaningful predictive value that augments traditional fracture risk tools.
Plain radiography-based fracture risk assessment
While deep learning-based fracture detection on radiographs has rapidly progressed, relatively few studies have addressed the more challenging task of predicting future fracture risk from plain radiographs. Recently, research has begun to explore whether routinely acquired skeletal radiographs contain latent information that can be harnessed for long-term fracture risk estimation with real clinical utility.
A notable example is the Osteoporotic Fractures in Men cohort analysis, which investigated whether hip radiographs could predict incident hip fractures in a large, prospective male cohort [31]. In this study, researchers trained a CNN model on baseline hip X-rays and validated it in 3,108 men, 89 of whom suffered hip fractures during a 10-year follow-up. The AI model—trained solely on radiograph data—achieved an AUROC of 0.81, substantially outperforming areal BMD alone (AUROC 0.65) and FRAX without BMD (AUROC 0.70). Even after incorporating age and BMI, the radiograph-based model retained its predictive advantage. Importantly, this model relied only on proximal femur images, underscoring the substantial, often underutilized, prognostic value present in conventional radiographs [31].
Another example is the study by Kong et al. [32], which developed a risk model using lateral lumbar spine radiographs from asymptomatic older adults to predict vertebral fractures over a median 40-month follow-up. Their DeepSurv model achieved a C-index of 0.612, modestly but meaningfully outperforming FRAX and Cox models. Of note, this study used real-world clinical imaging, supporting the generalizability of radiography-based risk assessment [32].
Adding to this body of evidence, a multitask deep learning model utilizing lateral spine radiographs and DXA-based vertebral fracture assessment was designed to simultaneously detect vertebral fractures, identify osteoporosis, and predict incident fractures [33]. The model demonstrated strong performance across detection and risk stratification tasks and was validated using longitudinal fracture outcomes, not just cross-sectional diagnoses. This multitask design exemplifies a more holistic approach to fracture risk prediction, allowing models to learn both structural abnormalities and prognostic patterns from the same input.
Despite these encouraging results, radiograph-based fracture risk assessment remains relatively underexplored compared to detection-focused models. Most AI applications for plain X-rays have concentrated on fracture identification rather than prospective risk prediction. Nevertheless, these early studies offer proof of concept that widely available radiographs may provide a scalable, low-cost means to identify individuals at high risk for fragility fractures.
CT-based fracture risk assessment
Although CT is typically used for non-skeletal diagnostic purposes, its rich three-dimensional depiction of bone morphology and quality has enabled new opportunities for opportunistic fracture risk prediction, especially when integrated with AI.
A landmark study by Dagan et al. [34] showed that CT scans acquired for routine clinical indications can be repurposed for fully automated estimation of future osteoporotic fracture risk. In a cohort of over 48,000 adults aged 50 to 90 years, the model extracted three imaging biomarkers from abdominal and chest CTs: presence of vertebral compression fractures, a simulated T-score, and lumbar trabecular attenuation. Combined with demographic metadata, these features predicted 5-year major osteoporotic fracture risk. The CT-based model achieved an AUROC of 0.73, outperforming the widely used FRAX tool without BMD (AUROC 0.66), and did not require dedicated imaging or user input.
Building on this, researchers developed a deep learning model that integrated not only vertebral features from abdominal CT, but also paraspinal muscle characteristics, acknowledging the emerging importance of muscle-bone interactions in fracture risk [35]. In a real-world asymptomatic screening population, their model achieved an AUROC of 0.827 in predicting 5-year incident vertebral fractures, surpassing both bone-only image models and FRAX.
In a biomechanical extension of CT-based assessment, Michalski et al. [36] and others applied FEA to lumbar and hip CT images to simulate bone strength under mechanical loading. When combined with ML, these models improved fracture discrimination beyond conventional BMD or morphometric measures, offering a physics-informed approach that may be especially useful in borderline or high-risk patients.
Recently, Kim et al. [37] developed a deep learning model to predict subsequent fracture risk in patients with prior hip fractures, using digitally reconstructed radiographs derived from standard hip CTs. This 2.5-dimensional ensemble model integrated frontal, lateral, and axial projections to generate individualized survival curves for estimating fracture-free probability. The model achieved AUROCs between 0.78 and 0.84, outperforming conventional clinical models. Notably, this work addresses the need for secondary prevention, where conventional tools like FRAX may underestimate short-term fracture risk in patients at very high risk.
Taken together, these studies demonstrate that CT-based AI models are powerful tools for both primary and secondary fracture risk prediction. By leveraging routinely acquired imaging, these approaches offer a promising path toward automated, individualized risk stratification. Future research should focus on standardizing CT protocols, validating models across diverse populations, and integrating prediction tools into radiology workflows and electronic health records to enable timely clinical decision-making.
A major barrier to the clinical adoption of AI models in medical imaging, including fracture risk prediction, is the ‘black-box’ nature of deep learning algorithms. Unlike conventional models such as FRAX or logistic regression, deep neural networks often lack direct interpretability, making it difficult for clinicians to understand the basis for their predictions. As AI begins to influence diagnostic or therapeutic decisions, interpretability becomes critical for clinical trust, accountability, and regulatory approval.
A widely used method for enhancing the interpretability of CNNs is gradient-weighted class activation mapping, which generates visual heatmaps highlighting regions of the input image that most strongly contributed to the model’s prediction [38]. In musculoskeletal imaging, these heatmaps often correspond to clinically relevant areas such as the femoral neck or vertebral bodies, mirroring radiologists’ reasoning processes [39,40].
Beyond heatmaps, intrinsically explainable models combine deep learning with rule-based or symbolic reasoning to better reflect clinical logic. For example, a recent neurosymbolic model for vertebral compression fractures first segments vertebrae using CNNs and then applies rule-based height criteria to detect fractures, yielding both transparency and high performance [40]. Advanced AI techniques can also identify and flag out-of-distribution (OOD) inputs, marking cases where the model’s confidence may be unreliable due to factors such as low image quality or unusual anatomy, thereby reducing risk in deployment [41]. Additionally, time-to-event visualizations, such as survival plots, offer a patient-centered means to convey risk. These models provide dynamic estimates of fracture-free probability over relevant timeframes, as used in recent post-hip fracture prediction studies, enabling clinicians to communicate personalized risk trajectories and enhancing patient understanding [37].
Transparent outputs allow clinicians to verify the model’s focus and reduce overreliance on AI-generated suggestions. Table 1 summarizes common interpretability methods in AI-based fracture risk assessment and their clinical significance.
While recent advances have improved the interpretability and performance of AI models in fracture risk prediction, several critical challenges persist. Foremost among these is the generalizability of such models across diverse clinical settings. As discussed earlier, models can fail when exposed to OOD inputs, such as images with different acquisition protocols or patients with atypical anatomy, which can lead to undetected prediction errors. Although confidence estimation and uncertainty detection can help mitigate these risks, they are not yet standard features in most deployed systems [41].
Equally important is the issue of bias introduced during model development, often stemming from imbalanced training datasets. If certain populations—such as older women or images from a single device vendor—are overrepresented, the model may produce systematically biased predictions for underrepresented groups, including men, younger adults, or racial minorities [42]. This bias can propagate inequities in clinical decision-making and fracture prevention.
Lastly, automation bias presents a subtle but significant risk in the clinical adoption of AI. This phenomenon describes the tendency of human users—especially under time constraints or cognitive load—to over-rely on automated systems, accepting AI-generated outputs without adequate critical evaluation [43,44]. For example, a high-risk prediction generated from a poor-quality image may be accepted without question if the model lacks an appropriate alert system or confidence score. This overreliance is particularly problematic in borderline or ambiguous cases, where diagnostic certainty is inherently low and nuanced clinical judgment is most needed. In such situations, an opaque AI model can shift decision-making authority away from the clinician, potentially resulting in missed diagnoses, unnecessary treatments, or missed opportunities for further evaluation. To mitigate automation bias, AI systems must be designed with transparency, uncertainty quantification, and clear mechanisms for accountability.
Bridging the gap between experimental AI models and clinical practice requires not only high predictive accuracy but also careful implementation strategies (Table 2). To ensure safe and effective deployment, AI models for fracture prediction must follow transparent development standards. Guidelines such as the Checklist for Artificial Intelligence in Medical Imaging (MICLAIM) [45], and Consolidated Standards of Reporting Trials–Artificial Intelligence extension (CONSORT-AI) [46] for randomized trials provide structured frameworks for assessing model quality, reproducibility, and transparency.
Beyond internal validation, robust external validation strategies are essential to assess generalizability. This includes evaluating model performance across different institutions, scanner types, and patient populations. Calibration plots, subgroup-specific metrics, and temporal validation can help uncover hidden performance gaps and promote equitable application across diverse clinical settings.
For safety, models should incorporate fail-safes and alert systems that notify users when predictions are likely to be unreliable. For instance, confidence scores or warning flags for low-quality images can help reduce the risk of inappropriate clinical decisions based on flawed inputs.
Finally, integration into existing clinical infrastructure, such as electronic medical record and picture archiving and communication system, is critical. Seamless AI deployment requires that models fit within radiology and endocrinology workflows, producing outputs that are interpretable, auditable, and easily accessible at the point of care.
AI-enhanced approaches to fracture risk assessment have shown promising potential to complement conventional tools by providing greater personalization and earlier identification of individuals at risk. Deep learning models trained on medical imaging can extract subtle structural features and generate time-specific risk estimates that surpass what is achievable with BMD or clinical risk calculators alone.
Current evidence suggests that, under controlled conditions, these AI models frequently outperform traditional methods in terms of discrimination and sensitivity, particularly for opportunistic screening and short-term fracture prediction. However, clinical implementation demands more than just algorithmic performance. For safe and effective integration into real-world workflows, models must be interpretable, generalizable across diverse populations and imaging protocols, and supported by robust validation and governance frameworks.
Continued interdisciplinary collaboration among clinicians, data scientists, and regulatory stakeholders will be essential to ensure that AI-driven tools for fracture risk assessment are not only technically robust but also clinically meaningful and ethically deployable.

CONFLICTS OF INTEREST

No potential conflict of interest relevant to this article was reported.

ACKNOWLEDGMENTS

This study was supported by the Korean Endocrine Society, and Korean Society of Bone and Mineral Research Young Investigator Award in 2022.

Table 1.
Interpretability Techniques in Imaging-Based Artificial Intelligence for Fracture Risk Assessment
Interpretability method Description Example in fracture prediction Clinical relevance
Grad-CAM Generates visual heatmaps showing which parts of an image contributed most to the model’s prediction Focuses on femoral neck or vertebral body deformities in fracture detection [39,40] Confirms that the model’s attention aligns with clinical reasoning; builds trust
Radiomics integration Extracts predefined features (e.g., texture and intensity) and correlates with deep features Uses radiomic features in DXA-based fracture models [28] Enhances transparency and interpretability; supports regulatory review
Neurosymbolic rule-based modeling Combines deep learning-based segmentation with transparent rule-based classification logic Vertebral height ratio rules applied after vertebral segmentation in CT [40] High accuracy with transparent logic; easier for clinicians and regulators to interpret
Survival plot visualization Predicts time-to-event outcomes and visualizes individual patient risk over time 2.5D ensemble model predicting subsequent fracture risk after hip fracture [37] Intuitive risk communication using survival curves; supports patient counseling

Grad-CAM, gradient-weighted class activation mapping; DXA, dual-energy X-ray absorptiometry; CT, computed tomography; 2.5D, 2.5-dimensional.

Table 2.
Summary of Technical and Clinical Considerations for the Effective Implementation of AI-Based Fracture Assessment Models
Domain Requirement Description Examples/references
Development transparency Adherence to reporting guidelines (e.g., MI-CLAIM and CONSORT-AI) Ensure reproducibility, quality control, and regulatory readiness Structured reporting of dataset, algorithm, and validation process [45]
External validation Multisite, multi-device, and multi-population evaluation Assess model generalizability across real-world settings Validation across institutions [21]
Calibration and equity Calibration plots and subgroup-specific metrics Detect performance drift and avoid bias across demographic groups C-index by sex/age; predicted vs. observed risk plots [34]
Fail-safes and alerts Confidence scoring, OOD detection, and image quality warnings Prevent unsafe use of AI under uncertain conditions Alert on low-quality input; defer-to-human in ambiguous cases
Workflow integration Embedding in EMR, PACS, or FLS pipelines Enable real-time clinical use with minimal friction Auto-generated risk scores in radiology report or FLS referral
Interpretability at point of care Intuitive outputs (e.g., heatmaps and survival curves) Improve clinician trust and facilitate patient communication Grad-CAM, vertebral attention maps, and survival plots [37]

AI, artificial intelligence; MI-CLAIM, Minimum Information Checklist for Artificial Intelligence in Medical Imaging; CONSORT-AI, Consolidated Standards of Reporting Trials–Artificial Intelligence extension; C-index, concordance index; OOD, out-of-distribution; EMR, electronic medical record; PACS, picture archiving and communication system; FLS, fracture liaison service; Grad-CAM, gradient-weighted class activation mapping.

  • 1. Harvey NC, McCloskey EV, Mitchell PJ, Dawson-Hughes B, Pierroz DD, Reginster JY, et al. Mind the (treatment) gap: a global perspective on current and future strategies for prevention of fragility fractures. Osteoporos Int 2017;28:1507–29.ArticlePubMedPMCPDF
  • 2. Kanis JA, Cooper C, Rizzoli R, Reginster JY; Scientific Advisory Board of the European Society for Clinical and Economic Aspects of Osteoporosis (ESCEO) and the Committees of Scientific Advisors and National Societies of the International Osteoporosis Foundation (IOF). European guidance for the diagnosis and management of osteoporosis in postmenopausal women. Osteoporos Int 2019;30:3–44.ArticlePubMedPMCPDF
  • 3. Nguyen ND, Frost SA, Center JR, Eisman JA, Nguyen TV. Development of a nomogram for individualizing hip fracture risk in men and women. Osteoporos Int 2007;18:1109–17.ArticlePubMedPDF
  • 4. Baim S, Leslie WD. Assessment of fracture risk. Curr Osteoporos Rep 2012;10:28–41.ArticlePubMedPDF
  • 5. Kong SH, Shin CS. Applications of machine learning in bone and mineral research. Endocrinol Metab (Seoul) 2021;36:928–37.ArticlePubMedPMCPDF
  • 6. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44–56.ArticlePubMedPDF
  • 7. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med 2019;25:24–9.ArticlePubMedPDF
  • 8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–44.ArticlePubMedPDF
  • 9. Chen X, Wang X, Zhang K, Fung KM, Thai TC, Moore K, et al. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal 2022;79:102444.ArticlePubMedPMC
  • 10. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441–6.ArticlePubMedPMC
  • 11. Lambin P, Leijenaar RT, Deist TM, Peerlings J, de Jong EE, van Timmeren J, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol 2017;14:749–62.ArticlePubMedPDF
  • 12. Singh A, Sengupta S, Lakshminarayanan V. Explainable deep learning models in medical image analysis. J Imaging 2020;6:52.ArticlePubMedPMC
  • 13. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195.ArticlePubMedPMCPDF
  • 14. Bradshaw TJ, Huemann Z, Hu J, Rahmim A. A guide to cross-validation for artificial intelligence in medical imaging. Radiol Artif Intell 2023;5:e220232.ArticlePubMedPMC
  • 15. Bilbily A, Syme CA, Adachi JD, Berger C, Morin SN, Goltzman D, et al. Opportunistic screening of low bone mineral density from standard X-rays. J Am Coll Radiol 2024;21:633–9.ArticlePubMed
  • 16. Wang L, Diacinti D, Diacinti D, Minisola S, Yajun L. The diagnosis of osteoporosis, including opportunistic computed tomography: a narrative review. Gerontology 2025;71:453–60.ArticlePubMedPDF
  • 17. Jang M, Kim M, Bae SJ, Lee SH, Koh JM, Kim N. Opportunistic osteoporosis screening using chest radiographs with deep learning: development and external validation with a cohort dataset. J Bone Miner Res 2022;37:369–77.ArticlePubMedPDF
  • 18. Andresen JR, Schroder G, Haider T, Andresen R. Opportunistic osteoporosis assessment and fracture risk determination using cancellous density measurement in hounsfield units of native lumbar computed tomography images: a comparative study with conventional bone density evaluation. J Clin Med 2025;14:1226.ArticlePubMedPMC
  • 19. Yang J, Liao M, Wang Y, Chen L, He L, Ji Y, et al. Opportunistic osteoporosis screening using chest CT with artificial intelligence. Osteoporos Int 2022;33:2547–61.ArticlePubMedPDF
  • 20. Park H, Kang WY, Woo OH, Lee J, Yang Z, Oh S. Automated deep learning-based bone mineral density assessment for opportunistic osteoporosis screening using various CT protocols with multi-vendor scanners. Sci Rep 2024;14:25014.ArticlePubMedPMCPDF
  • 21. Asamoto T, Takegami Y, Sato Y, Takahara S, Yamamoto N, Inagaki N, et al. External validation of a deep learning model for predicting bone mineral density on chest radiographs. Arch Osteoporos 2024;19:15.ArticlePubMedPDF
  • 22. Ryu SM, Lee S, Jang M, Koh JM, Bae SJ, Jegal SG, et al. Diagnosis of osteoporotic vertebral compression fractures and fracture level detection using multitask learning with U-Net in lumbar spine lateral radiographs. Comput Struct Biotechnol J 2023;21:3452–8.ArticlePubMedPMC
  • 23. Li YC, Chen HH, Horng-Shing Lu H, Hondar Wu HT, Chang MC, Chou PH. Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of human subspecialists? Clin Orthop Relat Res 2021;479:1598–12.ArticlePubMedPMC
  • 24. Choi E, Park D, Son G, Bak S, Eo T, Youn D, et al. Weakly supervised deep learning for diagnosis of multiple vertebral compression fractures in CT. Eur Radiol 2024;34:3750–60.ArticlePubMedPDF
  • 25. Zheng Z, Ryu BY, Kim SE, Song DS, Kim SH, Park JW, et al. Deep learning for automated hip fracture detection and classification: achieving superior accuracy. Bone Joint J 2025;107-B:213–20.ArticlePubMedPDF
  • 26. Spek RW, Smith WJ, Sverdlov M, Broos S, Zhao Y, Liao Z, et al. Detection, classification, and characterization of proximal humerus fractures on plain radiographs. Bone Joint J 2024;106-B:1348–60.ArticlePubMedPDF
  • 27. Gan K, Xu D, Lin Y, Shen Y, Zhang T, Hu K, et al. Artificial intelligence detection of distal radius fractures: a comparison between the convolutional neural network and professional assessments. Acta Orthop 2019;90:394–400.ArticlePubMedPMC
  • 28. Hong N, Park H, Kim CO, Kim HC, Choi JY, Kim H, et al. Bone radiomics score derived from DXA hip images enhances hip fracture prediction in older women. J Bone Miner Res 2021;36:1708–16.ArticlePubMedPDF
  • 29. Farzi M, Pozo JM, McCloskey E, Eastell R, Harvey NC, Frangi AF, et al. Quantitating age-related BMD textural variation from DXA region-free-analysis: a study of hip fracture prediction in three cohorts. J Bone Miner Res 2022;37:1679–88.ArticlePubMedPMCPDF
  • 30. Zhao C, Shaik A, Keyak JH, Lane NE, Deng JD, Su KJ, et al. ICGM-FRAX: iterative cross graph matching for hip fracture risk assessment using dual-energy X-ray absorptiometry images. arXiv 2025 Apr 21 [Preprint]. https://doi.org/10.48550/arXiv.2504.15384.Article
  • 31. Rekik I, Adeli E, Park SH, Cintas C. Predictive intelligence in medicine: PRIME 2022. Lecture notes in computer science. Vol 13564; Cham: Springer; 2022. Chapter 10, Opportunistic hip fracture risk prediction in men from X-ray: findings from the Osteoporosis in Men (MrOS) Study. p. 103–14.Article
  • 32. Kong SH, Lee JW, Bae BU, Sung JK, Jung KH, Kim JH, et al. Development of a spine X-ray-based fracture prediction model using a deep learning algorithm. Endocrinol Metab (Seoul) 2022;37:674–83.ArticlePubMedPMCPDF
  • 33. Hong N, Cho SW, Lee YH, Kim CO, Kim HC, Rhee Y, et al. Deep learning-based identification of vertebral fracture and osteoporosis in lateral spine radiographs and DXA vertebral fracture assessment to predict incident fracture. J Bone Miner Res 2025;40:628–38.ArticlePubMedPDF
  • 34. Dagan N, Elnekave E, Barda N, Bregman-Amitai O, Bar A, Orlovsky M, et al. Automated opportunistic osteoporotic fracture risk assessment using computed tomography scans to aid in FRAX underutilization. Nat Med 2020;26:77–82.ArticlePubMedPDF
  • 35. Kong SH, Cho W, Park SB, Choo J, Kim JH, Kim SW, et al. A computed tomography-based fracture prediction model with images of vertebral bones and muscles by employing deep learning: development and validation study. J Med Internet Res 2024;26:e48535.ArticlePubMedPMC
  • 36. Michalski AS, Besler BA, Burt LA, Boyd SK. Opportunistic CT screening predicts individuals at risk of major osteoporotic fracture. Osteoporos Int 2021;32:1639–49.ArticlePubMedPDF
  • 37. Kim Y, Kim YG, Park JW, Kim BW, Shin Y, Kong SH, et al. A CT-based deep learning model for predicting subsequent fracture risk in patients with hip fracture. Radiology 2024;310:e230614.ArticlePubMed
  • 38. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22-29; Venice, Italy. pp 618–26.Article
  • 39. Nicolaes J, Raeymaeckers S, Robben D, Wilms G, Vandermeulen D, Libanati C, et al. Detection of vertebral fractures in CT using 3D convolutional neural networks. In: In : Cai Y, Wang L, Audette M, Zheng G, Li S, In: Computational methods and clinical applications for spine imaging (6th International Workshop and Challenge). CSI 2019: lecture notes in computer science, vol 11963; 2019 Oct 17; Shenzhen, China. Cham: Springer; 2019. p. 3–14.Article
  • 40. Germann C, Meyer AN, Staib M, Sutter R, Fritz B. Performance of a deep convolutional neural network for MRI-based vertebral body measurements and insufficiency fracture detection. Eur Radiol 2023;33:3188–99.ArticlePubMedPMCPDF
  • 41. Skarstrom VW, Johansson L, Alven J, Lorentzon M, Haggstrom I. Explainable vertebral fracture analysis with uncertainty estimation using differentiable rule-based classification. arXiv 2024 Jul 3 [Preprint]. https://doi.org/10.48550/arXiv.2407.02926.Article
  • 42. Oakden-Rayner L. Exploring large-scale public medical image datasets. Acad Radiol 2020;27:106–12.ArticlePubMed
  • 43. Grote T, Berens P. On the ethics of algorithmic decision-making in healthcare. J Med Ethics 2020;46:205–11.ArticlePubMedPMC
  • 44. Lyell D, Coiera E. Automation bias and verification complexity: a systematic review. J Am Med Inform Assoc 2017;24:423–31.ArticlePubMedPMCPDF
  • 45. Mongan J, Moy L, Kahn CE. Checklist for artificial intelligence in medical imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell 2020;2:e200029.ArticlePubMedPMC
  • 46. Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ 2020;370:m3164.ArticlePubMedPMC

Figure & Data

References

    Citations

    Citations to this article as recorded by  
    • Quantitative CT Analysis for Predicting Hip Fragility Fracture Risk in Postmenopausal Women: A Value Assessment
      昭春 杜
      Advances in Clinical Medicine.2026; 16(01): 2897.     CrossRef
    • Advancements in ex vivo bone biomechanics: multimodal technologies and their integration with artificial intelligence
      Safa Unal, Mustafa Unal
      Methods.2026; 249: 48.     CrossRef
    • Artificial Intelligence for Fracture Risk Prediction: A Bibliometric Analysis of Research Hotspots and Evolutionary Trends
      Bo Wang, Yin Lian, Lina Gao
      Applied Artificial Intelligence Research.2026;[Epub]     CrossRef
    • Comment on “Assessing deep learning model performance in osteoporosis screening with lumbar spine radiographs”
      Dhananjay Kumar Mishra, Koyel Roy, Tarun Madan Kanade
      Journal of Bone and Mineral Metabolism.2026;[Epub]     CrossRef
    • Role of bone turnover markers in prediction of fracture healing: a contemporary evidence-based perspective
      Hamza Ali, Adarsh Kumar Singh, Anurag Baghel, Shafaque Asif
      International Journal of Research in Orthopaedics.2026; 12(3): 848.     CrossRef
    • Consensus statement on the application of artificial intelligence in osteoporosis screening and management: perspectives from the Asia-Pacific region
      Chun-Feng Huang, Wen-Hui Fang, Kun-Hui Chen, Sung-Yen Lin, Cheng-Jung Ho, Jawl-Shan Hwang, Ta-Wei Tai, Yuan-Fu Liu, Chien-An Shih, Jung-Fu Chen, Shih-Te Tu, Ding-Cheng Chan, Rong-Sen Yang, Shau-Huai Fu, Hsuan-Yu Chen, Keh-Sung Tsai, Tien-Tsai Cheng, Fang-
      Osteoporosis International.2026;[Epub]     CrossRef
    • Diagnostic Value of Opportunistic CT-Based Bone Density Assessment in Patients with and Without Sacral Insufficiency Fractures
      Julian Ramin Andresen, Guido Schröder, Thomas Haider, Hans-Christof Schober, Reimer Andresen
      Diagnostics.2025; 15(22): 2926.     CrossRef

    • PubReader PubReader
    • ePub LinkePub Link
    • Cite
      Cite
      export Copy Download
      Close
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Incorporating Artificial Intelligence into Fracture Risk Assessment: Using Clinical Imaging to Predict the Unpredictable
      Endocrinol Metab. 2025;40(4):499-507.   Published online August 4, 2025
      Close
    • XML DownloadXML Download
    Incorporating Artificial Intelligence into Fracture Risk Assessment: Using Clinical Imaging to Predict the Unpredictable
    Incorporating Artificial Intelligence into Fracture Risk Assessment: Using Clinical Imaging to Predict the Unpredictable
    Interpretability method Description Example in fracture prediction Clinical relevance
    Grad-CAM Generates visual heatmaps showing which parts of an image contributed most to the model’s prediction Focuses on femoral neck or vertebral body deformities in fracture detection [39,40] Confirms that the model’s attention aligns with clinical reasoning; builds trust
    Radiomics integration Extracts predefined features (e.g., texture and intensity) and correlates with deep features Uses radiomic features in DXA-based fracture models [28] Enhances transparency and interpretability; supports regulatory review
    Neurosymbolic rule-based modeling Combines deep learning-based segmentation with transparent rule-based classification logic Vertebral height ratio rules applied after vertebral segmentation in CT [40] High accuracy with transparent logic; easier for clinicians and regulators to interpret
    Survival plot visualization Predicts time-to-event outcomes and visualizes individual patient risk over time 2.5D ensemble model predicting subsequent fracture risk after hip fracture [37] Intuitive risk communication using survival curves; supports patient counseling
    Domain Requirement Description Examples/references
    Development transparency Adherence to reporting guidelines (e.g., MI-CLAIM and CONSORT-AI) Ensure reproducibility, quality control, and regulatory readiness Structured reporting of dataset, algorithm, and validation process [45]
    External validation Multisite, multi-device, and multi-population evaluation Assess model generalizability across real-world settings Validation across institutions [21]
    Calibration and equity Calibration plots and subgroup-specific metrics Detect performance drift and avoid bias across demographic groups C-index by sex/age; predicted vs. observed risk plots [34]
    Fail-safes and alerts Confidence scoring, OOD detection, and image quality warnings Prevent unsafe use of AI under uncertain conditions Alert on low-quality input; defer-to-human in ambiguous cases
    Workflow integration Embedding in EMR, PACS, or FLS pipelines Enable real-time clinical use with minimal friction Auto-generated risk scores in radiology report or FLS referral
    Interpretability at point of care Intuitive outputs (e.g., heatmaps and survival curves) Improve clinician trust and facilitate patient communication Grad-CAM, vertebral attention maps, and survival plots [37]
    Table 1. Interpretability Techniques in Imaging-Based Artificial Intelligence for Fracture Risk Assessment

    Grad-CAM, gradient-weighted class activation mapping; DXA, dual-energy X-ray absorptiometry; CT, computed tomography; 2.5D, 2.5-dimensional.

    Table 2. Summary of Technical and Clinical Considerations for the Effective Implementation of AI-Based Fracture Assessment Models

    AI, artificial intelligence; MI-CLAIM, Minimum Information Checklist for Artificial Intelligence in Medical Imaging; CONSORT-AI, Consolidated Standards of Reporting Trials–Artificial Intelligence extension; C-index, concordance index; OOD, out-of-distribution; EMR, electronic medical record; PACS, picture archiving and communication system; FLS, fracture liaison service; Grad-CAM, gradient-weighted class activation mapping.


    Endocrinol Metab : Endocrinology and Metabolism
    TOP