Acavis Publishers

Research Article

Bridging the Clinical Informatics Gap: Evaluating Open-Access NLP Solutions for EHR Data Integrity in Critical Access Hospitals and Federally Qualified Health Centers

Agyapong MK*

School of Business, Tiffin University, United States

Corresponding Author:

Michael Kwakye Agyapong, School of Business, Tiffin University, Tiffin, Ohio, United States. Email: mkwakyeagyapong@gmail.com

Citation : Agyapong MK. Bridging the Clinical Informatics Gap: Evaluating Open-Access NLP Solutions for EHR Data Integrity in Critical Access Hospitals and Federally Qualified Health Centers. Trends Publ Health Commun Med. 2026;1(1):1-7.

Download

Received Date: 07 May 2026

Published Date: 29 June 2026

Volume 1 Issue 1

Abstract

Background: Critical Access Hospitals (CAHs) and Federally Qualified Health Centers (FQHCs) collectively serve over 60 million Americans in rural and medically underserved communities across the United States. Despite near-universal Electronic Health Record (EHR) adoption, these institutions systematically lag behind larger hospitals in the deployment of advanced clinical data analytics functionalities, including Natural Language Processing (NLP) tools for clinical documentation quality improvement. This digital divide prevents patients served by these safety-net institutions from benefiting from the same data-driven quality improvement approaches available in well-resourced health systems.
Objective: This paper evaluates the feasibility, design requirements, and implementation considerations for deploying open-access NLP solutions to improve EHR data integrity in CAHs and FQHCs. We examine the specific documentation quality challenges, technical infrastructure constraints, and workforce limitations characteristic of these settings and propose a deployment framework that addresses these barriers.
Methods: We conducted a comprehensive review of the peer-reviewed literature on health IT adoption in CAHs and FQHCs, clinical documentation quality challenges in safety-net settings, and NLP deployment in resource-constrained healthcare environments. We synthesized findings across these domains to identify the key requirements, barriers, and enablers for open-access NLP deployment in underserved healthcare facilities. Based on this synthesis, we propose a tiered deployment framework with implementation guidelines tailored to the technical and workforce capacity of these institutions.
Results: We identify five critical barriers to NLP adoption in CAHs and FQHCs: (1) limited informatics workforce, (2) constrained IT budgets, (3) heterogeneous EHR platforms, (4) documentation practices shaped by unique care models, and (5) insufficient access to annotated training data. We propose a three-tiered deployment model—Basic, Intermediate, and Advanced—that aligns NLP tool complexity with institutional technical capacity. We further identify policy and infrastructure interventions that could accelerate adoption.
Conclusion: Open-access NLP solutions represent a viable pathway for improving EHR data integrity in safety-net healthcare institutions, provided that deployment strategies are tailored to the resource constraints and care delivery models of these settings. The proposed tiered framework offers a practical roadmap for progressive NLP adoption that can be initiated by facilities with minimal technical infrastructure and scaled as capacity grows.

Keywords

Critical Access Hospitals; Federally Qualified Health Centers; Natural Language Processing; Electronic Health Records; Clinical Documentation; Digital Divide; Health Informatics; Rural Health; Open-Access

Introduction

The United States healthcare system relies on a network of safety-net institutions—including Critical Access Hospitals and Federally Qualified Health Centers—to deliver essential medical services to populations that would otherwise lack access to care. Critical Access Hospitals, numbering approximately 1,370 facilities across 45 states, are federally designated as the sole source of inpatient hospital care for millions of Americans living in rural and frontier communities.¹ FQHCs, encompassing approximately 1,400 organizations and more than 17,000 service delivery sites, serve over 30 million patients annually, the vast majority of whom are low-income, uninsured, or publicly insured.² Together, these institutions represent the frontline of healthcare delivery for the nation’s most underserved populations.

The accuracy and integrity of clinical documentation at these institutions has significant implications that extend far beyond individual patient encounters. Clinical documentation drives diagnostic coding and billing, determines facility performance on quality measures, informs care transitions and coordination, and feeds into public health surveillance and reporting systems. When documentation is inaccurate, incomplete, or inconsistent, the consequences cascade across all of these functions—leading to improper reimbursement, misrepresented quality performance, compromised care coordination, and unreliable public health data.^3,4

Natural Language Processing has emerged as a powerful tool for improving clinical documentation quality by automatically detecting deficiencies in unstructured clinical text, extracting structured information from narrative notes, and providing real-time feedback to clinicians and coding professionals.^5,6 However, the deployment of NLP tools in healthcare has been concentrated in large academic medical centers and well-resourced health systems, leaving safety-net institutions largely excluded from the benefits of these advances.⁷ This exclusion is not a result of indifference but of structural barriers—limited IT budgets, insufficient informatics workforce, heterogeneous EHR platforms, and the absence of NLP tools designed for the unique documentation environments of CAHs and FQHCs.

The urgency of addressing this gap has been amplified by recent federal policy developments. The Trump Administration has made healthcare AI innovation a stated national priority through Executive Order 14179 on Removing Barriers to American Leadership in Artificial Intelligence⁸, the HHS AI Strategy released in December 2025⁹, the CMS Health Tech Ecosystem Initiative¹⁰, and the HHS Request for Information on accelerating AI adoption in clinical care.¹¹ The federal government’s investment in the Trusted Exchange Framework and Common Agreement (TEFCA), which has facilitated nearly 500 million health record exchanges as of early 2026¹², further underscores the importance of ensuring that the clinical documentation feeding into this national interoperability infrastructure is accurate and reliable.

This paper addresses the clinical informatics gap in CAHs and FQHCs by evaluating the feasibility of deploying open-access NLP solutions for EHR data integrity improvement in these settings. We identify the specific documentation quality challenges and technical constraints characteristic of safety-net institutions, propose a tiered deployment framework tailored to their resource realities, and discuss policy interventions that could accelerate adoption. The remainder of this paper is organized as follows: Section 2 examines the documented digital divide in clinical informatics; Section 3 analyzes the documentation quality challenges specific to CAHs and FQHCs; Section 4 presents the proposed tiered deployment framework; Section 5 discusses implementation considerations and policy implications; and Section 6 concludes the paper.

The Digital Divide in Clinical Informatics

Critical Access Hospitals

The gap between CAHs and larger hospitals in the adoption of advanced EHR functionalities has been extensively documented in the peer-reviewed literature. A systematic review by Pai¹, examining 45 studies on the impact of health information technology on CAHs, found that while EHR adoption has reached near-universal levels among these institutions, persistent challenges continue to hinder effective implementation and utilization, with significant consequences for rural healthcare delivery. The review identified financing constraints and workforce-related challenges as the most commonly reported barriers, documenting that CAHs frequently lack the IT personnel, informatics expertise, and capital investment necessary to move beyond basic EHR use toward advanced clinical data analytics.

Apathy et al.¹³, analyzing trends in advanced EHR functionality adoption using 2014, 2018, and 2023 data from the American Hospital Association Annual Survey IT Supplement, found that only 32.0% of CAHs demonstrated advanced use of clinical data analytics functionalities, compared to substantially higher rates among non-CAH hospitals. The study documented that while some patient engagement functionalities have reached near-universal adoption, the gap in clinical data analytics—the category most directly relevant to NLP-based documentation quality tools—remains wide and concerning. The authors concluded that policymakers should consider programs to support CAHs in closing remaining adoption gaps.

In a separate 2025 study, Apathy et al.¹⁴ revisited the rural-urban divide in hospital health IT adoption using 2023 national data, finding that rural hospitals were still significantly less likely than urban hospitals to have electronic query capabilities, report electronic data availability, and use data from external providers. The study documented that while some adoption gaps have narrowed, key gaps in health information exchange and clinical data analytics persist. The authors called for policy efforts to prioritize tailored solutions that address the unique challenges of rural hospitals.

These findings paint a consistent picture: CAHs have adopted EHRs but have not been able to leverage them for advanced clinical analytics. The clinical documentation quality tools that larger hospitals increasingly deploy—including NLP-based documentation review, automated coding assistance, and real-time quality feedback systems—remain largely out of reach for these rural institutions. This creates a two-tiered healthcare system in which patients treated at CAHs do not benefit from the same documentation-driven quality improvements available to patients at larger, better-resourced facilities.

Federally Qualified Health Centers

The clinical informatics gap in FQHCs parallels that of CAHs but manifests differently due to the ambulatory care context and the unique operational characteristics of community health centers. Walter et al.², surveying FQHC partners through AllianceChicago’s Practice-Based Research Network, documented that while FQHCs have experienced rapid growth in EHR utilization over the past decade, these EHR systems were built primarily to improve individual patient care and were not designed to facilitate population-level data analysis, quality improvement research, or the deployment of advanced analytical tools. The study found that accessing, cleaning, and using the rich data stored in FQHC EHR systems requires skill sets that often do not reside within the FQHC organizational structure.

This finding is particularly significant in light of the data that FQHCs generate. With over 30 million patient encounters annually spanning primary care, behavioral health, dental care, substance abuse treatment, and enabling services, FQHCs produce vast volumes of clinical documentation containing information about the health status, treatment patterns, and care needs of some of the nation’s most medically complex and socially vulnerable populations. Yet this information remains largely inaccessible to advanced analytics because the facilities lack the data science expertise to extract and analyze it.

The CDC’s electronic case reporting initiative, which has connected over 1,000 FQHC sites across dozens of states to digital public health reporting systems¹⁵, demonstrates federal recognition of the need to improve clinical data exchange in underserved settings. However, this initiative addresses the reporting layer of the data pipeline without directly addressing the upstream documentation quality challenges that determine the accuracy and completeness of the reported data. NLP tools for documentation quality improvement would complement electronic case reporting by strengthening the integrity of the clinical documentation from which reported data is derived.

Documentation Quality Challenges Specific to CAHs and FQHCs

Unique Documentation Environments

The clinical documentation environments of CAHs and FQHCs differ from those of larger hospitals in several respects that have direct implications for NLP tool design and deployment. First, the clinical workforce composition is different. CAHs rely heavily on primary care physicians, nurse practitioners, and physician assistants who may serve multiple roles within the facility, while FQHCs employ diverse clinical teams including medical providers, behavioral health specialists, dental providers, and social workers whose documentation practices and vocabularies vary significantly across disciplines.²

Second, the documentation volume patterns differ. CAHs, limited to 25 or fewer acute care inpatient beds, generate relatively low volumes of inpatient documentation but may produce substantial volumes of emergency department and outpatient notes. FQHCs generate high volumes of ambulatory encounter documentation, often with shorter notes per encounter but higher total encounter counts. These volume patterns affect the computational demands of NLP processing and the statistical properties of the training data available for machine learning models.

Third, the EHR platforms in use at these facilities are heterogeneous. While large health systems often standardize on a single EHR platform (typically Epic or Cerner), CAHs and FQHCs use a diverse mix of EHR systems including CPSI, Meditech, athenahealth, eClinicalWorks, NextGen, and others. This heterogeneity means that NLP tools must be EHR-agnostic—capable of processing clinical text regardless of the source system—rather than built for integration with a specific platform.

Fourth, the patient populations served by these institutions present documentation challenges that are less common in well-resourced urban hospitals. CAH patients in rural communities may have limited access to specialist referrals, requiring primary care documentation to carry a heavier clinical reasoning burden. FQHC patients are linguistically diverse, socially complex, and frequently affected by social determinants of health (SDOH) that complicate documentation and coding. Clinical notes at FQHCs may contain references to housing instability, food insecurity, transportation barriers, and other SDOH factors that are clinically relevant but not well-captured by standard clinical NLP tools designed for acute care settings.

Specific Deficiency Patterns

The documentation deficiency patterns in CAHs and FQHCs include those common across all healthcare settings—copy-paste propagation, incomplete narratives, ambiguous terminology, and structured-unstructured data discrepancies—as well as deficiency patterns that are more prevalent or consequential in safety-net settings. These include: (a) documentation insufficiency for medical necessity justification in cost-based reimbursement environments, where CAHs operate under Medicare’s cost-based payment model and inadequate documentation can result in cost report disallowances; (b) under-documentation of diagnostic complexity, where clinicians in resource-constrained settings may under-document comorbidities and case severity, leading to risk-adjustment inaccuracies that misrepresent patient population acuity; (c) social determinant documentation gaps, where SDOH factors that affect treatment planning and care coordination are mentioned in free-text notes but not captured in structured data fields; and (d) cross-disciplinary documentation fragmentation in FQHCs, where notes from medical, behavioral health, dental, and social work encounters are siloed within the EHR, producing an incomplete composite picture of patient needs.

Proposed Tiered Deployment Framework

Based on our analysis of the clinical informatics gap, the documentation quality challenges, and the technical and workforce constraints characteristic of CAHs and FQHCs, we propose a three-tiered deployment framework for open-access NLP solutions in these settings. The framework is designed to align NLP tool complexity with institutional technical capacity, enabling facilities to begin with low-complexity tools that require minimal technical expertise and progressively adopt more sophisticated capabilities as their informatics capacity grows.

Tier 1: Basic — Rule-Based Documentation Screening

The first tier is designed for facilities with minimal informatics infrastructure—typically a single IT generalist with basic data management skills. Tier 1 tools employ rule-based NLP methods that require no machine learning training data and no specialized programming expertise. These tools process clinical note exports (in plain text or CSV format) and apply predefined rule libraries to detect common documentation deficiencies including: missing required note sections (chief complaint, assessment, plan); copy-paste indicators based on text similarity thresholds; excessively short notes that may indicate inadequate documentation; and non-standard abbreviations that may impede accurate coding.

Tier 1 tools are implemented as standalone Python scripts with graphical user interface (GUI) wrappers that can be run on a standard desktop or laptop computer without server infrastructure. The rule libraries are configurable via plain-text configuration files, allowing facility staff to customize detection thresholds and add facility-specific rules without programming knowledge. Output is delivered as formatted reports in HTML and CSV formats suitable for review by clinical documentation improvement specialists, quality managers, or compliance officers.

The anticipated detection accuracy for Tier 1 tools is moderate (precision: 0.70–0.80; recall: 0.60–0.75), reflecting the inherent limitations of rule-based approaches for detecting context-dependent documentation deficiencies. However, the low implementation barrier and zero licensing cost make this tier appropriate as an entry point for facilities that currently have no NLP capabilities.

Tier 2: Intermediate — Machine Learning Classification

The second tier adds supervised machine learning classification capabilities to the rule-based foundation of Tier 1. Tier 2 is designed for facilities with at least one staff member who has intermediate data analytics skills—typically a health information management professional or quality analyst with training in data manipulation and basic statistical analysis. Tier 2 tools employ pre-trained classification models that have been trained on annotated clinical note datasets from similar facility types (CAH or FQHC), allowing facilities to deploy machine learning-based deficiency detection without the need to annotate their own training data.

The pre-trained models are distributed as serialized Python objects (using joblib or pickle formats) alongside the Tier 2 processing scripts. Facilities load the models, process their clinical note exports through the pipeline, and receive enhanced deficiency detection that captures context-dependent patterns not detectable by rule-based methods alone. Tier 2 tools also include a feedback mechanism that allows facilities to flag false positives and false negatives, generating labeled data that can be used to fine-tune the models for facility-specific documentation patterns in subsequent iterations.

The anticipated detection accuracy for Tier 2 tools is substantially higher than Tier 1 (precision: 0.80–0.88; recall: 0.78–0.86), based on performance benchmarks from analogous supervised classification systems validated in post-acute care settings.^16,17

Tier 3: Advanced — Transformer-Based Contextual Analysis

The third tier represents the most sophisticated deployment option, incorporating pre-trained clinical transformer models (ClinicalBERT, BioClinicalBERT) for deep contextual analysis of clinical narratives. Tier 3 is designed for facilities that have access to informatics support—either through internal staff, shared service arrangements with regional health networks, or partnerships with academic institutions. Tier 3 tools perform semantic analysis of clinical narratives, assessing whether documentation provides adequate clinical justification for diagnoses, whether assessment and plan sections are consistent with documented findings, and whether the narrative demonstrates appropriate clinical reasoning.

Tier 3 requires server-level computing resources (a multi-core CPU or GPU-enabled machine) for transformer model inference, and is most feasible for facilities that participate in Health Center Controlled Networks (HCCNs), Rural Health Networks, or other collaborative infrastructure arrangements that provide shared computing resources. The anticipated detection accuracy for Tier 3 tools is the highest across all tiers (precision: 0.85–0.92; recall: 0.83–0.90), with the added capability of providing attention-weighted explanations that highlight the specific text regions contributing to quality assessments.

**Figure 1.** Tiered Deployment Framework for Open Access NLP Solutions in CAHs and FQHCs.

Figure 1 illustrates the three-tier deployment framework and the progression pathway between tiers. [Note: Figure 1 should depict a pyramid or stacked diagram showing Tier 1 (Basic) at the base with the widest applicability, Tier 2 (Intermediate) in the middle, and Tier 3 (Advanced) at the top, with arrows indicating the upward progression pathway and side annotations showing the technical requirements and anticipated accuracy for each tier. This figure can be adapted from deployment maturity models such as the HIMSS Analytics Electronic Medical Record Adoption Model (EMRAM), available at https://www.himss.org/what-we-do-solutions/digital-health-transformation/maturity-model.]

Implementation Considerations and Policy Implications

Workforce Development

The successful deployment of NLP tools in CAHs and FQHCs requires a parallel investment in workforce development. Even Tier 1 tools, designed for minimal technical expertise, require staff who can execute Python scripts, interpret documentation quality reports, and translate NLP-generated findings into clinical workflow improvements. The current informatics workforce at most CAHs and FQHCs is insufficient for this task.^1,2

We recommend three workforce development strategies: (a) integration of NLP tool training into existing Health Center Controlled Network (HCCN) technical assistance programs, which already provide EHR training and data analytics support to FQHC networks; (b) development of short-format training modules (8–16 hours) specifically designed to teach health information management professionals and quality analysts how to operate open-access NLP tools; and (c) creation of regional NLP support cooperatives in which multiple CAHs share the cost of a dedicated informatics specialist who can deploy, configure, and maintain NLP tools across participating facilities.

Data Sharing and Collaborative Model Development

The pre-trained models required for Tier 2 and Tier 3 deployment depend on access to annotated clinical note datasets from CAH and FQHC settings. Individual facilities typically lack the volume of clinical notes and the annotation expertise necessary to develop robust training datasets independently. We propose a collaborative model development approach in which multiple facilities contribute de-identified clinical notes to a shared training corpus under data use agreements that comply with HIPAA requirements and institutional review board oversight.

This approach is modeled on the architecture of existing clinical data research networks, such as the ENACT (Electronic Health Record Act) NLP Working Group, which has demonstrated the feasibility of multi-site NLP algorithm development and deployment across heterogeneous EHR environments.¹⁸ The ENACT model provides a useful precedent for establishing shared NLP infrastructure across safety-net institutions.

Policy Interventions to Accelerate Adoption

Federal policy can play a significant role in accelerating NLP adoption in CAHs and FQHCs. We identify four high-impact policy interventions. First, the inclusion of NLP-based documentation quality tools in the technical assistance programs administered by HRSA’s Bureau of Primary Health Care and the Federal Office of Rural Health Policy would provide a direct pathway for distributing open-access NLP tools to the facilities that need them most. Second, the integration of documentation quality metrics—potentially generated by NLP tools—into CMS quality reporting programs (such as the Quality Payment Program and the Uniform Data System) would create a reimbursement-aligned incentive for facilities to adopt documentation improvement technologies.

Third, the CMS Rural Health Transformation Program, which provides cooperative agreements with states to support rural health innovation, could be leveraged to pilot NLP deployment in CAH settings. The Healthcare Information and Management Systems Society (HIMSS) has already provided guidance on this program’s technology innovation goals, suggesting institutional readiness for clinical informatics investment in rural settings.¹⁹ Fourth, the ASTP/ONC’s deregulatory approach under the HTI-5 Proposed Rule²⁰, which restructures health IT certification around FHIR-based interoperability and API-enabled data exchange, creates an environment in which open-access NLP tools can be more easily integrated with EHR systems through standardized data access interfaces.

Alignment with Federal AI Priorities

The deployment of open-access NLP tools in safety-net healthcare facilities aligns directly with the Trump Administration’s stated priorities for artificial intelligence in healthcare. Executive Order 14179⁸ established the policy of sustaining and enhancing America’s global AI dominance for human flourishing and economic competitiveness. The HHS AI Strategy⁹ commits to integrating AI across healthcare operations to improve patient outcomes and enhance efficiency. The HHS Request for Information on clinical AI adoption¹¹ explicitly seeks input on how AI can reduce provider burden, improve quality of care, and lower healthcare costs—objectives that NLP-based documentation quality tools are directly designed to achieve.

The federal government’s investment in TEFCA, which has facilitated nearly 500 million health record exchanges as of early 2026¹², further reinforces the national importance of documentation quality. As more clinical data flows through the national interoperability network, the integrity of that data at its source becomes increasingly critical. NLP tools that improve documentation quality at CAHs and FQHCs strengthen the entire data pipeline, ensuring that the records exchanged through TEFCA are accurate, complete, and reliable. Without upstream documentation quality improvement, the national interoperability infrastructure risks transmitting errors faster and farther rather than delivering the promised improvements in care coordination and quality.

Limitations

This paper presents a deployment framework based on a synthesis of the existing literature; empirical validation of the proposed tiered model in actual CAH and FQHC settings is planned as the next phase of this research. Several limitations should be acknowledged. First, the performance benchmarks cited for each tier are based on analogous NLP systems validated in related but not identical clinical settings; actual performance in CAH and FQHC documentation environments may differ. Second, the collaborative model development approach assumes willingness among facilities to share de-identified clinical data, which may face institutional, legal, or cultural barriers. Third, the proposed framework does not address all potential barriers to NLP adoption, including clinician resistance, workflow integration challenges, and the ongoing maintenance requirements of deployed NLP systems.

Conclusion

Critical Access Hospitals and Federally Qualified Health Centers serve as the healthcare safety net for over 60 million Americans in rural and medically underserved communities. These institutions generate vast volumes of clinical documentation whose accuracy and integrity have direct implications for patient safety, financial sustainability, quality measurement, and public health surveillance. Yet they systematically lack the advanced clinical informatics tools—including NLP-based documentation quality solutions—that are increasingly standard in well-resourced health systems.

This paper examined the clinical informatics gap in CAHs and FQHCs, analyzed the specific documentation quality challenges and technical constraints characteristic of these settings, and proposed a tiered deployment framework that aligns NLP tool complexity with institutional technical capacity. The framework offers a practical pathway for progressive NLP adoption—beginning with low-barrier, rule-based tools and advancing to machine learning and transformer-based capabilities as informatics capacity grows—that can bring the benefits of NLP-driven documentation quality improvement to the facilities that need them most.

The open-access design philosophy underlying this framework is not merely a matter of cost reduction; it is a statement of principle that the benefits of clinical NLP should not be concentrated in well-resourced institutions but should be accessible to the entire healthcare system, including the safety-net facilities that serve the nation’s most vulnerable populations. As federal policy continues to prioritize AI innovation in healthcare, health data interoperability, and the modernization of digital health infrastructure, ensuring that CAHs and FQHCs are included in this transformation is both a matter of health equity and a prerequisite for achieving the national data integrity goals that these policy initiatives are designed to advance.

Author Contributions

Michael Kwakye Agyapong conceived the study, conducted the literature review, developed the proposed framework, and wrote and revised the manuscript.

Ethics Declaration

Ethics approval was not required for this study as it does not involve human participants, patient data, or identifiable personal information.

Trends In Public Health And Community Medicine - TPHCM

Bridging the Clinical Informatics Gap: Evaluating Open-Access NLP Solutions for EHR Data Integrity in Critical Access Hospitals and Federally Qualified Health Centers

Abstract

Keywords

Introduction

The Digital Divide in Clinical Informatics

Documentation Quality Challenges Specific to CAHs and FQHCs

Proposed Tiered Deployment Framework

Implementation Considerations and Policy Implications

Conclusion

Author Contributions

Ethics Declaration

Funding

Conflict of Interest

References

Quick Links

Trends In Public Health And Community Medicine - TPHCM

Bridging the Clinical Informatics Gap: Evaluating Open-Access NLP Solutions for EHR Data Integrity in Critical Access Hospitals and Federally Qualified Health Centers

Abstract

Keywords

Introduction

The Digital Divide in Clinical Informatics

Documentation Quality Challenges Specific to CAHs and FQHCs

Proposed Tiered Deployment Framework

Implementation Considerations and Policy Implications

Conclusion

Author Contributions

Ethics Declaration

Funding

Conflict of Interest

References

Quick Links

Address

Subscribe to get exclusive updates