A 2025 study in JAMA Network Open found that most adults don’t trust the healthcare system to use AI responsibly. About two-thirds said they’re uneasy with how AI is handled, and more than half worry it could cause harm.
On the other hand, rapidly-advancing AI solutions are positioned to alleviate administrative burden, increase compliance, and transform revenue operations—helping to solve widespread industry challenges including burnout and declining reimbursement. According to 500+ industry leaders, early AI adopters in physical therapy and rehab stand to gain a major advantage.
But not all AI tools are built the same.
For practice leaders responsible for compliance, patient safety, and day-to-day performance, a weak AI purchase can cause more problems than it solves.
This guide gives you a practical way to evaluate trustworthy AI in healthcare, focusing on five areas: risk, explainability, governance, human review, and privacy.
Why Trust Is the First Test for Healthcare AI
Selecting an AI tool is not primarily a technology evaluation — it is a credibility, safety, and accountability decision. And in healthcare settings, including PT and rehabilitation therapy, low AI trustworthiness has a big impact. Specific failure scenarios include:
- An AI tool suggests outdated or wrong billing codes, which leads to denials.
- A system misses allergies or contraindications because a clinician leans too heavily on the output.
- Staff start treating AI-generated documentation as correct by default, even if it contains subtle mistakes.
These issues don’t happen in a vacuum. They tend to show up when systems make it too easy to accept outputs without understanding them. As Ben Catania (Raintree Director of AI) put it in a recent webinar on AI trustworthiness: “A common risk in healthcare can be certain users becoming overreliant or dependent on AI rather than using their own clinical expertise.”
Well-designed AI should do the opposite. It should make it easier for clinicians to review, validate, and correct what the system produces, not quietly replace that judgment.
What “trustworthy AI” actually means in practice
If the risk comes from systems that are hard to question, then “trustworthy AI” comes down to how the system is built and how it behaves in real workflows.
A tool that simply produces clean-looking outputs isn’t enough. You need to know what sits behind those outputs, and how the system holds up once people start relying on it day to day.
In practice, that means looking for a few specific things:
- A clear evaluation of user workflows and where things can go wrong
- Explainability, where reasoning behind AI outputs can be traced to the source
- Involvement of clinical subject matter experts during product development
- Performance metrics that are tracked and reviewed over time
- Real-world validation with actual end users, not just internal testing
Most of this won’t be obvious in a product demo. That’s where many teams get tripped up.
Beyond the Demo: How to Assess AI Risk and Vendor Readiness
As Jenna Geier (Raintree Sr. Product Manager, Compliance & Regulatory) said during the webinar, “When you’re investing in technology, you’re really not just buying a product—especially with AI. You’re investing in the future of your business.”
That’s exactly why surface-level evaluation isn’t enough. A polished demo can make almost any product look convincing, but it tells you very little about how the system was actually built or how it will behave once it’s in use.
To get a clearer picture, you have to go a layer deeper. Ask vendors:
- How was the AI built?
- What data trained it?
- How is it monitored after launch?
- Who reviews it before release and after updates?
- Is there an AI governance group involved in planning, development, and post-launch review?
Investigate bias, alignment, and real-world fit
Bias in healthcare AI is a functional risk. Unbalanced training data can produce inaccurate diagnoses or perpetuate health disparities. A well-aligned AI product requires close collaboration with healthcare providers during development and should ideally be trained on data specific to the practice’s own patient population rather than generic datasets.
Assess risk before implementation
Responsible adoption starts with a structured risk assessment covering intended use, failure modes, and escalation plans. When evaluating a new AI solution, ask vendors for an intervention risk management plan.
Explainability: Why Healthcare AI Should Be a Glass Box
If a clinician or compliance officer can’t trace how an AI reached a conclusion, they have no basis for judging whether it’s safe or accurate. Because AI systems generate probabilistic outputs, understanding the “why” behind an output is critical for accountability.
What meaningful explainability looks like
At a minimum, a system should show:
- What data it reviewed.
- What steps it executed.
- What reasoning supports the final output (the “glass box” approach).
- Underlying prompts and audit trails that link generated content back to actual visit data.
Glass box vs. black box
As AI tools become more common, trust and explainability are becoming key differentiators between products that may seem similar… at least on the surface. Explainable AI systems are easier to validate, easier to correct, and easier to rely on in high-stakes environments like healthcare.
Black box
You receive an output, but the logic is lost. There is no clear way for a user or auditor to to trace the specific inputs, processing steps, or reasoning used to reach a conclusion. This lack of transparency limits both trust and accountability.
Glass box
The path from input to output is fully inspectable. When necessary, stakeholders can see exactly what data the system used, how it was processed, and why a specific result was produced.
Build Governance and Human Oversight Into the Workflow
In the software realm, effective governance allows teams to ship features with confidence, rather than acting as a bureaucratic hurdle. When you’re evaluating an AI product, look for a formally chartered AI governance committee involving clinical, compliance, privacy, and security stakeholders. Each brings a different lens on risk. Without that structure, decisions tend to happen informally or inconsistently, which makes it harder to trace how choices were made and harder to correct them later.
Human oversight is non-negotiable
AI should support decisions, not replace them. Human-in-the-loop AI solutions ensure that outputs are grounded in real-world context. Workflows must allow end users to:
- Review every AI-generated output.
- Reject recommendations that don’t reflect reality.
- Feed corrections back into the process.
Continuous monitoring is key
AI performance can degrade over time. Teams need ongoing visibility into how systems are performing, along with clear processes for responding to errors. Post-launch monitoring, feedback loops, and documented incident response processes are essential to separate responsible deployment from a one-time release.
Protect Privacy and Security While Scaling AI Adoption
Privacy concerns surface early in healthcare settings like physical therapy and rehabilitation — and they should. The data involved is both sensitive and highly contextual. Leaders should expect direct, specific answers to a few core questions:
- Is patient data used to train models beyond your own practice?
- How long is data retained?
- Are sensitive attributes (such as race, ethnicity, gender) reviewed to prevent unintentional bias?
Vendor Review Checklist
Use these criteria when reviewing any AI vendor:
- Risk Assessment: Is there an intervention risk management plan?
- Bias and Alignment: Is training data documented for representativeness?
- Explainability: Can the vendor demonstrate a “glass box” approach?
- Governance: Is there a formal AI governance committee?
- Human Oversight: Can users review and override outputs?
- Privacy/Security: Are data use, consent, and recovery procedures documented?
- Monitoring: Are performance metrics published and reviewed regularly?
The goal here is not to check boxes. It is to understand how the system behaves under real conditions and how the vendor prioritizes security, transparency, and performance.
Key Takeaways
Trust in AI is established through consistent behavior over time—which reflects how a tool was built, tested, and governed. Organizations that treat explainability and bias as safety priorities will make innovation sustainable.
Use the framework above to create an internal evaluation checklist for your next AI software review as a starting point as you evaluate compliance and safety. For more on the topic, watch our webinar on Trustworthy AI, available on demand.