An artificial intelligence ethics expert says Generative AI needs more rules on risk in health and medtech to prevent disasters

A new study published in The Lancet by artificial intelligence ethicist Dr Stefan Harrer has argued for a strong and comprehensive ethical framework around the use, design, and governance of generative AI applications in healthcare and medicine, because it has the potential to go catastrophically wrong.

The peer-reviewed study details how Large Language Models (LLMs) have the potential to fundamentally transform information management, education, and communication workflows in healthcare and medicine, but equally remain one of the most dangerous and misunderstood types of AI.

Dr Stefan Harrer

Dr Harrer is chief innovation officer at the Digital Health Cooperative Research Centre (DHCRC), a key funding body for digital health research and development, and describes generative AI as like a “very fancy autocorrect” with not real understanding of language.

“LLMs used to be boring and safe. They have become exciting and dangerous,” he said.

“This study is a plea for regulation of generative AI technology in healthcare and medicine and provides technical and governance guidance to all stakeholders of the digital health ecosystem: developers, users, and regulators. Because generative AI should be both exciting and safe.”

LLMs are a key component of generative AI applications for creating new content including text, imagery, audio, code, and videos in response to textual instructions. Examples scrutinised in the study include OpenAI’s chatbot ChatGPT, Google’s chatbot Med-PALM, Stability AI’s imagery generator Stable Diffusion, and Microsoft’s BioGPT bot.

Dr Harrer’s study highlights a wide range of key applications for AI in healthcare, including:

assisting clinicians with the generation of medical reports or preauthorization letters;
helping medical students to study more efficiently;
simplifying medical jargon in clinician-patient communication;
increasing the efficiency of clinical trial design;
helping to overcome interoperability and standardisation hurdles in EHR mining;
making drug discovery and design processes more efficient.

However, his paper also highlights that the inherent danger of LLM-driven generative AI because, as already demonstrated on ChatGPT, it can authoritatively and convincingly create and spread false, inappropriate, and dangerous content at unprecedented scale.

Mitigating risks in AI

Alongside the risk factors identified by Dr Harrer, he also outlined and analysed real life use cases of ethical and unethical LLM technology development.

“Good actors chose to follow an ethical path to building safe generative AI applications,” he said.

“Bad actors, however, are getting away with doing the opposite: hastily productising and releasing LLM-powered generative AI tools into a fast-growing commercial market they gamble with the well-being of users and the integrity of AI and knowledge databases at scale. This dynamic needs to change.”

He argues that the limitations of LLMs are systemic and rooted in their lack of language comprehension.

“The essence of efficient knowledge retrieval is to ask the right questions, and the art of critical thinking rests on one’s ability to probe responses by assessing their validity against models of the world,” Dr Harrer said.

“LLMs can perform none of these tasks. They are in-betweeners which can narrow down the vastness of all possible responses to a prompt to the most likely ones but are unable to assess whether prompt or response made sense or were contextually appropriate.”

Guiding principles

He argues that boosting training data sizes and building ever more complex LLMs will not mitigate risks, but rather amplify them. So Dr Harrer proposes a regulatory framework with 10 principles for mitigating the risks of generative AI in health.

They are:

design AI as an assistive tool for augmenting the capabilities of human decision makers, not for replacing them;
design AI to produce performance, usage and impact metrics explaining when and how AI is used to assist decision making and scan for potential bias,
study the value systems of target user groups and design AI to adhere to them;
declare the purpose of designing and using AI at the outset of any conceptual or development work,
disclose all training data sources and data features;
design AI systems to clearly and transparently label any AI-generated content as such;
ongoingly audit AI against data privacy, safety, and performance standards;
maintain databases for documenting and sharing the results of AI audits, educate users about model capabilities, limitations and risks, and improve performance and trustworthiness of AI systems by retraining and redeploying updated algorithms;
apply fair-work and safe-work standards when employing human developers;
establish legal precedence to define under which circumstances data may be used for training AI, and establish copyright, liability and accountability frameworks for governing the legal dependencies of training data, AI-generated content, and the impact of decisions humans make using such data.

“Without human oversight, guidance and responsible design and operation, LLM-powered generative AI applications will remain a party trick with substantial potential for creating and spreading misinformation or harmful and inaccurate content at unprecedented scale,” Dr Harrer said.

He predicts a shit in the current competitive LLM arms race to a phase of more nuanced and risk-conscious experimentation with research-grade generative AI applications in health, medicine and biotech that will result in the first commercial product offerings in digital health data management within two years.

“I am inspired by thinking about the transformative role generative AI and LLMs could one day play in healthcare and medicine, but I am also acutely aware that we are by no means there yet and that, despite the prevailing hype, LLM-powered generative AI may only gain the trust and endorsement of clinicians and patients if the research and development community aims for equal levels of ethical and technical integrity as it progresses this transformative technology to market maturity,” Dr Harrer said.

The full study is available here.