Large Language Models – Digital Health Global https://www.digitalhealthglobal.com digital health tools and services Mon, 25 Mar 2024 11:53:23 +0000 en-GB hourly 1 https://wordpress.org/?v=5.8 https://www.digitalhealthglobal.com/wp-content/uploads/2018/05/faviconDHI.png Large Language Models – Digital Health Global https://www.digitalhealthglobal.com 32 32 Evaluating biases in Language Models for clinical use: a study on GPT-4 https://www.digitalhealthglobal.com/evaluating-biases-in-language-models-for-clinical-use-a-study-on-gpt-4/ Mon, 08 Jan 2024 14:32:29 +0000 https://www.digitalhealthglobal.com/?p=12070 Healthcare professionals have increasingly been exploring the potential of Large Language Models (LLMs) like GPT-4 to revolutionize aspects of patient care, from streamlining administrative tasks to enhancing clinical decision-making.

Despite their potential, language models can encode biases, impacting historically marginalized groups.

A recent study published by The Lancet Digital Health has shed light on potential pitfalls, emphasizing the need for cautious integration into healthcare settings.

The study

In this study, researchers delved into whether GPT-4 harbors racial and gender biases that could adversely impact its utility in healthcare applications. Using the Azure OpenAI interface, the team assessed four key aspects: medical education, diagnostic reasoning, clinical plan generation, and subjective patient assessment.

To simulate real-world scenarios, researchers employed clinical vignettes from NEJM Healer and drew from existing research on implicit bias in healthcare. The study aimed to gauge how GPT-4’s estimations aligned with the actual demographic distribution of medical conditions, comparing the model’s outputs with true prevalence estimates in the United States.

The study assessed GPT-4’s potential biases in medical applications, including medical education, diagnostic reasoning, clinical plan generation, and subjective patient assessment.

  • Simulating Patients for Medical Education:
    • GPT-4 was evaluated for creating patient presentations based on specific diagnoses, revealing biases in demographic portrayals.
    • The analysis included 18 diagnoses, assessing GPT-4’s ability to model demographic diversity and comparing the generated cases with true prevalence estimates.
    • Various prompts and geographical factors were considered, and strategies for de-biasing prompts were explored.
  • Constructing Differential Diagnoses and Clinical Treatment Plans:
    • GPT-4’s response to medical education cases was analyzed, evaluating the impact of demographics on diagnostic and treatment recommendations.
    • Cases from NEJM Healer and additional scenarios were used, examining the effect of gender and race on GPT-4’s outputs.
    • Two specific cases, acute dyspnea and pharyngitis in a sexually active teenager, underwent a more in-depth analysis.
  • Assessing Subjective Features of Patient Presentation:
    • GPT-4’s perceptions were examined using case vignettes designed to assess implicit bias in registered nurses.
    • Changes in race, ethnicity, and gender were introduced to measure the impact on GPT-4’s clinical decision-making abilities across various statements and categories.
    • The study aimed to identify significant differences in GPT-4’s agreement with statements based on demographic factors.

The results were concerning.

GPT-4 consistently generated clinical vignettes that perpetuated stereotypes related to demographic presentations, failing to accurately model the diversity of medical conditions.
The differential diagnoses provided by the model were more likely to include stereotypical associations with certain races, ethnicities, and genders.
Additionally, assessments and plans created by GPT-4 revealed significant associations between demographic attributes and recommendations for more costly procedures, as well as variations in patient perception.

These findings underscore the critical importance of subjecting LLM tools like GPT-4 to thorough and transparent bias assessments before their integration into clinical care. The study discusses potential sources of biases and proposes mitigation strategies to ensure responsible and ethical use in healthcare settings.

Priscilla Chan and Mark Zuckerberg provided funding for this research, which actively calls on the healthcare community to approach the integration of advanced language models with caution and a commitment to mitigating biases for improved patient care.

]]>
Large Language Models: Revolutionizing Unstructured Data Analysis in Healthcare https://www.digitalhealthglobal.com/large-language-models-revolutionizing-unstructured-data-analysis-in-healthcare/ Thu, 05 Oct 2023 11:48:00 +0000 https://www.digitalhealthglobal.com/?p=13108 In the big world of health care, the amount of unstructured data: medical records, clinical research papers, scientific publications, clinical trials, etc., can be overwhelming.

Extracting valuable information and knowledge from this unstructured data has long been a challenge, hindering the progress of medical research, diagnosis, and patient care.

However, with the advent of large-scale language models (LLMs), a breakthrough has occurred. These powerful artificial intelligence models have broken barriers, paving the way for unprecedented advances in the analysis of unstructured data in healthcare.

These models have incredible potential and are already transforming the data analytics landscape.

What are Large Language Models?

In recent years, large language models have emerged as an innovative development in artificial intelligence (AI) technology and natural language processing (NLP), transforming several fields.

They are designed to process and understand human language by exploiting large amounts of textual data. By learning patterns, relationships, and contextual information from this data, these models gain the ability to generate coherent and contextually appropriate responses and perform various language-related tasks.

LLMs are built with interconnected artificial neurons that imitate the human brain. They undergo extensive training on enormous datasets containing billions of sentences from diverse sources like books, articles, and websites.

Another vital aspect of these models is their immense number of parameters, which can range from millions to billions. These parameters enable the models to grasp the intricacies of language, resulting in the generation of contextually relevant and high-quality text.

Real-world Examples of Large Language Models in Healthcare Analytics

Disease diagnosis and treatment recommendations
In a study, researchers trained a language model using a large amount of medical literature and medical records. The model was then used to analyze complex patient cases, accurately diagnosing rare diseases and recommending tailored treatment strategies based on the latest research findings.

Literature review and evidence synthesis
Researchers have used these models to analyze large volumes of scientific literature, enabling comprehensive reviews and evidence-based assessments. By automating the extraction and synthesis of information, language models accelerate the identification of relevant studies, summarize key findings, and support evidence-based decision making.

Medical image analysis and radiology
In many scenarios, models can interpret radiology reports and extract key findings, aiding radiologists in diagnosis. They can also help with automatic report generation, reducing reporting time and improving workflow efficiency in radiology.

Mental health support and chatbots
These models have been integrated into mental health support systems and chatbots, providing personalized assistance and resources to people. They are also able to initiate natural language conversations, understand emotional nuances, and provide support, information, and referrals for mental health issues.

Integrating Large Language Models in Life Sciences

LLMs are not easily replicable, nor affordable for all organizations. The energy cost of training GPT-4 has been close to $100 million and rising in proportion to the complexity of the model itself. Thus, large IT companies, including Google, Amazon and OpenAI (sponsored by Microsoft, and others) have been the only players to have entered this space.

Users are therefore forced to work with these pre-trained models, limited to simple “fine tuning” with respect to their needs. However, for very specific domains, it is crucial to recognize that the results and performance may differ substantially from expectations.

Healthcare is a knowledge domain where many of the documents (scientific publications, etc.) are publicly available, and, therefore, large language models are already trained and seem to work well. When, however, we submit private and very specialized documents, performance may change and the LLM may not recognize concepts, such as: active ingredients, or names of molecules, or development processes that are internal knowledge.

Often implemented by universities and research centers, some LLMs, such as Google BERT, have been specialized, with additional training on certain areas, and released to the open-source community: BioBERT, MedBERT, SciBERT; and more recently, BioGPT, a verticalized version on biomedical concepts of the well-known GPT, have been released as well.

Therefore, it is important to have awareness and understanding of the scope of the intended use cases to choose the most suitable model, without getting dragged into the mainstream ChatGPT.

The right process of development can thus be summarized as:

  • Identify the right use case: Assess business operations to identify areas where an LLM can add value.
  • Select the appropriate model: Choose an LLM that fits your needs, considering the complexity of the task, model capabilities and resource requirements.
  • Prepare and fine-tune data: Collect and if necessary, pre-process relevant data to fine-tune the chosen model to ensure that it is aligned with the business context and produces accurate, domain-specific results.
  • Plan integration with existing systems: Perform the integration of an LLM into existing business processes and technology infrastructure.
  • Monitor and evaluate performance: Continuously monitor the performance of the implemented LLM, using metrics such as accuracy, response time, and user satisfaction to identify areas for improvement.
  • Ethical and privacy considerations: Take into account potential ethical and privacy issues related to AI implementation, while ensuring compliance with data protection regulations and responsible use of AI technologies.
  • Promote a culture of AI adoption: Encourage understanding and acceptance of AI technologies throughout the company by providing training and resources for employees to embrace and leverage LLMs.

Encouraging further exploration and experimentation

Ongoing research, development and testing of language models are essential to fully unlock their potential in health data analytics, to ensure that data privacy and security standards are met and to promote responsible use of AI technologies. While the seamless integration of language models with existing healthcare systems and workflows is critical for widespread adoption. By developing interoperable platforms and APIs that allow easy access to the models and facilitate integration with electronic health records, clinical decision support systems, and other healthcare applications, the potential impact and usability of large language models can be maximized.

It’s clear that these technologies have disrupted the landscape of healthcare data analytics, providing healthcare providers with advanced capabilities to extract information, thus improving care, and driving medical research.

The way forward with Healthware

For years, we at Healthware have been following the evolution of artificial intelligence and have utilized our machine learning and data science expertise to help our customers.

The new LLM-based tools offer us and our customers new ways to accelerate, enhance, and develop processes, products, and projects. They won’t make professionals obsolete, instead; they will empower them to work faster and more efficiently.

Our senior developers are already utilizing ChatGPT to speed up development work. Instead of researching documentation, the developer can ask the chatbot to help create a new component, which they can then review and integrate into the codebase. Chatbots are especially useful for more senior developers, who can adequately review the code and ensure it is suitable, working, and secure.

This approach allowed our designer to focus entirely on the core design. Looking toward the future, we could ask Chatbots to generate ideas or sketches of these characters. Ultimately, this design approach expedited the discovery process, allowing the designers to find the correct style and refine it.

And the number of opportunities just keeps growing. We can generate audio, video, text, images, code, and more with the current tools. These usually cannot be used as-is at the moment but are great drafts that our experts can finalize. And as the technology evolves, more and more final production content can be generated with these tools. They have already opened up a new skillset of growing importance in the future: prompt hacking. I.e., the ability to ask the right questions with the proper context in the right way from the right chatbot to get the best possible results. 

]]>