Large Language Models to Understand Biomedical Text
Large Language Models such as transformer-based models have been wildly successful in setting state-of-the-art benchmarks on a broad range of natural language processing (NLP) tasks, including question answering (QA), document classification, machine translation, text summarization, and others. These successes have been replicated in the clinical and biomedical domain via pretraining language models using large-scale clinical or biomedical corpora, then fine-tuning on a variety of clinical or biomedical downstream tasks, including computational phenotyping, automatic ICD (International Classification of Diseases) coding, knowledge graph completion, and clinical QA.
Novel variants of transformers specifically for long sequences reduce memory usage from quadratic to linear scale of the sequence length. The core idea behind these models is to replace the full attention mechanism with a sparse attention mechanism, which is typically a blend of sliding windows and reduced global attention. These models are capable of processing up to significantly more words and have empirically boosted performance on NLP tasks, including QA as well as text summarization. Recently, the release of OpenAI’s free tool ChatGPT demonstrated the ability of large language models to generate content, with anticipations on its possible uses and potential controversies. Early adopters have shared their experiences on social media, with largely positive sentiments. Articles are bemoaning the death of the traditional school essay assignment, as ChatGPT has been shown to generate high-scoring papers and even articulate critical thinking. The ethical and acceptable boundaries of ChatGPT’s use in scientific writing remain unclear.
We have been doing extensive research on exploring large language models, e.g., long-sequence transformers and GPT style models, in the clinical and biomedical domains. Our work examines the adaptability of these large language models to a series of clinical NLP tasks including clinical inferencing, biomedical named entity recognition, EHR based question answering, clinical notes classification tasks etc.
Select Publications
- Slides on our recent work on Large Language Models
- Comparative study of pretrained language models for long clinical text