Few-shot biomedical NER empowered by LLMs-assisted data augmentation and multi-scale feature extraction

Abstract Named Entity Recognition (NER) is a fundamental task in processing biomedical text.Due to the limited availability of labeled data, researchers have investigated few-shot learning methods to tackle this challenge.However, replicating the performance of fully supervised methods moen s73004srs remains difficult in few-shot scenarios.

This paper addresses two main issues.In terms of data augmentation, existing methods primarily focus on replacing content in the original text, which can potentially distort the semantics.Furthermore, current approaches often neglect sentence features at multiple scales.

To overcome these challenges, we utilize ChatGPT to generate enriched data with distinct semantics for the same entities, thereby reducing noisy data.Simultaneously, we employ dynamic blue square tablecloth convolution to capture multi-scale semantic information in sentences and enhance feature representation based on PubMedBERT.We evaluated the experiments on four biomedical NER datasets (BC5CDR-Disease, NCBI, BioNLP11EPI, BioNLP13GE), and the results exceeded the current state-of-the-art models in most few-shot scenarios, including mainstream large language models like ChatGPT.

The results confirm the effectiveness of the proposed method in data augmentation and model generalization.

Leave a Reply

Your email address will not be published. Required fields are marked *