-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathMDR_README.txt
More file actions
79 lines (65 loc) · 4.54 KB
/
MDR_README.txt
File metadata and controls
79 lines (65 loc) · 4.54 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
*************************************************************************
Benchmark Corpus for Medical Device Adverse Event Detection (MDR corpus)
*************************************************************************
Abstract: The U.S. Food and Drug Administration (FDA) collects real-world adverse events, including device-associated deaths, injuries, and malfunctions, through passive reporting to the agency`s Manufacturer and User Facility Device Experience (MAUDE) database. However, this system`s full potential remains untapped given the extensive use of unstructured text in medical device adverse event reports and lack of FDA resources and expertise to properly analyze all available data. In this work, we focus on addressing this limitation through the development of an annotated benchmark corpus to support the design and development of state-of-the-art NLP approaches towards automatic extraction of device-related adverse event information from FDA Medical Device Adverse Event Reports. We develop a dataset of labeled medical device reports from a diverse set of high-risk device types, that can be used for supervised machine learning. We develop annotation guidelines and manually annotate for nine entity types. The resulting dataset contains 935 annotated adverse event reports, containing 12252 annotated spans across the nine entity types. The dataset developed in this work will be made publicly available upon publication.
--------------------------------------------------------------------------------
A detailed description of the MDR corpus can be found in the following articles:
--------------------------------------------------------------------------------
Wunnava, S., Harris, D. A., Bourgeois, F. T., & Miller, T. A. (2024, May). Development of a Benchmark Corpus for Medical Device Adverse Event Detection. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health)@ LREC-COLING 2024 (pp. 240-245).
If you use this corpus for any publication purposes, you are requested to cite the source article:
@inproceedings{wunnava2024development,
title={Development of a Benchmark Corpus for Medical Device Adverse Event Detection},
author={Wunnava, Susmitha and Harris, David A and Bourgeois, Florence T and Miller, Timothy A},
booktitle={Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health)@ LREC-COLING 2024},
pages={240--245},
year={2024}
}
-------------
Contact info:
-------------
Timothy Miller (timothy.miller@childrens.harvard.edu)
------------------
Versions available
------------------
1.0 (May, 2024)
1.1 (August, 2024)
--------------
Version Notes:
--------------
Version 1.0 (May, 2024):
Description: Original version published in Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health)@ LREC-COLING 2024.
Entities: 9
Annotations: 12252
Version 1.1 (August, 2024):
Description: Revised entity annotations to enhance quality and accuracy, ensuring consistency with the annotation guidelines.
Entities: 9
Annotations: 10410
As seen in the below table, we have fewer entities in V1.1. Some reasons why we see a shift in the entity distribution between V_1.0 and V1.1:
1) According to the annotation guidelines, only terms that are names of devices, procedures, or treatments, as opposed to general English terms like "device," "procedure," and "treatment," should be annotated as entities. Therefore, we have revised these terms and removed the annotation for the general terms "device," "procedure," and "treatment."
2) We did a correction pass to make sure certain annotation entity types adhere to the guidelines more faithfully (for example, Device_Problem, Adverse_Event, Procedure, Treatment, Indication, Other_Medical_Conditions)
Entity V_1.0 V_1.1
Adverse_Event 2993 2304
Device 3410 2837
Device_Problem 964 1381
Indication 385 530
Manufacturer 280 290
Other_Medical_Conditions 461 297
Outcome 70 68
Procedure 3144 2050
Treatment 545 653
All 12252 10410
---------------------------
MDR Corpus Characteristics:
---------------------------
The MDR corpus is distributed with ten files i.e. batches 1-10. The files are in the csv UTF-8 format.
The format of each batch file is as follows with pipe delimiters:
Column-1: report_number
Column-2: device_report_product_code
Column-3: brand_name
Column-4: event_type
Column-5: date_of_event
Column-6: manufacturer_d_name
Column-7: TEXT
Column-8: id
Column-9: label
Column-10: annotation_id