Version 0.5.1
Documentation on Parse MEDLINE XML in README differs a bit from the medline_parser script.
Readme: delete : boolean if False means paper got updated so you might have two
Script: An iterator of dictionary containing information about articles in NLM format.
see parse_article_info). Articles that have been deleted will be
added with no information other than the field delete being True
I'm somewhat confused. As one seems to indicate that delete = False -> paper updated
While delete = True -> paper deleted.
But these don't seem like natural opposites. Doesn't updated mean that the previous paper was deleted?
Readme for reference:
MEDLINE XML has a different XML format than PubMed Open Access. The structure of XML files can be found in MEDLINE/PubMed DTD [here](https://www.nlm.nih.gov/databases/dtd/). You can use the function parse_medline_xml` to parse that format. This function will return list of dictionaries, where each element contains:
pmid : PubMed ID
pmc : PubMed Central ID
doi : DOI
other_id : Other IDs found, each separated by ;
title : title of the article
abstract : abstract of the article
authors : authors, each separated by ;
mesh_terms : list of MeSH terms with corresponding MeSH ID, each separated by ; e.g. 'D000161:Acoustic Stimulation; D000328:Adult; ...
publication_types : list of publication type list each separated by ; e.g. 'D016428:Journal Article'
keywords : list of keywords, each separated by ;
chemical_list : list of chemical terms, each separated by ;
pubdate : Publication date. Defaults to year information only.
journal : journal of the given paper
medline_ta : this is abbreviation of the journal name
nlm_unique_id : NLM unique identification
issn_linking : ISSN linkage, typically use to link with Web of Science dataset
country : Country extracted from journal information field
reference : string of PMID each separated by ; or list of references made to the article
delete : boolean if False means paper got updated so you might have two
languages : list of languages, separated by ;
vernacular_title: vernacular title. Defaults to empty string whenever non-available.
XMLs for the same paper. You can delete the record of deleted paper because it got updated.`
Greatful for clarification as I've hade some duplication issues
Version 0.5.1
Documentation on Parse MEDLINE XML in README differs a bit from the medline_parser script.
Readme:
delete: boolean ifFalsemeans paper got updated so you might have twoScript: An iterator of dictionary containing information about articles in NLM format.
see
parse_article_info). Articles that have been deleted will beadded with no information other than the field
deletebeingTrueI'm somewhat confused. As one seems to indicate that delete = False -> paper updated
While delete = True -> paper deleted.
But these don't seem like natural opposites. Doesn't updated mean that the previous paper was deleted?
Readme for reference:
MEDLINE XML has a different XML format than PubMed Open Access. The structure of XML files can be found in MEDLINE/PubMed DTD [here](https://www.nlm.nih.gov/databases/dtd/). You can use the functionparse_medline_xml` to parse that format. This function will return list of dictionaries, where each element contains:pmid: PubMed IDpmc: PubMed Central IDdoi: DOIother_id: Other IDs found, each separated by;title: title of the articleabstract: abstract of the articleauthors: authors, each separated by;mesh_terms: list of MeSH terms with corresponding MeSH ID, each separated by;e.g.'D000161:Acoustic Stimulation; D000328:Adult; ...publication_types: list of publication type list each separated by;e.g.'D016428:Journal Article'keywords: list of keywords, each separated by;chemical_list: list of chemical terms, each separated by;pubdate: Publication date. Defaults to year information only.journal: journal of the given papermedline_ta: this is abbreviation of the journal namenlm_unique_id: NLM unique identificationissn_linking: ISSN linkage, typically use to link with Web of Science datasetcountry: Country extracted from journal information fieldreference: string of PMID each separated by;or list of references made to the articledelete: boolean ifFalsemeans paper got updated so you might have twolanguages: list of languages, separated by;vernacular_title: vernacular title. Defaults to empty string whenever non-available.XMLs for the same paper. You can delete the record of deleted paper because it got updated.`
Greatful for clarification as I've hade some duplication issues