Filtering / accessing date in Bio Entrez pubmed pulls with python

Question

Filtering / accessing date in Bio Entrez pubmed pulls with python

I have a list of criteria (names and date ranges when articles were published) to get a list of published articles. I am using Biopython Bio Entrez to get documents from Entrez.

I can query and get results by author name, but I don't understand how to manipulate the data to get the dates there. This is what I did:

handle = Entrez.esearch(db="pubmed", term = "" )
result = Entrez.read(handle)
handle.close()
ids = result['IdList']
print ids
#for each ids go through it and pull the summary
for uid in ids:
     handle2 = Entrez.esummary(db="pubmed", id=uid, retmode= "xml")
     result2 = Entrez.read(handle2)
     handle2.close()

The result now looks like this:

 [{'DOI': '10.1016/j.jmoldx.2013.10.002', 'Title': 'Validation of a next-generation sequencing assay for clinical molecular oncology.', 'Source': 'J Mol Diagn', 'PmcRefCount': 7, 'Issue': '1', 'SO': '2014 Jan;16(1):89-105', 'ISSN': '1525-1578', 'Volume': '16', 'FullJournalName': 'The Journal of molecular diagnostics : JMD', 'RecordStatus': 'PubMed - indexed for MEDLINE', 'ESSN': '1943-7811', 'ELocationID': 'doi: 10.1016/j.jmoldx.2013.10.002', 'Pages': '89-105', 'PubStatus': 'ppublish+epublish', 'AuthorList': ['Cottrell CE', 'Al-Kateb H', 'Bredemeyer AJ', 'Duncavage EJ', 'Spencer DH', 'Abel HJ', 'Lockwood CM', 'Hagemann IS', "O'Guin SM", 'Burcea LC', 'Sawyer CS', 'Oschwald DM', 'Stratman JL', 'Sher DA', 'Johnson MR', 'Brown JT', 'Cliften PF', 'George B', 'McIntosh LD', 'Shrivastava S', 'Nguyen TT', 'Payton JE', 'Watson MA', 'Crosby SD', 'Head RD', 'Mitra RD', 'Nagarajan R', 'Kulkarni S', 'Seibert K', 'Virgin HW 4th', 'Milbrandt J', 'Pfeifer JD'], 'EPubDate': '2013 Nov 6', 'PubDate': '2014 Jan', 'NlmUniqueID': '100893612', 'LastAuthor': 'Pfeifer JD', 'ArticleIds': {'pii': 'S1525-1578(13)00219-5', 'medline': [], 'pubmed': ['24211365'], 'eid': '24211365', 'rid': '24211365', 'doi': '10.1016/j.jmoldx.2013.10.002'}, u'Item': [], 'History': {'received': '2013/02/04 00:00', 'medline': ['2014/08/30 06:00'], 'revised': '2013/08/23 00:00', 'pubmed': ['2013/11/12 06:00'], 'aheadofprint': '2013/11/06 00:00', 'accepted': '2013/10/01 00:00', 'entrez': '2013/11/12 06:00'}, 'LangList': ['English'], 'HasAbstract': 1, 'References': ['J Mol Diagn. 2014 Jan;16(1):7-10. PMID: 24269227'], 'PubTypeList': ['Journal Article'], u'Id': '24211365'}]

I tried using Efetch, which doesn't always have xml output from what I understand. I thought I could filter dates, parsing the syntax through xml like

proj_start = '2009 Jan 01'
proj_start = time.strptime(proj_start, '%Y %b %d')
for paper in results2:
    handle = open(paper)
    record = Entrez.read(handle)
    pub_dat=time.strptime(record["EPubDate"], '%Y %b %d')

I am getting error: Traceback (last call last):

   File "<ipython-input-39-13bcded12392>", line 2, in <module>
    handle = open(paper)

  TypeError: coercing to Unicode: need string or buffer, ListElement found

I feel like I am missing something and should be able to pass this right into the request. I also don't understand why this method doesn't work, although it seems to be a more complicated way. Is there a better way to do this? I tried to do it with xml.etree but also got a similar error.

+3

python text pubmed

Jacob Ian 01 june 15 at 17:48

source to share

1 answer

maxymoo · Answer 1 · 2015-06-01T22:26:56+0000

You don't need open(paper)

: paper

- already Python dict

(mostly JSON). If you want to accept the date, you can access it like this:

paper['History']['accepted']
'2013/10/01 00:00'

Filtering / accessing date in Bio Entrez pubmed pulls with python

More articles: