R2-06 第二阶段第一期

dean 2018-01-21 18:51:16 阅读: 471

library(httr)

baseUrl="https://eutils.ncbi.nlm.nih.gov/"

pubmedAction=list(

  base="entrez/eutils/index.fcgi",

  search="entrez/eutils/esearch.fcgi", #搜索接口

  fetch="entrez/eutils/efetch.fcgi", #获取数据接口

  summary="entrez/eutils/esummary.fcgi" #获取数据接口(fetch可返回多种数据格式)

)

#搜索文章的参数

searchArticleParam=list(

  retstart=0, #起始位置

  retmax=20, #每次取的数量

  usehistory='Y',#是否使用历史搜索

  querykey='',

  webenv='',

  term='(cell[TA]) AND 2017[DP]',#提交pubmed的词, 

  total_num=0, #总记录

  total_page=1, #总页数

  page_size=20, #每页数目

  current_page=1 #当前所在页数

)

postSearchUrl=paste(baseUrl,pubmedAction$search,sep="") #拼接搜索地址

r <- POST(postSearchUrl, 

          body = list(

            db='pubmed',

            term=searchArticleParam$term,

            retmode='json',

            retstart=searchArticleParam$retstart,

            retmax=searchArticleParam$retmax,

            usehistory=searchArticleParam$usehistory,

            rettype='uilist'

          )

)


stop_for_status(r) #清除http状态字符串

data=content(r, "parsed", "application/json") 

#data里面存储了所有数据

esearchresult=data$esearchresult

# $count=562,$retmax=20, $retstart=0,$querykey=1, $webenv=NCID_1_30290513_130.14.18.34_9001_1515165012_617859421_0MetA0_S_MegaStore_F_1

count = esearchresult$count

print(count)


searchArticleParam$total_num=esearchresult$count

searchArticleParam$querykey=esearchresult$querykey

searchArticleParam$webenv=esearchresult$webenv


pubmedidStr="28431241"; #多个pubmedid之间用“,”连接

postFetchUrl=paste(baseUrl,pubmedAction$fetch,sep="")

r2 <- POST(postFetchUrl, 

           body = list(

             db='pubmed',

             id=pubmedidStr,

             retmode='xml', #返回xml格式的,这个接口不支持json格式

             usehistory=searchArticleParam$usehistory,

             querykey=searchArticleParam$querykey,

             webenv=searchArticleParam$webenv

           )

)


stop_for_status(r2)


library(xml2)

data2=content(r2, "parsed", "application/xml")

article=xml_children(data2)

#xml_length(article)为里面文章的数量

count=length(article)

cnt=1

while(cnt<=count){ #循环将title和abstract输出

  title=xml_find_first(article[cnt],".//ArticleTitle") #找到第一个ArticleTitle节点

  abstract=xml_find_first(article[cnt],".//AbstractText")

  write.table(xml_text(title), file = "F:/R/a.txt", append =T,quote = FALSE,row.names = FALSE, col.names = FALSE)

  write.table(xml_text(abstract), file = "F:/R/a.txt", append =T,quote = FALSE,row.names = FALSE, col.names = FALSE)

  cnt = cnt + 1

}


结果:

part1

[1] 563



part2

AKT/PKB Signaling: Navigating the Network.

The Ser and Thr kinase AKT, also known as protein kinase B (PKB), was discovered 25 years ago and has been the focus of tens of thousands of studies in diverse fields of biology and medicine. There have been many advances in our knowledge of the upstream regulatory inputs into AKT, key multifunctional downstream signaling nodes (GSK3, FoxO, mTORC1), which greatly expand the functional repertoire of AKT, and the complex circuitry of this dynamically branching and looping signaling network that is ubiquitous to nearly every cell in our body. Mouse and human genetic studies have also revealed physiological roles for the AKT network in nearly every organ system. Our comprehension of AKT regulation and functions is particularly important given the consequences of AKT dysfunction in diverse pathological settings, including developmental and overgrowth syndromes, cancer, cardiovascular disease, insulin resistance and type 2 diabetes, inflammatory and autoimmune disorders, and neurological disorders. There has also been much progress in developing AKT-selective small molecule inhibitors. Improved understanding of the molecular wiring of the AKT signaling network continues to make an impact that cuts across most disciplines of the biomedical sciences.


看懂了search那部分,然后用xml解析就不明白了。照猫画虎,最终得到了结果。希望以后用的时候依然有效。

 
请登录后再评论
| 注册
{{item.nickname}} {{item.create_time}} {{item.floor}}楼
{{item.re_nickname}} 写于 {{item.re_time}}
切换到完整回复
科研狗 2015-2020 京ICP备16006621 科研好助手,专业的科研社交共享平台