using R XML package
R crawl test
library(XML)
library(stringr)
totalfile=NULL
for (i in 1:247){
url=paste0("http://www.0933.me/user/40733/share/p/",i,".html")
x=readHTMLList(url)[[3]]
la=str_locate(pattern=".pdf",x)[,1]
p=str_sub(x,1,la-1)
myfile=data.frame(x=i,y=p)
totalfile=rbind(totalfile,myfile)}
library(stringr)
totalfile=NULL
for (i in 1:247){
url=paste0("http://www.0933.me/user/40733/share/p/",i,".html")
x=readHTMLList(url)[[3]]
la=str_locate(pattern=".pdf",x)[,1]
p=str_sub(x,1,la-1)
myfile=data.frame(x=i,y=p)
totalfile=rbind(totalfile,myfile)}
library(DT)
datatable(totalfile)
datatable(totalfile)
Show 10 entries
Search:
x
|
y
| |
1
|
1
|
并购之路:20个世界500强企业的并购历程_12091959
|
2
|
1
|
圆明园的“记忆遗产”样式房图档635_12802785
|
3
|
1
|
明清吴语词典
|
4
|
1
|
结晶学导论第二版_12612940
|
5
|
1
|
结晶学导论第二版_12612940
|
6
|
1
|
命好不如习惯好_11072931_哈尔滨市:哈尔滨出版社_2002_郭腾尹著_Pg196
|
7
|
1
|
中国近代航运史资料第一辑下册1840-1895A5.1542_80407929
|
8
|
1
|
成人教育心理学_10824248
|
9
|
1
|
中国经济昆虫志第30册膜翅目胡蜂总科_10507895
|
10
|
1
|
洞见世界最富创意的广告公司BBDO_12784681
|
Showing 1 to 10 of 4,939 entries