XMLHTTP批量抓取遠(yuǎn)程資料
更新時(shí)間:2006年09月04日 00:00:00 作者:
可以在此基礎(chǔ)上結(jié)合正則表達(dá)式做成更好的效果,希望大家能分享一下Xmlhttp的session共享技術(shù)
<html>
<head>
<title>AUTOGET</title>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
</head>
<body bgcolor="#FFFFFF" style="font-family:Arial;font-size:12px">
<%
'=================================================
'FileName: Getit.Asp
'Intro : Auto Get Data From Remote WebSite
'Author: Babyt(阿泰)
'URL: http://blog.csdn.net/babyt
'createAt: 2002-02 Lastupdate:2004-09
'DB Table : data
'Table Field:
' UID -> Long -> Keep ID Of the pages
' UContent -> Text -> Keep Content Of the Pages(HTML)
'=================================================
Server.ScriptTimeout=5000
'on error resume next
Set conn = Server.createObject("ADODB.Connection")
conn.open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & Server.MapPath("getit.mdb")
Set rs = Server.createObject("ADODB.Recordset")
sql="select * from data"
rs.open sql,conn,1,3
Dim comeFrom,myErr,myCount
'========================================================
comeFrom="http://www.xxx.com/U.asp?ID="
myErr1="該資料不存在"
myErr2="該資料已隱藏"
'========================================================
'***************************************************************
' 只需要更改這里 i 的始點(diǎn)intMin和終點(diǎn)intMax,設(shè)定步長(zhǎng)intStep
' 每次區(qū)間設(shè)置成5萬(wàn)左右。估計(jì)要兩個(gè)多小時(shí)。期間不需要人工干預(yù)
'****************************************************************
intMin=0
intMax=10000
'設(shè)定步長(zhǎng)
intStep=100
'==========================================================
'以下代碼不要更改
'==========================================================
Call GetPart (intMin)
Response.write "已經(jīng)轉(zhuǎn)換完成" & intMin & "~~" & intMax & "之間的數(shù)據(jù)"
rs.close
Set rs=Nothing
conn.Close
set conn=nothing
%>
</body>
</html>
<%
'使用XMLHTTP抓取地址并進(jìn)次內(nèi)容處理
Function GetBody(Url)
Dim objXML
On Error Resume Next
Set objXML = createObject("Microsoft.XMLHTTP")
With objXML
.Open "Get", Url, False, "", ""
.Send
GetBody = .ResponseBody
End With
GetBody=BytesToBstr(GetBody,"GB2312")
Set objXML = Nothing
End Function
'使用Adodb.Stream處理二進(jìn)制數(shù)據(jù)
Function BytesToBstr(strBody,CodeBase)
dim objStream
set objStream = Server.createObject("Adodb.Stream")
objStream.Type = 1
objStream.Mode =3
objStream.Open
objStream.Write strBody
objStream.Position = 0
objStream.Type = 2
objStream.Charset = CodeBase
BytesToBstr = objStream.ReadText
objStream.Close
set objStream = nothing
End Function
'主函數(shù)
Function GetPart(iStart)
Dim iGo
time1=timer()
myCount=0
For iGo=iStart To iStart+intStep
If iGo<=intMax Then
Response.Execute comeFrom & iGo
'進(jìn)行簡(jiǎn)單的數(shù)據(jù)處理
content = GetBody(comeFrom & iGo )
content = Replace(content,chr(34),""")
If instr(content,myErr1) OR instr(content,myErr2) Then
'跳過(guò)錯(cuò)誤信息
Else
'寫入數(shù)據(jù)庫(kù)
rs.AddNew
rs("UID")=iGo
'********************************
rs("UContent")=Replace(content,""",chr(34))
'*********************************
rs.update
myCount=myCount+1
Response.Write iGo & "<BR>"
Response.Flush
End If
Else
Response.write "<font color=red>成功抓取"&myCount&"條記錄,"
time2=timer()
Response.write "耗時(shí):" & Int(FormatNumber((time2-time1)*1000000,3)) & " 秒</font><BR>"
Response.Flush
Exit Function
End If
Next
Response.write "<font color=red>成功抓取"&myCount&"條記錄,"
time2=timer()
Response.write "耗時(shí):" & CInt(FormatNumber((time2-time1),3)) & " 秒</font><BR>"
Response.Flush
'遞歸
GetPart(iGo+1)
End Function%>
<html>
<head>
<title>AUTOGET</title>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
</head>
<body bgcolor="#FFFFFF" style="font-family:Arial;font-size:12px">
<%
'=================================================
'FileName: Getit.Asp
'Intro : Auto Get Data From Remote WebSite
'Author: Babyt(阿泰)
'URL: http://blog.csdn.net/babyt
'createAt: 2002-02 Lastupdate:2004-09
'DB Table : data
'Table Field:
' UID -> Long -> Keep ID Of the pages
' UContent -> Text -> Keep Content Of the Pages(HTML)
'=================================================
Server.ScriptTimeout=5000
'on error resume next
Set conn = Server.createObject("ADODB.Connection")
conn.open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & Server.MapPath("getit.mdb")
Set rs = Server.createObject("ADODB.Recordset")
sql="select * from data"
rs.open sql,conn,1,3
Dim comeFrom,myErr,myCount
'========================================================
comeFrom="http://www.xxx.com/U.asp?ID="
myErr1="該資料不存在"
myErr2="該資料已隱藏"
'========================================================
'***************************************************************
' 只需要更改這里 i 的始點(diǎn)intMin和終點(diǎn)intMax,設(shè)定步長(zhǎng)intStep
' 每次區(qū)間設(shè)置成5萬(wàn)左右。估計(jì)要兩個(gè)多小時(shí)。期間不需要人工干預(yù)
'****************************************************************
intMin=0
intMax=10000
'設(shè)定步長(zhǎng)
intStep=100
'==========================================================
'以下代碼不要更改
'==========================================================
Call GetPart (intMin)
Response.write "已經(jīng)轉(zhuǎn)換完成" & intMin & "~~" & intMax & "之間的數(shù)據(jù)"
rs.close
Set rs=Nothing
conn.Close
set conn=nothing
%>
</body>
</html>
<%
'使用XMLHTTP抓取地址并進(jìn)次內(nèi)容處理
Function GetBody(Url)
Dim objXML
On Error Resume Next
Set objXML = createObject("Microsoft.XMLHTTP")
With objXML
.Open "Get", Url, False, "", ""
.Send
GetBody = .ResponseBody
End With
GetBody=BytesToBstr(GetBody,"GB2312")
Set objXML = Nothing
End Function
'使用Adodb.Stream處理二進(jìn)制數(shù)據(jù)
Function BytesToBstr(strBody,CodeBase)
dim objStream
set objStream = Server.createObject("Adodb.Stream")
objStream.Type = 1
objStream.Mode =3
objStream.Open
objStream.Write strBody
objStream.Position = 0
objStream.Type = 2
objStream.Charset = CodeBase
BytesToBstr = objStream.ReadText
objStream.Close
set objStream = nothing
End Function
'主函數(shù)
Function GetPart(iStart)
Dim iGo
time1=timer()
myCount=0
For iGo=iStart To iStart+intStep
If iGo<=intMax Then
Response.Execute comeFrom & iGo
'進(jìn)行簡(jiǎn)單的數(shù)據(jù)處理
content = GetBody(comeFrom & iGo )
content = Replace(content,chr(34),""")
If instr(content,myErr1) OR instr(content,myErr2) Then
'跳過(guò)錯(cuò)誤信息
Else
'寫入數(shù)據(jù)庫(kù)
rs.AddNew
rs("UID")=iGo
'********************************
rs("UContent")=Replace(content,""",chr(34))
'*********************************
rs.update
myCount=myCount+1
Response.Write iGo & "<BR>"
Response.Flush
End If
Else
Response.write "<font color=red>成功抓取"&myCount&"條記錄,"
time2=timer()
Response.write "耗時(shí):" & Int(FormatNumber((time2-time1)*1000000,3)) & " 秒</font><BR>"
Response.Flush
Exit Function
End If
Next
Response.write "<font color=red>成功抓取"&myCount&"條記錄,"
time2=timer()
Response.write "耗時(shí):" & CInt(FormatNumber((time2-time1),3)) & " 秒</font><BR>"
Response.Flush
'遞歸
GetPart(iGo+1)
End Function%>
相關(guān)文章
ASP采集入庫(kù)生成本地文件的幾個(gè)函數(shù)
ASP采集入庫(kù)生成本地文件的幾個(gè)函數(shù)...2006-06-06XMLHttp ASP遠(yuǎn)程獲取網(wǎng)頁(yè)內(nèi)容代碼
asp下利用xmlhttp獲取網(wǎng)頁(yè)內(nèi)容的方法這個(gè)方法一般比較通用的,然后通過(guò)字符截取網(wǎng)頁(yè)的內(nèi)容。2008-11-11發(fā)一個(gè)采集(小偷)用的類,ASP+緩存實(shí)現(xiàn)
發(fā)一個(gè)采集(小偷)用的類,ASP+緩存實(shí)現(xiàn)...2007-02-02用XML+FSO+JS實(shí)現(xiàn)服務(wù)器端文件的選擇效果
用XML+FSO+JS實(shí)現(xiàn)服務(wù)器端文件的選擇效果實(shí)現(xiàn)代碼,需要的朋友可以參考下2006-06-06圖片自動(dòng)保存到本地并利用aspjpeg為圖片加水印
圖片自動(dòng)保存到本地并利用aspjpeg為圖片加水印...2006-07-07網(wǎng)站生成靜態(tài)頁(yè)面,及網(wǎng)站數(shù)據(jù)采集的攻、防原理和策略
網(wǎng)站生成靜態(tài)頁(yè)面,及網(wǎng)站數(shù)據(jù)采集的攻、防原理和策略...2006-09-09光碟工具 Alcohol 120% v1.9.6.4719 下載(附序列號(hào)注冊(cè)碼)
光碟工具 Alcohol 120% v1.9.6.4719 下載(附序列號(hào)注冊(cè)碼)...2007-02-02