Python網絡請求模塊urllib與requests使用介紹

更新時間：2022年10月11日 09:20:03 作者：Python熱愛者

網絡爬蟲的第一步就是根據URL，獲取網頁的HTML信息。在Python3中，可以使用urllib和requests進行網頁數據獲取，這篇文章主要介紹了Python網絡請求模塊urllib與requests使用

urlib 介紹

urllib.request 提供了一個 urlopen 函數，來實現(xiàn)獲取頁面。支持不同的協(xié)議、基本驗證、cookie、代理等特性。
urllib 有兩個版本 urllib 以及 urllib2。
urllib2 能夠接受 Request 對象，urllib 則只能接受 url。
urllib 提供了 urlencode 函數來對GET請求的參數進行轉碼，urllib2 沒有對應函數。
urllib 拋出了一個 URLError 和一個 HTTPError 來處理客戶端和服務端的異常情況。

Requests 介紹

Requests 是一個簡單易用的，用Python編寫的HTTP庫。這個庫讓我們能夠用簡單的參數就完成HTTP請求，而不必像 urllib 一樣自己指定參數。同時能夠自動將響應轉碼為Unicode，而且具有豐富的錯誤處理功能。

International Domains and URLs
Keep-Alive & Connection Pooling
Sessions with Cookie Persistence
Browser-style SSL Verification
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
Multipart File Uploads
Connection Timeouts
.netrc support
List item
Python 2.6—3.4
Thread-safe

以下為一些示例代碼，本文環(huán)境為 Python 3.6

無需參數直接請求單個頁面

import urllib
from urllib.request import request
from urllib.urlopen import urlopen
# import urllib2
import requests
# 使用 urllib 方式獲取
response = urllib.request.urlopen('http://www.baidu.com')
# read() 讀取的是服務器的原始返回數據 decode() 后會進行轉碼
print(response.read().decode())
# 使用 requests 方式獲取
# request 模塊相比
resp = requests.get('http://www.baidu.com')
print(resp)
print(resp.text)

HTTP 是基于請求和響應的工作模式，urllib.request 提供了一個 Request 對象來代表請求，因此上面的代碼也可以這么寫

req = urllib.request.Request('http://www.baidu.com')
with urllib.request.urlopen(req) as response:
print(response.read())

Request對象可以增加header信息

req = urllib.request.Request('http://www.baidu.com')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with urllib.request.urlopen(req) as response:
print(response.read())

或者直接將 header 傳入 Request 構建函數。

帶參數的 GET 請求

帶有參數的請求和上面的例子本質一樣，可以事先拼出URL請求字符串，然后再進行請求。

本例使用了騰訊的股票API，可以傳入不同的股票代碼以及日期，查詢對應股票在對應時間的價格、交易信息。

# 使用帶參數的接口訪問
tencent_api = "http://qt.gtimg.cn/q=sh601939"
response = urllib.request.urlopen(tencent_api)
# read() 讀取的是服務器的原始返回數據 decode() 后會進行轉碼
print(response.read())
resp = requests.get(tencent_api)
print(resp)
print(resp.text)

發(fā)送 POST 請求

urllib 沒有單獨區(qū)分 GET 和 POST 請求的函數，只是通過 Request 對象是否有 data 參數傳入來判斷。

import urllib.parse
import urllib.request
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }
data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

到此這篇關于Python網絡請求模塊urllib與requests使用介紹的文章就介紹到這了,更多相關Python urllib與requests內容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: