快捷導(dǎo)航

python爬蟲中抓取指數(shù)的實(shí)例講解

更新時(shí)間：2020年12月01日 08:36:31 作者：小妮淺淺

在本篇文章里小編給大家整理了關(guān)于python爬蟲中抓取指數(shù)的實(shí)例講解內(nèi)容，有興趣的朋友們學(xué)習(xí)下。

有一些數(shù)據(jù)我們是沒法直觀的查看的，需要通過抓取去獲得。聽到指數(shù)這個(gè)詞，有的小伙伴們覺得很復(fù)雜，似乎只在股票的時(shí)候才聽說的，比如一些數(shù)據(jù)的漲跌分析都是比較棘手的問題。不過指數(shù)對于我們的數(shù)據(jù)分析還是很有幫助的，今天小編就python爬蟲中抓取指數(shù)得方法給大家?guī)碇v解。

剛好這幾天需要用到這個(gè)爬蟲，結(jié)果發(fā)現(xiàn)baidu指數(shù)的請求有點(diǎn)變化，所以就改了改：

import requests
import sys
import time
word_url = 'http://index.baidu.com/api/SearchApi/thumbnail?area=0&word={}'
COOKIES = ''
headers = {
 'Accept': 'application/json, text/plain, */*',
 'Accept-Encoding': 'gzip, deflate',
 'Accept-Language': 'zh-CN,zh;q=0.9',
 'Cache-Control': 'no-cache',
 'Cookie': COOKIES,
 'DNT': '1',
 'Host': 'index.baidu.com',
 'Pragma': 'no-cache',
 'Proxy-Connection': 'keep-alive',
 'Referer': 'http://index.baidu.com/v2/main/index.html',
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36',
 'X-Requested-With': 'XMLHttpRequest',
}
def decrypt(t,e):
 n = list(t)
 i = list(e)
 a = {}
 result = []
 ln = int(len(n)/2)
 start = n[ln:]
 end = n[:ln]
 for j,k in zip(start, end):
  a.update({k: j})
 for j in e:
  result.append(a.get(j))
 return ''.join(result)
  
def get_ptbk(uniqid):
 url = 'http://index.baidu.com/Interface/ptbk?uniqid={}'
 resp = requests.get(url.format(uniqid), headers=headers)
 if resp.status_code != 200:
  print('獲取uniqid失敗')
  sys.exit(1)
 return resp.json().get('data')
def get_index_data(keyword, start='2011-01-03', end='2019-08-05'):
 keyword = str(keyword).replace("'", '"')
 url = f'http://index.baidu.com/api/SearchApi/index?area=0&word={keyword}&area=0&startDate={start}&endDate={end}'
 resp = requests.get(url, headers=headers)
  print('獲取指數(shù)失敗')
 content = resp.json()
 data = content.get('data')
 user_indexes = data.get('userIndexes')[0]
 uniqid = data.get('uniqid')
 ptbk = get_ptbk(uniqid)
 while ptbk is None or ptbk == '':
  ptbk = get_ptbk(uniqid)
 all_data = user_indexes.get('all').get('data')
 result = decrypt(ptbk, all_data)
 result = result.split(',')
 print(result)
if __name__ == '__main__':
 words = [[{"name": "酷安", "wordType": 1}]]
get_index_data(words)

輸出: