解決Python3 抓取微信賬單信息問(wèn)題
這段時(shí)間有個(gè)朋友想導(dǎo)出微信里面的賬單信息,后來(lái)發(fā)現(xiàn)微信的反爬蟲(chóng)還是很厲害的,花了點(diǎn)時(shí)間去分析。
一、采用傳統(tǒng)模擬http抓取
抓取的主要URL:https://wx.tenpay.com/userroll/userrolllist,其中后面帶上三個(gè)參數(shù),具體參數(shù)見(jiàn)代碼,其中exportkey這參數(shù)是會(huì)過(guò)期的,userroll_encryption和userroll_pass_ticket 這兩個(gè)參數(shù)需要從cookie中獲得,應(yīng)該是作為獲取數(shù)據(jù)的標(biāo)識(shí),通過(guò)抓包也看不出端倪,應(yīng)該是微信程序內(nèi)部生成的,如果使用微信開(kāi)發(fā)著工具登錄后直接訪問(wèn)網(wǎng)址有的時(shí)候可以訪問(wèn)返回?cái)?shù)據(jù),但是只是在較短的時(shí)間內(nèi)有效,而且當(dāng)返回會(huì)話超時(shí)后,繼續(xù)使用網(wǎng)頁(yè)訪問(wèn)就會(huì)被限制,一直提示會(huì)話超時(shí),應(yīng)該是在網(wǎng)頁(yè)和移動(dòng)端中exportkey有不同的時(shí)間和訪問(wèn)次數(shù)的限制。
之后想通過(guò)破解seesion的方式,研究了一下,發(fā)現(xiàn)這是不可能的,想要破解session需要搞定wx.login,而wx.login是微信提供的,想要破解難度應(yīng)該不用我說(shuō)了。
二、解決exportkey 這個(gè)key和Cookie的獲取
需要的工具:
1、安卓/蘋(píng)果手機(jī)
2、Fiddler(抓包工具)
搞過(guò)爬蟲(chóng)的都知道Fiddler,具體操作就不多說(shuō)了,設(shè)置好代理和開(kāi)啟Fiddler后,抓取url中的exportkey和相應(yīng)的Cookie,用于接下來(lái)的數(shù)據(jù)抓取。
三、上代碼
代碼寫(xiě)的不是很好,若有錯(cuò)誤還望各位大大指正。
# coding:utf-8 import datetime import time import urllib import urllib.request import json import sys import io import ssl from DBController import DBController #數(shù)據(jù)庫(kù) #設(shè)置系統(tǒng)編碼格式 sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') #解決訪問(wèn)Https時(shí)不受信任SSL證書(shū)問(wèn)題 ssl._create_default_https_context = ssl._create_unverified_context class MainCode: def __init__(self, url=""): self.url = url self.dbController = DBController() # 數(shù)據(jù)庫(kù)控制 self.userroll_encryption = "uoxQXsCenowxj0G0ppRKBg8iHRPZwZKaUZB0ka1Y5apUuQnKkZTsA/2RMhBPGyMdiHS8QXk8y2JeLgqTPqZPU9fkrCUp+TIQPkHH/uExAwKeBFLute0ztdHaC6GJUJ2+/R8NGWGe16hSKc6L1+LvAw==" self.userroll_pass_ticket = "V7oum4glDbdaAwibC8mcuTizGIKmC9A/Y/V12qASuDALdRMveHcRHv1QXamFk27Z" # self.last_bill_id = "" # self.last_bill_type = "" # self.last_create_time = "" # self.last_trans_id = "" self.last_item = {} self.num= 0 #獲取網(wǎng)頁(yè)信息 def get_html(self, url, maxTryNum=5): goon = True # 網(wǎng)絡(luò)中斷標(biāo)記 obj = {} for tryNum in range(maxTryNum): try: # print(self.token) header = { "Accept": 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', "Accept-Encoding":'gzip, deflate, br', "Accept-Language":'zh-CN,zh;q=0.8', "Cache-Control":'max-age=0', "Connection": "keep-alive", "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 10_2 like Mac OS X) AppleWebKit/602.3.12 (KHTML, like Gecko) Mobile/14C92 Safari/601.1 wechatdevtools/1.02.1810240 MicroMessenger/6.5.7 Language/zh_CN webview/15415760070117398 webdebugger port/32594", "Cookie":"userroll_encryption="+self.userroll_encryption+"; userroll_pass_ticket="+self.userroll_pass_ticket, "Host":"wx.tenpay.com", "Upgrade-Insecure-Requests":"1", } req = urllib.request.Request(url=url, headers=header) # 訪問(wèn)網(wǎng)址 result = urllib.request.urlopen(req, timeout=5).read() break except urllib.error.HTTPError as e: if tryNum < (maxTryNum - 1): print("嘗試連接請(qǐng)求" + str(tryNum + 1)) # host = self.host2 time.sleep(5) else: print('Internet Connect Error!', "Error URL:" + url) goon = False break if goon: page = result.decode('utf-8') obj = json.loads(page) #print(obj) #print(page) else: print("--------------------------") return obj #保存到數(shù)據(jù)庫(kù) def save_info_to_db(self, item): select_sql = "SELECT count(*)as num FROM wx_order2 where trans_id = '%s'" % (item["trans_id"]) results = self.dbController.ExecuteSQL_Select(select_sql) if int(results[0][0]) == 0: sql = "INSERT INTO wx_order2 (bill_id, bill_type, classify_type, fee, fee_type, out_trade_no, pay_bank_name, payer_remark, remark, order_time, title, total_refund_fee, trans_id,fee_attr) VALUES ( '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s', '%s','%s','%s')" % ( str(item['bill_id']), str(item['bill_type']), str(item['classify_type']), str(item['fee']), str(item['fee_type']) , str(item['out_trade_no']), str(item['pay_bank_name']), str(item['payer_remark']), str(item['remark']), str(item['order_time']), str(item['title']), str(item['total_refund_fee']), str(item['trans_id']), str(item['fee_attr']) ) # print(sql) try: self.dbController.ExecuteSQL_Insert(sql) # self.log.info("插入數(shù)據(jù)成功") except Exception as e: print("save_info_to_db:",e) return #從獲取的網(wǎng)頁(yè)信息中過(guò)濾所需要的信息 def get_data(self,url): res_obj = self.get_html(url) this_page_num = 0 #若返回的ret_code== 0 則說(shuō)明獲取數(shù)據(jù)成功 if res_obj['ret_code'] == 0: record_list = res_obj['record'] self.last_bill_id = res_obj['last_bill_id'] self.last_bill_type = res_obj['last_bill_type'] self.last_create_time = res_obj['last_create_time'] self.last_trans_id = res_obj['last_trans_id'] num = 1 this_page_num = len(record_list) # order = record_list[i] for order in record_list: bill_id = order['bill_id'] bill_type = order['bill_type'] classify_type = order['classify_type'] fee = order['fee'] #賬單金額 fee = fee * 0.01 fee = round(fee, 2) #對(duì)金額保留兩位小數(shù) fee_type = order['fee_type'] #金額類型 out_trade_no = order['out_trade_no'] #賬單編號(hào) pay_bank_name = order['pay_bank_name'] #支付的銀行 payer_remark =order['payer_remark'] #支付說(shuō)明 remark = order['remark'] #賬單說(shuō)明 order_time = datetime.datetime.fromtimestamp(order['timestamp']) #將時(shí)間戳轉(zhuǎn)為時(shí)間 title = order['title'] #賬單標(biāo)題 title = title.replace(',','').replace('.','').replace("'",'') #去除英文逗號(hào)和單引號(hào) total_refund_fee = "0" trans_id = order['trans_id'] fee_attr = order['fee_attr'] #title = self.remove_emoji(title) fee_attr = order['fee_attr'] pay_type = "" if bill_type == 1: pay_type= "支付" elif bill_type == 2: pay_type = "充值" elif bill_type == 4: pay_type = "轉(zhuǎn)賬" elif bill_type == 6: pay_type="紅包" else: pay_type = str(bill_type) if fee_attr == "positive": fee_attr = "收入" elif fee_attr == "negtive": fee_attr = "支出" elif fee_attr == "neutral": fee_attr = "提現(xiàn)" item = {} item['bill_id'] = bill_id item['bill_type'] =bill_type item['classify_type'] = classify_type item['fee'] = fee item['fee_type'] = fee_type item['out_trade_no'] = out_trade_no item['pay_bank_name'] = pay_bank_name item['payer_remark'] = payer_remark item['remark'] = remark item['order_time'] = order_time item['title'] = title item['total_refund_fee'] = total_refund_fee item['trans_id'] = trans_id item['fee_attr'] = fee_attr # title = self.remove_emoji(title) if bill_id != '': self.last_item['last_bill_id'] = bill_id self.last_item['last_bill_type'] = bill_type self.last_item['last_create_time'] = order['timestamp'] self.last_item['last_trans_id'] = trans_id try: print(str(self.num),self.last_item,end='\n') self.num += 1 time.sleep(0.2) self.save_info_to_db(item) #print(str(num)+" 時(shí)間:" + str(order_time) + " 賬單標(biāo)題:" + title + " 說(shuō)明:"+ str(remark)+ " " +str(pay_type) +"金額:" + str(fee) + " 支付方式:"+ str(pay_bank_name)+" 類型:" + str(pay_type) +" fee_attr:"+str(fee_attr)+ '\n',end='') except Exception as e: print(e,end='\n') num = num+1 else:#若獲取數(shù)據(jù)不成功,打印原因 print(res_obj) return this_page_num #實(shí)例化 maincode = MainCode(); #設(shè)置Cookie參數(shù) maincode.userroll_encryption = "6Ow68aKrAz70mEczqeevA2gOXbr9H2a7+2ite6uuyWFdB6j1+SLhlaCNpYA6RjmaOI7IfCi9PXjQsrZPFIs1SMn38Uxr04GJsxMuSO/9wG+eBFLute0ztdHaC6GJUJ2+vmo+JIw351su8RiFxSagwA==" maincode.userroll_pass_ticket = "i0Co+55KSEjmFjfFZqMG14hasW4qtKFtbj0FiErcSzHY0afkFqHGib3YfsAZWcaG" #用于非第一頁(yè)的數(shù)據(jù)抓取 #maincode.last_item['last_bill_id'] = "2ce3d65b20a10700b2048d68" #maincode.last_item['last_bill_type'] = "4" #maincode.last_item['last_create_time'] = "1540809516" #maincode.last_item['last_trans_id'] = "1000050201201810290100731805325" #設(shè)置每次返回的數(shù)量 count = "20" #exportkey 需要從Fiddler 抓包獲取,有一定的時(shí)間限制 exportkey ="A%2BsIJaTGZksgZWPLtSKiyos%3D" #抓取的URL url ="https://wx.tenpay.com/userroll/userrolllist?classify_type=0&count="+count+"&exportkey="+exportkey+"&sort_type=1" for page in range(0,10): #記錄當(dāng)前頁(yè)返回的數(shù)據(jù)數(shù)量 this_page_num = 0 #第一頁(yè) if page == 0: this_page_num = maincode.get_data(url) #從第二頁(yè)開(kāi)始需要增加上一頁(yè)最后一個(gè)item的部分參數(shù),進(jìn)行下一頁(yè)的數(shù)據(jù)的抓取 else: url = "https://wx.tenpay.com/userroll/userrolllist?classify_type=0&count="+count+"&exportkey="+exportkey+"&sort_type=1"+"&last_bill_id="+str(maincode.last_item['last_bill_id'])+"&last_bill_type="+str(maincode.last_item['last_bill_type'])+"&last_create_time="+str(maincode.last_item['last_create_time'])+"&last_trans_id="+str(maincode.last_item['last_trans_id'] + "&start_time="+str(maincode.last_item['last_create_time'])) print(url) this_page_num = maincode.get_data(url) #如果數(shù)量少于20個(gè)則跳出循環(huán),抓取結(jié)束 if this_page_num < 20: break time.sleep(0.5) print(maincode.last_item)
因?yàn)槭菐团笥炎ト〉模軐?shí)現(xiàn)就可以了。之后若有需要再繼續(xù)優(yōu)化代碼吧!
總結(jié)
以上所述是小編給大家介紹的Python3 抓取微信賬單信息,希望對(duì)大家有所幫助,如果大家有任何疑問(wèn)請(qǐng)給我留言,小編會(huì)及時(shí)回復(fù)大家的。在此也非常感謝大家對(duì)腳本之家網(wǎng)站的支持!
如果你覺(jué)得本文對(duì)你有幫助,歡迎轉(zhuǎn)載,煩請(qǐng)注明出處,謝謝!
相關(guān)文章
PyQt5 實(shí)現(xiàn)給窗口設(shè)置背景圖片的方法
今天小編就為大家分享一篇PyQt5 實(shí)現(xiàn)給窗口設(shè)置背景圖片的方法,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-06-06Python中的yeild關(guān)鍵字提高代碼執(zhí)行效率場(chǎng)景實(shí)例探究
在Python編程語(yǔ)言中,yeild是一個(gè)非常實(shí)用的關(guān)鍵字,它不僅可以幫助你編寫(xiě)更加簡(jiǎn)潔的代碼,還可以提高代碼的執(zhí)行效率,本文將詳細(xì)介紹yeild在Python中的使用方法,并通過(guò)示例代碼進(jìn)行演示,讓我們一起來(lái)探索這個(gè)強(qiáng)大的關(guān)鍵字吧2024-01-01使用PyInstaller將Pygame庫(kù)編寫(xiě)的小游戲程序打包為exe文件及出現(xiàn)問(wèn)題解決方法
這篇文章主要介紹了使用PyInstaller將Pygame庫(kù)編寫(xiě)的小游戲程序打包為exe文件的方法,給大家介紹了通過(guò)Pyinstaller打包Pygame庫(kù)寫(xiě)的小游戲程序出現(xiàn)的問(wèn)題及解決方法,非常不錯(cuò),具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2019-09-09Python3 串口接收與發(fā)送16進(jìn)制數(shù)據(jù)包的實(shí)例
今天小編就為大家分享一篇Python3 串口接收與發(fā)送16進(jìn)制數(shù)據(jù)包的實(shí)例,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2019-06-06python實(shí)現(xiàn)進(jìn)度條的多種實(shí)現(xiàn)
這篇文章主要介紹了python實(shí)現(xiàn)進(jìn)度條的多種實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2021-04-04使用Python實(shí)現(xiàn)視頻轉(zhuǎn)音頻與音頻轉(zhuǎn)文本
這篇文章主要為大家詳細(xì)介紹了使用Python實(shí)現(xiàn)視頻轉(zhuǎn)音頻與音頻轉(zhuǎn)文本的相關(guān)知識(shí),文中的示例代碼簡(jiǎn)潔易懂,有需要的小伙伴可以參考一下2024-02-02解決Python import .pyd 可能遇到路徑的問(wèn)題
這篇文章主要介紹了解決Python import .pyd 可能遇到路徑的問(wèn)題,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-03-03python常用的各種排序算法原理與實(shí)現(xiàn)方法小結(jié)
這篇文章主要介紹了python常用的各種排序算法原理與實(shí)現(xiàn)方法,結(jié)合實(shí)例形式總結(jié)分析了冒泡排序、插入排序、選擇排序、快速排序等排序算法的相關(guān)原理與實(shí)現(xiàn)方法,需要的朋友可以參考下2023-04-04