python實(shí)現(xiàn)多線程并得到返回值的示例代碼
一、帶有返回值的多線程
1.1 實(shí)現(xiàn)代碼
# -*- coding:utf-8 -*- """ 作者:wyt 日期:2022年04月21日 """ import threading import requests import time urls = [ f'https://www.cnblogs.com/#p{page}' # 待爬地址 for page in range(1, 10) # 爬取1-10頁(yè) ] def craw(url): r = requests.get(url) num = len(r.text) # 爬取博客園當(dāng)頁(yè)的文字?jǐn)?shù) return num # 返回當(dāng)頁(yè)文字?jǐn)?shù) def sigle(): # 單線程 res = [] for i in urls: res.append(craw(i)) return res class MyThread(threading.Thread): # 重寫threading.Thread類,加入獲取返回值的函數(shù) def __init__(self, url): threading.Thread.__init__(self) self.url = url # 初始化傳入的url def run(self): # 新加入的函數(shù),該函數(shù)目的: self.result = craw(self.url) # ①。調(diào)craw(arg)函數(shù),并將初試化的url以參數(shù)傳遞——實(shí)現(xiàn)爬蟲功能 # ②。并獲取craw(arg)函數(shù)的返回值存入本類的定義的值result中 def get_result(self): #新加入函數(shù),該函數(shù)目的:返回run()函數(shù)得到的result return self.result def multi_thread(): print("start") threads = [] # 定義一個(gè)線程組 for url in urls: threads.append( # 線程組中加入賦值后的MyThread類 MyThread(url) # 將每一個(gè)url傳到重寫的MyThread類中 ) for thread in threads: # 每個(gè)線程組start thread.start() for thread in threads: # 每個(gè)線程組join thread.join() list = [] for thread in threads: list.append(thread.get_result()) # 每個(gè)線程返回結(jié)果(result)加入列表中 print("end") return list # 返回多線程返回的結(jié)果組成的列表 if __name__ == '__main__': start_time = time.time() result_multi = multi_thread() print(result_multi) # 輸出返回值-列表 # result_sig = sigle() # print(result_sig) end_time = time.time() print('用時(shí):', end_time - start_time)
1.2 結(jié)果
單線程:
多線程:
加速效果明顯。
二、實(shí)現(xiàn)過(guò)程
2.1 一個(gè)普通的爬蟲函數(shù)
import threading import requests import time urls = [ f'https://www.cnblogs.com/#p{page}' # 待爬地址 for page in range(1, 10) # 爬取1-10頁(yè) ] def craw(url): r = requests.get(url) num = len(r.text) # 爬取博客園當(dāng)頁(yè)的文字?jǐn)?shù) print(num) def sigle(): # 單線程 res = [] for i in urls: res.append(craw(i)) return res def multi_thread(): print("start") threads = [] # 定義一個(gè)線程組 for url in urls: threads.append( threading.Thread(target=craw,args=(url,)) # 注意args=(url,),元組 ) for thread in threads: # 每個(gè)線程組start thread.start() for thread in threads: # 每個(gè)線程組join thread.join() print("end") if __name__ == '__main__': start_time = time.time() result_multi = multi_thread() # result_sig = sigle() # print(result_sig) end_time = time.time() print('用時(shí):', end_time - start_time)
返回:
start
69915
69915
69915
69915
69915
69915
69915
69915
69915
end
用時(shí): 0.316709041595459
2.2 一個(gè)簡(jiǎn)單的多線程傳值實(shí)例
import time from threading import Thread def foo(number): time.sleep(1) return number class MyThread(Thread): def __init__(self, number): Thread.__init__(self) self.number = number def run(self): self.result = foo(self.number) def get_result(self): return self.result if __name__ == '__main__': thd1 = MyThread(3) thd2 = MyThread(5) thd1.start() thd2.start() thd1.join() thd2.join() print(thd1.get_result()) print(thd2.get_result())
返回:
3
5
2.3 實(shí)現(xiàn)重點(diǎn)
多線程入口
threading.Thread(target=craw,args=(url,)) # 注意args=(url,),元組
多線程傳參
需要重寫一下threading.Thread類,加一個(gè)接收返回值的函數(shù)。
三、代碼實(shí)戰(zhàn)
使用這種帶返回值的多線程技術(shù)重寫了一下之前發(fā)布過(guò)的一個(gè)爬取子域名的代碼,原始代碼在這里:http://chabaoo.cn/article/254460.htm
import threading import requests from bs4 import BeautifulSoup from static.plugs.headers import get_ua #https://cn.bing.com/search?q=site%3Abaidu.com&go=Search&qs=ds&first=20&FORM=PERE def search_1(url): Subdomain = [] html = requests.get(url, stream=True, headers=get_ua()) soup = BeautifulSoup(html.content, 'html.parser') job_bt = soup.findAll('h2') for i in job_bt: link = i.a.get('href') # print(link) if link not in Subdomain: Subdomain.append(link) return Subdomain class MyThread(threading.Thread): def __init__(self, url): threading.Thread.__init__(self) self.url = url def run(self): self.result = search_1(self.url) def get_result(self): return self.result def Bing_multi_thread(site): print("start") threads = [] for i in range(1, 30): url = "https://cn.bing.com/search?q=site%3A" + site + "&go=Search&qs=ds&first=" + str( (int(i) - 1) * 10) + "&FORM=PERE" threads.append( MyThread(url) ) for thread in threads: thread.start() for thread in threads: thread.join() res_list = [] for thread in threads: res_list.extend(thread.get_result()) res_list = list(set(res_list)) #列表去重 number = 1 for i in res_list: number += 1 number_list = list(range(1, number + 1)) dict_res = dict(zip(number_list, res_list)) print("end") return dict_res if __name__ == '__main__': print(Bing_multi_thread("qq.com"))
返回:
{
1:'https://transmart.qq.com/index',
2:'https://wpa.qq.com/msgrd?v=3&uin=448388692&site=qq&menu=yes',
3:'https://en.exmail.qq.com/',
4:'https://jiazhang.qq.com/wap/com/v1/dist/unbind_login_qq.shtml?source=h5_wx',
5:'http://imgcache.qq.com/',
6:'https://new.qq.com/rain/a/20220109A040B600',
7:'http://cp.music.qq.com/index.html',
8:'http://s.syzs.qq.com/',
9:'https://new.qq.com/rain/a/20220321A0CF1X00',
10:'https://join.qq.com/about.html',
11:'https://live.qq.com/10016675',
12:'http://uni.mp.qq.com/',
13:'https://new.qq.com/omn/TWF20220/TWF2022042400147500.html',
14:'https://wj.qq.com/?from=exur#!',
15:'https://wj.qq.com/answer_group.html',
16:'https://view.inews.qq.com/a/20220330A00HTS00',
17:'https://browser.qq.com/mac/en/index.html',
18:'https://windows.weixin.qq.com/?lang=en_US',
19:'https://cc.v.qq.com/upload',
20:'https://xiaowei.weixin.qq.com/skill',
21:'http://wpa.qq.com/msgrd?v=3&uin=286771835&site=qq&menu=yes',
22:'http://huifu.qq.com/',
23:'https://uni.weixiao.qq.com/',
24:'http://join.qq.com/',
25:'https://cqtx.qq.com/',
26:'http://id.qq.com/',
27:'http://m.qq.com/',
28:'https://jq.qq.com/?_wv=1027&k=pevCjRtJ',
29:'https://v.qq.com/x/page/z0678c3ys6i.html',
30:'https://live.qq.com/10018921',
31:'https://m.campus.qq.com/manage/manage.html',
32:'https://101.qq.com/',
33:'https://new.qq.com/rain/a/20211012A0A3L000',
34:'https://live.qq.com/10021593',
35:'https://pc.weixin.qq.com/?t=win_weixin&lang=en',
36:'https://sports.qq.com/lottery/09fucai/cqssc.htm'
}
非常非常非常能感受到速度快了超級(jí)多,用這種方式爆破子域名也比較爽。沒(méi)有多線程,我的項(xiàng)目里可能缺少了好幾個(gè)功能:因?yàn)橹皩戇^(guò)的一些程序都因執(zhí)行時(shí)間過(guò)長(zhǎng)被我砍掉。這個(gè)功能還是很實(shí)用的。
四、學(xué)習(xí)
B站python-多線程教程:https://www.bilibili.com/video/BV1bK411A7tV
到此這篇關(guān)于python實(shí)現(xiàn)多線程并得到返回值的示例代碼的文章就介紹到這了,更多相關(guān)python多線程得到返回值內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
python將數(shù)據(jù)插入數(shù)據(jù)庫(kù)的代碼分享
在本篇文章里小編給大家整理的是關(guān)于python將數(shù)據(jù)插入數(shù)據(jù)庫(kù)的代碼內(nèi)容,有興趣的朋友們可以參考下。2020-08-08python實(shí)現(xiàn)AI聊天機(jī)器人詳解流程
事情是這樣的,最近認(rèn)識(shí)的一位小姐姐有每天早晨看天氣預(yù)報(bào)的習(xí)慣。在我看來(lái),很多人起床第一件事情就是看微信消息,既然這樣,我就勉為其難每天早晨給小姐姐發(fā)送一則天氣預(yù)報(bào)吧2021-11-11Python爬蟲動(dòng)態(tài)ip代理防止被封的方法
在本篇文章中小編給大家整理了關(guān)于Python爬蟲動(dòng)態(tài)ip代理防止被封的方法以及實(shí)例代碼,需要的朋友們學(xué)習(xí)下。2019-07-07kNN算法python實(shí)現(xiàn)和簡(jiǎn)單數(shù)字識(shí)別的方法
這篇文章主要介紹了kNN算法python實(shí)現(xiàn)和簡(jiǎn)單數(shù)字識(shí)別的方法,詳細(xì)講述了kNN算法的優(yōu)缺點(diǎn)及原理,并給出了應(yīng)用實(shí)例,需要的朋友可以參考下2014-11-11Python 整行讀取文本方法并去掉readlines換行\(zhòng)n操作
這篇文章主要介紹了Python 整行讀取文本方法并去掉readlines換行\(zhòng)n操作,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2020-09-09python 爬取免費(fèi)簡(jiǎn)歷模板網(wǎng)站的示例
這篇文章主要介紹了python 爬取免費(fèi)簡(jiǎn)歷模板網(wǎng)站的示例,幫助大家更好的理解和使用python 爬蟲,感興趣的朋友可以了解下2020-09-09python 偷懶技巧——使用 keyboard 錄制鍵盤事件
這篇文章主要介紹了python如何使用 keyboard 錄制鍵盤事件,幫助大家提高工作效率,感興趣的朋友可以了解下2020-09-09