亚洲乱码中文字幕综合,中国熟女仑乱hd,亚洲精品乱拍国产一区二区三区,一本大道卡一卡二卡三乱码全集资源,又粗又黄又硬又爽的免费视频

BeautifulSoup中find和find_all的使用詳解

 更新時間:2020年12月07日 10:13:25   作者:OCISLU  
這篇文章主要介紹了BeautifulSoup中find和find_all的使用詳解,文中通過示例代碼介紹的非常詳細,對大家的學習或者工作具有一定的參考學習價值,需要的朋友們下面隨著小編來一起學習學習吧

爬蟲利器BeautifulSoup中find和find_all的使用方法

二話不說,先上段HTML例子

<html>
  <head>
    <title>
      index
    </title>
  </head>
  <body>
     <div>
        <ul>
           <li id="flask"class="item-0"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
          <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
          <li class="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
          <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
          <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
         </ul>
     </div>
    <li> hello world </li>
  </body>
</html>

使用BeautifulSoup前需要先構(gòu)建BeautifulSoup實例

# 構(gòu)建beautifulsoup實例
soup = BeautifulSoup(html,'lxml')
# 第一個參數(shù)是要匹配的內(nèi)容
# 第二個參數(shù)是beautifulsoup要采用的模塊,即規(guī)則

需要注意的是,導入對的模塊需要事先安裝,此處導入的LXML事先已經(jīng)安裝??梢詫氲哪K可通過查詢BeautifulSoup的文檔查看

第一次插入圖片,那,我表個白,我超愛我女朋友呼延羿彤~~

接下來是find和find_all的介紹

1. find
只返回第一個匹配到的對象
語法:

find(name, attrs, recursive, text, **wargs)    
# recursive 遞歸的,循環(huán)的

BeautifulSoup的find方法

參數(shù):

參數(shù)名 作用
name 查找標簽
text 查找文本
attrs 基于attrs參數(shù)

例子:

# find查找一次
li = soup.find('li')
print('find_li:',li)
print('li.text(返回標簽的內(nèi)容):',li.text)
print('li.attrs(返回標簽的屬性):',li.attrs)
print('li.string(返回標簽內(nèi)容為字符串):',li.string)

運行結(jié)果:

find_li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li.text(返回標簽的內(nèi)容): first item
li.attrs(返回標簽的屬性): {'id': 'flask', 'class': ['item-0']}
li.string(返回標簽內(nèi)容為字符串): first item

find也可以通過‘屬性=值'的方法進行匹配

li = soup.find(id = 'flask')
print(li,'\n')
<li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li> 

需要注意的是,因為class是python的保留關鍵字,若要匹配標簽內(nèi)class的屬性,需要特殊的方法,有以下兩種:

  • 在attrs屬性用字典的方式進行參數(shù)傳遞
  • BeautifulSoup自帶的特別關鍵字class_
# 第一種:在attrs屬性用字典進行傳遞參數(shù)
find_class = soup.find(attrs={'class':'item-1'})
print('findclass:',find_class,'\n')
# 第二種:BeautifulSoup中的特別關鍵字參數(shù)class_
beautifulsoup_class_ = soup.find(class_ = 'item-1')
print('BeautifulSoup_class_:',beautifulsoup_class_,'\n')

運行結(jié)果

findclass: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

BeautifulSoup_class_: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

2. find_all

返回所有匹配到的結(jié)果,區(qū)別于find(find只返回查找到的第一個結(jié)果)

語法:

find_all(name, attrs, recursive, text, limit, **kwargs)

BeautifulSoup的find_all方法

參數(shù)名 作用
name 查找標簽
text 查找文本
attrs 基于attrs參數(shù)

與find一樣的語法

上代碼

# find_all 查找所有
li_all = soup.find_all('li')
for li_all in li_all:
	print('---')
	print('匹配到的li:',li_all)
	print('li的內(nèi)容:',li_all.text)
	print('li的屬性:',li_all.attrs)

運行結(jié)果:

---
匹配到的li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li的內(nèi)容: first item
li的屬性: {'id': 'flask', 'class': ['item-0']}
---
匹配到的li: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
li的內(nèi)容: second item
li的屬性: {'class': ['item-1']}
---
匹配到的li: <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
li的內(nèi)容: third item
li的屬性: {'cvlass': 'item-inactie'}
---
匹配到的li: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
li的內(nèi)容: fourth item
li的屬性: {'class': ['item-1']}
---
匹配到的li: <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
</li>
li的內(nèi)容: fifth item

附上比較靈活的find_all查詢方法:

# 最靈活的使用方式
li_quick = soup.find_all(attrs={'class':'item-1'})
for li_quick in li_quick:
	print('最靈活的查找方法:',li_quick)

運行結(jié)果:

  • 最靈活的查找方法: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
  • 最靈活的查找方法: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>

完整代碼:

# coding=utf8
# @Author= CaiJunxuan
# @QQ=469590490
# @Wechat:15916454524

# beautifulsoup

# 導入beautifulsoup模塊
from bs4 import BeautifulSoup

# HTML例子
html = '''
<html>
  <head>
    <title>
      index
    </title>
  </head>
  <body>
     <div>
        <ul>
           <li id="flask"class="item-0"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
          <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
          <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
          <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
          <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
         </ul>
     </div>
    <li> hello world </li>
  </body>
</html>
'''

# 構(gòu)建beautifulsoup實例
soup = BeautifulSoup(html,'lxml')
# 第一個參數(shù)是要匹配的內(nèi)容
# 第二個參數(shù)是beautifulsoup要采用的模塊,即規(guī)則
# html.parser是python內(nèi)置的結(jié)構(gòu)匹配方法,但是效率不如lxml所以不常用
# lxml 采用lxml模塊
# html5lib,該模塊可以將內(nèi)容轉(zhuǎn)換成html5對象
# 若想要以上功能,就需要具備對應的模塊,比如使用lxml就要安裝lxml

# 在bs4當中有很多種匹配方法,但常用有兩種:

# find查找一次
li = soup.find('li')
print('find_li:',li)
print('li.text(返回標簽的內(nèi)容):',li.text)
print('li.attrs(返回標簽的屬性):',li.attrs)
print('li.string(返回標簽內(nèi)容為字符串):',li.string)
print(50*'*','\n')

# find可以通過'屬性 = 值'的方法進行select
li = soup.find(id = 'flask')
print(li,'\n')
# 因為class是python的保留關鍵字,所以無法直接查找class這個關鍵字
# 有兩種方法可以進行class屬性查詢
# 第一種:在attrs屬性用字典進行傳遞參數(shù)
find_class = soup.find(attrs={'class':'item-1'})
print('findclass:',find_class,'\n')
# 第二種:BeautifulSoup中的特別關鍵字參數(shù)class_
beautifulsoup_class_ = soup.find(class_ = 'item-1')
print('BeautifulSoup_class_:',beautifulsoup_class_,'\n')

# find_all 查找所有
li_all = soup.find_all('li')
for li_all in li_all:
	print('---')
	print('匹配到的li:',li_all)
	print('li的內(nèi)容:',li_all.text)
	print('li的屬性:',li_all.attrs)

# 最靈活的使用方式
li_quick = soup.find_all(attrs={'class':'item-1'})
for li_quick in li_quick:
	print('最靈活的查找方法:',li_quick)

到此這篇關于BeautifulSoup中find和find_all的使用詳解的文章就介紹到這了,更多相關BeautifulSoup find和find_all內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家!

相關文章

最新評論