BeautifulSoup中find和find_all的使用詳解

更新時間：2020年12月07日 10:13:25 作者：OCISLU

這篇文章主要介紹了BeautifulSoup中find和find_all的使用詳解，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧

爬蟲利器BeautifulSoup中find和find_all的使用方法

二話不說，先上段HTML例子

<html>
  <head>
    <title>
      index
    </title>
  </head>
  <body>
     <div>
        <ul>
           <li id="flask"class="item-0"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
          <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
          <li class="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
          <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
          <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
         </ul>
     </div>
    <li> hello world </li>
  </body>
</html>

使用BeautifulSoup前需要先構建BeautifulSoup實例

# 構建beautifulsoup實例
soup = BeautifulSoup(html,'lxml')
# 第一個參數(shù)是要匹配的內(nèi)容
# 第二個參數(shù)是beautifulsoup要采用的模塊，即規(guī)則

需要注意的是，導入對的模塊需要事先安裝，此處導入的LXML事先已經(jīng)安裝?？梢詫氲哪K可通過查詢BeautifulSoup的文檔查看

第一次插入圖片，那，我表個白，我超愛我女朋友呼延羿彤~~

接下來是find和find_all的介紹

1. find
只返回第一個匹配到的對象
語法：

find(name, attrs, recursive, text, **wargs)　　　　
# recursive 遞歸的，循環(huán)的

BeautifulSoup的find方法

參數(shù)：

參數(shù)名	作用
name	查找標簽
text	查找文本
attrs	基于attrs參數(shù)

例子：

# find查找一次
li = soup.find('li')
print('find_li:',li)
print('li.text(返回標簽的內(nèi)容):',li.text)
print('li.attrs(返回標簽的屬性):',li.attrs)
print('li.string(返回標簽內(nèi)容為字符串):',li.string)

運行結果：

find_li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li.text(返回標簽的內(nèi)容): first item
li.attrs(返回標簽的屬性): {'id': 'flask', 'class': ['item-0']}
li.string(返回標簽內(nèi)容為字符串): first item

find也可以通過‘屬性=值'的方法進行匹配

li = soup.find(id = 'flask')
print(li,'\n')

<li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>

需要注意的是，因為class是python的保留關鍵字，若要匹配標簽內(nèi)class的屬性，需要特殊的方法，有以下兩種：

在attrs屬性用字典的方式進行參數(shù)傳遞
BeautifulSoup自帶的特別關鍵字class_

# 第一種:在attrs屬性用字典進行傳遞參數(shù)
find_class = soup.find(attrs={'class':'item-1'})
print('findclass:',find_class,'\n')
# 第二種:BeautifulSoup中的特別關鍵字參數(shù)class_
beautifulsoup_class_ = soup.find(class_ = 'item-1')
print('BeautifulSoup_class_:',beautifulsoup_class_,'\n')

運行結果

findclass: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

BeautifulSoup_class_: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>

2. find_all

返回所有匹配到的結果，區(qū)別于find（find只返回查找到的第一個結果）

語法：

find_all(name, attrs, recursive, text, limit, **kwargs)

BeautifulSoup的find_all方法

參數(shù)名	作用
name	查找標簽
text	查找文本
attrs	基于attrs參數(shù)

與find一樣的語法

上代碼

# find_all 查找所有
li_all = soup.find_all('li')
for li_all in li_all:
	print('---')
	print('匹配到的li:',li_all)
	print('li的內(nèi)容:',li_all.text)
	print('li的屬性:',li_all.attrs)

運行結果：

---
匹配到的li: <li class="item-0" id="flask"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
li的內(nèi)容: first item
li的屬性: {'id': 'flask', 'class': ['item-0']}
---
匹配到的li: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
li的內(nèi)容: second item
li的屬性: {'class': ['item-1']}
---
匹配到的li: <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
li的內(nèi)容: third item
li的屬性: {'cvlass': 'item-inactie'}
---
匹配到的li: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
li的內(nèi)容: fourth item
li的屬性: {'class': ['item-1']}
---
匹配到的li: <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
</li>
li的內(nèi)容: fifth item

附上比較靈活的find_all查詢方法：

# 最靈活的使用方式
li_quick = soup.find_all(attrs={'class':'item-1'})
for li_quick in li_quick:
	print('最靈活的查找方法:',li_quick)

運行結果：

最靈活的查找方法: <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
最靈活的查找方法: <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>

完整代碼：

# coding=utf8
# @Author= CaiJunxuan
# @QQ=469590490
# @Wechat:15916454524

# beautifulsoup

# 導入beautifulsoup模塊
from bs4 import BeautifulSoup

# HTML例子
html = '''
<html>
  <head>
    <title>
      index
    </title>
  </head>
  <body>
     <div>
        <ul>
           <li id="flask"class="item-0"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
          <li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >second item</a></li>
          <li cvlass="item-inactie"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
          <li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fourth item</a></li>
          <li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" >fifth item</a>
         </ul>
     </div>
    <li> hello world </li>
  </body>
</html>
'''

# 構建beautifulsoup實例
soup = BeautifulSoup(html,'lxml')
# 第一個參數(shù)是要匹配的內(nèi)容
# 第二個參數(shù)是beautifulsoup要采用的模塊,即規(guī)則
# html.parser是python內(nèi)置的結構匹配方法，但是效率不如lxml所以不常用
# lxml 采用lxml模塊
# html5lib,該模塊可以將內(nèi)容轉換成html5對象
# 若想要以上功能,就需要具備對應的模塊，比如使用lxml就要安裝lxml

# 在bs4當中有很多種匹配方法,但常用有兩種:

# find查找一次
li = soup.find('li')
print('find_li:',li)
print('li.text(返回標簽的內(nèi)容):',li.text)
print('li.attrs(返回標簽的屬性):',li.attrs)
print('li.string(返回標簽內(nèi)容為字符串):',li.string)
print(50*'*','\n')

# find可以通過'屬性 = 值'的方法進行select
li = soup.find(id = 'flask')
print(li,'\n')
# 因為class是python的保留關鍵字，所以無法直接查找class這個關鍵字
# 有兩種方法可以進行class屬性查詢
# 第一種:在attrs屬性用字典進行傳遞參數(shù)
find_class = soup.find(attrs={'class':'item-1'})
print('findclass:',find_class,'\n')
# 第二種:BeautifulSoup中的特別關鍵字參數(shù)class_
beautifulsoup_class_ = soup.find(class_ = 'item-1')
print('BeautifulSoup_class_:',beautifulsoup_class_,'\n')

# find_all 查找所有
li_all = soup.find_all('li')
for li_all in li_all:
	print('---')
	print('匹配到的li:',li_all)
	print('li的內(nèi)容:',li_all.text)
	print('li的屬性:',li_all.attrs)

# 最靈活的使用方式
li_quick = soup.find_all(attrs={'class':'item-1'})
for li_quick in li_quick:
	print('最靈活的查找方法:',li_quick)

到此這篇關于BeautifulSoup中find和find_all的使用詳解的文章就介紹到這了,更多相關BeautifulSoup find和find_all內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

Python實現(xiàn)最常見加密方式詳解
這篇文章主要介紹了Python實現(xiàn)最常見加密方式詳解,文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下
2019-07-07
為Python的web框架編寫MVC配置來使其運行的教程
這篇文章主要介紹了為Python的web框架編寫MVC配置來使其運行的教程,示例代碼基于Python2.x版本,需要的朋友可以參考下
2015-04-04
如何用Python提取10000份log中的產(chǎn)品信息
這篇文章主要介紹了如何用Python提取10000份log中的產(chǎn)品信息，幫助大家更好的理解和使用python，感興趣的朋友可以了解下
2021-01-01
Python實現(xiàn)爬取騰訊招聘網(wǎng)崗位信息
這篇文章主要介紹了如何用python爬取騰訊招聘網(wǎng)崗位信息保存到表格，并做成簡單可視化。文中的示例代碼對學習Python有一定的幫助，感興趣的可以了解一下
2022-01-01
使用k8s部署Django項目的方法步驟
這篇文章主要介紹了使用k8s部署Django項目的方法步驟,小編覺得挺不錯的，現(xiàn)在分享給大家，也給大家做個參考。一起跟隨小編過來看看吧
2019-01-01
使用Python PIL庫讀取文件批量處理圖片大小實現(xiàn)
這篇文章主要為大家介紹了使用Python PIL庫讀取文件批量處理圖片大小實現(xiàn)示例詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進步，早日升職加薪
2023-07-07
用pandas劃分數(shù)據(jù)集實現(xiàn)訓練集和測試集
這篇文章主要介紹了用pandas劃分數(shù)據(jù)集實現(xiàn)訓練集和測試集，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧
2020-07-07
python能做什么 python的含義
在本篇文章里小編給大家整理的是關于python能做什么 python的含義的相關知識點，有需要的朋友們學習下。
2019-10-10
Python+Scipy實現(xiàn)自定義任意的概率分布
Scipy自帶了多種常見的分布，如正態(tài)分布、均勻分布、二項分布、多項分布、伽馬分布等等，還可以自定義任意的概率分布。本文將為大家介紹如何利用Scipy自定義任意的概率分布，感興趣的可以了解下
2022-08-08
工程師必須了解的LRU緩存淘汰算法以及python實現(xiàn)過程
這篇文章主要介紹了工程師必須了解的LRU緩存淘汰算法以及python實現(xiàn)過程，幫助大家更好的學習算法數(shù)據(jù)結構，感興趣的朋友可以了解下
2020-10-10