快捷導(dǎo)航

Python實(shí)現(xiàn)爬蟲(chóng)抓取與讀寫(xiě)、追加到excel文件操作示例

更新時(shí)間：2018年06月27日 08:48:16 作者：masterbu

這篇文章主要介紹了Python實(shí)現(xiàn)爬蟲(chóng)抓取與讀寫(xiě)、追加到excel文件操作,結(jié)合具體實(shí)例形式分析了Python針對(duì)糗事百科的抓取與Excel文件讀寫(xiě)相關(guān)操作技巧,需要的朋友可以參考下

本文實(shí)例講述了Python實(shí)現(xiàn)爬蟲(chóng)抓取與讀寫(xiě)、追加到excel文件操作。分享給大家供大家參考，具體如下：

爬取糗事百科熱門

安裝讀寫(xiě)excel 依賴 pip install xlwt安裝追加excel文件內(nèi)容依賴 pip install xlutils安裝 lxml

Python示例：

import csv
import requests
from lxml import etree
import time
import xlwt
import os
from xlutils.copy import copy
import xlrd
data_infos_list = []
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 '
         '(KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'}
# f = open('C:\\Users\\Administrator\\Desktop\\qiubaibook.csv', 'a+', newline='', encoding='utf-8')
# writer = csv.writer(f)
# writer.writerow(('author', 'sex', 'rank', 'content', 'great', 'comment', 'time'))
filename = 'C:\\Users\\Administrator\\Desktop\\qiubaibook.xls'
def get_info(url):
  res = requests.get(url, headers=headers)
  selector = etree.HTML(res.text)
  # print(res.text)
  htmls = selector.xpath('//div[contains(@class,"article block untagged mb15")]')
  # // *[ @ id = "qiushi_tag_120024357"] / a[1] / div / span 內(nèi)容
  # //*[@id="qiushi_tag_120024357"]/div[2]/span[1]/i 好笑
  # //*[@id="c-120024357"]/i 評(píng)論
  # //*[@id="qiushi_tag_120024357"]/div[1]/a[2]/h2 作者
  # //*[@id="qiushi_tag_120024357"]/div[1]/div 等級(jí)
  # // womenIcon manIcon 性別
  for html in htmls:
    author = html.xpath('div[1]/a[2]/h2/text()')
    if len(author) == 0:
      author = html.xpath('div[1]/span[2]/h2/text()')
    rank = html.xpath('div[1]/div/text()')
    sex = html.xpath('div[1]/div/@class')
    if len(sex) == 0:
      sex = '未知'
    elif 'manIcon' in sex[0]:
      sex = '男'
    elif 'womenIcon' in sex[0]:
      sex = '女'
    if len(rank) == 0:
      rank = '-1'
    contents = html.xpath('a[1]/div/span/text()')
    great = html.xpath('div[2]/span[1]/i/text()') # //*[@id="qiushi_tag_112746244"]/div[3]/span[1]/i
    if len(great) == 0:
      great = html.xpath('div[3]/span[1]/i/text()')
    comment = html.xpath('div[2]/span[2]/a/i/text()') # //*[@id="c-112746244"]/i
    if len(comment) == 0:
      comment = html.xpath('div[3]/span[2]/a/i/text()')
    # classes = html.xpath('a[1]/@class')
    # writer.writerow((author[0].strip(), sex, rank[0].strip(), contents[0].strip(), great[0].strip(),
    #         comment[0].strip(), time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))))
    data_infos = [author[0].strip(), sex, rank[0].strip(), contents[0].strip(), great[0].strip(),
           comment[0].strip(), time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))]
    data_infos_list.append(data_infos)
def write_data(sheet, row):
  for data_infos in data_infos_list:
    j = 0
    for data in data_infos:
      sheet.write(row, j, data)
      j += 1
    row += 1
if __name__ == '__main__':
  urls = ['https://www.qiushibaike.com/8hr/page/{}/'.format(num) for num in range(1, 14)]
  for url in urls:
    print(url)
    get_info(url)
    time.sleep(2)
  # 如果文件存在，則追加。如果文件不存在，則新建
  if os.path.exists(filename):
    # 打開(kāi)excel
    rb = xlrd.open_workbook(filename, formatting_info=True) # formatting_info=True 保留原有字體顏色等樣式
    # 用 xlrd 提供的方法獲得現(xiàn)在已有的行數(shù)
    rn = rb.sheets()[0].nrows
    # 復(fù)制excel
    wb = copy(rb)
    # 從復(fù)制的excel文件中得到第一個(gè)sheet
    sheet = wb.get_sheet(0)
    # 向sheet中寫(xiě)入文件
    write_data(sheet, rn)
    # 刪除原先的文件
    os.remove(filename)
    # 保存
    wb.save(filename)
  else:
    header = ['author', 'sex', 'rank', 'content', 'great', 'comment', 'time']
    book = xlwt.Workbook(encoding='utf-8')
    sheet = book.add_sheet('糗百')
    # 向 excel 中寫(xiě)入表頭
    for h in range(len(header)):
      sheet.write(0, h, header[h])
    # 向sheet中寫(xiě)入內(nèi)容
    write_data(sheet, 1)
    book.save(filename)

更多關(guān)于Python相關(guān)內(nèi)容可查看本站專題：《Python Socket編程技巧總結(jié)》、《Python正則表達(dá)式用法總結(jié)》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python函數(shù)使用技巧總結(jié)》、《Python字符串操作技巧匯總》、《Python入門與進(jìn)階經(jīng)典教程》及《Python文件與目錄操作技巧匯總》

希望本文所述對(duì)大家Python程序設(shè)計(jì)有所幫助。

您可能感興趣的文章: