快捷導(dǎo)航

scrapy redis配置文件setting參數(shù)詳解

更新時(shí)間：2020年11月18日 14:22:07 作者：qingDT

這篇文章主要介紹了scrapy redis配置文件setting參數(shù)詳解，文中通過(guò)示例代碼介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值，需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧

scrapy項(xiàng)目 setting.py

#Resis 設(shè)置

#使能Redis調(diào)度器

SCHEDULER = 'scrapy_redis.scheduler.Scheduler'

#所有spider通過(guò)redis使用同一個(gè)去重過(guò)濾器

DUPEFILTER_CLASS = 'scrapy_redis.dupefilter.RFPDupeFilter'

#不清除Redis隊(duì)列、這樣可以暫停/恢復(fù) 爬取

#SCHEDULER_PERSIST = True

#SCHEDULER_QUEUE_CLASS ='scrapy_redis.queue.PriorityQueue' #默認(rèn)隊(duì)列，優(yōu)先級(jí)隊(duì)列
#備用隊(duì)列。
#SCHEDULER_QUEUE_CLASS ='scrapy_redis.queue.FifoQueue' #先進(jìn)先出隊(duì)列
#SCHEDULER_QUEUE_CLASS ='scrapy_redis.queue.LifoQueue' #后進(jìn)先出隊(duì)列

#最大空閑時(shí)間防止分布式爬蟲(chóng)因?yàn)榈却P(guān)閉

#SCHEDULER_IDLE_BEFORE_CLOSE = 10


#將抓取的item存儲(chǔ)在Redis中以進(jìn)行后續(xù)處理。

ITEM_PIPELINES = {
   'scrapy_redis.pipelines.RedisPipeline':300,
}

# The item pipeline serializes and stores the items in this redis key.
#item pipeline 將items 序列化 并用如下key名儲(chǔ)存在redis中

#REDIS_ITEMS_KEY = '%(spider)s:items'

#默認(rèn)的item序列化方法是ScrapyJSONEncoder，你也可以使用自定義的序列化方式

#REDIS_ITEMS_SERIALIZER = 'json.dumps'


#設(shè)置redis地址 端口 密碼

REDIS_HOST = 'localhost'
REDIS_HOST = 6379

#也可以通過(guò)下面這種方法設(shè)置redis地址 端口和密碼，一旦設(shè)置了這個(gè)，則會(huì)覆蓋上面所設(shè)置的REDIS_HOST和REDIS_HOST

 REDIS_URL = 'redis://root:redis_pass@xxx.xx.xx.xx:6379' 
 #root用戶名，redis_pass:你設(shè)置的redis驗(yàn)證密碼，xxxx:你的主機(jī)ip

#你設(shè)置的redis其他參數(shù) Custom redis client parameters (i.e.: socket timeout, etc.)
REDIS_PARAMS = {}


#自定義的redis客戶端類
#REDIS_PARAMS['redis_cls'] = 'myproject.RedisClient'

# If True, it uses redis ``zrevrange`` and ``zremrangebyrank`` operation. You have to use the ``zadd``
# command to add URLS and Scores to redis queue. This could be useful if you
# want to use priority and avoid duplicates in your start urls list.

#REDIS_START_URLS_AS_SET = False

# 默認(rèn)的RedisSpider 或 RedisCrawlSpider start urls key

#REDIS_START_URLS_KEY = '%(name)s:start_urls'

#redis的默認(rèn)encoding是utf-8，如果你想用其他編碼可以進(jìn)行如下設(shè)置：

#REDIS_ENCODING = 'latin1'

類scrapy_redis.spiders.RedisSpider使spider可以從redis數(shù)據(jù)庫(kù)中讀取URL。Redis隊(duì)列中的URL將被爬取，如果第一個(gè)請(qǐng)求產(chǎn)生更多請(qǐng)求，則spider將處理這些請(qǐng)求，然后再?gòu)腞edis中獲取另一個(gè)URL。

創(chuàng)建spider

from scrapy_redis.spiders import RedisSpider

class MySpider(RedisSpider):
  name = 'myspider'

  def parse(self, response):
    # do stuff
    pass

在redis-cli設(shè)置start_url

redis-cli lpush myspider:start_urls http://google.com

到此這篇關(guān)于scrapy redis配置文件setting參數(shù)詳解的文章就介紹到這了,更多相關(guān)scrapy redis配置setting參數(shù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: