win7 x64系統(tǒng)中安裝Scrapy的方法
scrapy是用python開(kāi)發(fā)的爬蟲(chóng)框架,從網(wǎng)上查了安裝方法,感覺(jué)都說(shuō)的挺復(fù)雜,而且很多教程都很有年頭了,于是記錄了自己的安裝過(guò)程。
首先安裝python,地址:https://www.python.org/downloads/release/python-2710/,注意根據(jù)你的系統(tǒng)下64位(Windows x86-64 MSI installer)還是32位的(Windows x86 MSI installer)。
現(xiàn)在是python3.6的天下了,建議大家安裝python3版本。
裝完以后就可以安裝scrapy了,推薦使用pip方式安裝,因?yàn)閟crapy需要調(diào)用很多額外的庫(kù),pip會(huì)全部幫你安裝好,不需要你在到處翻找了。
pip在python安裝完后就已經(jīng)有了,不需要額外安裝,下面只要按照scrapy官網(wǎng)推薦的方法在命令提示符中輸入pip installscrapy(圖1),然后只需靜靜等待即可大功告成。
圖1
裝完以后可以敲入命令pip list看看已安裝的庫(kù)(圖2),出來(lái)很多啊,pip真是好東西。
圖2
現(xiàn)在試下看看建個(gè)爬蟲(chóng)項(xiàng)目,按照說(shuō)明文檔鍵入命令scrapy startproject tutorial,目錄已經(jīng)出來(lái)(圖3),看來(lái)是沒(méi)問(wèn)題了。但為了驗(yàn)證是否安裝成功,還得跑一下看看,第一次創(chuàng)建項(xiàng)目的時(shí)候,系統(tǒng)會(huì)提示可以跑個(gè)例子看看(圖4)。按照提示鍵入命令
圖3
圖4
scrapy genspider example example.com創(chuàng)建一個(gè)爬蟲(chóng),再鍵入命令scrapy crawl example
運(yùn)行爬蟲(chóng),結(jié)果如下(圖5),報(bào)錯(cuò)了,貌似是缺少win32api,立即上網(wǎng)下了一個(gè)(http://sourceforge.net/projects/pywin32/files/pywin32/Build%20219/),
圖5
下的時(shí)候注意對(duì)應(yīng)的python版本。win32api裝好以后再運(yùn)行一次爬蟲(chóng)(圖6),這次成功了,應(yīng)該是沒(méi)問(wèn)題了。
圖6
總結(jié)一下,其實(shí)剛開(kāi)始網(wǎng)上找資料的時(shí)候看到上面寫(xiě)的要先裝這個(gè)庫(kù)那個(gè)庫(kù)的時(shí)候心中很忐忑,結(jié)果發(fā)現(xiàn)不是很復(fù)雜,大多數(shù)問(wèn)題pip都給解決了,剩下的就是具體問(wèn)題具體研究,不過(guò)也沒(méi)碰到很復(fù)雜解決不了的問(wèn)題。另外吐下槽就是網(wǎng)上的教程互抄的太厲害,看著一搜一堆,其實(shí)多數(shù)都大同小異,真正有價(jià)值的沒(méi)幾個(gè),沒(méi)大腿抱就是辛苦呀。
最后說(shuō)一下,scrapy目前還不支持python3.x版本,我用的是python2.7,如果你碰到莫名其妙的問(wèn)題時(shí)請(qǐng)先看看自己有沒(méi)有裝錯(cuò)python版本。
下面是其他網(wǎng)友補(bǔ)充的文章
環(huán)境
Windows7 64位
Python2.7.6 64位
Python的安裝:
- 打開(kāi)http://www.python.org/getit/releases/2.7.6/頁(yè)面,下載Python-2.7.6.amd64.msi 進(jìn)行安裝,安裝完成后,需要配置環(huán)境變量,環(huán)境變量的配置可以參考該文章
- 測(cè)試python是否安裝成功,如果python成功安裝并且配置好環(huán)境變量,那么在cmd中輸入python,就能得到python版本的詳細(xì)信息(如32位或64位)
C:\Users\Administrator>python Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win 32
easy_install的安裝
保存ez_setup.py至本地,如D盤(pán)(如果失效了,可以參考下http://chabaoo.cn/article/151027.htm
#!/usr/bin/env python """ Setuptools bootstrapping installer. Maintained at https://github.com/pypa/setuptools/tree/bootstrap. Run this script to install or upgrade setuptools. This method is DEPRECATED. Check https://github.com/pypa/setuptools/issues/581 for more details. """ import os import shutil import sys import tempfile import zipfile import optparse import subprocess import platform import textwrap import contextlib from distutils import log try: from urllib.request import urlopen except ImportError: from urllib2 import urlopen try: from site import USER_SITE except ImportError: USER_SITE = None # 33.1.1 is the last version that supports setuptools self upgrade/installation. DEFAULT_VERSION = "33.1.1" DEFAULT_URL = "https://pypi.io/packages/source/s/setuptools/" DEFAULT_SAVE_DIR = os.curdir DEFAULT_DEPRECATION_MESSAGE = "ez_setup.py is deprecated and when using it setuptools will be pinned to {0} since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools" MEANINGFUL_INVALID_ZIP_ERR_MSG = 'Maybe {0} is corrupted, delete it and try again.' log.warn(DEFAULT_DEPRECATION_MESSAGE.format(DEFAULT_VERSION)) def _python_cmd(*args): """ Execute a command. Return True if the command succeeded. """ args = (sys.executable,) + args return subprocess.call(args) == 0 def _install(archive_filename, install_args=()): """Install Setuptools.""" with archive_context(archive_filename): # installing log.warn('Installing Setuptools') if not _python_cmd('setup.py', 'install', *install_args): log.warn('Something went wrong during the installation.') log.warn('See the error message above.') # exitcode will be 2 return 2 def _build_egg(egg, archive_filename, to_dir): """Build Setuptools egg.""" with archive_context(archive_filename): # building an egg log.warn('Building a Setuptools egg in %s', to_dir) _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir) # returning the result log.warn(egg) if not os.path.exists(egg): raise IOError('Could not build the egg.') class ContextualZipFile(zipfile.ZipFile): """Supplement ZipFile class to support context manager for Python 2.6.""" def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() def __new__(cls, *args, **kwargs): """Construct a ZipFile or ContextualZipFile as appropriate.""" if hasattr(zipfile.ZipFile, '__exit__'): return zipfile.ZipFile(*args, **kwargs) return super(ContextualZipFile, cls).__new__(cls) @contextlib.contextmanager def archive_context(filename): """ Unzip filename to a temporary directory, set to the cwd. The unzipped target is cleaned up after. """ tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) try: with ContextualZipFile(filename) as archive: archive.extractall() except zipfile.BadZipfile as err: if not err.args: err.args = ('', ) err.args = err.args + ( MEANINGFUL_INVALID_ZIP_ERR_MSG.format(filename), ) raise # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) yield finally: os.chdir(old_wd) shutil.rmtree(tmpdir) def _do_download(version, download_base, to_dir, download_delay): """Download Setuptools.""" py_desig = 'py{sys.version_info[0]}.{sys.version_info[1]}'.format(sys=sys) tp = 'setuptools-{version}-{py_desig}.egg' egg = os.path.join(to_dir, tp.format(**locals())) if not os.path.exists(egg): archive = download_setuptools(version, download_base, to_dir, download_delay) _build_egg(egg, archive, to_dir) sys.path.insert(0, egg) # Remove previously-imported pkg_resources if present (see # https://bitbucket.org/pypa/setuptools/pull-request/7/ for details). if 'pkg_resources' in sys.modules: _unload_pkg_resources() import setuptools setuptools.bootstrap_install_from = egg def use_setuptools( version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=DEFAULT_SAVE_DIR, download_delay=15): """ Ensure that a setuptools version is installed. Return None. Raise SystemExit if the requested version or later cannot be installed. """ to_dir = os.path.abspath(to_dir) # prior to importing, capture the module state for # representative modules. rep_modules = 'pkg_resources', 'setuptools' imported = set(sys.modules).intersection(rep_modules) try: import pkg_resources pkg_resources.require("setuptools>=" + version) # a suitable version is already installed return except ImportError: # pkg_resources not available; setuptools is not installed; download pass except pkg_resources.DistributionNotFound: # no version of setuptools was found; allow download pass except pkg_resources.VersionConflict as VC_err: if imported: _conflict_bail(VC_err, version) # otherwise, unload pkg_resources to allow the downloaded version to # take precedence. del pkg_resources _unload_pkg_resources() return _do_download(version, download_base, to_dir, download_delay) def _conflict_bail(VC_err, version): """ Setuptools was imported prior to invocation, so it is unsafe to unload it. Bail out. """ conflict_tmpl = textwrap.dedent(""" The required version of setuptools (>={version}) is not available, and can't be installed while this script is running. Please install a more recent version first, using 'easy_install -U setuptools'. (Currently using {VC_err.args[0]!r}) """) msg = conflict_tmpl.format(**locals()) sys.stderr.write(msg) sys.exit(2) def _unload_pkg_resources(): sys.meta_path = [ importer for importer in sys.meta_path if importer.__class__.__module__ != 'pkg_resources.extern' ] del_modules = [ name for name in sys.modules if name.startswith('pkg_resources') ] for mod_name in del_modules: del sys.modules[mod_name] def _clean_check(cmd, target): """ Run the command to download target. If the command fails, clean up before re-raising the error. """ try: subprocess.check_call(cmd) except subprocess.CalledProcessError: if os.access(target, os.F_OK): os.unlink(target) raise def download_file_powershell(url, target): """ Download the file at url to target using Powershell. Powershell will validate trust. Raise an exception if the command cannot complete. """ target = os.path.abspath(target) ps_cmd = ( "[System.Net.WebRequest]::DefaultWebProxy.Credentials = " "[System.Net.CredentialCache]::DefaultCredentials; " '(new-object System.Net.WebClient).DownloadFile("%(url)s", "%(target)s")' % locals() ) cmd = [ 'powershell', '-Command', ps_cmd, ] _clean_check(cmd, target) def has_powershell(): """Determine if Powershell is available.""" if platform.system() != 'Windows': return False cmd = ['powershell', '-Command', 'echo test'] with open(os.path.devnull, 'wb') as devnull: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except Exception: return False return True download_file_powershell.viable = has_powershell def download_file_curl(url, target): cmd = ['curl', url, '--location', '--silent', '--output', target] _clean_check(cmd, target) def has_curl(): cmd = ['curl', '--version'] with open(os.path.devnull, 'wb') as devnull: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except Exception: return False return True download_file_curl.viable = has_curl def download_file_wget(url, target): cmd = ['wget', url, '--quiet', '--output-document', target] _clean_check(cmd, target) def has_wget(): cmd = ['wget', '--version'] with open(os.path.devnull, 'wb') as devnull: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except Exception: return False return True download_file_wget.viable = has_wget def download_file_insecure(url, target): """Use Python to download the file, without connection authentication.""" src = urlopen(url) try: # Read all the data in one block. data = src.read() finally: src.close() # Write all the data in one block to avoid creating a partial file. with open(target, "wb") as dst: dst.write(data) download_file_insecure.viable = lambda: True def get_best_downloader(): downloaders = ( download_file_powershell, download_file_curl, download_file_wget, download_file_insecure, ) viable_downloaders = (dl for dl in downloaders if dl.viable()) return next(viable_downloaders, None) def download_setuptools( version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=DEFAULT_SAVE_DIR, delay=15, downloader_factory=get_best_downloader): """ Download setuptools from a specified location and return its filename. `version` should be a valid setuptools version number that is available as an sdist for download under the `download_base` URL (which should end with a '/'). `to_dir` is the directory where the egg will be downloaded. `delay` is the number of seconds to pause before an actual download attempt. ``downloader_factory`` should be a function taking no arguments and returning a function for downloading a URL to a target. """ # making sure we use the absolute path to_dir = os.path.abspath(to_dir) zip_name = "setuptools-%s.zip" % version url = download_base + zip_name saveto = os.path.join(to_dir, zip_name) if not os.path.exists(saveto): # Avoid repeated downloads log.warn("Downloading %s", url) downloader = downloader_factory() downloader(url, saveto) return os.path.realpath(saveto) def _build_install_args(options): """ Build the arguments to 'python setup.py install' on the setuptools package. Returns list of command line arguments. """ return ['--user'] if options.user_install else [] def _parse_args(): """Parse the command line for options.""" parser = optparse.OptionParser() parser.add_option( '--user', dest='user_install', action='store_true', default=False, help='install in user site package') parser.add_option( '--download-base', dest='download_base', metavar="URL", default=DEFAULT_URL, help='alternative URL from where to download the setuptools package') parser.add_option( '--insecure', dest='downloader_factory', action='store_const', const=lambda: download_file_insecure, default=get_best_downloader, help='Use internal, non-validating downloader' ) parser.add_option( '--version', help="Specify which version to download", default=DEFAULT_VERSION, ) parser.add_option( '--to-dir', help="Directory to save (and re-use) package", default=DEFAULT_SAVE_DIR, ) options, args = parser.parse_args() # positional arguments are ignored return options def _download_args(options): """Return args for download_setuptools function from cmdline args.""" return dict( version=options.version, download_base=options.download_base, downloader_factory=options.downloader_factory, to_dir=options.to_dir, ) def main(): """Install or upgrade setuptools and EasyInstall.""" options = _parse_args() archive = download_setuptools(**_download_args(options)) return _install(archive, _build_install_args(options)) if __name__ == '__main__': sys.exit(main())
在cmd中運(yùn)行:
d:\>python ez_setup.py
進(jìn)行SetupTools的安裝
在運(yùn)行的時(shí)候會(huì)發(fā)生一個(gè)錯(cuò)誤,該錯(cuò)誤為"ascii codec can't decode byte 0xe8 in position 0:ordinal not in range(128)",大意為ascii編碼不能解析byte 0xe8。
解決方法:找到并打開(kāi)python根目錄/Lib/mimetypes.py文件,在import urllib后,添加代碼:
reload(sys) sys.setdefaultencoding('gbk')
把默認(rèn)編碼方式改為gbk(網(wǎng)上有寫(xiě)用utf8的,在這個(gè)腳本中是無(wú)效的,需要改成gbk格式)。重新執(zhí)行python ez_setup.py,如果出現(xiàn)刷屏的安裝信息,則說(shuō)明安裝成功了。此時(shí),在python目錄下多了一個(gè)Script文件夾,easy_install就在里面
Scrapy依賴項(xiàng)的安裝
Scrapy的依賴項(xiàng)
安裝lxml-3.2.4.win32-py2.7.exe(64位系統(tǒng)需要安裝lxml-3.2.4.win-amd64-py2.7.exe)
安裝pywin32-218.win32-py2.7.exe(64位系統(tǒng)需要安裝pywin32-218.win-amd64-py2.7.exe)
安裝Twisted-13.2.0.win32-py2.7.exe(64位系統(tǒng)需要安裝Twisted-13.2.0.win-amd64-py2.7.exe)
安裝pyOpenSSL-0.13.1.win32-py2.7.exe(64位系統(tǒng)需要安裝pyOpenSSL-0.13.1.win-amd64-py2.7.exe)
將zope.interface-4.0.5-py2.7-win32.egg拷貝到C:\Python27\Scripts目錄下,執(zhí)行$ easy_install.exe zope.interface-4.0.5-py2.7-win32.egg
驗(yàn)證scrapy依賴項(xiàng)是否安裝成功的方法:
cmd執(zhí)行$ python進(jìn)入python控制臺(tái)
執(zhí)行import lxml,如果沒(méi)報(bào)錯(cuò),則說(shuō)明lxml安裝成功
執(zhí)行import twisted,如果沒(méi)報(bào)錯(cuò),則說(shuō)明twisted安裝成功
執(zhí)行import OpenSSL,如果沒(méi)報(bào)錯(cuò),則說(shuō)明OpenSSL安裝成功
執(zhí)行import zope.interface,如果沒(méi)報(bào)錯(cuò),則說(shuō)明zope.interface安裝成功
如果安裝成功,那么在cmd中執(zhí)行& python,然后執(zhí)行import lxml,如果沒(méi)有報(bào)錯(cuò),則說(shuō)明lxml安裝成功。
安裝Scrapy
方法1: 控制臺(tái)輸入:easy_install scrapy
方法2:解壓縮Scrapy-0.22.2.tar.gz,在其目錄下執(zhí)行$ python setup.py install進(jìn)行Scrapy的安裝。
檢查Scrapy是否安裝成功的方法:可以在cmd控制臺(tái)執(zhí)行 $ scrapy ,如果沒(méi)有報(bào)錯(cuò),說(shuō)明安裝成功。
相關(guān)文章
這篇文章就介紹到這了,需要的朋友可以參考一下。
相關(guān)文章
python實(shí)現(xiàn)不同文件夾下的函數(shù)相互調(diào)用
這篇文章主要介紹了python實(shí)現(xiàn)不同文件夾下的函數(shù)相互調(diào)用方式,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2023-08-08Django框架模板語(yǔ)言實(shí)例小結(jié)【變量,標(biāo)簽,過(guò)濾器,繼承,html轉(zhuǎn)義】
這篇文章主要介紹了Django框架模板語(yǔ)言,結(jié)合實(shí)例形式總結(jié)分析了Django框架中變量,標(biāo)簽,過(guò)濾器,繼承,html轉(zhuǎn)義等相關(guān)模板語(yǔ)言操作技巧,需要的朋友可以參考下2019-05-05python查找指定依賴包簡(jiǎn)介信息實(shí)現(xiàn)
這篇文章主要為大家介紹了python查找指定依賴包簡(jiǎn)介信息實(shí)現(xiàn)示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-02-02Python 彈窗設(shè)計(jì)小人發(fā)射愛(ài)心
今天小編就為大家分享一篇使用Python畫(huà)出小人發(fā)射愛(ài)心的代碼,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧2021-09-09Python使用OpenCV和K-Means聚類對(duì)畢業(yè)照進(jìn)行圖像分割
圖像分割是將圖像分割成多個(gè)不同區(qū)域(或片段)的過(guò)程。目標(biāo)是將圖像的表示變成更容易和更有意義的圖像。在這篇博客中,我們?cè)敿?xì)的介紹了使用方法,感興趣的可以了解一下2021-06-06PyCharm2020.1.2社區(qū)版安裝,配置及使用教程詳解(Windows)
這篇文章主要介紹了PyCharm2020.1.2社區(qū)版安裝,配置及使用教程(Windows),本文給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2020-08-08