win10 + python3.8 + idea
在命令行中输入安装 : pip3 install pyspider
如果没有报错,可以启动:pyspider all
在浏览器输入http://localhost:5000/(这里的5000是和下方图片中红框的数字要一致),如果看到以下界面,说明启动成功了!
过程遇到两次报错
1. 轮子错误
在windows系统常会出现如下问题
Command "python setup.py egg_info" failed with error code 10 in
解决方法:
1)安装依赖包:pip3 install wheel
2)到 https://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载另一个依赖pycurl,安装
进入www.lfd.uci.edu/~gohlke/pythonlibs/,Ctrl + F查找 pycurl
这个包名是pycurl-版本-你下载的python版本(如python3.4,就是cp34)-win32/64操作系统)
,选择你所需要的进行下载
3): 安装编译包,命令行输入 pip install 你下载的whl文件的位置如(d:\pycurl-7.43.1-cp34-cp34m-win_amd64.whl)
pip3 install F:\各种浏览器下载\谷歌浏览器下载\pycurl-7.43.0.3-cp37-cp37m-win_amd64.whl
S5: 继续装 pip install pyspider
2. 报错
async在3.7中是关键字不能作为参数了
[root@localhost python]# pyspider all [W 180629 07:08:26 run:413] phantomjs not found, continue running without it. [I 180629 07:08:29 result_worker:49] result_worker starting... [I 180629 07:08:31 processor:211] processor starting... [I 180629 07:08:31 tornado_fetcher:638] fetcher starting... [I 180629 07:08:31 scheduler:675] scheduler starting... [I 180629 07:08:31 scheduler:614] in 5m: new:0,success:0,retry:0,failed:0 [I 180629 07:08:31 scheduler:810] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 180629 07:08:32 app:84] webui exiting... Traceback (most recent call last): File "/root/.pyenv/versions/3.6.5/bin/pyspider", line 11, in load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')() File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 754, in main cli() File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__ return self.main(*args, **kwargs) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main rv = self.invoke(ctx) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke return ctx.invoke(self.callback, **ctx.params) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 497, in all ctx.invoke(webui, **webui_config) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke return callback(*args, **kwargs) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, **kwargs) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 384, in webui app.run(host=host, port=port) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/app.py", line 59, in run from .webdav import dav_app File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/webdav.py", line 216, in dav_app = WsgiDAVApp(config) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 122, in __init__ _check_config(config) File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 104, in _check_config raise ValueError("Invalid configuration:\n - " + "\n - ".join(errors))ValueError: Invalid configuration: - Deprecated option 'dir_browser.enable': use 'middleware_stack' instead. - Deprecated option 'domaincontroller': use 'domain_controller' instead.
ImportError: cannot import name 'CurlasyncHTTPClient' from 'tornado.curl_httpclient' (/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tornado/curl_httpclient.py)
两种解决方案 1、替换关键字。 2、降低python版本
选择第一种替换一下关键字。
分别在run.py、 tornado_fetcher.py、 webui>app.py, ctrl+f 查找async替换成其它单词比如说shark 就可以了。
批量替换注意勾选 全字符匹配words 和 大小写匹配 match case.不要替换了多余的async导致的. async 分别在 1 ...site-packages/pyspider/run.py 2 .../site-packages/pyspider/fetcher/tornado_fetcher.py 将async替换成 shark
继续运行 pyspider all ValueError: Invalid configuration: - Deprecated option 'domaincontroller': use 'http_authenticator
在安装包中找到pyspider的资源包,然后找到webui文件里面的webdav.py文件打开,修改第209行即可。
将'domaincontroller': NeedAuthController(app), 改为'http_authenticator':{ 'HTTPAuthenticator':NeedAuthController(app), },
注意大括号结尾后面跟着个逗号,少了这个逗号害的排查了一下午。
在安装包中找到pyspider的资源包,然后找到webui文件里面的app.py文件打开,修改第95行即可。
'fetch': lambda x: tornado_fetcher.Fetcher(None, None, async=False).fetch(x), 改为 'fetch': lambda x: tornado_fetcher.Fetcher(None, None, shak=False).fetch(x),
继续启动:
启动 pyspider 的所有组件,包括 PhantomJS、ResultWorker、Processer、Fetcher、Scheduler、WebUI,这些都是 pysipder 运行必备的组件。最后一行输出 WebUI 运行在 5000 端口上。可以打开浏览器,输入链接 http://localhost:5000,这时我们会看到启动页面。
总结:
这是python3.8,先导入keyword这个包,然后可以获得这样一个列表,这里面的都是不可以用作参数的特殊字符,当然变量名也是不可以使用的。