您当前的位置: 首页 >  ide

Peter_Gao_

暂无认证

  • 2浏览

    0关注

    621博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

pyspider 安装启动

Peter_Gao_ 发布时间:2019-11-28 09:06:24 ,浏览量:2

win10 +  python3.8 + idea

在命令行中输入安装 : pip3 install pyspider

如果没有报错,可以启动:pyspider all

在浏览器输入http://localhost:5000/(这里的5000是和下方图片中红框的数字要一致),如果看到以下界面,说明启动成功了!

 

过程遇到两次报错

1. 轮子错误

在windows系统常会出现如下问题

Command "python setup.py egg_info" failed with error code 10 in

解决方法:

  1)安装依赖包:pip3 install wheel

    2)到  https://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载另一个依赖pycurl,安装

进入www.lfd.uci.edu/~gohlke/pythonlibs/,Ctrl + F查找 pycurl

 这个包名是pycurl-版本-你下载的python版本(如python3.4,就是cp34)-win32/64操作系统),选择你所需要的进行下载

3): 安装编译包,命令行输入 pip install 你下载的whl文件的位置如(d:\pycurl-7.43.1-cp34-cp34m-win_amd64.whl)

pip3 install F:\各种浏览器下载\谷歌浏览器下载\pycurl-7.43.0.3-cp37-cp37m-win_amd64.whl

S5: 继续装  pip install pyspider

 

2. 报错

async在3.7中是关键字不能作为参数了

[root@localhost python]# pyspider all [W 180629 07:08:26 run:413] phantomjs not found, continue running without it. [I 180629 07:08:29 result_worker:49] result_worker starting... [I 180629 07:08:31 processor:211] processor starting... [I 180629 07:08:31 tornado_fetcher:638] fetcher starting... [I 180629 07:08:31 scheduler:675] scheduler starting... [I 180629 07:08:31 scheduler:614] in 5m: new:0,success:0,retry:0,failed:0 [I 180629 07:08:31 scheduler:810] scheduler.xmlrpc listening on 127.0.0.1:23333 [I 180629 07:08:32 app:84] webui exiting... Traceback (most recent call last):   File "/root/.pyenv/versions/3.6.5/bin/pyspider", line 11, in     load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 754, in main     cli()   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__     return self.main(*args, **kwargs)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main     rv = self.invoke(ctx)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 1066, in invoke     return _process_result(sub_ctx.command.invoke(sub_ctx))   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke     return ctx.invoke(self.callback, **ctx.params)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke     return callback(*args, **kwargs)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func     return f(get_current_context(), *args, **kwargs)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 497, in all     ctx.invoke(webui, **webui_config)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke     return callback(*args, **kwargs)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func     return f(get_current_context(), *args, **kwargs)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 384, in webui     app.run(host=host, port=port)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/app.py", line 59, in run     from .webdav import dav_app   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/webdav.py", line 216, in     dav_app = WsgiDAVApp(config)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 122, in __init__     _check_config(config)   File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 104, in _check_config     raise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))ValueError: Invalid configuration:   - Deprecated option 'dir_browser.enable': use 'middleware_stack' instead.   - Deprecated option 'domaincontroller': use 'domain_controller' instead.

ImportError: cannot import name 'CurlasyncHTTPClient' from 'tornado.curl_httpclient' (/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/tornado/curl_httpclient.py)  

两种解决方案 1、替换关键字。  2、降低python版本

选择第一种替换一下关键字。

分别在run.py、 tornado_fetcher.py、 webui>app.py, ctrl+f 查找async替换成其它单词比如说shark 就可以了。

批量替换注意勾选 全字符匹配words 和 大小写匹配 match case.不要替换了多余的async导致的. async 分别在 1 ...site-packages/pyspider/run.py 2 .../site-packages/pyspider/fetcher/tornado_fetcher.py 将async替换成 shark

继续运行 pyspider all ValueError: Invalid configuration: - Deprecated option 'domaincontroller': use 'http_authenticator

在安装包中找到pyspider的资源包,然后找到webui文件里面的webdav.py文件打开,修改第209行即可。

将'domaincontroller': NeedAuthController(app), 改为'http_authenticator':{         'HTTPAuthenticator':NeedAuthController(app), },

注意大括号结尾后面跟着个逗号,少了这个逗号害的排查了一下午。

在安装包中找到pyspider的资源包,然后找到webui文件里面的app.py文件打开,修改第95行即可。

    'fetch': lambda x: tornado_fetcher.Fetcher(None, None, async=False).fetch(x), 改为     'fetch': lambda x: tornado_fetcher.Fetcher(None, None,  shak=False).fetch(x),  

 继续启动:

启动 pyspider 的所有组件,包括 PhantomJS、ResultWorker、Processer、Fetcher、Scheduler、WebUI,这些都是 pysipder 运行必备的组件。最后一行输出 WebUI 运行在 5000 端口上。可以打开浏览器,输入链接 http://localhost:5000,这时我们会看到启动页面。

总结: 

 这是python3.8,先导入keyword这个包,然后可以获得这样一个列表,这里面的都是不可以用作参数的特殊字符,当然变量名也是不可以使用的。

 

 

 

关注
打赏
1664521772
查看更多评论
立即登录/注册

微信扫码登录

0.0411s