安装splash
1、安装docker(参考:mac安装docker) 2、安装splash
docker pull scrapinghub/splash # 安装
docker run -p 8050:8050 scrapinghub/splash # 运行
访问测试: http://localhost:8050/
import requests
import time
from scrapy import Selector
def timer(func):
def inner(*args):
start = time.time()
response = func(*args)
print("time: %s" % (time.time() - start))
return response
return inner
@timer
def use_request(url):
return requests.get(url)
@timer
def use_splash(url):
splash_url = "http://localhost:8050/render.html"
args = {
"url": url,
"timeout": 5,
"image": 0
}
return requests.get(splash_url, params=args)
if __name__ == '__main__':
url = "http://quotes.toscrape.com/js/"
r1 = use_request(url)
sel1 = Selector(r1)
text = sel1.css(".quote .text::text").extract_first()
print(text)
r2 = use_splash(url)
sel2 = Selector(r2)
text = sel2.css(".quote .text::text").extract_first()
print(text)
"""
time: 0.632809877396
None
time: 0.685022830963
“The world as we have created it is a process of our thinking.
It cannot be changed without changing our thinking.”
"""
通过测试,发现需要splash对网页进行了渲染,获取到了数据,而且速度还很快
args参数说明: url: 需要渲染的页面地址 timeout: 超时时间 proxy:代理 wait:等待渲染时间 images: 是否下载,默认1(下载) js_source: 渲染页面前执行的js代码
参考 Scrapy-Splash的介绍、安装以及实例