您当前的位置：首页 >

默默爬行的虫虫

暂无认证

4浏览

0关注

77博文

0收益
0浏览

0点赞

0打赏

0留言

私信

关注

热门博文

XPath总结笔记

默默爬行的虫虫发布时间：2022-07-05 00:22:16 ，浏览量：4

接下来我们来讲提取细节，首先加载库

from lxml import etree

提取标签内容， /text()[0]或者/string()[0] 提取标签属性值， /@属性名 *表示任意节点 ,@*表示任何属性, node()表示任意节点

在这里插入图片描述

1. 解析html流程说明

url_02 = 'https://www.qdfd.com.cn/qdweb/realweb/fh/FhProjectInfo.jsp'
data_02 = {
   'projectID': shuzi_01}
response_02 = requests.post(url_02, data=data_02,headers=header)
if response.status_code == 200:
	response_02.encoding = 'GBK'
	sleep(random.uniform(0.2, 0.3))  # 生成一个a到b的小数等待时间
	# 请求是否成功
	# print(response_02.status_code)
	
	html_02 = etree.HTML(response_02.text)
	
	
	# #/html/body/div[1]/div[2]/ul[2]/table[2]/tbody/tr[position()>1]/td[2]/a
	shuzi_2 = html_02.xpath('/html/body/div[1]/div[2]/ul[2]//tr[position()>1]/td[2]/a')

a = '''标题

    
        列表1第1项
        列表1第2项
    
    文字1
    文字2
    
        列表2第1项
        列表2第2项
    
'''

from lxml import etree
html = etree.HTML(a)
html.xpath('//title/text()')[0] # '标题'
html.xpath("//p[@class='first']//text()")[0] # '文字1'
html.xpath(


    
        
        
        
            最近更新
            
                深拷贝和浅拷贝的区别（重点）
【Vue】走进Vue框架世界
【云服务器】项目部署—搭建网站—vue电商后台管理系统
【React介绍】 一文带你深入React
【React】React组件实例的三大属性之state，props，refs（你学废了吗）
【脚手架VueCLI】从零开始，创建一个VUE项目
【React】深入理解React组件生命周期----图文详解（含代码）
【React】DOM的Diffing算法是什么？以及DOM中key的作用----经典面试题
【React】1_使用React脚手架创建项目步骤--------详解(含项目结构说明)
【React】2_如何使用react脚手架写一个简单的页面？
            
        
        
        
            热门博客
            
                优秀的代码都是如何分层的？
Spring 最常用的 7 大类注解，史上最强整理！
在IDEA里斗个地主不过分吧！
别再用currentTimeMillis统计耗时了，太 Low，试试StopWatch吧！
ping 命令还能这么玩？
最新 955 不加班的公司名单
HTTP 3.0彻底放弃TCP，TCP到底做错了什么？
为什么有些大公司技术弱爆了？
聊聊8 种架构模式，你经过几种？
同事写了一个责任链模式，bug无数...






    [ 申请 ]友情链接：
    
        搜外友链
        笔趣阁
        爱思助手
        ClashX教程
        绘画宝宝
        配音宝宝
    


    
        
            关于我们
            服务条款
            广告服务
            联系我们
            网站地图
            免责声明
            WAP
        
        技术支持：
            武汉快勤科技有限公司
            XML网站地图 
            备案号：鄂ICP备18027844号-9
            
        
    




    
        立即登录/注册
        
    
    
        
        微信扫码登录
    












	    基本
        文件
        流程
        错误
        SQL
        调试
    

		    
    
	请求信息 : 2025-07-26 17:08:57 HTTP/2.0 GET : /home/article/detail/id/435934.html
运行时间 : 0.0477s ( Load:0.0123s Init:0.0013s Exec:0.0202s Template:0.0139s )
吞吐率 : 20.96req/s
内存开销 : 1,931.44 kb
查询信息 : 16 queries 0 writes 
文件加载 : 36
缓存信息 : 5 gets 0 writes 
配置加载 : 132
会话信息 : SESSION_ID=6uom1kh22uho9l6cu21dsbv7on
    
    
        
    
	/www/wwwroot/www.chaojiit.com/index.php ( 1.30 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/ThinkPHP.php ( 4.71 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Think.class.php ( 12.32 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Storage.class.php ( 1.38 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Storage/Driver/File.class.php ( 3.56 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Mode/common.php ( 2.82 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Common/functions.php ( 51.07 KB )
/www/wwwroot/www.chaojiit.com/Application/Common/Common/function.php ( 6.83 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Hook.class.php ( 4.02 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/App.class.php ( 12.44 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Dispatcher.class.php ( 15.15 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Route.class.php ( 13.38 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Controller.class.php ( 10.95 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/View.class.php ( 7.96 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/BuildLiteBehavior.class.php ( 3.69 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ParseTemplateBehavior.class.php ( 3.89 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ContentReplaceBehavior.class.php ( 1.93 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Conf/convention.php ( 11.18 KB )
/www/wwwroot/www.chaojiit.com/Application/Common/Conf/config.php ( 1.81 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Lang/zh-cn.php ( 2.57 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Conf/debug.php ( 1.51 KB )
/www/wwwroot/www.chaojiit.com/Application/Home/Conf/config.php ( 0.05 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ReadHtmlCacheBehavior.class.php ( 5.62 KB )
/www/wwwroot/www.chaojiit.com/Application/Home/Controller/ArticleController.class.php ( 6.55 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Model.class.php ( 67.27 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Db.class.php ( 5.70 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Db/Driver/Mysql.class.php ( 8.73 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Db/Driver.class.php ( 41.60 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Cache.class.php ( 3.84 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Cache/Driver/File.class.php ( 5.90 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Template.class.php ( 28.35 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Template/TagLib/Cx.class.php ( 22.62 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Template/TagLib.class.php ( 9.19 KB )
/www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php ( 15.99 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/WriteHtmlCacheBehavior.class.php ( 1.43 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ShowPageTraceBehavior.class.php ( 5.27 KB )
    
    
        
    
	[ app_init ] --START--
Run Behavior\BuildLiteBehavior [ RunTime:0.000010s ]
[ app_init ] --END-- [ RunTime:0.000043s ]
[ app_begin ] --START--
Run Behavior\ReadHtmlCacheBehavior [ RunTime:0.000476s ]
[ app_begin ] --END-- [ RunTime:0.000517s ]
[ view_parse ] --START--
[ template_filter ] --START--
Run Behavior\ContentReplaceBehavior [ RunTime:0.000331s ]
[ template_filter ] --END-- [ RunTime:0.000387s ]
Run Behavior\ParseTemplateBehavior [ RunTime:0.010882s ]
[ view_parse ] --END-- [ RunTime:0.010913s ]
[ view_filter ] --START--
Run Behavior\WriteHtmlCacheBehavior [ RunTime:0.000207s ]
[ view_filter ] --END-- [ RunTime:0.000222s ]
[ app_end ] --START--
    
    
        
    
	[2] session_save_path(): open_basedir restriction in effect. File(/var/lib/php/session) is not within the allowed path(s): (/www/wwwroot/www.chaojiit.com/:/tmp/) /www/wwwroot/www.chaojiit.com/ThinkPHP/Common/functions.php 第 1239 行.
[8192] Array and string offset access syntax with curly braces is deprecated /www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Cache/Driver/File.class.php 第 59 行.
[8] Undefined variable: user /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 35 行.
[8] Undefined variable: user /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 137 行.
[8] Trying to access array offset on value of type null /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 137 行.
[8] Undefined variable: user /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 138 行.
[8] Trying to access array offset on value of type null /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 138 行.
[8] Undefined variable: pinglun_list /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 147 行.
[8] Undefined variable: top_list /www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php 第 182 行.
    
    
        
    
	SHOW COLUMNS FROM `configuration` [ RunTime:0.0005s ]
SELECT `value` FROM `configuration` WHERE `name` = 'site_name' LIMIT 1   [ RunTime:0.0001s ]
SHOW COLUMNS FROM `menu` [ RunTime:0.0004s ]
SELECT * FROM `menu` WHERE `fid` = 0 AND `status` = 1  [ RunTime:0.0002s ]
SELECT * FROM `menu` WHERE `fid` = 1 AND `status` = 1  [ RunTime:0.0001s ]
SELECT * FROM `menu` WHERE `fid` = 2 AND `status` = 1  [ RunTime:0.0001s ]
SELECT * FROM `menu` WHERE `fid` = 3 AND `status` = 1  [ RunTime:0.0001s ]
SELECT * FROM `menu` WHERE `fid` = 4 AND `status` = 1  [ RunTime:0.0001s ]
SHOW COLUMNS FROM `article` [ RunTime:0.0015s ]
SELECT * FROM `article` WHERE `id` = 435934 LIMIT 1   [ RunTime:0.0004s ]
SHOW COLUMNS FROM `bloger` [ RunTime:0.0005s ]
SELECT * FROM `bloger` WHERE `id` = 784 LIMIT 1   [ RunTime:0.0002s ]
SELECT COUNT(*) AS tp_count FROM `article` WHERE `bloger_id` = 784 LIMIT 1   [ RunTime:0.0001s ]
SHOW COLUMNS FROM `article_content` [ RunTime:0.0005s ]
SELECT `content` FROM `article_content` WHERE `article_id` = 435934 LIMIT 1   [ RunTime:0.0005s ]
SELECT * FROM `article` WHERE `bloger_id` = 784 ORDER BY view_count desc LIMIT 0,10   [ RunTime:0.0003s ]
    
    
        
    
	    
    
    



0.0477s