您当前的位置: 首页 >  爬虫

38 爬虫 - BeautifulSoup4遍历文档树

杨林伟 发布时间:2019-08-30 09:25:27 ,浏览量:2

1. 直接子节点 :.contents .children 属性 1.1 .content

tag 的 .content 属性可以将tag的子节点以列表的方式输出

print soup.head.contents 
#[The Dormouse's story]

输出方式为列表,我们可以用列表索引来获取它的某一个元素

print soup.head.contents[0]
#The Dormouse's story
1.2 .children

它返回的不是一个 list,不过我们可以通过遍历获取所有子节点。

我们打印输出 .children 看一下,可以发现它是一个 list 生成器对象

print soup.head.children
#

for child in  soup.body.children:
    print child

结果:

The Dormouse's story

Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.

...
2. 所有子孙节点: .descendants属性

.contents.children 属性仅包含tag的直接子节点,.descendants 属性可以对所有tag的子孙节点进行递归循环,和 children类似,我们也需要遍历获取其中的内容。

for child in soup.descendants:
    print child

运行结果:


The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...

The Dormouse's story
The Dormouse's story
The Dormouse's story



The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...



The Dormouse's story
The Dormouse's story
The Dormouse's story


Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
Once upon a time there were three little sisters; and their names were


 Elsie 
,

Lacie
Lacie
 and

Tillie
Tillie
;
and they lived at the bottom of a well.


...
...
3. 节点内容: .string 属性

如果tag只有一个 NavigableString 类型子节点,那么这个tag可以使用 .string 得到子节点。如果一个tag仅有一个子节点,那么这个tag也可以使用 .string 方法,输出结果与当前唯一子节点的 .string 结果相同。

通俗点说就是:如果一个标签里面没有标签了,那么 .string 就会返回标签里面的内容。如果标签里面只有唯一的一个标签了,那么 .string 也会返回最里面的内容。例如:

print soup.head.string
#The Dormouse's story
print soup.title.string
#The Dormouse's story
关注
打赏
1688896170
查看更多评论

杨林伟

暂无认证

  • 2浏览

    0关注

    3183博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文
立即登录/注册

微信扫码登录

0.0826s