github: https://github.com/codelucas/newspaper
安装pip3 install newspaper3k
代码示例
# -*- coding: utf-8 -*-
from newspaper import Article
url = "https://news.sina.com.cn/"
article = Article(url)
article.download()
article.parse()
print(article.title)
print(article.authors)
print(article.publish_date)
print(article.top_image)
print(article.text[:50])
解析的结果和新闻页面显示的信息基本一致,如果是简单处理新闻应该可以了