您当前的位置：首页 > 彭世瑜 Python

Python爬虫：tesseract识别图片验证码

彭世瑜发布时间：2018-04-28 14:31:38 ，浏览量：4

安装tesseract

mac环境下：

$ brew install tesseract

测试

$ tesseract -v
tesseract 3.05.01

直接使用

$ tesseract test.png output #识别test.png的图片，把结果放到output.txt中

图片这里写图片描述

识别结果

Hello world!
1234

python接口

安装模块

 $ pip install pillow
 $ pip install pytesseract

代码实现

# -*- coding: utf-8 -*-

from PIL import Image
import pytesseract

img = Image.open("tesseract_demo.png")
img = img.convert("L")  # 转为黑白图
ret = pytesseract.image_to_string(img)
print(ret)

"""
Hello world!
1234
"""

如果没有识别出内容，使用命令行方式识别，看下报错如果报错 empty page, 加一些空白就可以

# -*- coding: utf-8 -*-

from PIL import Image
import pytesseract

img = Image.open("tesseract_demo.png")

# 新建一个比原图大一倍左右的白色画布，将原图贴到白色背景图中
image = Image.new("RGB", (300, 200), "white")
image.paste(img, (50, 50))

# 接下来要使用新合成的图
image = image.convert("L")  # 转为黑白图
ret = pytesseract.image_to_string(image)
print(ret)

参考: OCR Tesseract 识别报 empty page解决办法

识别中文

需要下载中文包： https://pan.baidu.com/s/10dP9ZJMnwX4yOVz6K4jykQ#list/path=%2F 密码 v13f 注意报错的路径，比如mac下：

/usr/local/Cellar/tesseract/3.05.01/share/tessdata/chi_sim.traineddata

把下载的文件chi_sim.traineddata 拷贝到这个目录下即可

import pytesseract
import cv2


def ocr_text(image_name):
    img = cv2.imread(image_name)
    gray = cv2.cvtColor(src=img, code=cv2.COLOR_BGR2GRAY)
    ret, th = cv2.threshold(src=gray.copy(), thresh=236, maxval=255, type=0)
    
    result = pytesseract.image_to_string(th, lang='chi_sim')
    return result


if __name__ == '__main__':
    image = "out-9-22.png"
    print(ocr_text(image))

识别效果还可以 cv2模块也可以用 PIL 替代

参考

Mac上tesseract-OCR的安装配置
python+pytesseract 中文识别

关注

打赏

1688896170

查看更多评论

暂无认证

4浏览

0关注

2727博文

0收益
0浏览

0点赞

0打赏

0留言

私信

关注

热门博文

[ 申请 ]友情链接：

传奇私服南島屋 My命理学快连vpn 快连vpn 搜外友链笔趣阁爱思助手 ClashX教程绘画宝宝配音宝宝

立即登录/注册

微信扫码登录

基本文件流程错误 SQL 调试

/www/wwwroot/www.chaojiit.com/index.php ( 1.30 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/ThinkPHP.php ( 4.71 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Think.class.php ( 12.32 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Storage.class.php ( 1.38 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Storage/Driver/File.class.php ( 3.56 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Mode/common.php ( 2.82 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Common/functions.php ( 51.07 KB )
/www/wwwroot/www.chaojiit.com/Application/Common/Common/function.php ( 6.83 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Hook.class.php ( 4.02 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/App.class.php ( 12.44 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Dispatcher.class.php ( 15.15 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Route.class.php ( 13.38 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Controller.class.php ( 10.95 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/View.class.php ( 7.96 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/BuildLiteBehavior.class.php ( 3.69 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ParseTemplateBehavior.class.php ( 3.89 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ContentReplaceBehavior.class.php ( 1.93 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Conf/convention.php ( 11.18 KB )
/www/wwwroot/www.chaojiit.com/Application/Common/Conf/config.php ( 1.81 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Lang/zh-cn.php ( 2.57 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Conf/debug.php ( 1.51 KB )
/www/wwwroot/www.chaojiit.com/Application/Home/Conf/config.php ( 0.05 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ReadHtmlCacheBehavior.class.php ( 5.62 KB )
/www/wwwroot/www.chaojiit.com/Application/Home/Controller/ArticleController.class.php ( 6.84 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Model.class.php ( 67.27 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Db.class.php ( 5.70 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Db/Driver/Mysql.class.php ( 8.73 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Db/Driver.class.php ( 41.60 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Cache.class.php ( 3.84 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Cache/Driver/File.class.php ( 5.90 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Template.class.php ( 28.35 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Template/TagLib/Cx.class.php ( 22.62 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Think/Template/TagLib.class.php ( 9.19 KB )
/www/wwwroot/www.chaojiit.com/Application/Runtime/Cache/Home/3c8a1a47a3534a7b1252c226abfc3928.php ( 15.07 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/WriteHtmlCacheBehavior.class.php ( 1.43 KB )
/www/wwwroot/www.chaojiit.com/ThinkPHP/Library/Behavior/ShowPageTraceBehavior.class.php ( 5.27 KB )

0.0478s

ShowPageTrace