首先给出官方的项目与下载参考地址,
https://github.com/tesseract-ocr/tesseract
Installing TesseractYou can either Install Tesseract via pre-built binary package or build it from source.
A C++ compiler with good C++17 support is required for building Tesseract from source.
先直接下载二进制文件测试一下,这里说是好几个地方可以下载,
Downloads | tessdoc
一此比较才的版本你可以到SourceForge(Downloads Archive on SourceForge)上去下载,我是直接去的(UB Mannheim ),
Home · UB-Mannheim/tesseract Wiki · GitHub
Index of /tesseract
看了一下,二进制的最新版 是这个,
https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.1.20220118.exe
直接下载下来,解压后放到这里,
cd D:\open\tesseract501
测试了一张官方的图片,
./tesseract d:/open/testimg/eurotext.png d:/open/testresult/eurotext-eng -l eng
结果如下,
The (quick) [brown] {fox} jumps! Over the $43,456.78 #90 dog & duck/goose, as 12.5% of E-mail from aspammer@website.com is spam. Der ,,schnelle” braune Fuchs springt tiber den faulen Hund. Le renard brun «rapide» saute par-dessus le chien paresseux. La volpe marrone rapida salta sopra il cane pigro. El zorro marron rapido salta sobre el perro perezoso. A raposa marrom rapida salta sobre 0 cao preguigoso.
另外再测试第二张,
输出结果为,
Noisy, image to test Tesseract OCR
不得不说,对于扫描版OCR,tesseract的效果非常棒,速度也非常快!
本文结束。