一种低亮度非均匀光照文档图片快速二值化方法
DOI:
CSTR:
作者:
作者单位:

(哈尔滨理工大学 理学院,黑龙江 哈尔滨 150000)

作者简介:

王康维(1996-),男,黑龙江省齐齐哈尔市人,硕士研究生 在读,学生,主要从事图像处理方面的研究.

通讯作者:

中图分类号:

基金项目:

大学生创新创业训练项目(201810214035)资助项目 (哈尔滨理工大学理学院,黑龙江哈尔滨 150000)


A fast binarization method for dark and uneven illumination document images
Author:
Affiliation:

(School of Applied Sciences,Harbin University of Science and Technology,Harbin,H eilongjiang Province 150000,China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    二值化是光学文字识别(OCR)的重要步骤,直接影响到光学文字识别的成功率。目前基 于亮度分割局域二值化算法效果好,但是过程复杂、运算耗时。快速二值化算法流程简单、 噪声敏感。低亮度图片一般有不可忽略的噪声,并且文字对比度低。为获取低对比度文字, 快速二值化算法需对亮度梯度敏感,但是也会导致快速二值化结果文字断裂、丢失、背景噪 声大。为实现高质量快速二值化,本文采取非局域均值滤波算法抑制噪声,同时避免过度平 滑图片。采用改进的Bradley算法提取低对比度文字,并解决了文字断裂等问题。最后采用 膨胀腐蚀算法抑制二值化噪声。本方法适用于非均匀低亮度和高亮度的图片。实验结果表明 ,本方法在非均匀高亮度下,表现和其他快速二值化算法相同。在非均匀低亮度下,提取文 字更多、文字断裂更少、噪声更小。本方法二值化结果的OCR召回率达到了93.5%。

    Abstract:

    Binarization is an important step in o ptical character recognition (OCR),directly affects the accuracy of OCR.At prese nt,the local binarization algorithms based on luminance segmentation have good e ffect,complicated process and long elapsed time.The fast binarization algorithms are simple and noise sensitive.Generally,low-luminance images have nonnegligi ble noise and low contrast of text.In order to obtain low contrast text,fast bin arization algorithms need to be sensitive to luminance gradient.However,in the b inarization result,luminance gradient sensitivity also leads to nonnegligible ba ckground noise,textual breakage and loss.In this paper,for high-quality and fas t binarization,non-local mean filtering is adopted to suppress noise and avoid over-smooth.Improved Bradley algorithm is used to extract the low contrast text in order to solve the problem of textual breakage.In the end,dilation algorithm and erosion algorithm are used to suppress the noise of binarization.Our method is suitable for uneven low luminance pictures and uneven high luminance picture s.Experimental results show that our method performs the same as other fast bina rization algorithms under uneven high luminance,however,extracts more text with less noise under uneven low luminance,solves the problem of textual breakage.The OCR recall rate of the binarization results of this method reached 93.5%.

    参考文献
    相似文献
    引证文献
引用本文

王康维,赵磊,黄鑫炎,彭玉发,马思远,范虹伯.一种低亮度非均匀光照文档图片快速二值化方法[J].光电子激光,2020,31(12):1333~1340

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-08-19
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2021-01-26
  • 出版日期:
文章二维码