PaddleOCR

发表于 2025-12-06 分类于 ocr 阅读次数：

🔍 OCR 返回字段详解（表格）

字段名	类型	含义说明	示例值
`file_path`	`str`	输入的 PDF 文件路径	`"D:\code\python\ocr\MMGraphRAG_Connecting_Vision_and_Language.pdf"`
`model_settings`	`dict`	模型推理时的配置参数	`{'use_doc_preprocessor': False, 'use_textline_orientation': False}`
`dt_polys`	`List[List[List[int]]]`	文本检测框（Detection），每个框由 4 个 `[x, y]` 坐标表示（顺时针或任意顺序），通常为四边形（支持倾斜文本）	`[[479, 209], [2264, 203], [2265, 316], [479, 322]]`
`text_det_params`	`dict`	文本检测模块的超参数（如阈值、尺寸限制等）	`{'thresh': 0.3, 'box_thresh': 0.6, 'unclip_ratio': 1.5}`
`text_type`	`str`	文本类型（如通用文本、公式、表格等）	`"general"`
`textline_orientation_angles`	`List[int]`	每行文本的旋转角度（单位：度或索引），`-1` 表示未启用或无旋转	`[-1, -1, ..., -1]`
`text_rec_score_thresh`	`float`	文本识别的置信度阈值（低于此值的文本可能被过滤）	`0.0`（表示全部保留）
`return_word_box`	`bool`	是否返回单词级而非行级的 bounding box	`False`
`rec_texts`	`List[str]`	识别出的文本内容（Recognition），与 `dt_polys` 一一对应	`["MMGraphRAG：通过可解释的多模态", ...]`
`rec_scores`	`List[float]`	每个识别文本的置信度得分（0~1）	`[0.9902, 0.9990, ...]`
`rec_polys`	`List[List[List[int]]]`	同 `dt_polys`，通常是检测和识别结果对齐后的多边形坐标	与 `dt_polys` 相同（此处未做后处理调整）
`rec_boxes`	`List[List[int]]`	简化版文本框（轴对齐矩形 bounding box），格式为 `[x_min, y_min, x_max, y_max]`	`[479, 203, 2265, 322]`