面向解释性文字描述的盲图像质量评价

卞庭呈; 谢伍媛; 王妙辉

doi:10.3724/SP.J.1089.2025-00278

面向解释性文字描述的盲图像质量评价

Explainable Text Description-Based Blind Image Quality Assessment

摘要

摘要: 盲图像质量评估（blind image quality assessment，BIQA）模拟人类对图像质量失真水平给出预测质量分数。针对现有的单图像模态的BIQA方法在面对具有复杂失真类型和内容的自然真实图像时的表征能力受限，预测的分数未能提供解释性说明，使预测结果的可信度受到影响的问题，提出一种面向解释性文字描述的BIQA方法xBIQA。首先利用图像的失真程度和整体描述生成图像的全局质量文本，通过生成图像的局部质量文本提供图像的细节描述，将两者与提示词共同输入到大语言模型中，以获得图像质量的详细语义特征；然后将文本语义特征和图像纹理特征进行对齐与融合，回归得出图像质量分数，同时输出对应的解释性文本描述。实验结果表明，与传统基于单图像模态的BIQA方法相比，xBIQA通过大语言模型可以有效地生成与图像质量高度相关的描述文本，有助于提升基于多模态学习的BIQA模型性能；在KonIQ-10k和LIVE-Challenge公开数据集上，所提方法在SRCC指标上分别提升了1.64%和2.60%。

Abstract: Blind Image Quality Assessment (BIQA) aims to simulate human prediction of image quality distortion levels and provide quality scores. However, existing unimodal-based BIQAs have limited representational ability when facing complex contents and distortion types, and the predicted scores also fail to provide explanatory descriptions which further affects the credibility of their prediction results. To address these challenges, we propose an eXplainable Blind Image Quality Assessment (xBIQA) guided by Large Language Model (LLM). Our method leverages image distortion and overall description to generate global quality text, while local quality text is produced to provide detailed descriptions of specific areas. These global texts, local texts, and prompts are then jointly fed into an LLM to generate detailed semantic features. Compared to traditional BIQA methods based on a single image modality, our approach demonstrates that LLMs can effectively produce text descriptions highly correlated with image quality, thereby enhancing the performance of BIQA models based on multimodal learning. Then, we align and fuse the text semantic features and the image texture features, and regress to obtain the image quality score, while outputting its corresponding quality explanatory description. Experimental results show that our xBIQA performs best on the KonIQ-10k and LIVE Challenge datasets, with improvements of 1.64% and 2.60% in the SRCC metric, respectively.

HTML全文

参考文献(26)

施引文献

资源附件(1)

英文长摘要