CMMF-Net: A Generative network based on CLIP-guided multi-modal feature fusion for thermal infrared image colorization

Jiang, Qian and Zhou, Tao and He, Youwei and Ma, Wenjun and Hou, Jingyu and Ahmad Shahrizan, Abdul Ghani and Miao, Shengfa and Jing, Xin (2025) CMMF-Net: A Generative network based on CLIP-guided multi-modal feature fusion for thermal infrared image colorization. Intelligence and Robotics, 5 (1). pp. 34-49. ISSN 2770-3541. (Published)

Preview

Pdf
CMMF-Net_A Generative network based on CLIP-guided.pdf
Available under License Creative Commons Attribution.
Download (1MB) | Preview

DOI/Official URL: https://doi.org/10.20517/ir.2025.03

Abstract

Thermal infrared (TIR) images remain unaffected by variations in light and atmospheric conditions, which makes them extensively utilized in diverse nocturnal traffic scenarios. However, challenges pertaining to low contrast and absence of chromatic information persist. The technique of image colorization emerges as a pivotal solution aimed at ameliorating the fidelity of TIR images. This enhancement is conducive to facilitating human interpretation and downstream analytical tasks. Because of the blurred and intricate features of TIR images, extracting and processing their feature information accurately through image-based approaches alone becomes challenging for networks. Hence, we propose a multi-modal model that integrates text features from TIR images with image features to jointly perform TIR image colorization. A vision transformer (ViT) model will be employed to extract features from the original TIR images. Concurrently, we manually observe and summarize the textual descriptions of the images, and then input these descriptions into a pretrained contrastive language-image pretraining (CLIP) model to capture text-based features. These two sets of features will then be fed into a cross-modal interaction (CI) module to establish the relationship between text and image. Subsequently, the text-enhanced image features will be processed through a U-Net network to generate the final colorized images. Additionally, we utilize a comprehensive loss function to ensure the network’s ability to generate high-quality colorized images. The effectiveness of the methodology put forward in this study is evaluated using the KAIST datasets. The experimental results vividly showcase the superior performance of our CMMF-Net method in comparison to other methodologies for the task of TIR image colorization.

Item Type:	Article
Additional Information:	Indexed by Scopus
Uncontrolled Keywords:	Thermal infrared image colorization; Transformer; Vision and language
Subjects:	T Technology > T Technology (General) T Technology > TA Engineering (General). Civil engineering (General) T Technology > TJ Mechanical engineering and machinery T Technology > TK Electrical engineering. Electronics Nuclear engineering T Technology > TS Manufactures
Faculty/Division:	Faculty of Manufacturing and Mechatronic Engineering Technology
Depositing User:	Mr Muhamad Firdaus Janih@Jaini
Date Deposited:	17 Feb 2025 08:03
Last Modified:	17 Feb 2025 08:03
URI:	http://umpir.ump.edu.my/id/eprint/43832
Download Statistic:	View Download Statistics

Actions (login required)

View Item