Recently, remote sensing image captioning (RSIC) has gained significant attention in the remote sensing community. Due to the significant differences in spatial resolution of remote sensing images, ...
Abstract: The effective modal fusion and perception between the language and the image are necessary for inferring the reference instance in the referring image segmentation (RIS) task. In this ...