Ah-Pine, J., et al. “Crossing Textual and Visual Content in Different Application Scenarios.” Multimedia Tools and Applications 42 1 (2009): 31-56.
This article is quite outside the scope of my research and is bordering on irrelevant to it. The article discusses two approaches to text-image information processing in the multimodal scenario. In doing so, the paper is rather thick with formulas and coding to create these methods by which multimodal documents can be automatically scanned and various types of information (text, image, video, audio, etc.) can be extracted and coded.
However, I draw on this article for a few points that the authors address about our current state of multimodality on the Web and about how we now think differently about the interaction of visual (image and video) and text.