Text-image combinations constitute one of the most traditional areas of concern where multimodality is recognised. Studies of text-image combinations, and of the various properties of ‘text’ and ‘image’ that might support or hinder such combinations, go back to classical times. However, many of the difficulties that arise when attempting to put the study of text-image combinations on a firmer basis – be that basis theoretical, empirical or philosophical – arise because of inappropriate or insufficient theoretical accounts of the nature of multimodality itself. Whenever multimodality is related too specifically to sensory modalities, the text-image division collapses since both are typically visual. Whenever multimodality is seen as a purely ‘semiotic’ endeavour (i.e., traditionally separated from materiality), the particular benefits and reasons for employing text-image combinations become unclear. In this talk, I set out the general theory of multimodality that we have been developing over the past decade, showing how it takes up and makes more usable several basic semiotic notions such as iconicity and indexicality. Within the theory, materiality and the conventionalised communicative uses made of any materiality are combined within a specific re-definition of the notion of ‘semiotic modes’. In many respects, the theory I describe can be considered as a reorientation to Peirce which also draws on more recent sociosemiotics developments in the theory and practice of multimodality, on the one hand, combined with a strong commitment to empirical methods, on the other. Examples will be drawn from various media involving graphic design and visual communication more generally, suggesting methodologies by which further research may proceed with a surer footing.