A picture is considered to have a significant role in facilitating the students' English learning. Thus, understanding the picture represented in an EFL textbook considered one of the sources through which the students learn the English materials is crucial. Nevertheless, to the best of the writers' knowledge, few studies investigating the interrelations between visual and verbal text focusing on the learning material of acrostic poems were found. Thereby, this study aimed to scrutinise the interrelations of the visual-verbal text concerning acrostic poems. To that end, a qualitative research method using Systemic Functional Multimodal Discourse Analysis (SF-MDA) was employed to investigate the analysis units, i.e., the acrostic poems included the images of a primary level EFL textbook. Besides, analysis units were investigated based on the relative status and logico-semantics relations and grammar of visual design deriving from systemic functional linguistics. The findings revealed that the visual image could be construed through ideational, interpersonal, and textual/compositional meanings. Besides, there are some interrelations in a certain extent and fashions between the pictures and the verbal text, indicated by the relative status and logico-semantics comprising independent and complimentary equal status with exposition, exemplification, and extension. In summary, the trinocular meanings, i.e. ideational, interpersonal, and compositional meanings, along with the interactions between the visual image and verbal texts, the pictures are considered to have significant roles in assisting the readers/viewers in understanding the poems due to such an interaction built by the images and verbal texts exist.