A key characteristic of video data is their associated spatial and temporal semantics. It is therefore important that a video model, does not only model the characteristics of objects but also their relationships in time and space. In view of this, the purpose of our work is to design a model for the specification of the spatio-temporal relationships among objects in video sequences. The model does not only describe the spatial relationships among objects for each frame in a given video scene, but also the temporal relations of the spatial relationships for each frame in terms of the relations between two temporal intervals. Subsequently it models the temporal composition of an object, which reflects the evolution of object spatial relationships over the subsequent frames in the video segment, as well as in the whole video sequence. In addition to the spatio-temporal relationships our model also provides an effective and expressive way for the complete and precise representation of distances among objects in digital video. This model is a basis for the annotation of raw video.