Florida International University and University of Miami TRECVID 2009-high level feature extraction Conference

Lin, L, Chen, C, Shyu, ML et al. (2009). Florida International University and University of Miami TRECVID 2009-high level feature extraction .

cited authors

  • Lin, L; Chen, C; Shyu, ML; Fleites, F; Chen, SC

authors

abstract

  • In this paper, the details about FIU-UM group TRECVID2009 high-level feature extraction task submission are presented. Six runs were conducted using different feature sets, data pruning approaches, classification algorithms, and ranking methods. A proportion of TRECVID2009 development data were randomly sampled from the whole development data archives (all TRECVID2007 development data and test data), which include all positive data instances (target-high-level feature data) and partial negative data instances (around one-third non-target-high-level feature data) for each high-level feature. Two strategies dealing with the skipping/not-sure shots were also introduced. First four runs treated the skipping/not-sure data instances as positive instances in the training data (ALL), and the last two runs disregarded these skipping/not-sure data instances from the training data (PURE). • FIU-UM-1: KF+ALL+CB+MCA+RANK, training on partial TRECVID2009 development data with all positive set (ALL) and using key-frame based low-level features (KF), correlation-based pruning (CB), MCA-based classifier (MCA), and ranking method (RANK). The RANK method uses the Euclidean distances of two selected features between each testing data instance and the positive training set as additional scores integrated with the scores from MCA-based classifier to obtain the final ranking scores. • FIU-UM-2: KF+ALL+CB+MCA, training on partial TRECVID2009 development data with all positive set (ALL) and using key-frame based low-level features (KF), correlation-based pruning (CB), MCA-based classifier (MCA), and a ranking process used MCA-based scores from the classifier. • FIU-UM-3: SF+ALL+DB+SB, training on partial TRECVID2009 development data with all positive sets (ALL) and using shot-based low-level features (SF), distance-based pruning (DB), subspace-based classifier (SB), and a ranking process used subspace-based scores from the classifier. • FIU-UM-4: SF+ALL+DB+SB+SVMC, training on partial TRECVID2009 development data with all positive set (ALL) and using shot-based low-level features (SF), distance-based pruning (DB), subspace-based classifier (SB), and SVMC ranking method. The SVMC method brings the retrieval results from SVM with chi-square kernel (SVMC) and considers these results as additional scores which are later combined with subspace-based scores to form the final ranking scores. • FIU-UM-5: KF+PURE+CB+MCA+RANK, training on partial TRECVID2009 development data with pure positive set (PURE) and using key-frame based low-level features (KF), correlationbased pruning (CB), MCA-based classifier (MCA), and ranking method (RANK). • FIU-UM-6: SF+PURE+DB+SB, training on partial TRECVID2009 development data with pure positive set (PURE) and using shot-based low-level features (SF), distance-based pruning (DB), subspace-based classifier (SB), and a ranking process used subspace-based scores from the classifier. In the TRECVID2009 high-level feature extraction task submission, we are able to improve the framework in several ways. First, more key-frame based visual features (513) were extracted in addition to the 28 old shot-based features, and different normalization methods were applied. Second, all development data (219 videos) and testing data (619 videos) were processed. Third, a key-frame detection algorithm was implemented to extract the key-frames from testing videos, which are not provided by TRECVID. Fourth, different data pruning methods were proposed to solve the data imbalance issue, and from other experimental results, our proposed methods performs well on removing noisy data and selecting the typical positive and negative data instances. Fifth, two new classifiers were proposed in our framework rather than using the existing classifiers like Support Vector Machine, Decision Tree, etc. Finally, in addition to concept detection, we are able to extend our framework to the area of video retrieval. In other words, we proposed several scoring methods to rank the retrieved results. However, we are still facing a lot of challenges. First, as can be seen from the description of each run, three runs by utilizing the CB+MCA model were trained by the key-frame based low/mid-level visual features. By adding some low-level audio features, the extraction performance for some highlevel features would be improved, such as person-playing-a-musical-instrument, people-dancing, and singing. Similarly, more visual features would help the runs trained only by the shot-based feature data. Therefore, how to integrate the audio features with the key-frame based features and add more visual features with shot-based features need to be done. Second, to solve the data imbalance problem, the negative data instances were first randomly sampled. This is very risky since by doing this, the difference of the distribution of the training set and testing set could be enlarged. Then even the training performance is pretty good as in our experiments, the testing results may not be as good as expected. Therefore, more investigations on data sampling and data pruning should be considered. Third, from the results we could see that the ranking methods are not good enough. More research on ranking the retrieved results should be studied.

publication date

  • January 1, 2009