Multimedia Grand Challenges

Zero-Shot Multimodal Video Recognition

With the rapid development of mobile Internet, video has attracted more and more attention from Internet users and has become one of the most important media sources of information acquisition. In many real-world scenarios, mainstream video websites still need to manually classify and monitor the uploaded videos, although such an approach is time-consuming and labor-intensive. Therefore, using the techniques of machine learning and artificial intelligence to automatically extract the key semantic information of the video content and recognize the key objects is an urgent technical requirement for the development of Internet video service. However, most of the existing video recognition algorithms rely too much on large-scale labeled training samples, which lacks versatility and extensibility in practical applications.

Based on the above analysis, we organize a new multimodal video recognition challenge which will be held in conjunction with ACM MM ASIA 2019. The challenge provides a new multimodal video dataset GTCOM-OUC-Video. Participants are encouraged to use the multimodal features of the labeled seen classes and external semantic knowledge to recognize unseen classes without any labeled instance. This will help to achieve a zero-shot video understanding algorithm that increases the versatility and extensibility of the conventional model.


  • GTCOM Digital Media & Entertainment Co., Ltd.
  • Institute of Automation of Chinese Academy of Sciences (CASIA)
  • Ocean University of China(OUC)

Fine-grained vehicle footprint recognition

Background: Car tire recognition is an important means in providing clues for criminal case solving and traffic accident management. With the rapid increase in the number of vehicles in use, there is urgent need to develop efficient and automatic car tire recognition system.

Data: About 10,000 tire pattern images, in 63 classes. Each class contains 80 tire surface images and 80 indentation marks images taken at different scales and different angles.

Task: Design a model to learn the transferring relationship between tire surface and tire indentation mark. Given query image as either tire surface or tire indentation mark, the algorithm need to provide high precision in tire pattern retrieval, finding the tire surface and tire indentation mark images of same tire model.


  • Xi’an University of Posts and Telecommunications.
  • Microsoft Research Asia