ACM Multimedia 2022 Workshop on

Multimedia Understanding with Pre-trained Models

December 13 - 16, 2022 Tokyo, Japan

WORKSHOP OVERVIEW

Multi-modal understanding plays a crucial role in enabling the machine to perceive the physical world with multiple sensor cues as humans. Recently, large-scale pre-trained models (PTMs) has become a research hotspot in the field of artificial intelligence. Existing techniques follow the self-supervised learning paradigm achieve great success on the uni-modal scenes, such as computer vision (CV) and natural language process (NLP). The recent advances in large-scale pre-trained models inspire the researchers to explore more and deeper pre-training techniques for the multi-modal understanding problem. In this workshop, we aim to bring together researchers from the field of multimedia to discuss recent research and future directions on pre-trained models with self-supervised learning for multimedia understanding.

In recent years, we have witnessed the great success of pre-trained models (PTM) in natural language processing (NLP), such as GPT3, BERT, Roberta, DEBERTA, etc. It motivates the researchers in the multimedia community to leverage the idea of PTM to address multi-modal tasks. The scope of this workshop is focused on pre-trained models with self-supervised learning for multimedia understanding. The potential topics include architecture design for multi-modal PTM, pre-text task design for self-supervised learning, multi-modal data modeling, efficiency enhancing for PTM, interpretability of PTM, etc.

Call for Papers

Multi-modal understanding plays a crucial role in enabling the machine to perceive the physical world with multiple sensor cues as humans. Recently, large-scale pre-trained models (PTMs) has become a research hotspot in the field of artificial intelligence. Existing techniques follow the self-supervised learning paradigm achieve great success on the uni-modal scenes, such as computer vision (CV) and natural language process (NLP). The recent advances in large-scale pre-trained models inspire the researchers to explore more and deeper pre-training techniques for the multi-modal understanding problem. In this workshop, we aim to bring together researchers from the field of multimedia to discuss recent research and future directions on pre-trained models with self-supervised learning for multimedia understanding.

  • Unified PTM strategies for multi-modal understanding
  • PTM for cross-modal matching and retrieval
  • PTM for audio-visual understanding
  • PTM for video captioning
  • PTM for sign language translation
  • Leveraging off-the-shelf PTM for multi-modal understanding
  • Interpretability in self-supervised PTM

Submission Guidelines

Paper format: Submitted papers (.pdf format) must use the ACM Article Template. Please remember to add Concepts and Keywords. Please use the template in traditional double-column format to prepare your submissions. For example, word users may use Word Interim Template, and latex users may use sample-sigconf template.
Length: Papers must be no longer than 6 pages, including all text and figures. There is no limit to the pages of references.
Blinding: Paper submissions must conform with the “double-blind” review policy. This means that the authors should not know the names of the reviewers of their papers, and reviewers should not know the names of the authors. Please prepare your paper in a way that preserves anonymity of the authors. Do not put the authors' names under the title. Avoid using phrases such as “our previous work” when referring to earlier publications by the authors. Remove information that may identify the authors in the acknowledgments (e.g., co-workers and grant IDs). Check supplemental material (e.g., titles in the video clips, or supplementary documents) for information that may identify the authors' identities. Avoid providing links to websites that identify the authors. Papers without appropriate blinding will be desk rejected without review.

Important Dates:
Submission deadline: October 9, 2022 (AoE)
Decision notification: October 23, 2022 (AoE)

Invited Speakers

To be added.

Organizers

Wengang Zhou

Ph.D, Professor

EEIS Department, University of Science and Technology of China
Email: zhwg@ustc.edu.cn

Jiaxin Shi

Ph.D, Senior Researcher

Huawei Cloud Computing Technologies Co., Ltd.
Email: shijiaxin3@huawei.com

Lingxi Xie

Ph.D, Senior Researcher

Huawei Cloud Computing Technologies Co., Ltd.
Email: 198808xc@gmail.com