FlyAI小助手

  • 3

    获得赞
  • 85873

    发布的文章
  • 0

    答辩的项目

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection

作者: Hao Sun

作者邀请

论文作者还没有讲解视频

邀请直播讲解

您已邀请成功, 目前已有 $vue{users_count} 人邀请!

再次邀请

Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature extraction and feature interaction, achieving good performance. Nevertheless, these approaches underutilize the reciprocal relationship between two tasks. In this paper, we propose a task-reciprocal transformer based on DETR (TR-DETR) that focuses on exploring the inherent reciprocity between MR and HD. Specifically, a local-global multi-modal alignment module is first built to align features from diverse modalities into a shared latent space. Subsequently, a visual feature refinement is designed to eliminate query-irrelevant information from visual features for modal interaction. Finally, a task cooperation module is constructed to refine the retrieval pipeline and the highlight score prediction process by utilizing the reciprocity between MR and HD. Comprehensive experiments on QVHighlights, Charades-STA and TVSum datasets demonstrate that TR-DETR outperforms existing state-of-the-art methods. Codes are available at \url{https://github.com/mingyao1120/TR-DETR}.

文件下载
本作品采用 知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,转载请附上原文出处链接和本声明。
本文链接地址:https://flyai.com/paper_detail/86895
讨论
500字
表情
发送
删除确认
是否删除该条评论?
取消 删除