FlyAI小助手

  • 3

    获得赞
  • 85873

    发布的文章
  • 0

    答辩的项目

Imperio: Language-Guided Backdoor Attacks for Arbitrary Model Control

作者: Ka-Ho Chow

作者邀请

论文作者还没有讲解视频

邀请直播讲解

您已邀请成功, 目前已有 $vue{users_count} 人邀请!

再次邀请

Revolutionized by the transformer architecture, natural language processing (NLP) has received unprecedented attention. While advancements in NLP models have led to extensive research into their backdoor vulnerabilities, the potential for these advancements to introduce new backdoor threats remains unexplored. This paper proposes Imperio, which harnesses the language understanding capabilities of NLP models to enrich backdoor attacks. Imperio provides a new model control experience. It empowers the adversary to control the victim model with arbitrary output through language-guided instructions. This is achieved using a language model to fuel a conditional trigger generator, with optimizations designed to extend its language understanding capabilities to backdoor instruction interpretation and execution. Our experiments across three datasets, five attacks, and nine defenses confirm Imperio's effectiveness. It can produce contextually adaptive triggers from text descriptions and control the victim model with desired outputs, even in scenarios not encountered during training. The attack maintains a high success rate across complex datasets without compromising the accuracy of clean inputs and also exhibits resilience against representative defenses. The source code is available at \url{https://khchow.com/Imperio}.

文件下载
本作品采用 知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,转载请附上原文出处链接和本声明。
本文链接地址:https://flyai.com/paper_detail/86894
讨论
500字
表情
发送
删除确认
是否删除该条评论?
取消 删除