FlyAI小助手

  • 3

    获得赞
  • 85873

    发布的文章
  • 0

    答辩的项目

DSAT-Net: Dual Spatial Attention Transformer for Building Extraction from Aerial Images

作者: Zhechun Wan

作者邀请

论文作者还没有讲解视频

邀请直播讲解

您已邀请成功, 目前已有 $vue{users_count} 人邀请!

再次邀请

Both local and global context dependencies are essential for building extraction from remote sensing (RS) images. Convolutional Neural Network (CNN) can extract local spatial details well but lacks the ability to model long-range dependency. In recent years, Vision Transformer (ViT) have shown great potential in modeling global context dependency. However, it usually brings huge computational cost, and spatial details can not be fully retained in the process of feature extraction. To maximize the advantages of CNNs and ViTs, we propose DSAT-Net, which combine them in one model. In DSAT-Net, we design an efficient Dual Spatial Attention Transformer (DSAFormer) to solve the defects of standard ViT. It has a dual attention structure to complement each other. Specifically, the global attention path (GAP) conducts a large scale down sampling of the feature maps before the global self-attention computing, to reduce the computational cost. The local attention path (LAP) uses efficient stripe convolution to generate local attention, which can alleviate the loss of information caused by down-sampling operation in the GAP and supplement the spatial details. In addition, we design a feature refining module called Channel Mixing Feature Refine Module (CM-FRM) to fuse low-level and high-level features. Our model achieved competitive results on three public building extraction datasets. Code will be available at: https://github.com/stdcoutzrh/BuildingExtraction.

文件下载
本作品采用 知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,转载请附上原文出处链接和本声明。
本文链接地址:https://flyai.com/paper_detail/70072
讨论
500字
表情
发送
删除确认
是否删除该条评论?
取消 删除