Ambiguous Images With Human Judgments for Robust Visual Event Classification
基于人类判断的模糊图像稳健视觉事件分类
							
						    来自arXiv
                            2022-10-08 01:41:32
                        
Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data.
当代视力基准主要考虑人类 可以达到近乎完美的性能。然而,人类经常出现在 使用他们无法100%确定分类的可视数据,以及模型 在标准视觉基准上进行培训时,在以下方面进行评估时性能较低 这些数据。为了解决这个问题,我们介绍了一个创建 并用它来产生Squid-E(“Squidy”),一个 从视频中提取的噪声图像的集合。所有图像都使用 基本真实值和测试集以人类不确定性为注解 判断力。我们使用这个数据集来描述人类视觉上的不确定性 任务和评估现有的视觉事件分类模型。实验 结果表明,现有的视觉模型还不足以 为模糊图像及其数据集提供有意义的输出 自然界可以通过模型训练和改进来评估和改进这些模型 模型校准的直接评估。这些发现推动了大规模的 不明确的数据集创建和专注于噪声视觉数据的进一步研究。
论文代码
关联比赛
本文链接地址:https://flyai.com/paper_detail/8620
 
			


