Ambiguous Images With Human Judgments for Robust Visual Event Classification-FlyAI

Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data.

当代视力基准主要考虑人类可以达到近乎完美的性能。然而，人类经常出现在使用他们无法100%确定分类的可视数据，以及模型在标准视觉基准上进行培训时，在以下方面进行评估时性能较低这些数据。为了解决这个问题，我们介绍了一个创建并用它来产生Squid-E(“Squidy”)，一个从视频中提取的噪声图像的集合。所有图像都使用基本真实值和测试集以人类不确定性为注解判断力。我们使用这个数据集来描述人类视觉上的不确定性任务和评估现有的视觉事件分类模型。实验结果表明，现有的视觉模型还不足以为模糊图像及其数据集提供有意义的输出自然界可以通过模型训练和改进来评估和改进这些模型模型校准的直接评估。这些发现推动了大规模的不明确的数据集创建和专注于噪声视觉数据的进一步研究。

Ambiguous Images With Human Judgments for Robust Visual Event Classification

作者邀请

论文代码

关联比赛