Who Decides What Should Be Detected? Challenges in OSOD Research and Beyond

Who Decides What Should Be Detected? Challenges in OSOD Research and Beyond

       

The Challenge of Unknown Objects in Detection

Recent advances in object detection have enabled models to accurately detect and classify known objects in images. However, in real-world applications, detectors frequently encounter objects that do not belong to any class seen during training. Addressing such “unknown” objects has become a key challenge, leading to the emergence of the research area known as Open-Set Object Detection (OSOD). OSOD aims not only to detect known objects but also to handle unknown ones appropriately—for example, by recognizing and flagging them as “unknown.”

At first glance, OSOD seems closely related to Open-Set Recognition (OSR), a classification task where the goal is to recognize known classes and reject unknown inputs. However, object detection fundamentally differs in that it requires a prior notion of what to detect. This prerequisite clashes with the idea of detecting arbitrary unknown objects. Thus, OSOD poses unique challenges, especially when it comes to defining what counts as an object and how to evaluate performance. This paper critically examines these issues and proposes a new and more practical formulation of the OSOD task.

Background: OSOD-I and OSOD-II

The authors categorize previous OSOD work into two formulations. The first, referred to as OSOD-I, focuses solely on detecting known objects accurately while ignoring the presence of unknown objects in the image. The goal here is to avoid misclassifying unknown objects as known classes. This setup is relatively safe and conservative, assuming that unknowns are just distractors to be ignored.

The second, OSOD-II, extends this formulation by requiring the detection of both known and unknown objects. In this setting, the detector must label known objects with their class names and classify any unseen object as “unknown.” This is conceptually attractive but practically difficult. The core issue lies in the ambiguity of “unknown objects”: if anything not known is considered unknown, what distinguishes it from background, noise, or irrelevant regions? Should a car tire be detected separately from the car? Should a blurry object in the background be counted?

Moreover, the evaluation of OSOD-II is problematic. Widely used metrics such as A-OSE (Absolute Open-Set Error) and WI (Wilderness Impact) only capture specific misclassification patterns and fail to measure overall detection performance—especially for the unknown class. These metrics were originally designed for OSOD-I and are not suited for OSOD-II’s broader goals.

A New Proposal: OSOD-III for Practical Detection

To address these limitations, the paper introduces a new formulation: OSOD-III. In this setting, the detection task is constrained to a predefined super-class, such as “traffic signs” or “animals.” The detector is asked to classify known objects within this super-class and identify unknown ones that also belong to the same super-class. Any object outside this super-class is treated as irrelevant and is not part of the detection target.

This setup resolves the ambiguity of what to detect. Because the super-class is clearly defined, the boundary between “detectable” and “background” becomes much clearer. Additionally, since unknown objects are structurally and visually similar to the known classes (e.g., different bird species or new traffic signs), models have a better chance of learning useful features and generalizing. Most importantly, OSOD-III allows for proper evaluation using standard metrics such as Average Precision (AP) for both known and unknown classes.

In short, OSOD-III transforms the ill-posed goal of OSOD-II into a more practical and solvable problem, while preserving the core challenge of handling unknown instances.

Experimental Setup and Findings

To validate OSOD-III, the authors construct benchmarks using three existing datasets: Open Images (a large-scale general dataset), CUB200 (a fine-grained bird dataset), and MTSD (a traffic sign dataset). For each dataset, they define super-classes (e.g., “animal,” “vehicle,” or “traffic sign”), split them into known and unknown sub-classes, and apply five representative OSOD methods originally designed for OSOD-II. Additionally, they introduce a simple baseline: a standard detector enhanced with an uncertainty-based classifier to distinguish between known and unknown predictions.

The experiments reveal three key findings:

  1. Existing methods underperform under proper metrics: Many OSOD-II methods perform well in terms of A-OSE or WI, but fail to detect unknown objects effectively when evaluated with AP. Surprisingly, their performance is comparable to—or worse than—a simple baseline.
  2. Naive baselines can be strong: Even a basic uncertainty-based detector with no extra training steps performs competitively, especially when known and unknown objects share visual traits within a super-class.
  3. The main difficulty lies in distinguishing known from unknown: Rather than discovering new objects per se, the challenge is correctly classifying them without confusion. Overlapping bounding boxes and poor confidence calibration often cause models to mislabel unknowns as known, and vice versa.

These insights highlight that current OSOD methods are not yet robust enough for practical deployment, especially in safety-critical domains such as autonomous driving.

Significance and Future Directions

The key contribution of this paper lies in its rethinking of the OSOD problem from the ground up. By pointing out the flawed assumptions in the OSOD-II setting and offering OSOD-III as a better alternative, the authors provide a framework that is both theoretically coherent and practically useful.

The OSOD-III setting also aligns well with real-world applications. For example, an ADAS (Advanced Driver-Assistance System) could use it to detect new traffic signs in unfamiliar regions. An insect identification app could flag new species while still recognizing known ones. These use cases illustrate how the OSOD-III formulation enables incremental learning, user feedback loops, and scalable deployment.

Future research could explore more sophisticated ways to define and encode super-class structures, improve feature separation between known and unknown classes, and develop evaluation protocols that go beyond AP to capture safety and reliability aspects.

Publication

Hosoya, Yusuke, Masanori Suganuma, and Takayuki Okatani. “Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy.” International Journal of Computer Vision (2025): 1-25.

@article{hosoya2025rethinking,
title={Rethinking Open-Set Object Detection: Issues, A New Formulation, and Taxonomy},
author={Hosoya, Yusuke and Suganuma, Masanori and Okatani, Takayuki},
journal={International Journal of Computer Vision},
pages={1--25},
year={2025},
publisher={Springer}
}