Superclass Learning with Representation Enhancement

Zeyu Gan, Suyun Zhao, Jinlong Kang, Liyuan Shang, Hong Chen, Cuiping Li  

Abstract

In many real scenarios, data are often divided into a handful of artificial super categories in terms of expert knowledge rather than the representations of images. Concretely, a superclass may contain massive and various raw categories, such as refuse sorting. Due to the lack of common semantic features, the existing classification techniques are intractable to recognize superclass without raw class labels, thus they suffer severe performance damage or require huge annotation costs. To narrow this gap, this paper proposes a superclass learning framework, called SuperClass Learning with Representation Enhancement(SCLRE), to recognize super categories by leveraging enhanced representation. Specifically, by exploiting the self-attention technique across the batch, SCLRE collapses the boundaries of those raw categories and enhances the representation of each superclass. On the enhanced representation space, a superclass-aware decision boundary is then reconstructed. Theoretically, we prove that by leveraging attention techniques the generalization error of SCLRE can be bounded under superclass scenarios. Experimentally, extensive results demonstrate that SCLRE outperforms the baseline and other contrastive-based methods on CIFAR-100 datasets and four high-resolution datasets.

1. Introduction & Motivation

Intro

Figure 1: An illustration of superclass problem.

In real-world applications, the criteria for image classification are often determined by human cognition rather than the features of the images themselves. In some scenarios, due to overly coarse-grained criteria, a category of images may contain various subclasses, resulting in a lack of common semantic features even among images belonging to the same class. For example, in the context of waste classification, images of recyclable waste include a wide range of items, from beverage cans to books, with no apparent commonality. This article defines this phenomenon as the "Superclass Learning" problem.

Characteristics of Superclass Learning

  • Subclasses within a superclass are usually scattered and share few common features.
  • Instances from different superclasses may have common features, leading to potential confusion between classes.

Challenges of Superclass Learning

  • Breaking the original basic class decision boundaries(Fig 1.b): It is necessary to divide the domain into smaller, more meaningful domains that better represent the actual problem (e.g., splitting the apple domain into fruit and toy domains).
  • Reconstructing decision boundaries at the superclass level(Fig 1.c): The goal is to merge relevant class domains into a new superclass domain that accurately represents the intended classification (e.g., combining fruit apples, eggs, and bones into a food waste superclass domain).

Contribution of this paper

  • We propose the under-study but realistic problem, superclass learning.
  • We propose a novel representation enhancement method (SCLRE) to address superclass learning.
  • We perform extensive experiments to demonstrate that SCLRE outperforms other SOTA classification techniques.

2. Framework of SCLRE

result

Figure 2: Overview of the process of SCLRE.

SCLRE process the images in the following steps. The images first generate their representations through a convolutional neural network, then mix with each other in a trainable cross-instance attention module for enhancement. After enhancement, the representations are then adjusted according to their superclass labels and the target anchors.

Enhancement

  • We calculate the attention weights by putting the batch images into a cross-instance attention module.
  • We use the attention weights to mix the representations with each other.

Adjustment

  • We adjust the enhanced representations under the guidance of a supervised contrastive loss.
  • We also preset target anchors for each superclass to help the adjustment process.

More details about the enhancement and adjustment process are shown in our paper (Sec.3).

3. Experiment Result

3.1 Accuracy

Table 1&2: Accuracy on Multiple Datasets.

result
We reorganize the raw classes of these datasets and make them superclass problems. The classification results show that SCLRE outperforms the SOTA representation learning framework.

3.2 Visualization

result

Figure 3: Visualization of Representation Space.

We use t-SNE to visualize the representation space of each compared mathod. The result shows that SCLRE can effectively scratch superclass-aware representations and make the boundaries of each superclass more clear.
More experiment results like ablation, sensetivity and robustness are shown in our paper.

4. Analysis of Generalization

According to the theory of contrastive learning, we analyze the generalization upper bound of SCLRE.The details of the proof are shown in the paper and appendix. To draw a conclusion, The generalization error can be bounded by the similarity of attention vectors from the same superclass in the form of the following equation:
result
During the training process, for the samples of the same superclass under the same task, the SCLRE training process can help to discover samples that are more important to the construction of the superclass structure, and they should also be similar to the samples in the same superclass. Therefore, the sample pairs in O2 can naturally have high similarity attention vectors, thereby reducing the upper bound of the generalization error, which is consistent with the phenomenon we observed in the experiment.

5. BibTex

@InProceedings{Gan_2023_CVPR, author = {Gan, Zeyu and Zhao, Suyun and Kang, Jinlong and Shang, Liyuan and Chen, Hong and Li, Cuiping}, title = {Superclass Learning With Representation Enhancement}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {24060-24069} }