Learning Superclass Representation via Semantic Reconstruction

Zeyu Gan, Suyun Zhao, Jinlong Kang, Hong Chen, Cuiping Li

Abstract

This study explores the image representation learning problems with super coarse-grained annotations, i.e., superclass learning. A novel learning framework, SCLRE, is then proposed, which contributes to extracting superclass-aware representations by leveraging self-attention techniques across the instances and reconstructing the semantics. By theoretical analysis, we reveal that a well-aligned representation space is essential for superclass learning and the generalization error of SCLRE can be bounded by attention constraints. By empirical validation, extensive experiments verify that SCLRE outperforms the baseline and other contrastive-based methods on CIFAR-100 datasets and four high-resolution datasets.

1. Introduction & Motivation

Intro

Figure 1: An illustration of superclass problem.

In real-world applications, the criteria for image classification are often determined by human cognition rather than the features of the images themselves. In some scenarios, due to overly coarse-grained criteria, a category of images may contain various subclasses, resulting in a lack of common semantic features even among images belonging to the same class. For example, in the context of waste classification, images of recyclable waste include a wide range of items, from beverage cans to books, with no apparent commonality. This article defines this phenomenon as the Superclass Learning problem.

 Characteristics of Superclass Learning

  • Subclasses within a superclass are usually scattered and share few common features.
  • Instances from different superclasses may have common features, leading to potential confusion between classes.

 Challenges of Superclass Learning

  • Breaking the original basic class decision boundaries(Fig 1.b): It is necessary to divide the domain into smaller, more meaningful domains that better represent the actual problem (e.g., splitting the apple domain into fruit and toy domains).
  • Reconstructing decision boundaries at the superclass level(Fig 1.c): The goal is to merge relevant class domains into a new superclass domain that accurately represents the intended classification (e.g., combining fruit apples, eggs, and bones into a food waste superclass domain).

 Contribution of this paper

  • We propose the under-study but realistic problem, superclass learning.
  • We propose a novel representation enhancement method (SCLRE) to address superclass learning.
  • We perform extensive experiments to demonstrate that SCLRE outperforms other classification techniques.
  • We conduct extensive analysis about the isotropy of superclass learning and extend SCLRE into semi-supervised situations.

2. Framework of SCLRE

result

Figure 2: Overview of the process of SCLRE.

SCLRE processes the images in the following steps. The images first generate their representations through a convolutional neural network, then mix with each other in a trainable cross-instance attention module for enhancement. After enhancement, the representations are then adjusted according to their superclass labels and the target anchors.

Enhancement

  • We calculate the attention weights by putting the batch images into a cross-instance attention module.
  • We use the attention weights to mix the representations with each other.

Adjustment

  • We adjust the enhanced representations under the guidance of a supervised contrastive loss.
  • We also preset target anchors for each superclass to help the adjustment process.

More details about the enhancement and adjustment process are shown in our paper (Sec.4).

3. Experiment Result

3.1 Accuracy

Table 1&2: Accuracy on Multiple Datasets.

result
We reorganize the raw classes of these datasets and make them superclass problems. The classification results show that SCLRE outperforms the SOTA representation learning framework.

3.2 Visualization

result

Figure 3: Visualization of Representation Space.

We use t-SNE to visualize the representation space of each compared method. The result shows that SCLRE can effectively extract superclass-aware representations and make the boundaries of each superclass more clear. More experiment results like ablation, sensitivity, and robustness are shown in our paper.

4. Analysis of Generalization

According to the theory of contrastive learning, we analyze the generalization upper bound of SCLRE. The details of the proof are shown in the paper and appendix. To draw a conclusion, The generalization error can be bounded by the similarity of attention vectors from the same superclass in the form of the following equation:
result
During the training process, for the samples of the same superclass under the same task, the SCLRE training process can help to discover samples that are more important to the construction of the superclass structure, and they should also be similar to the samples in the same superclass. Therefore, the sample pairs in O2 can naturally have high similarity attention vectors, thereby reducing the upper bound of the generalization error, which is consistent with the phenomenon we observed in the experiment.

5. Extension to Semi-Supervised Learning

We further extend SCLRE to semi-supervised learning. By applying self-supervised pretraining and a strict threshold, we get reliable new generated pseudo-labels. Then we use the pseudo-labels to train the model in a semi-supervised way. For convenience, we call this method SS-SCLRE. The experiment results show that SS-SCLRE can achieve better performance and alleviate performance decay.
result

Figure 4: Results about the Semi-Supervised Extension.

The details of the extension are shown in the paper.

BibTex

Recommended citation(for the conference version):
Zeyu Gan, Suyun Zhao, Jinlong Kang, Liyuan Shang, Hong Chen, and Cuiping Li. Superclass learning with representation enhancement. In Proceedingsof the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 24060-24069, June 2023.