Skip to content

icon picker
DG




Copy of Table
Paper Name
Categories
Type
Recent issues
Motivations
Contributions
Target
Evaluation List
Conf./Jour.
Year
Link
Exploiting Domain-specific Features to Enhance DG
Only consider domain-invariant
Ignore domain-specific
Extend beyond invariance view
Disentangle and joint learning
Confirms that domain-specific is essential.

Preserve domain-invariant info.
t-NSE to visualize the features → domain-invariant still makes mistake.

A simple feature augmentation for DG
Relying on image-space data aug.
Limited data diversity.
Require careful augmentation.
Where to add SFA?
Noise type for SFA.
Hyper-parameters.
Visualize using t-SNE.
Incorporating methods.
Domain-invariant Disentangled Network for Generalizable Object Detection
Object detection has seldom being explored.
Effectiveness of each component.
Hyper-parameters.
Visualization (for fun/ no meaning).
Cross-domain Semantic Segmentation via Domain-invariant Iterative
Domain Generalization via Entropy Regularization
Can only guarantee features have invariant marginal distributions.
invariance of conditional distributions more important.
Ensure conditional invariance → entropy regularization.

Different weighting factors.
Deeper Network.
Class imbalance.
Feature visualization (t-SNE).
Model-based Domain Generalization
Capture inter-domain variation.

1st learn transformation map data → enforce invariance.
Re-formulate the domain generalization problem → semi-infinite constrained optimization problem.
Learning to learn single Domain Generalization
Only 1 source domain, many unseen domains.
Leverage adversarial training.
Create fictitious, challenging data.
Use meta-learning scheme.
Wasserstein Autoencoder (WAE).

Features (t-SNE) visualization.
Hyper-parameters tuning.
Loss function validation.
Meta vs. Without Meta.
Domain Generalization with Mixstyle
Where to apply?
Mixing vs. Replacing
Random vs. fixed shuffle at multiple layers.
Hyper-parameters.
Learning to diversify for Single Domain Generalization
Visualize (t-SNE) target features.
Hyper-parameters.
Gradient matching for Domain Generalization
Tracking GIP.
Random Grouping → domains show no shifting → no focus on learning matching? → bigger domain shift, better Fish.
Hyper-parameters.
Ablation on pretrained-models.
A Fourier-based Framework for Domain Generalization
Phase component → high-level semantics
Magnitude component → low-level semantics
Fourier-based data augmentation.
Co-teacher regularization.

Different components impact: AM, a2o_co-teacher, o2a_co-teacher, Teacher (turn on/off components).
Other choice of Fourier-based data augmentation (AM vs. AS).

Progressive Domain Expansion Network for Single Domain Generalization.
Limited generalization performance gains
Lack appropriate safety and effectiveness constraints.

Domain expansion network.
Generated domain → progressively expanded.
Contrastive learning → learn cross-domain invariant representation.
Visualize (t-SNE) feature space.
Tuning hyper-parameters.

SWAD: Domain Generalization by seeking flat minima
Simply minimizing ERM / complex, non-convex loss landscape → not sufficient.
Flat minima leads to robustness against the loss landscape shift.
Use dense stochastic weight averaging (D-SWA) → make the loss landscape flatter.

Local flatness analysis
Loss surface visualization
Validation accuracy/rounds
Different components.
Causality Inspired representation learning for DG
Remove components.
Visualize attention map
Independence of causal representation.
Representation Importance (
)
Hyper-parameter sentivity.
SelfReg: Self-supervised Contrastive Regularization for Domain Generalization
Require sampling of the negative data pair.
CL performance depends on quality/quantity of negative data pairs.
Only use positive data pairs → resolve problems caused by negative data pair sampling.
Self-supervised Contrastive Learning.
Class-specific domain perturbation layer → apply mixup augmentation (only positive pairs are used).
Visualize (t-SNE) the latent spaces
Different dissimilarity losses (logot only / feature only).
Use to visualize where network focuses on.
Removing each components (Losses, Mixup, CDPL, SWA, IDCL).
C-Mixup: Improving Generalization in Regression
Systematic analysis of mixup in regression remained unexplored.
Can result in incorrect labels.
Adjust the sampling probability based on the similarity of the labels.
Mixing on input data and label.
c
Generalization gap / Epochs.
Pair-wise divergence (averaging over class / domains).
Compatibility of C-Mixup (Integrate with other algorithms).
C-Mixup vs. other distance metrics.
Difference hyper-parameters (e.g., bandwidth).
Instance-Aware Domain Generalization for Face Anti-Spoofing
Artificial domain labels are coarse-grained and subjective, which cannot reflect real domain accurately.
Focus on domain-level alignment, not fine-grained enough to ensure that learned representations are insensitive to domain-style.
Align features on instance-level.

Dynamic Kernel Generator
Categorical Style Assembly.
Asymmetric Instance Adaptive Whitening.
Remove components.
Different losses (replace).
Different style augmentation.
Different kernel designs.
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening.
Collecting multi-domain dataset is costly and labor-intensive.
Performance highly depends on the number of source datasets.
Exploit instance normalization layers → feature covariance contains domain-specific style such as texture and color.
Whitening transformation removes feature correlation and makes each feature have unit variance → eliminates domain-specific style information → may improve, but not fully explored DG

Instance selective whitening.
Whitening loss.
Disentangled Prompt Representation for Domain Generalization
PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization
Data of source and target domain are not accessible.
Only target task definition is given.

Large-scale vision language could shed light on this challenging source-free domain generalization.

Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
2 existing approaches for single domain generalization: data augmentation + feature disentanglement. Those methods mainly focus on static network.
Static networks lack the capability to dynamically adapt to the diverse variations in different visual scenes, which limits the representation power of the models.
Each image may have its unique characteristics (e.g., variations in lighting conditions, object appearances, scene structures).
Object-centric representations robust to variations in appearance, context, scene complexity.

Dynamic Learning approach for Single Domain Generalization.
A prompt-based object-centric gating module is designed to perceive object-centric features of objects.
Leverage multi-modal features of CLIP (prompts describe different domain scenes).
Slot-Attention multi-modal fusion module → fuse the linguistic/visual features → extract effective object-centric representations.
→ Generate the gating masks → dynamically select relevant object-centric features to improve generalization ability.
Disentangled Prompt Representation for Domain Generalization
Large-scale pre-trained models greatly enhance domain generalization.
Pre-trained Visual Foundation Model (VFM): trained by utilizing large-scale (image, text) pairs → rich in semantic information of prior knowledge.
VFMs are able to encode semantic meanings of visual descriptions (regardless of styles).
Fine-tuning pre-trained foundation models with new datasets → achieve better results on downstream tasks with few training samples.

ISSUES:
Existing prompt tuning methods tune the foundation model to generate domain and task-specific features, whereas domain generalization requires the model to generate domain-invariant features that works well across different unseen domains → Crucial to develop prompts that can guide the foundation model in disentangling invariant features across all domains.
Fully leverage a distinctive aspect of VFM (controllable and flexible language prompt).
Text prompt plays a vital role → guide the disentanglement of image feature.
Text modality in VFM can be more easily disentangled (rich in semantic information and interpretable).

prompt tuning framework for DG with LLM-assist text prompt disentanglement + text-guided visual representation disentanglement model.
Domain-invariant + domain-specific descriptions are first generated with LLM (for prompt tuning to learn disentangled textual features).
Learned disentangled textual features → guide the learning of domain-invariant and domain-specific visual features.
To classify images from unseen domains → leveraging domain-specific knowledge from similar seen domains is essential → domain-specific prototypes will be selected for images from different unseen domain.
STYLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization
Unknown Prompt, the only Lacuna: Unveiling CLIP’s Potential for Open Domain Generalization
Key research gaps in using CLIP for Open DG (Unseen domain may contains new labels/categories):
Prompt design:
Multi-class classification over one-against-all recourse for ODG.
Domain-agnostic visual embeddings.
Unify the classification of known classes and outliers using CLIP → unknown-class prompt.
Gather training data → generate pseudo-open images that semantically distinct from existing categories → opt to pre-trained conditional diffusional model.

Learning Domain Invariant Prompt for Vision-Language Models
Towards Principled Disentanglement for Domain Generalization
Spurious correlation.
First, diversify the inter-class variation = modeling potential seen/unseen variations.
Then, disentangle constrained DG.
Principled constrained learning formulation based on disentanglement → theoretical guarantees on empirical duality gap.
Promotes semantic invariance via constrained optimization setup.
Controllable/interpretable data generation.
Towards Unsupervised Domain Generalization
Manually labeled data can be costly or unavailable.
Unlabeled data can be more accessible.
Contrastive learning only learns robust representations against pre-defined perturbation (under IID).
Unsupervised learning discriminative representations.
Select valid source of negative samples according to the similarity among domains.
Big differences: 1) domain-related features → discriminative enough, 2) boost variance across domains.
How unsupervised learning enhances the generalization ability of models:

PCL: Proxy-based Contrastive Learning for Domain Generalization
Using contrastive learning to learn domain-invariant representations.
Positive sample-to-sample pairs hinder model generalisation.
Replace sample-to-sample relations with proxy-to-sample relations.
Limitations. The proxy-based method makes a trade-off between sample-to-sample relations and class-to-sample relations. The model generalization is gained by sacrificing some potential useful semantic relations.
Alleviate the positive alignment issue
Proposed a novel proxy-based contrastive learning
Style Neophile: Constantly Seeking Novel Styles for Domain Generalization
Domain-invariant representation learning by style of its image
Ccurrent style augmentation methods observe from a restricted set of external images or interpolate style of source domain
Synthetic styles generated from both styles of source domain images and previously generated synthetic styles
Method to synthesize novel, diverse and plausible styles during training
Compound Domain Generalization via Meta-Knowledge Encoding
Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization
Unsupervised Domain Generalization by Learning a Bridge Across Domains
Very littles data (images and/or labels) - not sufficient to train the standard Unsupervised Domain Adaptation and Domain Generalisation.
Propose a new concept of learnable BrAD (Bridge Across Domains) - an auxiliary visually “bridge” domain. That is used only during contrastive self-supervised training to learn representations of each domains to the ones for the shared BrAD

CLIP the Gap: A Single Domain Generalization Approach for Object Detection
Performance of object detectors degrades when the test data distribution deviate from the training data one.
Not always possible to obtain target data

How to learn to generalize from a single source dataset

Approach to Single Domain Generalization for object detection.
Use CLIP to guide the training of an object detector => generalise to unseen target domains.
To improve the generalisability of an object detector, domain concepts via text-prompts is used during training (augment the diversity of the learned image features and make them more robust to an unseen target domain)
Mean Average Precision
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization
Features distribution following to Gaussian which cannot be accurately matched and higher computationally complexity.
Proposed to perform Exact Feature Distribution Matching by exactly matching the empirical Histogram Matching ==> Sort-Matching
Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing
The current DG methods almost implement on the complete representation from common modules ==> ignore fully taking adv of subtle properties of global and local image
Style transfers are inefficient in large-scale training
Using two-stream structure to extract content and style features: content info recording some global semantic features and physical attributes. style information: preserve some discriminative info
Proposed a novel Shuffled Style Assembly Network to extract and reassemble different content and style features
Using contrastive learning to emphasize liveness-related style information
Representations of the correct assemblies are used to distinguish between living and spoofing during inferring
DNA: Domain Generalization with Diversified Neural Averaging
There are no rows in this table

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.