Microscopy Analysis and Material Characterization - Wavelet-Augmented Vision Transformers for High-Resolution, Low-Data-Regime Segmentation of Semiconductor Structures
Microscopy Analysis and Material Characterization - Wavelet-Augmented Vision Transformers for High-Resolution, Low-Data-Regime Segmentation of Semiconductor Structures
Wednesday, November 19, 2025
Summary:
Accurate segmentation of cross-sectional semiconductor STEM images is crucial for device metrology but is challenged by high-resolution requirements and limited labeled data. We present a wavelet-augmented vision transformer pipeline that combines unsupervised pretraining with Swin-MAE and efficient finetuning using a SegFormer decoder. By stacking discrete wavelet transform (DWT) subbands with original images, we incorporate physics-informed priors that enhance boundary detection. The Swin architecture's hierarchical attention allows scalable processing of 2k×2k micrographs, critical for delineating transistor features. Our approach achieves high pixel-level accuracy with as few as 1–5 labeled images, reducing annotation needs by an order of magnitude and training time by 2–3× compared to ViT or UNet baselines. This pipeline enables rapid adaptation to evolving semiconductor process nodes, supporting faster yield optimization and process control.
Accurate segmentation of cross-sectional semiconductor STEM images is crucial for device metrology but is challenged by high-resolution requirements and limited labeled data. We present a wavelet-augmented vision transformer pipeline that combines unsupervised pretraining with Swin-MAE and efficient finetuning using a SegFormer decoder. By stacking discrete wavelet transform (DWT) subbands with original images, we incorporate physics-informed priors that enhance boundary detection. The Swin architecture's hierarchical attention allows scalable processing of 2k×2k micrographs, critical for delineating transistor features. Our approach achieves high pixel-level accuracy with as few as 1–5 labeled images, reducing annotation needs by an order of magnitude and training time by 2–3× compared to ViT or UNet baselines. This pipeline enables rapid adaptation to evolving semiconductor process nodes, supporting faster yield optimization and process control.