On Device Single Image Super Resolution for iOS

11-767 On Device Machine Learning Semester Project

Modern mobile devices offer impressive camera capabilities, but real-time image enhancement still relies heavily on cloud processing, raising concerns about privacy, latency, and battery usage. This project addresses these issues by adapting the SwinIR model for single image super resolution (SISR) to run entirely on-device on the iPhone 16 Pro Max. The goal was to create a fast, energy-efficient, and privacy-preserving solution for image enhancement using a compressed, transformer-based model.

We began with the SwinIR model and explored multiple model compression techniques to fit the constraints of mobile hardware. I led the effort to transition from PyTorch-based quantization to CoreML-compatible workflows, leveraging mixed precision and FP16 strategies aligned with Apple’s GPU and Neural Engine. I implemented structured pruning of attention heads using L1-norms, and combined it with CoreML-based quantization to create a hardware-optimized model for iOS. The pipeline was benchmarked in terms of latency, FLOPs, energy usage, and image quality metrics like PSNR.

Images from: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

The final model achieved substantial improvements in runtime and energy efficiency without compromising visual quality. Structured pruning preserved PSNR and qualitative performance better than unstructured approaches and, in the lightweight SwinIR variant, even outperformed the baseline in both visual fidelity and quantitative accuracy. CoreML quantization provided smoother integration and superior performance compared to PyTorch alternatives.

This project demonstrated that with careful pruning and quantization, transformer-based models like SwinIR can be successfully deployed on mobile hardware. It highlighted the importance of structured pruning and hardware-aware quantization strategies for preserving image quality in resource-constrained environments. The work contributes to real-world on-device AI by enabling efficient, private, and high-quality image enhancement without cloud dependency.