One paper accepted by IEEE DCC!

Written by  |  27/12/2021 - 09/57

Congrats to Ming! Another milestone for Learnt Image Coding!



Title: Transformer-based Image Compression


Abstract: A  Transformer-based  Image  Compression  (TIC)  approach  is  developed  which  reuses  the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.  Both main and hyper encoders are comprised of a sequence of neural transformation  units  (NTUs)  to  analyse  and  aggregate  important  information  for  more  compact representation  of  input  image,  while  the  decoders  mirror  the  encoder-side  operations  to generate pixel-domain image reconstruction from the compressed bitstream.  Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module(CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors.  The TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard, and requires much less model parameters, e.g., up to 45% reduction to leading-performance LIC.


Source code will be made publicly accessible soon.