Image Augmentation
transforms.RandomAffine
transforms.RandomAffine is a function that applies a random affine transformation to an image. Affine transformations can translate, rotate, scale, or distort an image. The figure below shows the result of translating an image vertically and horizontally. When affine transforms are used during AutoEncoder training, they allow for the expression (extraction) of object position information as image features, allowing for appropriate reconstruction even for untaught positions.
transforms.RandomVerticalFlip
transforms.RandomVerticalFlip is a function that randomly flips the input image vertically to increase data diversity.
transforms.RandomHorizontalFlip
transforms.RandomHorizontalFlip is a function that randomly flips the input image horizontally and can be combined with RandomVerticalFlip
to improve the generalization performance of the model.
transforms.ColorJitter
transforms.ColorJitter is a function that applies random color transformations to an input image, allowing adjustments to its brightness, contrast, saturation, and hue. The figure below illustrates the effects of such transformations.
GridMask
GridMask is a method used to increase the diversity of the training data by masking parts of the image with a grid-like pattern1. As shown in the figure below, this technique aims to improve the generalization performance of the model by training it on image data where certain parts are missing. When applied to a SARNN model, the missing parts of the image do not attract attention, allowing the model to learn spatial attention, which is crucial for motion prediction. The source code for GridMask is available here.
-
Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia. Gridmask data augmentation. arXiv preprint arXiv:2001.04086, 2020. ↩