Skip to content

Dataloader

EIPL provides a MultimodalDataset class for learning robot motions, which inherits from the Dataset class. This class returns a pair of input data, x_data, and the corresponding true value, y_data, for each epoch. The input data, x_data, consists of pairs of images and joint angles, and data augmentation is applied during each epoch. The input images are randomly adjusted for brightness, contrast, etc. to improve robustness to changes in lighting conditions, while Gaussian noise is added to the input joint angles to improve robustness to position errors. On the other hand, no noise is added to the original data. The model learns to handle noiseless situations (internal representation) from input data mixed with noise, allowing robust motion generation even in the presence of real-world noise during inference.

The following source code shows how to use the MultimodalDataset class with an example of an object grasping task collected by AIREC. By providing 5-dimensional time series image data [number of data, time series length, channel, height, width] and 3-dimensional time series joint angle data [number of data, time series length, number of joints] to the MultimodalDataset class, data augmentation and other operations are performed automatically. Note that the SampleDownloader, which is used to download the sample dataset, is not mandatory. You can use functions like numpy.load or others to load your own datasets directly.

How to use dataloader
from eipl.data import SampleDownloader, MultimodalDataset

# Download and normalize sample data
grasp_data = SampleDownloader("airec", "grasp_bottle", img_format="CHW")
images, joints = grasp_data.load_norm_data("train", vmin=0.1, vmax=0.9)

# Give the image and joint angles to the Dataset class
multi_dataset = MultimodalDataset(images, joints)

# Return input/true data as return value.
x_data, y_data = multi_dataset[1]

The following figure shows the robot camera images returned by the MultimodalDataset class. From left to right, the images show the original image, the image with noise, and the robot joint angles. Random noise is added to the image at each epoch, allowing the model to learn from a variety of visual situations. The black dotted lines represent the original joint angles, while the colored lines represent the joint angles with Gaussian noise.

dataset

Note

If you are unable to obtain the dataset due to a proxy or any other reason, you can manually download the dataset from here and save it in the ~/.eipl/ folder.

```bash            
$ cd ~/
$ mkdir -p .eipl/airec/
$ cd .eipl/airec/
$ # copy grasp_bottle.tar to ~/.eipl/airec/ directory
$ tar xvf grasp_bottle.tar
$ ls grasp_bottle/*
grasp_bottle/joint_bounds.npy
```

dataloader.MultimodalDataset

Bases: Dataset

This class is used to train models that deal with multimodal data (e.g., images, joints), such as CNNRNN/SARNN.

Parameters:

Name Type Description Default
images numpy array

Set of images in the dataset, expected to be a 5D array [data_num, seq_num, channel, height, width].

required
joints numpy array

Set of joints in the dataset, expected to be a 3D array [data_num, seq_num, joint_dim].

required
stdev float

Set the standard deviation for normal distribution to generate noise.

0.02
Source code in en/docs/model/src/dataloader.py
class MultimodalDataset(Dataset):
    #:: MultimodalDataset
    """
    This class is used to train models that deal with multimodal data (e.g., images, joints), such as CNNRNN/SARNN.

    Args:
        images (numpy array): Set of images in the dataset, expected to be a 5D array [data_num, seq_num, channel, height, width].
        joints (numpy array): Set of joints in the dataset, expected to be a 3D array [data_num, seq_num, joint_dim].
        stdev (float, optional): Set the standard deviation for normal distribution to generate noise.
    """

    def __init__(self, images, joints, stdev=0.02):
        """
        The constructor of Multimodal Dataset class. Initializes the images, joints, and transformation.

        Args:
            images (numpy array): The images data, expected to be a 5D array [data_num, seq_num, channel, height, width].
            joints (numpy array): The joints data, expected to be a 3D array [data_num, seq_num, joint_dim].
            stdev (float, optional): The standard deviation for the normal distribution to generate noise. Defaults to 0.02.
        """
        self.stdev = stdev
        self.images = images
        self.joints = joints
        self.transform = transforms.ColorJitter(contrast=0.5, brightness=0.5, saturation=0.1)

    def __len__(self):
        """
        Returns the number of the data.
        """
        return len(self.images)

    def __getitem__(self, idx):
        """
        Extraction and preprocessing of images and joints at the specified indexes.

        Args:
            idx (int): The index of the element.

        Returns:
            dataset (list): A list containing lists of transformed and noise added image
                            and joint (x_img, x_joint) and the original image and joint (y_img, y_joint).
        """
        y_img = self.images[idx]
        y_joint = self.joints[idx]

        x_img = self.transform(self.images[idx])
        x_img = x_img + torch.normal(mean=0, std=self.stdev, size=x_img.shape)

        x_joint = self.joints[idx] + torch.normal(mean=0, std=self.stdev, size=y_joint.shape)

        return [[x_img, x_joint], [y_img, y_joint]]

__getitem__(idx)

Extraction and preprocessing of images and joints at the specified indexes.

Parameters:

Name Type Description Default
idx int

The index of the element.

required

Returns:

Name Type Description
dataset list

A list containing lists of transformed and noise added image and joint (x_img, x_joint) and the original image and joint (y_img, y_joint).

Source code in en/docs/model/src/dataloader.py
def __getitem__(self, idx):
    """
    Extraction and preprocessing of images and joints at the specified indexes.

    Args:
        idx (int): The index of the element.

    Returns:
        dataset (list): A list containing lists of transformed and noise added image
                        and joint (x_img, x_joint) and the original image and joint (y_img, y_joint).
    """
    y_img = self.images[idx]
    y_joint = self.joints[idx]

    x_img = self.transform(self.images[idx])
    x_img = x_img + torch.normal(mean=0, std=self.stdev, size=x_img.shape)

    x_joint = self.joints[idx] + torch.normal(mean=0, std=self.stdev, size=y_joint.shape)

    return [[x_img, x_joint], [y_img, y_joint]]

__init__(images, joints, stdev=0.02)

The constructor of Multimodal Dataset class. Initializes the images, joints, and transformation.

Parameters:

Name Type Description Default
images numpy array

The images data, expected to be a 5D array [data_num, seq_num, channel, height, width].

required
joints numpy array

The joints data, expected to be a 3D array [data_num, seq_num, joint_dim].

required
stdev float

The standard deviation for the normal distribution to generate noise. Defaults to 0.02.

0.02
Source code in en/docs/model/src/dataloader.py
def __init__(self, images, joints, stdev=0.02):
    """
    The constructor of Multimodal Dataset class. Initializes the images, joints, and transformation.

    Args:
        images (numpy array): The images data, expected to be a 5D array [data_num, seq_num, channel, height, width].
        joints (numpy array): The joints data, expected to be a 3D array [data_num, seq_num, joint_dim].
        stdev (float, optional): The standard deviation for the normal distribution to generate noise. Defaults to 0.02.
    """
    self.stdev = stdev
    self.images = images
    self.joints = joints
    self.transform = transforms.ColorJitter(contrast=0.5, brightness=0.5, saturation=0.1)

__len__()

Returns the number of the data.

Source code in en/docs/model/src/dataloader.py
def __len__(self):
    """
    Returns the number of the data.
    """
    return len(self.images)