Dataloader
EIPL provides a MultimodalDataset
class for learning robot motions, which inherits from the Dataset class. This class returns a pair of input data, x_data
, and the corresponding true value, y_data
, for each epoch. The input data, x_data
, consists of pairs of images and joint angles, and data augmentation is applied during each epoch. The input images are randomly adjusted for brightness, contrast, etc. to improve robustness to changes in lighting conditions, while Gaussian noise is added to the input joint angles to improve robustness to position errors. On the other hand, no noise is added to the original data. The model learns to handle noiseless situations (internal representation) from input data mixed with noise, allowing robust motion generation even in the presence of real-world noise during inference.
The following source code shows how to use the MultimodalDataset
class with an example of an object grasping task collected by AIREC. By providing 5-dimensional time series image data [number of data, time series length, channel, height, width] and 3-dimensional time series joint angle data [number of data, time series length, number of joints] to the MultimodalDataset
class, data augmentation and other operations are performed automatically. Note that the SampleDownloader
, which is used to download the sample dataset, is not mandatory. You can use functions like numpy.load
or others to load your own datasets directly.
The following figure shows the robot camera images returned by the MultimodalDataset
class. From left to right, the images show the original image, the image with noise, and the robot joint angles. Random noise is added to the image at each epoch, allowing the model to learn from a variety of visual situations. The black dotted lines represent the original joint angles, while the colored lines represent the joint angles with Gaussian noise.
Note
If you are unable to obtain the dataset due to a proxy or any other reason, you can manually download the dataset from here and save it in the ~/.eipl/ folder.
```bash
$ cd ~/
$ mkdir -p .eipl/airec/
$ cd .eipl/airec/
$ # copy grasp_bottle.tar to ~/.eipl/airec/ directory
$ tar xvf grasp_bottle.tar
$ ls grasp_bottle/*
grasp_bottle/joint_bounds.npy
```
dataloader.MultimodalDataset
Bases: Dataset
This class is used to train models that deal with multimodal data (e.g., images, joints), such as CNNRNN/SARNN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
numpy array
|
Set of images in the dataset, expected to be a 5D array [data_num, seq_num, channel, height, width]. |
required |
joints |
numpy array
|
Set of joints in the dataset, expected to be a 3D array [data_num, seq_num, joint_dim]. |
required |
stdev |
float
|
Set the standard deviation for normal distribution to generate noise. |
0.02
|
Source code in en/docs/model/src/dataloader.py
__getitem__(idx)
Extraction and preprocessing of images and joints at the specified indexes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx |
int
|
The index of the element. |
required |
Returns:
Name | Type | Description |
---|---|---|
dataset |
list
|
A list containing lists of transformed and noise added image and joint (x_img, x_joint) and the original image and joint (y_img, y_joint). |
Source code in en/docs/model/src/dataloader.py
__init__(images, joints, stdev=0.02)
The constructor of Multimodal Dataset class. Initializes the images, joints, and transformation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
numpy array
|
The images data, expected to be a 5D array [data_num, seq_num, channel, height, width]. |
required |
joints |
numpy array
|
The joints data, expected to be a 3D array [data_num, seq_num, joint_dim]. |
required |
stdev |
float
|
The standard deviation for the normal distribution to generate noise. Defaults to 0.02. |
0.02
|