日本語
EIPL: Embodied Intelligence with
Deep Predictive Learning

Hiroshi Ito
Tetsuya Ogata
Waseda University
[Paper]
[Documentation]
[GitHub Code]



Abstract

A deep learning-based approach can generalize model performance while reducing feature design costs by learning end-to-end environment recognition and motion generation. However, the process incurs huge training data collection costs and time and human resources for trial-and-error when involving physical contact with robots. Therefore, we propose ''deep predictive learning,'' a motion learning concept that assumes imperfections in the predictive model and minimizes the prediction error with the real-world situation. Deep predictive learning is inspired by the ''free energy principle and predictive coding theory,'' which explains how living organisms behave to minimize the prediction error between the real world and the brain. Robots predict near-future situations based on sensorimotor information and generate motions that minimize the gap with reality. The robot can flexibly perform tasks in unlearned situations by adjusting its motion in real-time while considering the gap between learning and reality.


Deep Predictive Learning


Deep predictive learning consists of three phases: training data collection, learning, and motion generation. In the data collection phase, sensorimotor information is stored as time-series data as the robot experiences work in the real world, using teleoperation or direct teaching. In the learning phase, the model is trained to minimize the prediction error between the current and next sensory-motor information. Specifically, the current robot state (\(i_t, s_t\)) is input to the model and the weights are updated to minimize the error between the model's predicted state (\(\hat{i}_{t+1}, \hat{s}_{t+1}\)) at the next time and the true value (\(i_{i+1}, s_{i+1}\)). In the motion generation phase, the robot predicts the near-future sensation and motion in real time based on the robot's sensorimotor information. The robot predicts the near-future situation based on the sensorimotor information and controls each joint of the robot to minimize the error (gap) from reality. The robot can work flexibly under unlearned situations by continuing to adjust its motion in real time while tolerating the difference between the learning time and reality.


Video



Source Code and Documentation

We have released the Pytorch based implementation and sample dataset on the github page. By referring to the source code and documentation, you can systematically learn everything from data collection to learning and analyzing motion generation models. The documentation uses an inexpensive robotic arm, OpenManipulator, and a multi-degree-of-freedom humanoid robot, AIREC, as examples, but you can easily apply it to your own robot by setting appropriate parameters of the robot body information (e.g., joint degrees of freedom, camera image resolution). The following figure shows the inference results of the motion generation model with attention mechanism when the robot is trained to perform an object grasping motion. From left to right, the input image with the attention point, the predicted image, the predicted joint angle, and the internal state of the RNN. The meaning of each figure and the visualization analysis method are described in the document.


[Documentation]
[GitHub Code]


BibTeX

@misc{suzuki2023deep,
  author    = {Kanata Suzuki and Hiroshi Ito and Tatsuro Yamada and Kei Kase and Tetsuya Ogata},
  title     = {Deep Predictive Learning : Motion Learning Concept inspired by Cognitive Robotics}, 
  booktitle = {arXiv preprint arXiv:2306.14714},
  year      = {2023},  
}


Acknowledgements

We would like to thank Hayato Idei for fruitful discussions and comments. The project was supported by JST Moonshot R&D Project (JPMJMS2031), JST ACT-X (JPMJAX190I), and Hitachi, Ltd.