Simulation for Sensorimotor Control

Use Reinforcement Learning method to demonstrate human Center Nervous System's adaptability

Introduction

This is my first reinforcement learning project, and it is a theoretical tabular reinforcement learning method based project. In this project, I applied a reinforcement learning algorithm, Value-Iteration based Adaptive Dynamic Programming (ADP) to a augmented sensorimotor system to explain the Central Nervous System’s (CNS’s) learning ability and adaptivity to perturbed environment and time delay in limbs’ movement.

Method

The methodology is based on a previous research which models limb’s movement as a Linear Time-Invariant (LTI) system and applies Policy-Iteration based Adaptive Dynamic Programming (ADP) to a solve the unknown system dynamic. However, it did not take into account the time delay issues and CNS’s adaptability to time delay in sensorimotor control. To address this limitation, in this research, I introduced time delay to the LTI system and applied state-augmentation and Value Iteration based ADP algorithm to demonstrate the Central Nervous System’s (CNS’s) adaptability to time delay in limbs’ movement.

Modeling

The following is the how we model this scenario and how to introduce time delay. The experient scenario:

Experiment Scenario

The system ordinary differential equations:

Experiment Scenario

Introduce time delay to the system:

Experiment Scenario

Augmentation

Inspired by a paper for autonomous vehicles and its state augmentation idea I augmented the state vector with past state and control vectors to eliminate the effect of time delay. To augment the system with past states and control vectors, we must first discretize the system. The discretization method is:

Experiment Scenario

Then, augment the discrete LTI system with past state and control vectors:

Experiment Scenario

Subsequently, I applied Value Iteration based ADP algorithm to the augmented LTI system and conducted simulation in a Python environment. The Value Iteration based ADP algorithm is shown as below. The Value Iteration based ADP is based on Value Iteration algorithm in Dynamic Programming when the system dynamic is known (model-based). Here is the model-based method is shown as below:

Experiment Scenario
Experiment Scenario

Adaptive Dynamic Programming

To solve the unknow system dynamic (Matrices A and B), we introduced the Data-Driven method, Value Iteration based ADP to generate the control gain matrix K (policy) from data obtained from the environment instead of the known system dynamic (Matrices A and B). (In some sense, this approach is similar to Q-learning, where H matrix defines the action-value function:)

Experiment Scenario
Experiment Scenario
Experiment Scenario
Experiment Scenario

By applying ADP, we can obtain the control policy from data instead of system model:

Experiment Scenario

The overall algorithm is:

Experiment Scenario

Results

The simluation result has shown CNS’s learning ability and adaptivity to perturbed environment and time delay in limbs’ movement. Human’s trajectory during learning:

Experiment Scenario

Human’s trajectory after learning:

Experiment Scenario

Model’s trajectory during learning:

Experiment Scenario

Model’s trajectory after learning:

Experiment Scenario

The project is implemented in Python and has not yet been uploaded to GitHub. For access to the project code, please feel free to contact me directly.