海词手机词典
  • The method uses actor-critic architecture, in which a recursive least-squares TD method is used to estimate parameters of value function during critic training and a value gradient method is used to improve control policy during actor training.

    播放读音 播放读音