海词手机词典
  • The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.

    播放读音 播放读音