Репост из: AGI
Reward Learning by Simulating the Past
https://bair.berkeley.edu/blog/2019/02/11/learning_preferences/
https://bair.berkeley.edu/blog/2019/02/11/learning_preferences/