Inner alignment is a problem when you train the reward function & the policy function jointly.

โค‹ Read More