… and the trick to getting it working was, as usual, “working just a little bit harder than you want to“. Shortly after my last post, I got REINFORCE, a classic reinforcement learning algorithm, successfully training on my local machine, with apparent learning for all three environments in the assignment (though whether my solution is able to reach the expected final level of performance or not is still an open question).
-the Centaur