Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning | Xiaol.x | Podwise