kalomaze's kalomazing blog

07 May, 2026 Reinforcement Learning for Knowledge Awareness
20 Feb, 2026 Don't Exclude Rollouts From Your RL Training Runs
09 Nov, 2025 RL Learning with LoRA: A Diverse Deep Dive
09 Mar, 2025 Understanding Transformers... (beyond the Math)
03 Mar, 2025 GRPO Judge Experiments: Findings & Empirical Observations
27 Feb, 2025 Why does GRPO work?
27 Feb, 2025 Synthetic rejected preference data creation [via Qwen7b finetune]