kalomaze's kalomazing blog
Home
Blog
09 Mar, 2025
Understanding Transformers... (beyond the Math)
03 Mar, 2025
GRPO Judge Experiments: Findings & Empirical Observations
27 Feb, 2025
Why does GRPO work?
27 Feb, 2025
Synthetic rejected preference data creation [via Qwen7b finetune]