zafstojano

Follow

🔥

Zafir Stojanovski zafstojano

🔥

Follow

42 followers · 39 following

Organizations

zafstojano/README.md

👋 Hi, I'm Zafir

I'm interested in system-2 thinking, catastrophic forgetting, and fair evals.

🌐 Open Source

I have contributed to the following open-source repositories:

🏋🏻‍♀️ Reasoning Gym – RL environments for reasoning models.
📚 RLHF Book – An introduction to RLHF and post-training.
🔬 Language Model Evaluation Harness – A framework for few-shot evaluation of LLMs.
📈 Policy Gradients – Minimal hackable implementation of policy gradient methods.
🌍 OpenEnv – An interface library for RL post training with environments.
🏒 Laser Hockey – Winning entry for an RL tournament in laser hockey.
👾 Word Game Bench – Evaluating LLMs on Wordle and Connections.
📖 ML Interview Q&A – Booklet with popular questions and answers for ML interviews.

📄 Publications

My work is used by AI labs such as DeepMind, Meta, and NVIDIA:

🏋🏻 Reasoning Gym: Reasoning Environments for RL with Verifiable Rewards – NeurIPS (Spotlight)
🌊 Momentum-based Weight Interpolation for Continual Learning – Interpolate @ NeurIPS (Best Paper Award)

Pinned Loading

open-thought/reasoning-gym open-thought/reasoning-gym Public

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1.3k 111
natolambert/rlhf-book natolambert/rlhf-book Public

Textbook on reinforcement learning from human feedback

Python 1.6k 148
EleutherAI/lm-evaluation-harness EleutherAI/lm-evaluation-harness Public

A framework for few-shot evaluation of language models.

Python 11.5k 3.1k
policy-gradients policy-gradients Public

A minimal hackable implementation of policy gradient methods (GRPO, PPO, REINFORCE)

Python 13