Skip to content
View zafstojano's full-sized avatar
πŸ”₯
πŸ”₯

Organizations

@LokaHQ @open-thought

Block or report zafstojano

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zafstojano/README.md

πŸ‘‹ Hi, I'm Zafir

I'm interested in system-2 thinking, catastrophic forgetting, and fair evals.

🌐 Open Source

I have contributed to the following open-source repositories:

  • πŸ‹πŸ»β€β™€οΈ Reasoning Gym – RL environments for reasoning models.
  • πŸ“š RLHF Book – An introduction to RLHF and post-training.
  • πŸ”¬ Language Model Evaluation Harness – A framework for few-shot evaluation of LLMs.
  • πŸ“ˆ Policy Gradients – Minimal hackable implementation of policy gradient methods.
  • 🌍 OpenEnv – An interface library for RL post training with environments.
  • πŸ’ Laser Hockey – Winning entry for an RL tournament in laser hockey.
  • πŸ‘Ύ Word Game Bench – Evaluating LLMs on Wordle and Connections.
  • πŸ“– ML Interview Q&A – Booklet with popular questions and answers for ML interviews.

πŸ“„ Publications

My work is used by AI labs such as DeepMind, Meta, and NVIDIA:

Pinned Loading

  1. open-thought/reasoning-gym open-thought/reasoning-gym Public

    [NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

    Python 1.3k 111

  2. natolambert/rlhf-book natolambert/rlhf-book Public

    Textbook on reinforcement learning from human feedback

    Python 1.6k 148

  3. EleutherAI/lm-evaluation-harness EleutherAI/lm-evaluation-harness Public

    A framework for few-shot evaluation of language models.

    Python 11.5k 3.1k

  4. policy-gradients policy-gradients Public

    A minimal hackable implementation of policy gradient methods (GRPO, PPO, REINFORCE)

    Python 13