Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Changelog

All notable changes to Taskmaster are documented here.

## [2.3.0] - 2026-02-25

### Changed
- Hook `reason` field now contains only the TASKMASTER_DONE signal token instead
of the full completion checklist. This keeps user-visible terminal output
minimal — one collapsed line rather than a wall of text.
- Full completion checklist lives exclusively in SKILL.md, which is always
loaded as system context. The agent already has all instructions; the reason
field no longer needs to duplicate them.
- Added `last_assistant_message` as the primary done-signal detection path
(faster, no transcript file parsing required). Transcript-based check is
retained as fallback.
- Removed `HAS_RECENT_ERRORS` / `stop_hook_active` escape-hatch logic in favor
of the explicit TASKMASTER_DONE signal protocol.
- `hooks/check-completion.sh` brought in sync with root-level canonical source.

## [2.2.0] - 2026-02-19

### Changed
- Default `TASKMASTER_MAX` set to 100 (previously 0 / infinite).
- Moved full completion checklist from hook `reason` into SKILL.md system
context (first pass; reason still contained a short prompt).
- `install.sh` made POSIX-portable (`sh` shebang, conditional `pipefail`).

### Fixed
- Resolved infinite loop caused by `set -euo pipefail` in sh-sourced contexts.

## [2.1.0]

### Added
- Session-scoped counter with configurable `TASKMASTER_MAX` escape hatch.
- Subagent skip: transcripts shorter than 20 lines are ignored.
- `TASKMASTER_DONE_PREFIX` env var for customising the done token prefix.

## [2.0.0]

### Added
- TASKMASTER_DONE signal protocol: stop is allowed only after the agent emits
`TASKMASTER_DONE::<session_id>` in its response.
- Transcript-based done-signal detection.

## [1.0.0]

### Added
- Initial release: stop hook that blocks agent from stopping prematurely.
- Completion checklist injected via hook `reason` field.
- `TASKMASTER_MAX` loop guard.
64 changes: 64 additions & 0 deletions LESSONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Lessons Learned

Append-only log of debugging insights and non-obvious solutions.

---

## 2026-02-25T14:00 - Claude Code hook `reason` is dual-use: user-visible AND AI context

**Problem**: The taskmaster stop hook embedded a full 5-item completion checklist in the `reason` field of `{ "decision": "block", "reason": "..." }`. Every stop attempt printed the entire checklist to the user's terminal.

**Root Cause**: Claude Code's stop hook `reason` field serves two purposes simultaneously — it is displayed to the user in the terminal UI ("Stop hook error: ...") AND injected back into the AI conversation as context. Putting verbose instructions in `reason` to inform the AI caused them to also appear as user-visible output.

**Lesson**: The `reason` field is not a private AI channel. Anything in `reason` is shown to the human. Persistent AI instructions belong in SKILL.md (system context loaded at session start), not in transient hook `reason` values. The `reason` should carry only the minimum transient signal the agent needs right now.

**Code Issue**:
```bash
# Before (verbose — full checklist in reason, shown to user)
REASON="${LABEL}: ${PREAMBLE}

Before stopping, do each of these checks:
1. RE-READ THE ORIGINAL USER MESSAGE(S)...
2. CHECK THE TASK LIST...
[etc]"
jq -n --arg reason "$REASON" '{ decision: "block", reason: $reason }'

# After (minimal — only the done signal; checklist lives in SKILL.md)
DONE_SIGNAL="${DONE_PREFIX}::${SESSION_ID}"
jq -n --arg reason "$DONE_SIGNAL" '{ decision: "block", reason: $reason }'
```

**Solution**: Strip the checklist from `reason`. Put it only in SKILL.md, which is always loaded as system context. The `reason` now contains only the done signal token the agent must emit.

**Prevention**: When designing Claude Code hooks, ask: "Does this text need to be in the reason, or is it already in system context?" If it's in a skill file, it doesn't belong in `reason`.

---

## 2026-02-25T14:30 - `last_assistant_message` is faster than transcript scanning for done-signal detection

**Problem**: The hook was opening and scanning potentially large transcript JSON files on every stop attempt to detect whether the agent had emitted the done signal.

**Root Cause**: The hook relied exclusively on transcript-file parsing, which requires disk I/O and JSON scanning on every invocation.

**Lesson**: The Claude Code hook input JSON exposes `last_assistant_message` directly. Checking that field is O(1) and avoids the file read in the common case (agent just emitted the signal in its latest message).

**Code Issue**:
```bash
# Before (always scans transcript file)
if tail -400 "$TRANSCRIPT" 2>/dev/null | grep -Fq "$DONE_SIGNAL"; then
HAS_DONE_SIGNAL=true
fi

# After (fast path via last_assistant_message, transcript as fallback)
LAST_MSG=$(echo "$INPUT" | jq -r '.last_assistant_message // ""')
if [ -n "$LAST_MSG" ] && echo "$LAST_MSG" | grep -Fq "$DONE_SIGNAL" 2>/dev/null; then
HAS_DONE_SIGNAL=true
fi
if [ "$HAS_DONE_SIGNAL" = false ] && [ -f "$TRANSCRIPT" ]; then
if tail -400 "$TRANSCRIPT" 2>/dev/null | grep -Fq "$DONE_SIGNAL"; then
HAS_DONE_SIGNAL=true
fi
fi
```

**Prevention**: Always check `last_assistant_message` before falling back to transcript file parsing in stop hooks.
76 changes: 70 additions & 6 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: |
parseable done signal in its final response. Provides deterministic machine
detection for true completion.
author: blader
version: 2.1.0
version: 2.3.0
---

# Taskmaster
Expand All @@ -21,28 +21,92 @@ completion review cycle.
1. **Agent tries to stop** — the stop hook fires every time.
2. **Hook scans transcript** for a parseable token:
`TASKMASTER_DONE::<session_id>`
3. **Token missing** — hook blocks stop and injects a checklist plus the exact
token to emit when truly done.
3. **Token missing** — hook blocks stop with a brief trigger message.
4. **Token present** — hook allows stop and clears session counter state.

## Completion Checklist

When the stop hook blocks you, execute this checklist before emitting the
done signal. Do NOT narrate or summarize — just DO the work.

1. **RE-READ THE ORIGINAL USER MESSAGE(S).** List every discrete request or
acceptance criterion. For each one, confirm it is fully addressed — not
just started, FULLY done. If the user explicitly changed their mind,
withdrew a request, or told you to stop or skip something, treat that
item as resolved and do NOT continue working on it.

2. **CHECK THE TASK LIST.** Review every task. Any task not marked completed?
Do it now — unless the user indicated it is no longer wanted.

3. **CHECK THE PLAN.** Walk through each step — INCLUDING verification steps.
Any step skipped or partially done? Finish it — unless the user redirected
or deprioritized it. If the plan includes ANY verification steps (builds,
tests, lints, type-checks, manual validation, smoke tests, functional
tests, or any other form of verification), you MUST actually execute them
and see them pass — do not skip them or just claim they pass.

4. **CHECK FOR ERRORS OR INCOMPLETE WORK.** Did anything fail or remain
unfinished? Fix it. This applies to ALL types of problems — logic errors,
missing functionality, incomplete refactors, broken scripts, configuration
issues, or anything else that prevents the work from being fully done.

5. **CHECK FOR LOOSE ENDS.** Any TODO comments, placeholder code, missing
tests, untested changes, or follow-ups noted but not acted on?

6. **CHECK FOR BLOCKERS.** If something is blocking you, do NOT give up. You
are a world-class engineer with access to a full development environment,
a terminal, every tool you need, and all the information on the internet.
If it does not violate the laws of physics, it is within your ability to
solve it. Try a different approach, read more code, search for examples,
re-examine your assumptions. Never declare something a blocker and stop.
Solve it.

**User priority:** The user's latest instructions always take priority. If the
user said to stop, move on, or skip something, respect that — do not force
completion of work the user no longer wants.

**DO NOT NARRATE — EXECUTE:** If any incomplete work remains, your ONLY job is
to DO that work right now. Do NOT respond by explaining what the remaining
tasks are, describing their complexity, listing their dependencies, or
analyzing how difficult they will be. Do NOT ask the user for permission or
direction to proceed. Do NOT write summaries of what is left. Just DO the
work. The user asked you to do it — that IS your direction. Every sentence you
spend describing remaining work instead of doing it is wasted. Open files,
write code, run commands, fix bugs. Act.

**HONESTY CHECK:** Before marking anything as "not possible" or "skipped", ask
yourself: did you actually TRY, or are you rationalizing skipping it because
it seems hard or inconvenient? "I can't do X" is almost never true — what you
mean is "I haven't tried X yet." If you haven't attempted something, you don't
get to claim it's impossible. Attempt it first.

## Parseable Done Signal

When the work is genuinely complete, the agent must include this exact line
in its final response (on its own line):
When the work is genuinely complete, include this exact line in your final
response (on its own line):

```text
TASKMASTER_DONE::<session_id>
```

Do NOT emit that done signal early. If any work remains, continue working.

This gives external automation a deterministic completion marker to parse.

## Configuration

- `TASKMASTER_MAX` (default `0`): Max number of blocked stop attempts before
- `TASKMASTER_MAX` (default `100`): Max number of blocked stop attempts before
allowing stop. `0` means infinite (keep firing).
- `TASKMASTER_DONE_PREFIX` (default `TASKMASTER_DONE`): Prefix used for the
done token.

## Design Notes

The hook's `reason` field is intentionally minimal — it contains only the done
signal token. The full completion checklist lives here in SKILL.md, which is
always loaded as system context. This keeps the user-visible terminal output
clean while the agent still has all required instructions.

## Setup

The hook must be registered in `~/.claude/settings.json`:
Expand Down
67 changes: 19 additions & 48 deletions check-completion.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,20 @@
#
# Stop hook: keep firing until the agent emits an explicit done signal.
#
# The stop is allowed only after the transcript contains:
# The stop is allowed only after the agent emits:
# TASKMASTER_DONE::<session_id>
#
# Optional env vars:
# TASKMASTER_MAX Max number of blocks before allowing stop (default: 0 = infinite)
# TASKMASTER_MAX Max number of blocks before allowing stop (default: 100)
# TASKMASTER_DONE_PREFIX Prefix for done token (default: TASKMASTER_DONE)
#
set -euo pipefail
set -u

INPUT=$(cat)
SESSION_ID=$(echo "$INPUT" | jq -r '.session_id')
TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path')
# Expand leading ~ to $HOME (bash does not expand tilde inside quoted strings)
TRANSCRIPT="${TRANSCRIPT/#\~/$HOME}"
if [ -z "$SESSION_ID" ] || [ "$SESSION_ID" = "null" ]; then
SESSION_ID="unknown-session"
fi
Expand All @@ -30,7 +32,7 @@ fi
COUNTER_DIR="${TMPDIR:-/tmp}/taskmaster"
mkdir -p "$COUNTER_DIR"
COUNTER_FILE="${COUNTER_DIR}/${SESSION_ID}"
MAX=${TASKMASTER_MAX:-0}
MAX=${TASKMASTER_MAX:-100}

COUNT=0
if [ -f "$COUNTER_FILE" ]; then
Expand All @@ -41,17 +43,17 @@ fi
DONE_PREFIX="${TASKMASTER_DONE_PREFIX:-TASKMASTER_DONE}"
DONE_SIGNAL="${DONE_PREFIX}::${SESSION_ID}"
HAS_DONE_SIGNAL=false
HAS_RECENT_ERRORS=false

if [ -f "$TRANSCRIPT" ]; then
TAIL_400=$(tail -400 "$TRANSCRIPT" 2>/dev/null || true)
if echo "$TAIL_400" | grep -Fq "$DONE_SIGNAL" 2>/dev/null; then
HAS_DONE_SIGNAL=true
fi
# Primary: check last_assistant_message (most reliable — no transcript parsing needed)
LAST_MSG=$(echo "$INPUT" | jq -r '.last_assistant_message // ""')
if [ -n "$LAST_MSG" ] && echo "$LAST_MSG" | grep -Fq "$DONE_SIGNAL" 2>/dev/null; then
HAS_DONE_SIGNAL=true
fi

TAIL_40=$(tail -40 "$TRANSCRIPT" 2>/dev/null || true)
if echo "$TAIL_40" | grep -qi '"is_error":\s*true' 2>/dev/null; then
HAS_RECENT_ERRORS=true
# Fallback: check transcript file if last_assistant_message didn't match
if [ "$HAS_DONE_SIGNAL" = false ] && [ -f "$TRANSCRIPT" ]; then
if tail -400 "$TRANSCRIPT" 2>/dev/null | grep -Fq "$DONE_SIGNAL"; then
HAS_DONE_SIGNAL=true
fi
fi

Expand All @@ -63,43 +65,12 @@ fi
NEXT=$((COUNT + 1))
echo "$NEXT" > "$COUNTER_FILE"

# Optional escape hatch. Default is infinite (0) so hook keeps firing.
# Optional escape hatch. Default is 100.
if [ "$MAX" -gt 0 ] && [ "$NEXT" -ge "$MAX" ]; then
rm -f "$COUNTER_FILE"
exit 0
fi

if [ "$HAS_RECENT_ERRORS" = true ]; then
PREAMBLE="Recent tool errors were detected. Resolve them before declaring done."
else
PREAMBLE="Stop is blocked until completion is explicitly confirmed."
fi

if [ "$MAX" -gt 0 ]; then
LABEL="TASKMASTER (${NEXT}/${MAX})"
else
LABEL="TASKMASTER (${NEXT})"
fi

# --- reprompt ---
REASON="${LABEL}: ${PREAMBLE}

Before stopping, do each of these checks:

1. RE-READ THE ORIGINAL USER MESSAGE(S). List every discrete request or acceptance criterion. For each one, confirm it is fully addressed — not just started, FULLY done. If the user explicitly changed their mind, withdrew a request, or told you to stop or skip something, treat that item as resolved and do NOT continue working on it.
2. CHECK THE TASK LIST. Review every task. Any task not marked completed? Do it now — unless the user indicated it is no longer wanted.
3. CHECK THE PLAN. Walk through each step — INCLUDING verification steps. Any step skipped or partially done? Finish it — unless the user redirected or deprioritized it. If the plan includes ANY verification steps (builds, tests, lints, type-checks, manual validation, smoke tests, functional tests, or any other form of verification), you MUST actually execute them and see them pass — do not skip them or just claim they pass.
4. CHECK FOR ERRORS OR INCOMPLETE WORK. Did anything fail or remain unfinished? Fix it. This applies to ALL types of problems — logic errors, missing functionality, incomplete refactors, broken scripts, configuration issues, or anything else that prevents the work from being fully done.
5. CHECK FOR LOOSE ENDS. Any TODO comments, placeholder code, missing tests, untested changes, or follow-ups noted but not acted on?
6. CHECK FOR BLOCKERS. If something is blocking you, do NOT give up. You are a world-class engineer with access to a full development environment, a terminal, every tool you need, and all the information on the internet. If it does not violate the laws of physics, it is within your ability to solve it. Try a different approach, read more code, search for examples, re-examine your assumptions. Never declare something a blocker and stop. Solve it.

IMPORTANT: The user's latest instructions always take priority. If the user said to stop, move on, or skip something, respect that — do not force completion of work the user no longer wants.

HONESTY CHECK: Before marking anything as \"not possible\" or \"skipped\", ask yourself: did you actually TRY, or are you rationalizing skipping it because it seems hard or inconvenient? \"I can't do X\" is almost never true — what you mean is \"I haven't tried X yet.\" If you haven't attempted something, you don't get to claim it's impossible. Attempt it first.

When and only when everything is genuinely 100% done (or explicitly deprioritized by the user), include this exact line in your final response on its own line:
${DONE_SIGNAL}

Do NOT emit that done signal early. If any work remains, continue working now."

jq -n --arg reason "$REASON" '{ decision: "block", reason: $reason }'
# Minimal reason — full completion checklist lives in SKILL.md (always in system context).
# Only the done signal is included so the agent knows exactly what to emit when complete.
jq -n --arg reason "$DONE_SIGNAL" '{ decision: "block", reason: $reason }'
22 changes: 16 additions & 6 deletions docs/SPEC.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Taskmaster
## Product & Technical Specification

**Version**: 2.1.0
**Version**: 2.2.0
**Scope**: `taskmaster/hooks/check-completion.sh`, `taskmaster/SKILL.md`

## 1. Goal
Expand Down Expand Up @@ -39,18 +39,28 @@ On each stop event:
7. Optional safety cap: if `TASKMASTER_MAX > 0` and counter reaches cap,
allow stop and clear counter file.

### 3.2 Prompt Injection
### 3.2 Prompt Architecture

When blocking, Taskmaster injects:
The verbose completion checklist lives in `SKILL.md`, which is loaded as system
context (invisible to the user in session history). The hook's `reason` field
is kept minimal — just a label, status, and done signal — so it does not
pollute the conversation transcript.

When blocking, Taskmaster injects a brief reason:

- `TASKMASTER (N)` or `TASKMASTER (N/MAX)` label.
- A completion checklist (requests, plan, errors, loose ends, blockers).
- The exact done line to emit when truly complete.
- Short status (stop blocked / errors detected).
- Reference to follow the taskmaster completion checklist.
- The exact done signal to emit when truly complete.

The full checklist (re-read user messages, check task list, check plan, check
for errors, check for loose ends, check for blockers, honesty check) is in the
"Completion Checklist" section of `SKILL.md`.

### 3.3 Error Signal Hinting

Taskmaster inspects recent transcript lines for `"is_error": true` and adjusts
preamble text to explicitly call out unresolved errors.
the brief preamble text to call out unresolved errors.

## 4. Runtime Interfaces

Expand Down
Loading