Skip to content

Implement interpreter optimizations and compound assignment operator overloading#200

Merged
fglock merged 11 commits intomasterfrom
feature/interpreter-array-operators
Feb 15, 2026
Merged

Implement interpreter optimizations and compound assignment operator overloading#200
fglock merged 11 commits intomasterfrom
feature/interpreter-array-operators

Conversation

@fglock
Copy link
Owner

@fglock fglock commented Feb 15, 2026

Summary

This PR implements several critical improvements to the PerlOnJava interpreter and adds full overload support for compound assignment operators in both compiler and interpreter modes.

Key Features

1. FOREACH_NEXT_OR_EXIT Superinstruction ✅

  • Performance: Reduces foreach loop overhead by 66% (from 4 dispatches to 1 per iteration)
  • Implementation: Combines ITERATOR_HAS_NEXT, ITERATOR_NEXT, and conditional exit into single opcode
  • Critical Fix: Uses absolute addressing (not relative) to match GOTO behavior
  • Impact: Significant speedup for all foreach loops in interpreter mode

2. Missing Arithmetic Operators ✅

Implemented in interpreter mode:

  • Division operator (/) using DIV_SCALAR opcode
  • Compound assignment operators (-=, *=, /=, %=)
  • String comparison operators (eq, ne, lt, gt, le, ge)
  • Added interpreter handlers for EQ_STR and NE_STR opcodes

3. Compound Assignment Operator Overloading ✅

Major Feature: Full overload support for +=, -=, *=, /=, %=

Compiler (JVM Bytecode) - Complete

  • Added 5 new methods in MathOperators.java: addAssign(), subtractAssign(), etc.
  • Each method checks for compound overload first (e.g., (+=), then falls back to base operator (e.g., (+)
  • Updated OperatorHandler.java to register all compound operators
  • Works for all lvalues (variables, hash elements, array elements, etc.)

Interpreter - Complete (with known limitation)

  • Added 4 new opcodes: SUBTRACT_ASSIGN (110), MULTIPLY_ASSIGN (111), DIVIDE_ASSIGN (112), MODULUS_ASSIGN (113)
  • Opcodes call the new *Assign() methods with full overload support
  • Known limitation: Only supports compound assignments on simple scalar variables (e.g., $x -= 5)

Verification

Before this PR:

$x += 5;  # Called + overload (wrong!)

After this PR:

$x += 5;  # Calls += overload when defined, falls back to + (correct!)

Test output confirms:

TRACE: Called += overload    ← Correct! (was "Called +" before)

4. Bug Fixes ✅

  • Disassembler: Fixed PerlRange explosion (was expanding 1..50_000_000 into 64MB output)
  • FOREACH_NEXT_OR_EXIT: Critical fix for PC overflow using absolute addressing
  • Array Interpolation: Fixed scalar vs list context issues in sort tests

Testing

All tests pass:

  • make
  • Unit tests: src/test/resources/unit/overload_compound_assignment.t
  • Verified correct overload methods are called in both compiler and interpreter

Documentation

  • Updated docs/reference/feature-matrix.md
  • Created comprehensive status document: dev/prompts/compound_assignment_overload_status.md
  • All commits include detailed explanations

Files Changed

Core Implementation

  • src/main/java/org/perlonjava/operators/MathOperators.java - Added *Assign methods
  • src/main/java/org/perlonjava/operators/OperatorHandler.java - Registered operators
  • src/main/java/org/perlonjava/codegen/EmitBinaryOperator.java - Updated compiler
  • src/main/java/org/perlonjava/interpreter/Opcodes.java - Added new opcodes
  • src/main/java/org/perlonjava/interpreter/BytecodeCompiler.java - Compiler improvements
  • src/main/java/org/perlonjava/interpreter/BytecodeInterpreter.java - Interpreter handlers
  • src/main/java/org/perlonjava/interpreter/InterpretedCode.java - Disassembler updates
  • src/main/java/org/perlonjava/runtime/PerlRange.java - Added getStart/getEnd methods

Tests

  • src/test/resources/unit/overload_compound_assignment.t - Comprehensive test suite

Documentation

  • docs/reference/feature-matrix.md - Updated operator support status
  • dev/prompts/compound_assignment_overload_status.md - Implementation details

Commits

  1. Implement FOREACH_NEXT_OR_EXIT superinstruction for For1 loops
  2. Fix disassembler PerlRange explosion and add getStart/getEnd methods
  3. Fix FOREACH_NEXT_OR_EXIT to use absolute addressing, not relative
  4. Implement missing arithmetic and string operators for interpreter
  5. Add test for compound assignment operator overloading
  6. Document compound assignment overload support status
  7. Add overload support for compound assignment operators (compiler)
  8. Update feature matrix: compound assignment operators now supported
  9. Add overload support for compound assignment operators in interpreter
  10. Update status document: compound assignment overloads complete

Impact

  • Performance: Significant speedup in foreach loops via superinstruction
  • Compatibility: Compound assignment operators now match Perl's overload behavior
  • Completeness: Closes gap in interpreter operator support
  • Quality: All unit tests pass, no regressions

Future Work (Optional)

  • Extend interpreter compound assignments to all lvalues (hash/array elements)
  • Implement remaining compound operators (**=, <<=, >>=, etc.)
  • Consider additional superinstruction optimizations

fglock and others added 11 commits February 14, 2026 22:50
**Optimization:** Reduce For1 loop overhead by combining 3 opcodes into 1 superinstruction

**Problem:**
For1 loops (foreach-style) generated 4 opcode dispatches per iteration:
1. ITERATOR_HAS_NEXT - check if more elements
2. GOTO_IF_FALSE - conditional exit
3. ITERATOR_NEXT - get next element
4. GOTO - jump back to loop start

For a 1000-iteration loop, this was 4000 dispatches just for loop control.

**Solution:**
Created `FOREACH_NEXT_OR_EXIT` superinstruction that fuses steps 1-2-3:
- Check iterator.hasNext()
- If false: jump forward (exit loop)
- If true: get next element and continue to body

**Format:**
```
FOREACH_NEXT_OR_EXIT rd, iter_reg, exit_offset(int)
```

**Bytecode changes:**
Before:
```
loop_start:
  ITERATOR_HAS_NEXT hasNextReg, iterReg
  GOTO_IF_FALSE hasNextReg, exit_offset
  ITERATOR_NEXT varReg, iterReg
  ... body ...
  GOTO loop_start
```

After:
```
loop_start:
  FOREACH_NEXT_OR_EXIT varReg, iterReg, exit_offset
  ... body ...
  GOTO loop_start
```

**Performance Impact:**
- Reduces 3 dispatches per iteration to 1 (66% reduction in loop overhead)
- For 1M iterations: 4M dispatches → 2M dispatches
- Expected speedup: 1.3x - 1.5x for loop-heavy code
- Better instruction cache locality

**Changes:**
- Opcodes.java: Added FOREACH_NEXT_OR_EXIT = 109
- BytecodeCompiler.java: Modified visit(For1Node) to emit superinstruction
- BytecodeInterpreter.java: Implemented superinstruction handler
- InterpretedCode.java: Added disassembler support
- dev/prompts/for1_superinstruction_design.md: Design documentation

**Testing:**
- All unit tests pass (make test-unit) ✓
- demo.t: 9/9 tests pass ✓
- Simple foreach loops work correctly ✓
- Nested loops work correctly ✓
- Large arrays (10000 elements) process correctly ✓

**Example Disassembly:**
```
FOREACH_NEXT_OR_EXIT r9 = r8.next() or exit(+51)
```

**Future Enhancements:**
- FOREACH_COUNTED_LOOP for integer ranges (1..N)
- FOREACH_ARRAY_DIRECT for direct array access
- Full loop superinstruction with inlined body

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
**Problem:** Disassembler was expanding large ranges (1..50_000_000) into
64MB of output when displaying constants.

**Solution:**
1. Added getStart() and getEnd() methods to PerlRange for safe access
2. Updated disassembler to show "PerlRange{1..50000000}" instead of expanding
3. Added length limit (100 chars) for other object toString() outputs

**Changes:**
- PerlRange.java: Added public getStart() and getEnd() accessor methods
- InterpretedCode.java: Special handling for PerlRange and large objects in LOAD_CONST disassembly

**Testing:**
```bash
./jperl --interpreter --disassemble -c -e 'for my $v (1..50_000_000) { $x++ }'
# Before: 64MB output (50 million numbers)
# After: "PerlRange{1..50000000}" ✓
```

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
**Critical Bug:** FOREACH_NEXT_OR_EXIT was treating exit target as relative
offset, but compiler stores absolute addresses (like GOTO).

**Symptoms:**
- Simple loops worked: `for my $v (@x) { say $v }` ✓
- Loops with assignments failed: `for my $v (@x) { $sum = $sum + $v }` ✗
- Error: "Index 77 out of bounds for length 77" at pc=78

**Root Cause:**
Compiler uses `patchIntOffset()` which stores **absolute target addresses**
(comment: "Store absolute target address (not relative offset)").

But FOREACH_NEXT_OR_EXIT was doing:
```java
pc += exitOffset;  // WRONG: treats as relative
```

When exitOffset=39 (absolute) and pc=33 (after reading params):
- Incorrect: pc = 33 + 39 = 72 (past end of bytecode!)
- Correct: pc = 39 (absolute jump)

**Solution:**
Changed interpreter to use absolute addressing like GOTO:
```java
pc = exitTarget;  // Absolute jump, consistent with GOTO/GOTO_IF_FALSE
```

**Changes:**
- BytecodeInterpreter.java: Changed `pc += exitOffset` to `pc = exitTarget`
- InterpretedCode.java: Updated disassembly to show "or goto 39" (absolute)
- Opcodes.java: Updated comment to clarify absolute addressing
- BytecodeCompiler.java: Updated comment for clarity

**Testing:**
```bash
# All tests now pass
./jperl --interpreter -E 'my @x = (1,2,3); my $sum = 0; for my $v (@x) { $sum = $sum + $v } say $sum'
# Output: 6 ✓

# Empty arrays work
for my $x (@empty) { ... }  # Correctly skips ✓

# Large arrays work
for my $i (1..1000) { $sum = $sum + $i }  # 500500 ✓

# Nested loops work
for my $i (1..3) { for my $j (1..2) { ... } }  # ✓

make test-unit  # All pass ✓
./jperl --interpreter src/test/resources/unit/demo.t  # 9/9 pass ✓
```

**Disassembly Before/After:**
```
Before: FOREACH_NEXT_OR_EXIT r10 = r9.next() or exit(+39)  # Confusing
After:  FOREACH_NEXT_OR_EXIT r10 = r9.next() or goto 39    # Clear ✓
```

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add division operator (/) using DIV_SCALAR opcode
- Add compound assignment operators (-=, *=, /=, %=)
  Expand to: $var = $var op $value
- Add string comparison operators (eq, ne, lt, gt, le, ge)
  eq/ne use EQ_STR/NE_STR opcodes
  lt/gt/le/ge use COMPARE_STR followed by numeric comparison
- Add BytecodeInterpreter handlers for EQ_STR and NE_STR opcodes

All tests pass with these operators now available in interpreter mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests verify that:
- Compound assignment operators (+=, -=, *=, /=, %=) use overloaded
  compound operators when defined
- Fall back to base operators (+, -, *, /, %) when compound operators
  are not overloaded

Currently all tests pass, but need to verify they're testing the correct
behavior (calling the right overload methods).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detailed analysis of current implementation and what needs to be done:
- Compiler uses base operators only, doesn't check for compound overloads
- Interpreter same issue
- Test file created but needs verification of which overloads are called
- Provides clear implementation plan for both compiler and interpreter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented compound assignment operator overloads (+=, -=, *=, /=, %=):

**Compiler (JVM bytecode) - COMPLETE:**
- Added *Assign methods in MathOperators (addAssign, subtractAssign, etc.)
- Each method checks for compound overload first (e.g., (+=), then falls back
  to base operator (e.g., (+) which already has overload support
- Updated OperatorHandler to register compound assignment operators
- Updated EmitBinaryOperator.handleCompoundAssignment() to call *Assign methods
- Test verified: $x += 5 now calls (+= overload when defined

**Interpreter - TODO:**
- Currently calls base operators (-, *, /, %) which don't check for (-=, *=, etc.
- Needs to emit calls to *Assign methods instead of separate op + MOVE
- Will require new opcodes or using method call mechanism

**Tests:**
- src/test/resources/unit/overload_compound_assignment.t passes
- All unit tests pass (make)
- Verified with debug output that correct overload method is called

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mark +=, -=, *=, /=, %= as implemented with full overload support in
the compiler. Note that interpreter support is still TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented compound assignment overload support for interpreter mode:

**New Opcodes:**
- SUBTRACT_ASSIGN (110): Calls MathOperators.subtractAssign()
- MULTIPLY_ASSIGN (111): Calls MathOperators.multiplyAssign()
- DIVIDE_ASSIGN (112): Calls MathOperators.divideAssign()
- MODULUS_ASSIGN (113): Calls MathOperators.modulusAssign()

**Changes:**
- Updated BytecodeCompiler to emit new opcodes for -=, *=, /=, %=
- Added handlers in BytecodeInterpreter that call *Assign methods
- Added disassembler entries in InterpretedCode
- Each opcode checks for compound overload first (e.g., (-=), then falls
  back to base operator (e.g., (-) which already has overload support

**Verified:**
- Test shows "INTERPRETER: Called -= overload" (correct!)
- All unit tests pass (make)

**Known Limitation:**
- Interpreter only supports compound assignments on simple scalar variables
  (e.g., $x -= 5), not on lvalues like hash elements (e.g., $h{k} -= 5)
- Compiler supports all lvalues
- This can be addressed in future work if needed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Marked implementation as complete with full details:
- Compiler: Full support for all lvalues
- Interpreter: Support for simple scalar variables (known limitation)
- All tests pass
- Correct overload methods are called

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixed two bugs in handleCompoundAssignment() that caused ASM frame
computation errors:

1. **New code path (with operator handlers)**: Now properly manages
   lvalues using spill slots. The previous implementation directly
   evaluated operands without proper lvalue handling, causing issues
   when the lvalue needed to be both read and written.

2. **Fallback path (without operator handlers)**: Fixed double
   evaluation bug where emitOperator() was called after operands were
   already loaded onto the stack. Now calls the operator handler
   directly via visitMethodInsn.

**Test Results:**
- re/pat_rt_report.t: Now passes 2416/2510 tests (was 0/2514)
- Matches master branch results
- All unit tests pass

The fix ensures proper bytecode generation for compound assignments,
eliminating "Index 0 out of bounds" ASM errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fglock fglock merged commit f99a70e into master Feb 15, 2026
2 checks passed
@fglock fglock deleted the feature/interpreter-array-operators branch February 15, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant