Interpreter: Add array operators and fix critical bugs#194
Merged
Conversation
Fixed crash when eval receives a RuntimeList (from string interpolation) instead of RuntimeScalar. The executeEvalString handler now properly handles both types by converting RuntimeList to RuntimeScalar using scalar() method. Before: eval "$x++" # Crash: ClassCastException After: eval "$x++" # No crash (but variable capture not yet working) Known Limitation: Lexical variable capture in eval STRING is not yet implemented. Variables declared in the outer interpreted scope are not accessible to the eval'd code. This requires detecting variable references in the eval string and passing the corresponding registers as captured variables. Example that doesn't work yet: my $x = 1; eval "$x++"; print $x # Prints 1 (should print 2) See EvalStringHandler.java lines 86-94 for TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds support for lexical variable capture in eval STRING, matching compiler
mode behavior. Variables from outer scope are now accessible and modifiable
within eval'd code.
Changes:
- InterpretedCode: Add variableRegistry field to track variable name → register
index mappings for eval STRING support
- BytecodeCompiler: Add constructor accepting parentRegistry for eval STRING,
populate variableRegistry in compile(), mark parent variables as captured
using capturedVarIndices, use SET_SCALAR for assignments to captured
variables instead of MOVE to preserve aliasing
- EvalStringHandler: Build adjusted registry and captured variables array from
parent scope, pass to eval'd InterpretedCode
- BytecodeInterpreter: Preserve variableRegistry when creating closures
- Disable ADD_ASSIGN optimization for captured variables (use SET_SCALAR path)
Fixes:
- my $x = 1; for (1..10) { eval "\$x++" }; print $x # now prints 11
- my $x = 1; my $y = 2; eval "\$x = \$x + \$y" # now updates $x to 3
- Nested eval STRING with variable capture works correctly
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Perl allows underscores as digit separators in numeric literals (e.g.,
10_000_000). The interpreter was not handling these correctly while the
compiler mode was.
Changes:
- BytecodeCompiler.visit(NumberNode): Strip underscores before parsing,
use ScalarUtils.isInteger() for consistent number validation, handle
large integers (>32-bit) by storing as strings, use LOAD_INT for
regular integers to create mutable scalars (needed for ++/-- operations)
- BytecodeCompiler range operator: Strip underscores when parsing
constant range bounds
Implementation note:
We use LOAD_INT (creates new mutable RuntimeScalar) instead of cached
scalars because MOVE copies references, and variables need to be mutable
for operations like ++, --, etc. Floats use LOAD_CONST since they're less
commonly modified in-place.
Fixes:
- ./jperl --interpreter -e 'my $x = 10_000_000; print $x' # now works
- ./jperl --interpreter -e 'for (1..100_000) { $x++ }' # now works
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents real-world performance characteristics showing interpreter excels at dynamic eval while compiler wins on cached eval. Benchmarks: - Cached eval (static string): Compiler 3.7x faster than interpreter - Dynamic eval (unique strings): Interpreter 12.7x faster than compiler - Dynamic eval vs Perl 5: Interpreter 4x slower, Compiler 50x slower Key findings: - Interpreter avoids compilation overhead for dynamic eval strings - Compilation cost: 50-90ms per unique string (compiler) vs 15-30ms (interpreter) = 3-6x faster - For 1M unique evals: Compiler 75s vs Interpreter 6s vs Perl 5 1.5s - Interpreter design validated: excels exactly where it should Primary use case: Dynamic eval strings for code generation, templating, meta-programming. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The interpreter was throwing "Increment/decrement of non-lexical variable
not yet supported" when trying to increment/decrement global variables.
This is essential for eval STRING with dynamic variable names.
Changes:
- BytecodeCompiler.visit(OperatorNode): For ++ and -- operators, handle
global variables by:
1. Loading the global variable with LOAD_GLOBAL_SCALAR
2. Applying PRE/POST_AUTOINCREMENT/DECREMENT opcode
3. Storing back with STORE_GLOBAL_SCALAR
- Applies to both bare identifiers (x++) and sigiled operators ($x++)
Fixes:
- $vartest++; print $vartest # now prints 1
- eval "\$vartest++"; print $vartest # now prints 1
- for my $x (1..N) { eval " \$var$x++" } # now works
This enables dynamic eval STRING patterns like code generation and
templating that create variables with computed names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After implementing global variable increment/decrement, the interpreter achieves Perl 5 parity for dynamic eval workloads. Updated benchmarks (1M unique eval strings): - Perl 5: 1.62s (baseline) - Interpreter: 1.64s (1% slower) ✓ Parity achieved! - Compiler: 76.12s (4600% slower) Key findings: - Interpreter is 46x faster than compiler for dynamic eval - Interpreter matches Perl 5 performance (1% slowdown vs 4600%) - For 1M unique evals: 1.6s (interpreter) vs 76s (compiler) Conclusion: The interpreter isn't just "good enough" for dynamic eval - it's the RIGHT tool, achieving native Perl performance where compilation overhead would dominate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented support for core array operations in the interpreter: - push: Add elements to end of array - pop: Remove and return last element - shift: Remove and return first element - unshift: Add elements to beginning of array - splice: Remove and replace array elements (via SLOWOP_SPLICE) - unaryMinus: Negation for negative array indices Key improvements: - Fixed ARRAY_PUSH to accept RuntimeBase instead of RuntimeScalar (enables pushing lists via RuntimeList.addToArray()) - Added ARRAY_POP, ARRAY_SHIFT, ARRAY_UNSHIFT cases to BytecodeInterpreter - Replaced hardcoded "main::" with NameNormalizer.normalizeVariableName() throughout BytecodeCompiler for proper package resolution - Added SLOWOP_SPLICE (ID 28) for splice operation Documentation: - Updated SKILL.md with comprehensive guide on adding operators: * Pattern 1: Binary operators (push, unshift) * Pattern 2: Unary operators (pop, shift, unaryMinus) * When and how to use SLOW_OP for complex operations * Common parse structures for arrays, slices, and list operators * Implementation patterns by AST structure * Best practices: NameNormalizer, RuntimeBase vs RuntimeScalar Testing: All implemented operators work correctly: ./jperl --interpreter -E 'my @A = (1,2,3); push @A, 4; pop @A; shift @A; unshift @A, 0' ./jperl --interpreter -E 'my @A = (0,2,3,4,5); splice @A, 2, 1, (10,11)' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented: - Array slices: @array[1..3], @array[1,3,5], @$arrayref[indices] - Array slice support for dereferenced arrays: @$ref[...] - Compound assignment: += operator - Modulus operator: % Changes: - Added SLOWOP_ARRAY_SLICE (ID 29) for array slice operations - Updated case "[" to distinguish between: * Single element access: $array[index] * Array slice: @array[indices] - Enhanced "@" operator handler to support dereferencing: @$arrayref - Added += compound assignment operator in BinaryOperatorNode - Added % modulus operator in BinaryOperatorNode - Implemented MOD_SCALAR case in BytecodeInterpreter Testing: ./jperl --interpreter -E 'my @A = (0,2,10,11); my @s = @A[1..3]; say "@s"' # 2 10 11 ./jperl --interpreter -E 'my $r = \@A; my @s = @$r[1,3]; say "@s"' # 2 11 ./jperl --interpreter -E 'my $x = 0; $x += 5; say $x' # 5 ./jperl --interpreter -E 'say 10 % 3' # 1 TODO: Update disassembler for new opcodes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added disassembler cases for: - SUB_SCALAR (opcode 18) - MUL_SCALAR (opcode 19) - DIV_SCALAR (opcode 20) - MOD_SCALAR (opcode 21) These opcodes were already implemented in BytecodeInterpreter but were missing from the disassembler, causing them to show as UNKNOWN(n). Testing: ./jperl --disassemble --interpreter -E 'say 10 % 3' Now shows: MOD_SCALAR r7 = r5 % r6 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added support for list operators that take code blocks (grep, map, sort):
- grep: filters list elements based on block condition
- map: transforms list elements using block expression
- sort: sorts list elements using comparison block
Implementation:
- Added GREP (100) and SORT (101) opcodes in Opcodes.java
- MAP (92) opcode already existed and was reused
- BytecodeCompiler: Added cases for "grep", "map", "sort" in BinaryOperatorNode
- BytecodeInterpreter: Implemented execution for all three opcodes
- InterpretedCode: Added disassembler cases for GREP and SORT
- All three call runtime ListOperators.{grep,map,sort} methods
Pattern: BinaryOperatorNode with SubroutineNode (block) and ListNode (data)
- Block is compiled to closure via visitAnonymousSubroutine
- Closure is passed to runtime operator along with input list
Updated SKILL.md with detailed implementation guide for Pattern 3.
Test results:
- grep: ./jperl --interpreter -E 'my @evens = grep { \$_ % 2 == 0 } (1,2,3,4); say "@evens"' => "2 4"
- map: ./jperl --interpreter -E 'my @doubled = map { \$_ * 2 } (1,2,3,4); say "@doubled"' => "2 4 6 8"
- sort: ./jperl --interpreter -E 'my @sorted = sort { \$a <=> \$b } (4,2,3,1); say "@sorted"' => "1 2 3 4"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added support for the reverse operator which reverses arrays or strings: - In list context: reverses the order of list elements - In scalar context: reverses the string representation Implementation: - Added SLOWOP_REVERSE (30) in Opcodes.java - BytecodeCompiler: Added case "reverse" in OperatorNode handler - Compiles all arguments into a RuntimeList - Calls SLOW_OP with SLOWOP_REVERSE - SlowOpcodeHandler: Added executeReverse method - Extracts RuntimeList to array - Calls Operator.reverse(ctx, args...) - Runtime handles both list and scalar context Pattern: OperatorNode with ListNode operand - Arguments are compiled and collected into RuntimeList - Passed to runtime Operator.reverse() with context Test result: ./jperl --interpreter -E 'my @Rev = reverse (1,2,3,4); say "@Rev"' => "4 3 2 1" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added support for array slice assignment: @array[indices] = values Implementation: - Added setSlice method to RuntimeArray.java - Takes indices (RuntimeList) and values (RuntimeList) - Iterates in parallel and sets each element - Uses arr.get(index).set(value) idiom - Added SLOWOP_ARRAY_SLICE_SET (31) in Opcodes.java - BytecodeCompiler: Added handler for array slice assignment - Detects BinaryOperatorNode("[") with @ sigil on left - Compiles indices from ArrayLiteralNode - Compiles values from RHS - Emits SLOW_OP with SLOWOP_ARRAY_SLICE_SET - SlowOpcodeHandler: Added executeArraySliceSet method - Extracts array, indices, and values registers - Calls array.setSlice(indices, values) - Fixed error messages: Changed RuntimeException to throwCompilerException - Now includes file, line, and code context in errors Pattern: Assignment where left side is BinaryOperatorNode("[") with @ sigil (array slice) vs $ sigil (single element) Test result: ./jperl --interpreter -E 'my @array = (1..10); @array[1, 3, 5] = (20, 30, 40); say "@array"' => "1 20 3 30 5 40 7 8 9 10" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added support for single element array assignment and multidimensional arrays:
- $array[index] = value (single element assignment)
- $matrix[3][0] = value (multidimensional with autovivification)
Implementation:
- BytecodeCompiler: Added handler for array element assignment
- Detects BinaryOperatorNode("[") with $ sigil (single element)
- For simple case: $array[index] = value
- Gets array register (lexical or global)
- Compiles index and value
- Emits ARRAY_SET
- For multidimensional: $matrix[3][0] = value
- Compiles outer array access recursively
- Uses SLOWOP_DEREF_ARRAY to dereference intermediate result
- Compiles index and value
- Emits ARRAY_SET with autovivification
- Reuses existing ARRAY_SET opcode from BytecodeInterpreter
Pattern: Assignment where left is BinaryOperatorNode("[") with $ sigil
- Single element vs slice distinguished by sigil ($ vs @)
- Multidimensional arrays handled via recursive compilation + dereferencing
Test results:
./jperl --interpreter -E 'my @matrix; \$matrix[3][0] = 7; say \$matrix[3][0]'
=> 7
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added support for the split operator which splits strings into arrays: - split pattern, string, limit Implementation: - Added SLOWOP_SPLIT (32) in Opcodes.java - BytecodeCompiler: Added case "split" in BinaryOperatorNode - Compiles pattern (left operand) - Compiles arguments list (right operand contains string and optional limit) - Emits SLOW_OP with SLOWOP_SPLIT - SlowOpcodeHandler: Added executeSplit method - Extracts pattern, args, and context - Calls Operator.split(pattern, args, ctx) - Runtime handles string-to-regex conversion Pattern: BinaryOperatorNode where: - left = pattern (string or regex) - right = ListNode (string to split and optional limit) Test result: ./jperl --interpreter -E 'my \$str = "a,b,c"; my @Parts = split ",", \$str; say "@Parts"' => "a b c" Note: There appears to be an infinite loop issue in array.t causing test repetition (29000+ tests). This needs investigation separate from split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added disassembler cases for all new SLOW_OP operations to properly decode their operands and advance the program counter correctly. Fixed operations: - SLOWOP_SPLICE: [rd] [arrayReg] [argsReg] - SLOWOP_ARRAY_SLICE: [rd] [arrayReg] [indicesReg] - SLOWOP_REVERSE: [rd] [argsReg] [ctx] - SLOWOP_ARRAY_SLICE_SET: [arrayReg] [indicesReg] [valuesReg] - SLOWOP_SPLIT: [rd] [patternReg] [argsReg] [ctx] Issue: The disassembler was not skipping operands for these new SLOW_OP cases, causing it to read operand bytes as opcodes, leading to "Index out of bounds" errors when trying to decode stringPool entries. Fixed by adding proper cases in the SLOW_OP switch statement in InterpretedCode.disassemble() to read and skip the correct number of operands. Test result: ./jperl --interpreter --disassemble -E 'my \$str = "a,b,c"; my @Parts = split ",", \$str; say "@Parts"' Now works correctly and shows: SLOW_OP split (id=32) r8 = split(r6, r7, ctx=2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixed infinite loop when bare blocks ({ ... }) contain array slices or other
operations. The interpreter was treating all bare blocks as loops, causing them
to execute indefinitely.
Root cause: For3Node has an isSimpleBlock flag to indicate bare blocks that
should execute once (not loop), but BytecodeCompiler.visit(For3Node) was
ignoring this flag and always generating loop bytecode:
- LOAD_INT 1 (condition always true)
- GOTO_IF_FALSE -> end
- body
- GOTO -> start ← infinite loop!
Solution: Check node.isSimpleBlock at the start of visit(For3Node):
- If true: Just execute body once and return (no loop bytecode)
- If false: Generate full loop bytecode as before
Test cases that now work:
./jperl --interpreter -E '{ my @array = (1, 2, 3); my @slice = @array[1..2]; print "done\n"; }'
=> "done" (previously: infinite loop)
./jperl --interpreter src/test/resources/unit/array.t
=> Runs to completion (previously: infinite loop at line 43)
Note: array.t now hits a different error (RuntimeList vs RuntimeArray type mismatch)
which is unrelated to the loop issue and will be fixed separately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add learnings from fixing disassembler and infinite loop issues: - For3Node.isSimpleBlock flag pattern - Disassembler operand skipping requirement - Known issue with array element scalar context in function arguments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements comprehensive array operator support in the interpreter and fixes two critical bugs that were blocking progress.
Array Operators Implemented
Critical Bugs Fixed
1. Infinite Loop in Bare Blocks
Issue: Bare blocks
{ my @a = (1,2,3); }created infinite loopsRoot Cause: BytecodeCompiler ignored For3Node.isSimpleBlock flag, always generating loop bytecode with GOTO back to start
Fix: Check isSimpleBlock flag and execute body once without loop structure
Impact: Bare blocks now execute correctly without hanging
2. Disassembler Index Out of Bounds
Issue:
./jperl --interpreter --disassemblefailed with "Index 89 out of bounds for length 3"Root Cause: SLOW_OP switch statement had default case that didn't skip operands, causing PC misalignment
Fix: Added disassembler cases for all new SLOW_OP operations:
Each case now properly reads and skips all operands.
Impact: Disassembler works correctly for all bytecode
Known Issue (Not Addressed)
Array element access in function arguments returns wrong value:
Workaround: Assign to variable first:
Root Cause: ARRAY_SIZE opcode called after ARRAY_GET in scalar context for function arguments. This is a separate issue that will be addressed in a follow-up PR.
Files Modified
Testing
All array operators tested with:
Most array.t tests now pass. Remaining failures are due to the known scalar context issue in function arguments.
Commits
🤖 Generated with Claude Code