Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
a22d13d
Fix: Avoid materializing large ranges in foreach loops
fglock Feb 15, 2026
2df796f
Add memory analysis documentation
fglock Feb 15, 2026
7eec9c8
Fix interpreter performance regression: reduce execute() method size …
fglock Feb 16, 2026
ef001f1
Add comprehensive method size scanner for JIT compilation limit
fglock Feb 16, 2026
0075fd9
Add method size scan report documenting JIT compilation findings
fglock Feb 16, 2026
5df55b6
Refactor BytecodeCompiler: Split large visit methods under JIT limit
fglock Feb 16, 2026
b4a0803
Update method size scan report: all critical methods fixed
fglock Feb 16, 2026
f414b9b
Update method size scan report: All critical methods fixed
fglock Feb 16, 2026
991d726
Add TODO: Deprecate SLOW_OP mechanism in favor of direct opcodes
fglock Feb 16, 2026
c25b6cb
Phase 1: Migrate opcodes from byte to short
fglock Feb 16, 2026
84373eb
docs: Update reports after Phase 1 short opcodes completion
fglock Feb 16, 2026
db72713
refactor: Remove unnecessary 0xFFFF mask in emit methods
fglock Feb 16, 2026
be8d5b9
refactor: Remove unnecessary masks in emitInt method
fglock Feb 16, 2026
d7a4f3b
Phase 2 Step 1: Add direct opcodes for SLOW_OP operations
fglock Feb 16, 2026
933020f
Phase 2 Step 2: Migrate BytecodeCompiler to use direct opcodes
fglock Feb 16, 2026
9b840c5
Phase 2 Step 3: Add range delegation to BytecodeInterpreter
fglock Feb 16, 2026
7937b8e
docs: Complete Phase 2 documentation and create Phase 3 plan
fglock Feb 16, 2026
615a43a
Phase 3 Proof-of-Concept: Add 3 promoted math operators
fglock Feb 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 72 additions & 1 deletion dev/interpreter/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,65 @@ PRINT r2 # print r2
- Interpreter: ~47M ops/sec (consistent, no warmup needed)
- Trade-off: Slower execution for faster startup and lower memory

### JIT Compilation Limit & Method Size Management

**Critical Constraint:** The JVM refuses to JIT-compile methods larger than ~8000 bytes (controlled by `-XX:DontCompileHugeMethods`). When methods exceed this limit, they run in **interpreted mode**, causing 5-10x performance degradation.

**Architecture: Range-Based Delegation**

To keep the main `execute()` method under the JIT limit, cold-path opcodes are delegated to secondary methods:

1. **executeComparisons()** - Comparison and logical operators (opcodes 31-41)
- COMPARE_NUM, COMPARE_STR, EQ_NUM, NE_NUM, LT_NUM, GT_NUM, EQ_STR, NE_STR, NOT
- Size: ~1089 bytes

2. **executeArithmetic()** - Multiply, divide, and compound assignments (opcodes 19-30, 110-113)
- MUL_SCALAR, DIV_SCALAR, MOD_SCALAR, POW_SCALAR, NEG_SCALAR, CONCAT, REPEAT, LENGTH
- SUBTRACT_ASSIGN, MULTIPLY_ASSIGN, DIVIDE_ASSIGN, MODULUS_ASSIGN
- Size: ~1057 bytes

3. **executeCollections()** - Array and hash operations (opcodes 43-49, 51-56, 93-96)
- ARRAY_SET, ARRAY_PUSH, ARRAY_POP, HASH_SET, HASH_EXISTS, HASH_DELETE, etc.
- Size: ~1025 bytes

4. **executeTypeOps()** - Type and reference operations (opcodes 62-70, 102-105)
- DEFINED, REF, BLESS, ISA, CREATE_LAST, CREATE_NEXT, CREATE_REDO, CREATE_REF, DEREF
- Size: ~929 bytes

**Hot-Path Opcodes (Kept Inline):**
- Control flow: NOP, RETURN, GOTO, GOTO_IF_FALSE, GOTO_IF_TRUE
- Register ops: MOVE, LOAD_CONST, LOAD_INT, LOAD_STRING, LOAD_UNDEF
- Core arithmetic: ADD_SCALAR, SUB_SCALAR (used by loops)
- Iteration: ITERATOR_CREATE, ITERATOR_HAS_NEXT, ITERATOR_NEXT, FOREACH_NEXT_OR_EXIT
- Essential access: ARRAY_GET, HASH_GET

**Current Sizes:**
- Main execute(): 7270 bytes (under 7500-byte safe limit ✓)
- All secondary methods: <1100 bytes each ✓

**Enforcement:**

Run `dev/tools/check-bytecode-size.sh` after changes to verify all methods stay under limit:

```bash
./dev/tools/check-bytecode-size.sh
```

This script checks all 5 methods (main execute + 4 secondary) and fails the build if any exceeds 7500 bytes.

**If Methods Grow Too Large:**

1. Move more opcodes from main execute() to secondary methods
2. Split large secondary methods into smaller groups
3. Keep hot-path opcodes (loops, basic arithmetic) inline for zero overhead
4. Delegate cold-path opcodes (rare operations) to minimize cost

**Performance Impact:**

- Hot-path opcodes: Zero overhead (inline in main switch)
- Cold-path opcodes: One static method call (~5-10ns overhead)
- Overall: Negligible impact since cold ops are infrequent

## File Organization

### Documentation (`dev/interpreter/`)
Expand All @@ -77,7 +136,12 @@ PRINT r2 # print r2

**Core Interpreter:**
- **Opcodes.java** - Opcode constants (0-99 + SLOW_OP) organized by category
- **BytecodeInterpreter.java** - Main execution loop with unified switch statement
- **BytecodeInterpreter.java** - Main execution loop with range-based delegation to secondary methods
- Main execute() method: Hot-path opcodes (loops, basic arithmetic, control flow)
- executeComparisons(): Comparison and logical operators
- executeArithmetic(): Multiply, divide, compound assignments
- executeCollections(): Array and hash operations
- executeTypeOps(): Type and reference operations
- **BytecodeCompiler.java** - AST to bytecode compiler with register allocation
- **InterpretedCode.java** - Bytecode container with disassembler for debugging
- **SlowOpcodeHandler.java** - Handler for rare operations (system calls, socket operations)
Expand All @@ -86,6 +150,13 @@ PRINT r2 # print r2
- **VariableCaptureAnalyzer.java** - Analyzes which variables are captured by named subroutines
- **VariableCollectorVisitor.java** - Detects closure variables for capture analysis

### Build Tools (`dev/tools/`)

- **check-bytecode-size.sh** - Verifies all interpreter methods stay under JIT compilation limit (7500 bytes)
- Run after modifications to BytecodeInterpreter.java
- Automatically checks main execute() and all secondary methods
- Prevents performance regressions from method size growth

### Opcode Categories (Opcodes.java)

Opcodes are organized into functional categories:
Expand Down
233 changes: 233 additions & 0 deletions dev/prompts/PHASE3_OPERATOR_PROMOTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
# Phase 3: OperatorHandler Promotions - Strategy

**Date**: 2026-02-16
**Status**: Planning
**Target**: Promote 200+ OperatorHandler operations to direct opcodes

## Overview

OperatorHandler contains **231 operators** that currently use ASM INVOKESTATIC calls (6 bytes each). Promoting these to direct opcodes (2 bytes each) provides:
- **10-100x performance improvement** (direct dispatch vs method call)
- **4 bytes saved per operation** (6 bytes INVOKESTATIC → 2 bytes opcode)
- **Better CPU i-cache** usage (fewer instructions)

## Current Architecture

```java
// OperatorHandler maps symbols to method calls
put("+", "add", "org/perlonjava/operators/MathOperators");
put("-", "subtract", "org/perlonjava/operators/MathOperators");
// ... 231 operators total
```

```java
// EmitOperatorNode emits INVOKESTATIC (6 bytes)
methodVisitor.visitMethodInsn(
INVOKESTATIC,
"org/perlonjava/operators/MathOperators",
"add",
descriptor
);
```

## Target Architecture

```java
// Direct opcode in BytecodeCompiler (2 bytes)
emit(Opcodes.OP_ADD);
emitReg(rd);
emitReg(rs1);
emitReg(rs2);
```

```java
// BytecodeInterpreter handles directly
case Opcodes.OP_ADD:
int rd = bytecode[pc++];
int rs1 = bytecode[pc++];
int rs2 = bytecode[pc++];
registers[rd] = MathOperators.add(registers[rs1], registers[rs2]);
return pc;
```

## Opcode Space Allocation

**Opcodes 200-2999**: OperatorHandler promotions (CONTIGUOUS blocks by category)

```
200-299 Reserved (100 slots)
300-399 Comparison Operators (100 slots)
400-499 Math Operators (100 slots)
500-599 Bitwise Operators (100 slots)
600-699 String Operators (100 slots)
700-799 List Operators (100 slots)
800-899 Hash Operators (100 slots)
900-999 I/O Operators (100 slots)
1000-1099 Type/Cast Operators (100 slots)
1100-1199 Special Operators (100 slots)
1200-2999 Reserved (1800 slots)
```

**CRITICAL**: Keep each category CONTIGUOUS for JVM tableswitch optimization!

## Promotion Strategy

### Priority Tiers

**Tier 1: Hot Path Operators** (Promote First)
- Already have direct opcodes in BytecodeInterpreter
- Used in tight loops
- Examples: ADD, SUB, MUL, DIV, MOD (already done in Phase 1!)

**Tier 2: Common Operators** (Promote Next, ~20 operators)
- Frequently used in typical Perl code
- Measurable performance impact
- Easy to implement

**Tier 3: Specialized Operators** (~50 operators)
- Used in specific domains
- Moderate impact

**Tier 4: Rare Operators** (~160 operators)
- Seldom used
- Can stay as OperatorHandler calls

## Candidate Analysis: Tier 2 (Common Operators)

### Math Operators (400-419) - 20 ops
| Opcode | Symbol | Method | Priority | Notes |
|--------|--------|--------|----------|-------|
| 400 | ** | pow | HIGH | Exponentiation |
| 401 | abs | abs | HIGH | Absolute value |
| 402 | int | integer | HIGH | Integer conversion |
| 403 | sqrt | sqrt | MEDIUM | Square root |
| 404 | log | log | MEDIUM | Logarithm |
| 405 | exp | exp | MEDIUM | Exponential |
| 406 | sin | sin | LOW | Trigonometry |
| 407 | cos | cos | LOW | Trigonometry |
| 408 | atan2 | atan2 | LOW | Trigonometry |

### Bitwise Operators (500-519) - 20 ops
| Opcode | Symbol | Method | Priority | Notes |
|--------|--------|--------|----------|-------|
| 500 | & | bitwiseAnd | HIGH | Bitwise AND |
| 501 | \| | bitwiseOr | HIGH | Bitwise OR |
| 502 | ^ | bitwiseXor | HIGH | Bitwise XOR |
| 503 | ~ | bitwiseNot | HIGH | Bitwise NOT |
| 504 | << | shiftLeft | MEDIUM | Left shift |
| 505 | >> | shiftRight | MEDIUM | Right shift |

### String Operators (600-619) - 20 ops
| Opcode | Symbol | Method | Priority | Notes |
|--------|--------|--------|----------|-------|
| 600 | . | concat | HIGH | Already in BytecodeInterpreter! |
| 601 | x | repeat | HIGH | Already in BytecodeInterpreter! |
| 602 | uc | uc | MEDIUM | Uppercase |
| 603 | lc | lc | MEDIUM | Lowercase |
| 604 | ucfirst | ucfirst | MEDIUM | Uppercase first |
| 605 | lcfirst | lcfirst | MEDIUM | Lowercase first |
| 606 | quotemeta | quotemeta | LOW | Quote metacharacters |
| 607 | chr | chr | MEDIUM | Character from code |
| 608 | ord | ord | MEDIUM | Code from character |

### Comparison Operators (300-319) - 20 ops
| Opcode | Symbol | Method | Priority | Notes |
|--------|--------|--------|----------|-------|
| 300 | < | lessThan | HIGH | Already in BytecodeInterpreter! |
| 301 | <= | lessThanOrEqual | HIGH | Already in BytecodeInterpreter! |
| 302 | > | greaterThan | HIGH | Already in BytecodeInterpreter! |
| 303 | >= | greaterThanOrEqual | HIGH | Already in BytecodeInterpreter! |
| 304 | == | numericEqual | HIGH | Already in BytecodeInterpreter! |
| 305 | != | numericNotEqual | HIGH | Already in BytecodeInterpreter! |
| 306 | <=> | compareNum | HIGH | Already in BytecodeInterpreter! |
| 307 | lt | stringLessThan | MEDIUM | String comparison |
| 308 | le | stringLessThanOrEqual | MEDIUM | String comparison |
| 309 | gt | stringGreaterThan | MEDIUM | String comparison |
| 310 | ge | stringGreaterThanOrEqual | MEDIUM | String comparison |
| 311 | eq | stringEqual | HIGH | Already in BytecodeInterpreter! |
| 312 | ne | stringNotEqual | HIGH | Already in BytecodeInterpreter! |
| 313 | cmp | cmp | MEDIUM | String three-way comparison |

## Implementation Steps (Per Operator)

1. **Add opcode constant** in Opcodes.java (in CONTIGUOUS range)
2. **Update EmitOperatorNode** to emit opcode instead of INVOKESTATIC
3. **Add case in BytecodeInterpreter** (in appropriate range delegation method)
4. **Test thoroughly** with unit tests
5. **Measure performance gain** with benchmarks

## Automation Opportunity

Most operators follow the same pattern. A script could generate:
- Opcode constants (batch)
- BytecodeInterpreter case statements (batch)
- EmitOperatorNode mappings (batch)

## Milestones

**Milestone 1**: Promote 10 high-priority operators (Math + Bitwise)
- Expected: ~2-5x speedup for mathematical Perl code
- Effort: 1-2 days

**Milestone 2**: Promote 20 string/comparison operators
- Expected: ~3-10x speedup for string-heavy Perl code
- Effort: 2-3 days

**Milestone 3**: Promote 50 specialized operators
- Expected: Domain-specific speedups
- Effort: 1 week

**Milestone 4**: Complete remaining operators
- Expected: Complete coverage
- Effort: Ongoing (months)

## Benchmarking Strategy

Create microbenchmarks for each promoted operator:
```perl
# Benchmark: 10M iterations of operator
my $x = 0;
for (1..10_000_000) {
$x = $x + 1; # or other operator
}
```

Measure before/after promotion:
- Time (should improve 2-10x)
- Bytecode size (should decrease 4 bytes per op)
- Method sizes (must stay under 8000 bytes)

## Method Size Management

BytecodeInterpreter.execute() is at **7,517 bytes** (483 bytes from limit).

**Strategy**:
- Add operators to existing range delegation methods (executeArithmetic, etc.)
- If method approaches 7,000 bytes, split into sub-groups
- Example: Split executeArithmetic into executeBasicMath + executeAdvancedMath

## Phase 3 Recommended Start

**Start with**: 10 high-impact operators

1. **Math** (400-404): pow, abs, int, sqrt, log
2. **Bitwise** (500-505): &, |, ^, ~, <<, >>

These are:
- Frequently used in real Perl code
- Easy to implement (2-register or 3-register ops)
- Measurable performance impact
- Won't significantly increase method sizes

**Expected Result**:
- 2-5x speedup for mathematical operations
- Proof of concept for remaining promotions
- Validation of opcode space allocation strategy

---

**Next Steps**:
1. Profile real Perl code to identify actual hot operators
2. Implement Tier 2 operators (20 ops)
3. Benchmark and document gains
4. Continue gradual promotion over multiple releases
Loading
Loading