Skip to content

Conversation

@kkollsga
Copy link
Contributor

@kkollsga kkollsga commented Feb 2, 2026

Summary

This PR integrates mypy's stubtest tool into xarray's CI pipeline to validate that type annotations match runtime behavior.

What's included

CI Integration:

  • New stubtest job in .github/workflows/ci-additional.yaml
  • Runs on every PR with continue-on-error: true (Phase 1: informational)
  • Tests core modules: dataarray, dataset, variable

Infrastructure:

  • _stubtest/allowlist.txt - Comprehensive allowlist for TYPE_CHECKING imports and intentional stub/runtime differences
  • _stubtest/run_stubtest.py - Runner script with reporting
  • ci_tests/ - Pytest-based tests for stubtest compliance and type regressions

Code cleanup:

  • Removed 14 obsolete # type: ignore comments revealed by scipy-stubs and pandas-stubs
  • Added 6 targeted type ignores for legitimate typing limitations
  • Net improvement: -8 type ignores

Phased rollout

The stubtest job is currently non-blocking (continue-on-error: true). Once maintainers are confident it's stable, it can be made required by changing to continue-on-error: false.

Verification

Stubtest errors reduced from 1,084 → 0 with the allowlist.

@github-actions github-actions bot added topic-backends Automation Github bots, testing workflows, release automation io topic-NamedArray Lightweight version of Variable labels Feb 2, 2026
@VeckoTheGecko
Copy link
Contributor

Hi @kkollsga , I'm curious about this issue so am happy to dig into this PR a bit - but before I do that I would like to know to what extent AI has been used on this PR.

It seems like a heavy PR and from a glance (on mobile) I dont see the connections to the original work done in Scipy. An AI disclosure statement would be really helpful for me to know how its been used 😊

(Note I am not a maintainer, and Xarray - AFAICT - doesn't have LLM disclosure as a contributing policy. This is a request from myself)

@kkollsga
Copy link
Contributor Author

kkollsga commented Feb 3, 2026

This PR was developed with assistance from Claude Code. AI usage is disclosed via the Co-authored-by: Claude <noreply@anthropic.com> trailer in the main commit, following xarray's CLAUDE.md guidelines.

The original issue references stubtest tooling from NumPy's fellowship work, with a cross-reference to #10273 (scipy-stubs compliance). The allowlist was built specifically for xarray's patterns by iteratively running stubtest on core modules. This PR also cleans up some obsolete type ignores that scipy-stubs made redundant, partially addressing #10273.

The categorized allowlist (TYPE_CHECKING imports, dynamic methods, protocol differences) is designed to be maintainable as xarray evolves.

@kkollsga
Copy link
Contributor Author

kkollsga commented Feb 3, 2026

Earlier iterations ran into issues with pixi lock validation when modifying pyproject.toml, and the initial allowlist (~600 lines) had overly broad patterns causing conflicts.

The fix was to keep this PR as pure infrastructure with no source changes, and rebuild the allowlist from scratch using efficient grouped regex patterns. This brought it down to 170 lines with 0 stubtest errors and 0 unused patterns.

Future work: Expand coverage to additional modules (backends, computation, indexes), then transition from continue-on-error: true to a required check once stable.

Add stubtest to CI for validating type stubs against runtime behavior.
This helps catch type annotation regressions early.

- Add stubtest job to ci-additional.yaml (non-blocking with continue-on-error)
- Create allowlist for known acceptable differences (TYPE_CHECKING imports, etc.)
- Tests core modules: dataarray, dataset, variable

Refs pydata#11086

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Automation Github bots, testing workflows, release automation io topic-backends topic-NamedArray Lightweight version of Variable

Projects

None yet

Development

Successfully merging this pull request may close these issues.

more sophisticated type checking with numpy

2 participants