Skip to content

Add GitHub issue tests#67

Draft
gaurav wants to merge 52 commits intomainfrom
add-github-issue-tests
Draft

Add GitHub issue tests#67
gaurav wants to merge 52 commits intomainfrom
add-github-issue-tests

Conversation

@gaurav
Copy link
Collaborator

@gaurav gaurav commented Jan 8, 2026

WIP

Next steps:

  • Ask Claude to make the tests run faster.
  • Think about how to keep the README file up to date with the assertions.
  • Add HasPreferredName assertion.

gaurav and others added 10 commits February 15, 2026 02:39
Each assertion type (Resolves, DoesNotResolve, ResolvesWith, ResolvesWithType,
SearchByName, Needed) is now a self-contained class in
src/babel_validation/assertions/, grouped by which service it targets
(nodenorm.py, nameres.py, common.py). A central ASSERTION_HANDLERS registry
in __init__.py maps lowercase assertion names to handler instances, and
NodeNormAssertion / NameResAssertion marker base classes allow isinstance()
checks for applicability. GitHubIssueTest.test_with_nodenorm() and
test_with_nameres() are now 3-line dispatchers. A README.md documents all
supported assertion types with examples in both wiki and YAML syntax.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gaurav and others added 2 commits February 19, 2026 19:03
…pendently.

Bumps pytest to >=9.0 (which includes built-in subtests support). Each
TestResult is now evaluated in its own subtest block, so a failure no longer
short-circuits the rest. Adds post-loop state-consistency subtests: a closed
issue with failing tests fails with a "consider reopening" message, and an open
issue where all tests pass emits an xfail "consider closing" hint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously, get_github_issue_id() and GitHubIssueTest.__str__() both resolved
the org/repo name via github_issue.repository.organization.name, which triggers
lazy PyGitHub API calls. Parse org/repo from html_url instead (always present
in the issue JSON, no extra round-trip needed). Also resolves the TODO comment
in get_test_issues_from_issue().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gaurav and others added 13 commits February 19, 2026 19:10
Promote NodeNormAssertion and NameResAssertion from empty marker classes
to full framework base classes (renamed NodeNormTest / NameResTest).
All boilerplate (empty-param guards, CURIE pre-warming, yielded-values
tracking) now lives in the base class test_with_nodenorm / test_with_nameres
methods, which dispatch to a new test_param_set() hook.

Handler authors now only override test_param_set() and call self.passed()
/ self.failed() — no TestResult/TestStatus construction, no loop
scaffolding, no return-inside-generator bugs.

Also fixes a latent bug where `return [TestResult(...)]` inside a generator
body silently discarded the result (the list went to StopIteration.value
instead of being yielded to the caller).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add PARAMETERS, WIKI_EXAMPLES, and YAML_PARAMS doc attributes to every
assertion handler class, making the class body the single source of truth
for documentation. Add gen_docs.py to render README.md from those attributes,
and a freshness test that fails if README.md is out of date with the handlers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move parametrization from module-level @pytest.mark.parametrize to a
pytest_generate_tests hook in tests/github_issues/conftest.py. This
enables two fast paths: --issue fetches only named issues directly
(near-instant), and the full run pre-filters issues with BabelTest
syntax at collection time rather than creating test items for all issues.

Also fix get_test_issues_from_issue() to avoid a separate GitHub API
call per issue by using repository.full_name instead of
repository.organization.name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add pytest-xdist[psutil] and filelock dependencies. Refactor
github_issues conftest to parametrize with picklable string IDs
instead of PyGithub Issue objects, add a filelock-guarded file cache
for the full issue list (1-hour TTL), and hydrate Issue objects in
a per-test fixture so xdist workers avoid redundant GitHub API scans.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a filelock-guarded file cache (TTL 1 hour) in GoogleSheetTestCases.__init__
so that when pytest-xdist workers reimport the module they read from the local
temp file instead of each making a redundant HTTP request to Google Sheets.

This enables parallel execution of gsheet tests with -n auto.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IDENTIFIERS_WITH_DESCRIPTIONS and IDENTIFIERS_WITHOUT_DESCRIPTIONS were
Python sets, which have unpredictable iteration order across processes.
Workers collected tests in different orders, triggering xdist's
"Different tests were collected" fatal error.

Changed both to lists so parametrize order is stable across all workers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the 1-hour TTL with per-run invalidation:
- pytest_configure hook in conftest.py deletes the gsheet CSV cache at the
  start of each run (controller only, not xdist workers).
- GoogleSheetTestCases.__init__ now just checks whether the cache file exists;
  no more time-based TTL.

Result: the controller downloads once, workers share that file — but the next
pytest invocation always starts fresh, so edits to the Google Sheet are
immediately reflected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply pytest.mark.xfail(strict=False) dynamically to each test when the
GitHub issue is still open:
- Open issue + any subtest fails  → XFAIL  (x) — known outstanding issue
- Open issue + all subtests pass  → XPASS  (X) — issue ready to be closed
- Closed issue + any subtest fails → FAIL  (F) — regression, needs attention
- Closed issue + all subtests pass → PASS  (.) — as expected

Removes the ad-hoc issue-state consistency subtests at the bottom, which
are now fully covered by the xfail mechanism. Also drops the unused
`from github import Issue` import.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Same pattern as the Google Sheet cache: the pytest_configure hook in
tests/conftest.py now also deletes babel_validation_issues_cache.json on
startup (controller only, not xdist workers). The TTL check in
_get_all_test_issue_ids() is replaced with a plain existence check, so
workers share the file the controller wrote without re-fetching.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents and verifies that unknown assertion names are silently parsed
but raise ValueError at execution time for both wiki and YAML syntaxes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gaurav and others added 11 commits February 28, 2026 00:20
When an issue is open, individual failing subtests now call pytest.xfail()
imperatively so they show as x (XFAIL) rather than F (FAIL) in the output.
A parent-level pytest.xfail() is also called after all subtests complete
(when any xfailed) to ensure the parent shows as XFAIL rather than XPASS.
This preserves the XPASS signal for the case where all subtests pass on an
open issue, indicating the issue may be ready to close.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace full-repo pagination with GitHub Search API in
get_issues_with_tests(), reducing collection from O(all issues) fetches
to O(matching issues). Add a module-level _fetched_issues_cache to avoid
re-fetching issues already retrieved during collection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a GitHub issue embeds a BabelTest with an unrecognised assertion
type, the test previously showed as XFAIL (misleading — implies the
service is at fault, not the test itself). Now a pre-flight check runs
before the xfail marker is applied: if any assertion type is unknown,
pytest.fail() is called immediately, producing a hard FAIL with a
clear message listing the bad type(s) and valid alternatives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds DoesNotResolveWithHandler, the inverse of ResolvesWith: asserts
that a set of CURIEs do NOT all normalize to the same canonical
identifier in NodeNorm.

Also extracts _compare_resolutions() helper shared by both handlers,
and fixes both ResolvesWith and DoesNotResolveWith to compare by
result['id']['identifier'] rather than full JSON dumps — which is
more robust and semantically correct (equality of the canonical
identifier, not byte-for-byte JSON equality).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new `HasLabel` assertion that checks a CURIE resolves to a
specific primary label in NodeNorm (id.label), enabling label regression
tests. Syntax: {{BabelTest|HasLabel|CHEBI:15365|aspirin}}.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant