fix: don't encode ':' or '/' as part of the canonical representation#161
fix: don't encode ':' or '/' as part of the canonical representation#161dwalluck wants to merge 5 commits intopackage-url:masterfrom
Conversation
|
Note that this matches the current test suite data for But https://github.com/package-url/purl-spec/blob/8040ff0be50f0c5b1986b1a0947bd539f5405fc4/test-suite-data.json#L88-L89 |
|
The line 112-113 where you say the |
|
Let me run the full test suite again without the |
|
OK, the first line has is for |
|
Checking the latest test suite file, if you don't encode the colon, if fails. The file seems to be inconsistent. It's encoded in one spot and not in the other. |
|
I was talking about the one you mentioned above - there doesn't seem to be any encoding. "purl": "pkg:Maven/org.apache.xmlgraphics/batik-anim@1.9.1?classifier=sources&repositorY_url=repo.spring.io/release",
"canonical_purl": "pkg:maven/org.apache.xmlgraphics/batik-anim@1.9.1?classifier=sources&repository_url=repo.spring.io/release", |
|
There is no encoding in the input. However, the output without the patch is |
|
And I found the inconsistency in the test file: the colon is encoded when it's in the version (sha256%3A). It's not encoded when it's in a key value (repository_url=https:). This is not right, is it? I am trying to verify if the problem is the code or the test file. |
Sorry, I flipped the examples! L112-113 is for |
There was a problem hiding this comment.
There should be rather a different set for each PURL component:
namespacedoes not need to encode neither:nor/.namedoes not need to encode:, but it needs to encode/, which is used to separate it fromnamespace.qualifiers, probably does not need to encode:,/or?.subpathcan probably add&and=to the mix.
There was a problem hiding this comment.
I will add a param to the percentEncode to take a String of characters to allow.
There was a problem hiding this comment.
Really, there are a lot more characters potentially allowed by the RFC than we list here.
I am of the opinion right now to add just enough in order to pass the test suite.
The '?' is not allowed unencoded per the purl spec itself.
|
So, with 8ab171f, this now fixes #122, but not #92. The test suite has Should we just change the test suite back to how to was and assume that there is a mistake in the tests and that the current |
|
The exact definition of Maybe this PR is precocious and we should wait for the spec to be finalized. |
|
The author of the Rust version has proposed the set of characters to use https://github.com/phylum-dev/purl/blob/151168733f75a9802556e4b07eb577b9d99f7cea/purl/src/format.rs#L9-L27 I could take them from here. |
That code is based on the WHATWG URL standard, while, unless I am mistaken, PURL will be based on RFC 3986. The living standard seems more liberal, so I would base the code on the more conservative RFC 3986. In particular:
|
|
I will revisit this later. It may need to be combined with #174. |
e42515b to
447ba2a
Compare
|
Now that Spotless is there, there will be some conflicts. This should help: |
OK, it does apply on build, but if you have to fix the conflicts first, then it doesn't really help. |
Now you just need to run |
7803217 to
b6482d2
Compare
You can:
|
This makes the Java canonical representation match the majority of other implementations. Fixes package-url#122 Fixes package-url#92
|
This is blocked by package-url/purl-spec#438 |
|
While PR package-url/purl-spec#461 haven't been merged yet, it is only receiving changes in the wording, not the rules. I think it is pretty safe to assume that the following
So I think the only change we need to apply is to add |
|
@ppkarwasz I agree - we need to add |
This makes the Java canonical representation match the majority of other implementations.
Fixes #122
Fixes #92