request JSON-LD from Link rel=alternate#129
request JSON-LD from Link rel=alternate#129alpha-beta-soup wants to merge 8 commits intodigitalbazaar:masterfrom
Conversation
|
This PR does not fixes the following test: because https://schema.org gives (Pdb) p headers["Accept"]
'application/ld+json;profile=http://www.w3.org/ns/json-ld#context, application/ld+json, application/json;q=0.5, text/html;q=0.8, application/xhtml+xml;q=0.8'it's here due to 6573 # FIXME: only if html5lib loaded?
6574 headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8'According to json-ld:
So even if we accept HTML, if it's HTML and there's a json-ld alternative, let's use it. To do this, I'll move In other words (and with a correct indentation) I mean: --- a/lib/pyld/documentloader/requests.py
+++ b/lib/pyld/documentloader/requests.py
@@ -69,7 +69,6 @@ def requests_document_loader(secure=False, **kwargs):
'contentType': content_type,
'contextUrl': None,
'documentUrl': response.url,
- 'document': response.json() if content_type in headers['Accept'] else None
}
link_header = response.headers.get('link')
if link_header:
@@ -77,15 +76,15 @@ def requests_document_loader(secure=False, **kwargs):
LINK_HEADER_REL)
# only 1 related link header permitted
if linked_context and content_type != 'application/ld+json':
- if isinstance(linked_context, list):
- raise JsonLdError(
- 'URL could not be dereferenced, '
- 'it has more than one '
- 'associated HTTP Link Header.',
- 'jsonld.LoadDocumentError',
- {'url': url},
- code='multiple context link headers')
- doc['contextUrl'] = linked_context['target']
+ if isinstance(linked_context, list):
+ raise JsonLdError(
+ 'URL could not be dereferenced, '
+ 'it has more than one '
+ 'associated HTTP Link Header.',
+ 'jsonld.LoadDocumentError',
+ {'url': url},
+ code='multiple context link headers')
+ doc['contextUrl'] = linked_context['target']
linked_alternate = parse_link_header(link_header).get('alternate')
# if not JSON-LD, alternate may point there
if (linked_alternate and
@@ -93,9 +92,8 @@ def requests_document_loader(secure=False, **kwargs):
not re.match(r'^application\/(\w*\+)?json$', content_type)):
doc['contentType'] = 'application/ld+json'
doc['documentUrl'] = prepend_base(url, linked_alternate['target'])
- if content_type not in headers['Accept']:
- # Original was not JSON/JSON-LD; fetch linked JSON-LD
- return loader(doc['documentUrl'], options=options)
+ return loader(doc['documentUrl'], options=options)
+ doc['document'] = response.json()
return doc
except JsonLdError as e:
raise e |
|
Hmm, that's a bit odd. Yes, Pyld adds if content_type not in headers['Accept']:
# Original was not JSON/JSON-LD; fetch linked JSON-LD
return loader(doc['documentUrl'], options=options)is important, it checks whether the server responded with |
Totally agree. But looks like https://schema.org/ does not have a Which looks legit to me, even though I still didn't read and understood rfc8288 entierly.
IIRC It looks already covered by the:
check a few line before. |
|
Ah I see. Sorry it's been a while since I've looked at this. I think your version of the PR makes sense, and as long as both our tests pass, it gets my vote. |
I'd like to add both tests to the test suite, but I don't understand how to do so. If someone can explain before merging I'd gladly do so. |
59c2185 to
afd80f9
Compare
|
@anatoly-scherbakov first a github noob question, but how ddid you link iss-128 to this PR? |
|
@mielvds thanks for review!
|
|
Thanks for resurrecting this, let me know if you need anything from me. |
|
@alpha-beta-soup thank you for your contribution! I'd welcome it if you would look through the additional changes I've done to see if they are what you had intended. |
alpha-beta-soup
left a comment
There was a problem hiding this comment.
This is exactly my intention, has my approval subject to tests passsing without regressions
Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>
Test that both requests and aiohttp document loaders can expand a JSON-LD document whose context is https://schema.org, which serves text/html with a Link rel="alternate" header pointing to the actual JSON-LD context. Ref digitalbazaar#128 Co-authored-by: Cursor <cursoragent@cursor.com>
Defer response.json() until after Link header processing so that non-JSON responses (e.g. text/html from schema.org) don't crash the loader. When a Link rel="alternate" of type application/ld+json is found and the response isn't already JSON, follow it unconditionally. Also fixes 2-space indentation to 4-space. Fixes digitalbazaar#128 Co-authored-by: Cursor <cursoragent@cursor.com>
Same fix as the requests loader: defer response.json() until after Link header processing, and actually fetch the alternate URL when found (previously it only updated doc['documentUrl'] without making a second request). Also fixes 2-space indentation to 4-space. Fixes digitalbazaar#128 Co-authored-by: Cursor <cursoragent@cursor.com>
mielvds
left a comment
There was a problem hiding this comment.
Looks good! The tests that were failing before now pass, so the schema.org context URL seems to work now. I've reverted them (and figured out how to push to this branch ;)).
I don't like the test_schema_org.py file though (probably an AI artefact). Those test cases should be in something close to
- a
test_aiohttp.pyandtest_requests.py, - or simpler: test_loader.py with classes
TestRequestsandTestAiohttp
The tests themselves can use schema.org, but should be named after the functionality they are testing, so something like test_remote_context_without_jsonld_mimetype
|
@anatoly-scherbakov I just merged #170 introducing |
Ref #128
With the following test cases defined:
context.jsonldmanifest.jsonldsample.jsonldsample2.jsonldoutput.jsonldThe tests both fail before the changes. The tests both passs after the changes. Since all the test cases of this repository are remote, I am not sure whether or where to contribute these test cases.
However, there is one regression, which I do not currently know how to resolve. This test does not have an error before the changes, and does have an error after the changes. The report is:
All other tests are unaffected. However, as mentioned in #128, running the tests as described with no changes elicits 5 failures, 2 errors, and 34 skipped tests.