request JSON-LD from Link rel=alternate by alpha-beta-soup · Pull Request #129 · digitalbazaar/pyld

alpha-beta-soup · 2020-06-24T02:57:40Z

With the following test cases defined:

context.jsonld

{
  "@context": {
    "@vocab":   "https://w3c.github.io/json-ld-api/tests/vocab#",
    "dcterms":       "http://purl.org/dc/terms/",
    "jld":      "https://w3c.github.io/json-ld-api/tests/vocab#",
    "mf":       "http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#",
    "rdfs":     "http://www.w3.org/2000/01/rdf-schema#",
    "xsd":      "http://www.w3.org/2001/XMLSchema#",

    "context":         { "@type": "@id" },
    "expect":          { "@id": "mf:result", "@type": "@id" },
    "expectErrorCode": { "@id": "mf:result" },
    "frame":           { "@type": "@id" },
    "input":           { "@id": "mf:action", "@type": "@id" },
    "option":          { "@type": "@id"},
    "sequence":        { "@id": "mf:entries", "@type": "@id", "@container": "@list" },
    "redirectTo":      { "@type": "@id"},

    "name":                 "mf:name",
    "purpose":              "rdfs:comment",
    "description":          "rdfs:comment",
    "base":                 { "@type": "@id" },
    "compactArrays":        { "@type": "xsd:boolean" },
    "compactToRelative":    { "@type": "xsd:boolean" },
    "contentType":          { "@type": "xsd:string" },
    "expandContext":        { "@type": "@id" },
    "extractAllScripts":    { "@type": "xsd:boolean" },
    "httpLink":             { "@type": "xsd:string", "@container": "@set" },
    "httpStatus":           { "@type": "xsd:integer" },
    "normative":            { "@type": "xsd:boolean" },
    "processingMode":       { "@type": "xsd:string" },
    "processorFeature":     { "@type": "xsd:string" },
    "produceGeneralizedRdf":{ "@type": "xsd:boolean" },
    "specVersion":          { "@type": "xsd:string" },
    "useNativeTypes":       { "@type": "xsd:boolean" }
  }
}

manifest.jsonld

{
  "@context": ["context.jsonld", {"@base": "manifest"}],
  "@id": "",
  "@type": "mf:Manifest",
  "name": "JSON-LD Test Suite",
  "description": "This manifest loads some tests for resolving https://github.com/digitalbazaar/pyld/issues/128",
  "sequence": [{
  	"@id": "#t1",
  	"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
  	"name": "Test for JSON-LD via Link header",
  	"purpose": "Tests for correct retrieval of remote JSON-LD when it is present as a Link HTTP header",
  	"input": "/full/path/to/pyld/tests/sample.jsonld",
  	"expect": "/full/path/to/pyld/tests/output.jsonld"
  }, {
    "@id": "#t2",
    "@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
    "name": "Test for JSON-LD via direct JSON-LD URL",
    "purpose": "Tests for correct retrieval of remote JSON-LD when it is given as a direct link within a context",
    "input": "/full/path/to/pyld/tests/sample2.jsonld",
    "expect": "/full/path/to/pyld/tests/output.jsonld"
  }]
}

sample.jsonld

{
	"@context": "https://schema.org",
	"@type":"Dataset",
	"@id":"http://localhost:5000/collections/obs",
	"url":"http://localhost:5000/collections/obs"
}

sample2.jsonld

{
	"@context": "https://schema.org/docs/jsonldcontext.jsonld",
	"@type":"Dataset",
	"@id":"http://localhost:5000/collections/obs",
	"url":"http://localhost:5000/collections/obs"
}

output.jsonld

[
  {
    "@id": "http://localhost:5000/collections/obs",
    "@type": [
      "http://schema.org/Dataset"
    ],
    "http://schema.org/url": [
      {
        "@id": "http://localhost:5000/collections/obs"
      }
    ]
  }
]

The tests both fail before the changes. The tests both passs after the changes. Since all the test cases of this repository are remote, I am not sure whether or where to contribute these test cases.

However, there is one regression, which I do not currently know how to resolve. This test does not have an error before the changes, and does have an error after the changes. The report is:

======================================================================
ERROR: Remote document: https://w3c.github.io/json-ld-api/tests/remote-doc-manifest#t0002: Document loader loads a JSON document.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/runtests.py", line 372, in runTest
    raise e
  File "tests/runtests.py", line 319, in runTest
    result = getattr(jsonld, fn)(*params)
  File "tests/../lib/pyld/jsonld.py", line 163, in expand
    return JsonLdProcessor().expand(input_, options)
  File "tests/../lib/pyld/jsonld.py", line 820, in expand
    remote_doc = load_document(input_, options)
  File "tests/../lib/pyld/jsonld.py", line 6591, in load_document
    code='loading document failed')
pyld.jsonld.JsonLdError: ('No remote document found at the given URL.',)
Type: jsonld.NullRemoteDocument
Code: loading document failed

All other tests are unaffected. However, as mentioned in #128, running the tests as described with no changes elicits 5 failures, 2 errors, and 34 skipped tests.

JulienPalard · 2020-10-09T09:36:39Z

This PR does not fixes the following test:

from pyld import jsonld

jsonld.expand(
    {
        "@context": "http://schema.org/",
        "@type": "Person",
        "name": "Jane Doe",
        "jobTitle": "Professor",
        "telephone": "(425) 123-4567",
        "url": "http://www.janedoe.com",
    }
)

because https://schema.org gives Content-Type: text/html, which is in headers["Accept"]:

(Pdb) p headers["Accept"]
'application/ld+json;profile=http://www.w3.org/ns/json-ld#context, application/ld+json, application/json;q=0.5, text/html;q=0.8, application/xhtml+xml;q=0.8'

it's here due to pyld/jsonld.py adding it:

6573 	    # FIXME: only if html5lib loaded?
6574 	    headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8'

According to json-ld:

A processor seeing a non-JSON result will note the presence of the link header and load that document instead.

So even if we accept HTML, if it's HTML and there's a json-ld alternative, let's use it.

To do this, I'll move doc['document'] = response.json() right before returning, and just drop the if content_type not in headers['Accept']: you added, it works for me.

In other words (and with a correct indentation) I mean:

--- a/lib/pyld/documentloader/requests.py
+++ b/lib/pyld/documentloader/requests.py
@@ -69,7 +69,6 @@ def requests_document_loader(secure=False, **kwargs):
                 'contentType': content_type,
                 'contextUrl': None,
                 'documentUrl': response.url,
-                'document': response.json() if content_type in headers['Accept'] else None
             }
             link_header = response.headers.get('link')
             if link_header:
@@ -77,15 +76,15 @@ def requests_document_loader(secure=False, **kwargs):
                     LINK_HEADER_REL)
                 # only 1 related link header permitted
                 if linked_context and content_type != 'application/ld+json':
-                  if isinstance(linked_context, list):
-                      raise JsonLdError(
-                          'URL could not be dereferenced, '
-                          'it has more than one '
-                          'associated HTTP Link Header.',
-                          'jsonld.LoadDocumentError',
-                          {'url': url},
-                          code='multiple context link headers')
-                  doc['contextUrl'] = linked_context['target']
+                    if isinstance(linked_context, list):
+                        raise JsonLdError(
+                            'URL could not be dereferenced, '
+                            'it has more than one '
+                            'associated HTTP Link Header.',
+                            'jsonld.LoadDocumentError',
+                            {'url': url},
+                            code='multiple context link headers')
+                    doc['contextUrl'] = linked_context['target']
                 linked_alternate = parse_link_header(link_header).get('alternate')
                 # if not JSON-LD, alternate may point there
                 if (linked_alternate and
@@ -93,9 +92,8 @@ def requests_document_loader(secure=False, **kwargs):
                         not re.match(r'^application\/(\w*\+)?json$', content_type)):
                     doc['contentType'] = 'application/ld+json'
                     doc['documentUrl'] = prepend_base(url, linked_alternate['target'])
-                    if content_type not in headers['Accept']:
-                        # Original was not JSON/JSON-LD; fetch linked JSON-LD
-                        return loader(doc['documentUrl'], options=options)
+                    return loader(doc['documentUrl'], options=options)
+            doc['document'] = response.json()
             return doc
         except JsonLdError as e:
             raise e

alpha-beta-soup · 2020-10-11T02:16:15Z

Hmm, that's a bit odd. Yes, Pyld adds text/html (in the line you pointed out, i.e. headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8') but that's a default header value and it shouldn't affect the server response as long as application/json+ld is included in the Accept header value with a higher precedence. That check

if content_type not in headers['Accept']:
    # Original was not JSON/JSON-LD; fetch linked JSON-LD
    return loader(doc['documentUrl'], options=options)

is important, it checks whether the server responded with application/json+ld, and if not, attempts to fetch the linked resource. There's no value doing this if the response is already JSON-LD.

JulienPalard · 2020-10-11T07:58:23Z

it shouldn't affect the server response as long as application/json+ld is included in the Accept header value with a higher precedence.

Totally agree. But looks like https://schema.org/ does not have a application/json-ld variant at all, so it always reply with text/html, independently of the Accept header. But this text/html response links to the ld+json, see:

$ curl -I -H "Accept: application/ld+json" https://schema.org
HTTP/2 200 
[...]
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
[...]
content-type: text/html
[...]

Which looks legit to me, even though I still didn't read and understood rfc8288 entierly.

That check
if content_type not in headers['Accept']:
    # Original was not JSON/JSON-LD; fetch linked JSON-LD
    return loader(doc['documentUrl'], options=options)
is important, it checks whether the server responded with application/json+ld, and if not, attempts to fetch the linked resource. There's no value doing this if the response is already JSON-LD.

IIRC It looks already covered by the:

not re.match(r'^application/(\w*+)?json$', content_type)):

check a few line before.

alpha-beta-soup · 2020-10-11T08:27:40Z

Ah I see. Sorry it's been a while since I've looked at this. I think your version of the PR makes sense, and as long as both our tests pass, it gets my vote.

JulienPalard · 2020-10-11T10:11:39Z

and as long as both our tests pass, it gets my vote.

I'd like to add both tests to the test suite, but I don't understand how to do so. If someone can explain before merging I'd gladly do so.

lib/pyld/documentloader/requests.py

mielvds · 2026-02-05T11:18:12Z

@anatoly-scherbakov first a github noob question, but how ddid you link iss-128 to this PR?
Second: I re-enabled the https://schema.org/ context in a test in the iss-128 branch because of this fix. However, it seems like I'm stil getting the pyld.jsonld.JsonLdError: ('Could not retrieve a JSON-LD document from the URL.',)

anatoly-scherbakov · 2026-02-06T16:25:59Z

@mielvds thanks for review!

The PR was already present, I just was able to push to it. I think I have that ability since I am a contributor to digitalbazaar/pyld. But the exact mechanics is due to an LLM agent helping me, it's going to be difficult to resurrect the necessary commands 😅
I am adding tests to verify this. It seems that .json() was called too early.

alpha-beta-soup · 2026-02-08T20:56:24Z

Thanks for resurrecting this, let me know if you need anything from me.

anatoly-scherbakov · 2026-02-09T09:35:11Z

@alpha-beta-soup thank you for your contribution! I'd welcome it if you would look through the additional changes I've done to see if they are what you had intended.

alpha-beta-soup

This is exactly my intention, has my approval subject to tests passsing without regressions

Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>

Test that both requests and aiohttp document loaders can expand a JSON-LD document whose context is https://schema.org, which serves text/html with a Link rel="alternate" header pointing to the actual JSON-LD context. Ref digitalbazaar#128 Co-authored-by: Cursor <cursoragent@cursor.com>

Defer response.json() until after Link header processing so that non-JSON responses (e.g. text/html from schema.org) don't crash the loader. When a Link rel="alternate" of type application/ld+json is found and the response isn't already JSON, follow it unconditionally. Also fixes 2-space indentation to 4-space. Fixes digitalbazaar#128 Co-authored-by: Cursor <cursoragent@cursor.com>

Same fix as the requests loader: defer response.json() until after Link header processing, and actually fetch the alternate URL when found (previously it only updated doc['documentUrl'] without making a second request). Also fixes 2-space indentation to 4-space. Fixes digitalbazaar#128 Co-authored-by: Cursor <cursoragent@cursor.com>

mielvds

Looks good! The tests that were failing before now pass, so the schema.org context URL seems to work now. I've reverted them (and figured out how to push to this branch ;)).

I don't like the test_schema_org.py file though (probably an AI artefact). Those test cases should be in something close to

a test_aiohttp.py and test_requests.py,
or simpler: test_loader.py with classes TestRequests and TestAiohttp

The tests themselves can use schema.org, but should be named after the functionality they are testing, so something like test_remote_context_without_jsonld_mimetype

mielvds · 2026-02-11T09:05:30Z

@anatoly-scherbakov I just merged #170 introducing tests/test_document_loader.py. I'd rather have you merge these tests in there.

datadavev mentioned this pull request Aug 3, 2020

Issue 128 #135

Closed

datadavev added a commit to datadavev/pyld that referenced this pull request Aug 3, 2020

Rolling back recursion count, correcting issues in PR digitalbazaar#129

30668b3

JulienPalard mentioned this pull request Oct 9, 2020

Always follow link header if content-type is not json. #141

Closed

alpha-beta-soup mentioned this pull request Nov 24, 2020

google expects plain schema-org url geopython/pygeoapi#576

Merged

pvgenuchten mentioned this pull request Dec 1, 2020

pyld requires different schema-org ld-context then search engine crawlers geopython/pygeoapi#583

Closed

ianco mentioned this pull request May 19, 2021

Error loading json-ld content from https://schema.org #154

Open

anatoly-scherbakov reviewed Oct 18, 2023

View reviewed changes

lib/pyld/documentloader/requests.py Outdated Show resolved Hide resolved

anatoly-scherbakov force-pushed the iss-128 branch from 59c2185 to afd80f9 Compare January 21, 2026 13:52

mielvds modified the milestones: v3.0.0-alpha (Testing strategy), v3.0.0 (JSON-LD 1.1) Feb 3, 2026

anatoly-scherbakov requested review from BigBlueHat and mielvds February 4, 2026 16:12

alpha-beta-soup commented Feb 9, 2026

View reviewed changes

alpha-beta-soup and others added 6 commits February 10, 2026 09:06

request JSON-LD from Link rel=alternate

df98896

Update lib/pyld/documentloader/requests.py

97d492a

Co-authored-by: Anatoly Scherbakov <altaisoft@gmail.com>

Strip Content Type parameters before matching Content Type

7c29af8

mielvds force-pushed the iss-128 branch from 4248d43 to 8820321 Compare February 10, 2026 08:07

mielvds added 2 commits February 10, 2026 09:13

Revert schema.org context in compaction test

1545861

Add network marks to date compact tests

3b59daa

mielvds requested changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

request JSON-LD from Link rel=alternate#129

request JSON-LD from Link rel=alternate#129
alpha-beta-soup wants to merge 8 commits intodigitalbazaar:masterfrom
alpha-beta-soup:iss-128

alpha-beta-soup commented Jun 24, 2020 •

edited

Loading

Uh oh!

JulienPalard commented Oct 9, 2020

Uh oh!

alpha-beta-soup commented Oct 11, 2020

Uh oh!

JulienPalard commented Oct 11, 2020

Uh oh!

alpha-beta-soup commented Oct 11, 2020

Uh oh!

JulienPalard commented Oct 11, 2020

Uh oh!

Uh oh!

mielvds commented Feb 5, 2026

Uh oh!

anatoly-scherbakov commented Feb 6, 2026

Uh oh!

alpha-beta-soup commented Feb 8, 2026

Uh oh!

anatoly-scherbakov commented Feb 9, 2026

Uh oh!

alpha-beta-soup left a comment

Uh oh!

mielvds left a comment

Uh oh!

mielvds commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alpha-beta-soup commented Jun 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienPalard commented Oct 9, 2020

Uh oh!

alpha-beta-soup commented Oct 11, 2020

Uh oh!

JulienPalard commented Oct 11, 2020

Uh oh!

alpha-beta-soup commented Oct 11, 2020

Uh oh!

JulienPalard commented Oct 11, 2020

Uh oh!

Uh oh!

mielvds commented Feb 5, 2026

Uh oh!

anatoly-scherbakov commented Feb 6, 2026

Uh oh!

alpha-beta-soup commented Feb 8, 2026

Uh oh!

anatoly-scherbakov commented Feb 9, 2026

Uh oh!

alpha-beta-soup left a comment

Choose a reason for hiding this comment

Uh oh!

mielvds left a comment

Choose a reason for hiding this comment

Uh oh!

mielvds commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alpha-beta-soup commented Jun 24, 2020 •

edited

Loading

mielvds commented Feb 11, 2026 •

edited

Loading