Skip to content

fix: Extract ADLS account_name from URI hostname in FsspecFileIO#3005

Open
antonlin1 wants to merge 1 commit intoapache:mainfrom
antonlin1:fix/adls-account-name-from-uri
Open

fix: Extract ADLS account_name from URI hostname in FsspecFileIO#3005
antonlin1 wants to merge 1 commit intoapache:mainfrom
antonlin1:fix/adls-account-name-from-uri

Conversation

@antonlin1
Copy link

@antonlin1 antonlin1 commented Feb 6, 2026

Summary

  • When adls.account-name is not in catalog/table properties (common for tables created by Spark), FsspecFileIO created AzureBlobFileSystem with account_name=None
  • adlfs _strip_protocol() strips abfss://container@account.dfs.core.windows.net/path to container/path, losing the storage account info, causing FileNotFoundError
  • The fix extracts account_name from the URI hostname as a last-resort fallback in _adls(), after SAS token extraction and explicit property checks

Priority order for account_name resolution:

  1. Explicit adls.account-name property
  2. SAS token key extraction (existing behavior)
  3. NEW: URI hostname extraction (e.g. usagestorageprod.dfs.core.windows.netusagestorageprod)

Test plan

  • test_adls_account_name_extracted_from_uri_hostname — verifies account extraction from full ABFSS URI
  • test_adls_account_name_not_overridden_when_in_properties — verifies explicit property takes priority
  • Existing test_adls_account_name_sas_token_extraction still passes (SAS token takes priority over hostname)

🤖 Generated with Claude Code

@antonlin1 antonlin1 force-pushed the fix/adls-account-name-from-uri branch from 8820970 to 1b6e8b4 Compare February 6, 2026 10:15
When tables created by Spark/Hadoop store fully-qualified ABFSS URIs
(e.g. abfss://container@account.dfs.core.windows.net/path) but catalog
properties don't include adls.account-name, FsspecFileIO would create
AzureBlobFileSystem with account_name=None. adlfs then strips the URI
to container/path via _strip_protocol(), losing the storage account,
resulting in FileNotFoundError.

The fix extracts account_name from the URI hostname as a fallback in
_adls(), after SAS token extraction and explicit property checks, so
existing configuration always takes priority.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@antonlin1 antonlin1 force-pushed the fix/adls-account-name-from-uri branch from 1b6e8b4 to fe8232a Compare February 6, 2026 12:25
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, thanks @antonlin1 for adding this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants