Skip to content

FEAT: Pass all connection string params to mssql-py-core for bulk copy#439

Open
bewithgaurav wants to merge 21 commits intomainfrom
bewithgaurav/bulkcopy-connstr-to-pycore
Open

FEAT: Pass all connection string params to mssql-py-core for bulk copy#439
bewithgaurav wants to merge 21 commits intomainfrom
bewithgaurav/bulkcopy-connstr-to-pycore

Conversation

@bewithgaurav
Copy link
Collaborator

@bewithgaurav bewithgaurav commented Feb 18, 2026

Work Item / Issue Reference

AB#42663

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request refactors and simplifies how ODBC connection string parameters are translated for use with the bulk copy feature in mssql_python. The key change is the introduction of a new helper function, connstr_to_pycore_params, which centralizes and standardizes the parameter mapping logic. This reduces code duplication, improves maintainability, and ensures sensitive data is handled more securely.

Bulk copy connection context refactor:

  • Added a new helper function connstr_to_pycore_params in mssql_python/helpers.py to translate ODBC connection string parameters into the format expected by the py-core bulk copy path, including type conversion and key normalization.
  • Updated mssql_python/cursor.py to use connstr_to_pycore_params for building the pycore_context dictionary, replacing the previous inline mapping and logic. [1] [2]

Security and code quality improvements:

  • Simplified sensitive data cleanup in the bulk copy context by iterating over a tuple of sensitive keys instead of repeating pop calls.
  • Removed redundant manual extraction and assignment of authentication parameters in favor of the new helper function, reducing the risk of accidentally logging or exposing credentials.

Documentation and maintainability:

  • Added detailed documentation to the new helper function explaining its purpose, supported keys, and conversion logic.
  • Added a clarifying comment in cursor.py indicating the mapping from ODBC connection-string keywords.

- Add extract_auth_type() fallback for Windows Interactive where
  process_connection_string returns auth_type=None (DDBC handles
  auth natively but bulkcopy needs it for Azure Identity)
- Preserve auth_type in fallthrough return so bulkcopy can attempt
  fresh token acquisition even when connect-time token fails
- Fix get_raw_token docstring to match actual credential behavior
- Use 'is None' instead of '== None' in test assertion
Raises ValueError with supported types instead of KeyError when an
unsupported auth_type is passed to _acquire_token.
…llthrough return, clean up extract_auth_type parsing
… bewithgaurav/bulkcopy-accesstoken-support
Drop app, workstationid, language, connect_timeout, mars_connection
from key_map — the parser rejects these keywords so they never reach
connstr_to_pycore_params. Also clean up bool_keys/int_keys to match.
@github-actions github-actions bot added the pr-size: medium Moderate update size label Feb 18, 2026
@bewithgaurav bewithgaurav force-pushed the bewithgaurav/bulkcopy-connstr-to-pycore branch from 048ec5d to 4953f9c Compare February 18, 2026 11:58
@github-actions
Copy link

github-actions bot commented Feb 18, 2026

📊 Code Coverage Report

🔥 Diff Coverage

20%


🎯 Overall Coverage

76%


📈 Total Lines Covered: 5502 out of 7178
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

  • mssql_python/connection_string_parser.py (100%)
  • mssql_python/cursor.py (100%)
  • mssql_python/helpers.py (5.9%): Missing lines 275,307,315,317-320,324-325,331,333-336,339,341

Summary

  • Total: 20 lines
  • Missing: 16 lines
  • Coverage: 20%

mssql_python/helpers.py

Lines 271-279

  271     # mars_connection) are silently dropped here.  In the normal connect()
  272     # path the parser validates keywords first (validate_keywords=True),
  273     # but bulkcopy parses with validation off, so this mapping is the
  274     # authoritative filter in that path.
! 275     key_map = {
  276         # auth / credentials
  277         "uid": "user_name",
  278         "pwd": "password",
  279         "trusted_connection": "trusted_connection",

Lines 303-311

  303         "packet size": "packet_size",
  304         "connectretrycount": "connect_retry_count",
  305         "connectretryinterval": "connect_retry_interval",
  306     }
! 307     int_keys = {
  308         "packet_size",
  309         "connect_retry_count",
  310         "connect_retry_interval",
  311         "keep_alive",

Lines 311-329

  311         "keep_alive",
  312         "keep_alive_interval",
  313     }
  314 
! 315     pycore_params: dict = {}
  316 
! 317     for connstr_key, pycore_key in key_map.items():
! 318         raw_value = params.get(connstr_key)
! 319         if raw_value is None:
! 320             continue
  321 
  322         # First-wins: match ODBC behaviour — first synonym in the
  323         # connection string takes precedence (e.g. Addr before Server).
! 324         if pycore_key in pycore_params:
! 325             continue
  326 
  327         # ODBC values are always strings; py-core expects native types for int keys.
  328         # Boolean params (trust_server_certificate, multi_subnet_failover) are passed
  329         # as strings — all Yes/No validation is in connection.rs for single-location

Lines 327-345

  327         # ODBC values are always strings; py-core expects native types for int keys.
  328         # Boolean params (trust_server_certificate, multi_subnet_failover) are passed
  329         # as strings — all Yes/No validation is in connection.rs for single-location
  330         # consistency with Encrypt, ApplicationIntent, IPAddressPreference, etc.
! 331         if pycore_key in int_keys:
  332             # Numeric params (timeouts, packet size, etc.) — skip on bad input
! 333             try:
! 334                 pycore_params[pycore_key] = int(raw_value)
! 335             except (ValueError, TypeError):
! 336                 pass  # let py-core fall back to its compiled-in default
  337         else:
  338             # String params (server, database, encryption, etc.) — pass through
! 339             pycore_params[pycore_key] = raw_value
  340 
! 341     return pycore_params
  342 
  343 
  344 # Settings functionality moved here to avoid circular imports


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.pybind.ddbc_bindings.cpp: 69.3%
mssql_python.pybind.ddbc_bindings.h: 69.7%
mssql_python.pybind.connection.connection.cpp: 75.3%
mssql_python.helpers.py: 77.4%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.cursor.py: 84.7%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

# mars_connection) are intentionally excluded — the parser rejects them.
key_map = {
# auth / credentials
"uid": "user_name",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this map is being used to convert all the connection string params into pycore context
we can go ahead with using the conn string params as is inside pycore but that will require changing a lot of tests, hence added a map

# bulkcopy() is called long after the original connection.
pycore_context = dict(context)
# Translate parsed connection string into the dict py-core expects.
pycore_context = connstr_to_pycore_params(params)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have kept cursor.py to be just passing the params and not at all containing any "validation" logic since we'll take care of that in pycore

--arg patch_coverage_pct "${{ env.PATCH_COVERAGE_PCT }}" \
--arg low_coverage_files "${{ env.LOW_COVERAGE_FILES }}" \
--arg patch_coverage_summary "${{ env.PATCH_COVERAGE_SUMMARY }}" \
--arg low_coverage_files "$LOW_COVERAGE_FILES" \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor code coverage fix, this PR had some codeblocks which needed to be escaped, since they were getting into the bash terminal

@bewithgaurav
Copy link
Collaborator Author

code coverage isn't reflecting correct numbers since the tests are a part of mssql-py-core as of now
will get auto-fixed once we migrate them here

Use jq --rawfile instead of ${{ env.* }} expansion for
PATCH_COVERAGE_SUMMARY and LOW_COVERAGE_FILES to prevent
raw Python source in diff hunks from being interpreted as
shell commands.
@bewithgaurav bewithgaurav force-pushed the bewithgaurav/bulkcopy-connstr-to-pycore branch from 4953f9c to 0b1941c Compare February 18, 2026 15:46
@bewithgaurav bewithgaurav marked this pull request as ready for review February 19, 2026 05:40
Copilot AI review requested due to automatic review settings February 19, 2026 05:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the bulk copy (“py-core”) connection context construction so that parsed ODBC connection-string parameters are centrally translated into the parameter format expected by mssql_py_core, reducing inline mapping logic in cursor.bulkcopy().

Changes:

  • Added connstr_to_pycore_params() helper to map/convert parsed connection-string params into py-core snake_case keys (with basic int coercion).
  • Updated Cursor._bulkcopy() to build pycore_context via the new helper and simplified sensitive-key cleanup.
  • Adjusted PR code coverage workflow to pass multiline env values into jq via shell variables.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
mssql_python/helpers.py Introduces connstr_to_pycore_params() for bulkcopy py-core parameter translation.
mssql_python/cursor.py Uses the new helper to build py-core bulkcopy connection context and streamlines sensitive key cleanup.
.github/workflows/pr-code-coverage.yml Tweaks jq --arg inputs to use shell vars for better handling of multiline env values.
Comments suppressed due to low confidence (1)

mssql_python/cursor.py:2651

  • In the Azure AD token flow, pycore_context is built from connstr_to_pycore_params(params) before adding access_token, which means user_name/password from the original connection string may also be included (and retained until finally). Previously, SQL credentials were only added in the non-AAD path. Consider explicitly removing user_name/password when _auth_type is set (or when setting access_token) to minimize sensitive-data exposure and avoid ambiguous/dual auth inputs being passed to mssql-py-core.
        # Translate parsed connection string into the dict py-core expects.
        pycore_context = connstr_to_pycore_params(params)

        # Token acquisition — only thing cursor must handle (needs azure-identity SDK)
        if self.connection._auth_type:
            # Fresh token acquisition for mssql-py-core connection
            from mssql_python.auth import AADAuth

            try:
                raw_token = AADAuth.get_raw_token(self.connection._auth_type)
            except (RuntimeError, ValueError) as e:
                raise RuntimeError(
                    f"Bulk copy failed: unable to acquire Azure AD token "
                    f"for auth_type '{self.connection._auth_type}': {e}"
                ) from e
            pycore_context["access_token"] = raw_token

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

_normalize_params and connstr_to_pycore_params now skip a
canonical key that was already set by an earlier synonym.
Matches ODBC Driver 18 fFromAttrOrProp first-wins semantics
confirmed via live test against sqlcconn.cpp.

Add TestSynonymFirstWins (12 tests) covering server/addr/
address, trustservercertificate snake_case, and packetsize
with-space synonym groups.
@bewithgaurav bewithgaurav force-pushed the bewithgaurav/bulkcopy-connstr-to-pycore branch from db08e21 to dffadd6 Compare February 19, 2026 06:14
Copy link
Contributor

@sumitmsft sumitmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving this with one question. Although I know the answer but just want to confirm:

Previously bulkcopy effectively defaulted TrustServerCertificate to “yes” when not specified (it used params.get("trustservercertificate", "yes") ... and passed a boolean to py-core).
Now we only forward trust_server_certificate if the keyword is present, which means bulkcopy depends on mssql-py-core’s default when it’s absent.
Since ODBC 18.5 is “secure by default” (Encrypt on; TrustServerCertificate off), can you confirm:

  • What does mssql-py-core do when trust_server_certificate is not provided?
  • Does this match ODBC 18.5’s effective defaults and our non-bulkcopy connection behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments