-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat!: add exp.Trunc for numeric truncation #6923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add exp.Trunc expression class for numeric truncation, filling a gap alongside existing Round, Floor, and Ceil functions (in AST since 2021). Implementation: - New exp.Trunc class with _sql_names = ["TRUNC", "TRUNCATE"] - Oracle: Type-annotation-aware builder (DATE/TIMESTAMP/DATETIME -> DateTrunc, numeric -> Trunc) - Exasol: Updated to use exp.Trunc instead of Anonymous; fixed to handle DATETIME type and single-arg date TRUNC (defaults to 'DD' like Oracle) - MySQL: Generates TRUNCATE(...); TRUNC alias normalizes to TRUNCATE - T-SQL: Generates ROUND(x, n, 1) - Other dialects (PostgreSQL, Snowflake, DuckDB, BigQuery): Auto-supported via _sql_names - Date-only dialects (Hive, Spark) unchanged Tests: - Oracle: exp.Trunc identity and type assertion (vs DateTrunc), single-arg TRUNC - MySQL: TRUNCATE identity and TRUNC alias normalization - T-SQL: ROUND(x, n, 1) generation from other dialects' TRUNC - Exasol: exp.Trunc type assertion, DATETIME handling, single-arg date TRUNC Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
SQLite's TRUNC() only accepts one argument. This change: - Simplifies TRUNC(x, 0) to TRUNC(x) for SQLite - Warns when non-zero decimals are used (unsupported)
VaggelisD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @doripo, thanks for the contribution!
- Can we increase test coverage e.g in dialects like Spark to ensure that
TRUNC(foo)remainsexp.DateTrunc? - Snowflake's TRUNC is also overloaded, so we'd need to approach it similar to Oracle & Exasol. See this comment
- SQLite: Use @unsupported_args("decimals") for best-effort transpilation,
move trunc_sql inside Generator for auto-discovery, update tests
- Spark/Hive: Add tests to verify TRUNC remains DateTrunc/TimestampTrunc
Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
- Add build_trunc to dialect.py for Oracle/Exasol/Snowflake - Uses type annotation to distinguish date vs numeric truncation - Returns Anonymous if type cannot be determined - Add Snowflake support for overloaded TRUNC/TRUNCATE - Add comprehensive Snowflake TRUNC tests Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
Add validate_all tests for: - Oracle: Date truncation (DAY, WEEK, MONTH, QUARTER, YEAR), Timestamp truncation (HOUR, MINUTE, SECOND, DAY, MONTH, YEAR) - Exasol: Date/time truncation with Exasol-specific units (YYYY, MM, DD, HH, MI, SS, WW, Q) - Cross-dialect: Oracle, Snowflake, Postgres, BigQuery, DuckDB, TSQL, Spark Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
VaggelisD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick & accurate iterations! Leaving a few more comments your way, should be close to merging soon (also, appreciate the a-b sorting of the imports as a cherry on top 😆)
add .assert_is(exp.Trunc) as chained function call Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com>
- Use TEMPORAL_TYPES instead of DATE/TIMESTAMP/DATETIME to correctly handle all temporal types (e.g. TIMESTAMPTZ, TIMESTAMPLTZ) - Group build_trunc conditions by return type (DateTrunc/Trunc/Anonymous) - Chain .assert_is() to validate_identity calls - Merge redundant validate_all tests - Add fallback tests for Anonymous case - Remove duplicate Spark TRUNC test Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
- test_trunc_type_inference: tests build_trunc discrimination logic (shared across Oracle, Exasol, Snowflake) - 5 cases covering temporal+?, ?+string, numeric+?, ?+int, ?+? fallback - test_trunc: Oracle-specific identity and transpilation tests Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
|
@VaggelisD modified according to your suggestions, thanks, please also let me know if the trailing comment in the main PR description ("Out of scope / potential follow-up:") is appropriate or better remove it from the commit message. |
|
Thank you!
It is fine for future reference, thanks for documenting it; The PR has already grown so it's great if we can now limit its scope to what's already there |
VaggelisD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @doripo, I believe I have one final comment and a few nits left, really appreciate your cooperation. If anything else comes up after that we'll take it to the finish line by merging this PR and applying the fixes afterwards
I agree, we can definitely increase the test coverage by adding more TRUNC specific tests in oracle, snowflake & exasol files. We can add it as part of a separate PR since this is a big PR. Something like this for Oracle: Similarly we can do the same for snowflake (https://docs.snowflake.com/en/sql-reference/functions-date-time#label-supported-date-time-parts) Exasol: |
- Snowflake: date_trunc_requires_part=False (single-arg is numeric) - Oracle: default_date_trunc_unit="DD" (single-arg temporal uses DD) - Update tests for new behavior - Use validate_identity where SQL round-trips Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
- Use "truncation" (concept) instead of "TRUNC" (function name) - Consistent patterns: "Numeric truncation identity", "Date truncation with typed column and unit", "Cross-dialect numeric truncation transpilation" Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
Signed-off-by: Dori Polotsky <doripo@riverpool.ai>
|
The PR grew to more dialects so I attempted to aggregate their docs for our reference during reviewing:
|
georgesittas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work
VaggelisD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final comments from me too, thank you!
| # Fallback to Anonymous (Exasol requires unit for date truncation) | ||
| self.validate_identity("TRUNC(CAST(x AS DATE))").assert_is(exp.Anonymous) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Exasol docs unit is optional so we could infer that this is date truncation here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into Exasol's TRUNC default behavior. I can't say that the docs are 100% explicit, but:
- TRUNC/TRUNCATE docs say format is optional and "if a format is not specified, the value is truncated to days"
- DATE_TRUNC is described as "PostgreSQL compatible"
- The ROUND function docs reference "days from noon" as the rounding boundary, implying day-level granularity
This suggests DD (day truncation to midnight) matches Oracle's default, though not stated verbatim. Are you comfortable adding default_date_trunc_unit="DD" for Exasol, or would you prefer keeping the Anonymous fallback until we have firmer confirmation?
I have the fix ready (4 changed lines) just wanted to double check.
(continuing 77830a3#r2759813244)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: opened new issue #6959 for DD / DAY unit transpilation
|
@VaggelisD that's a very good reference! I believe that these minor corrections to your table apply:
Based on my understanding. the current gaps are therefore quite small (about 20 lines of code, not including the tests):
I can address them either now or in a follow-up PR, as they are incremental improvements. |
|
Let's get any minor fixes in and then improve the parsing / transpilation in a follow-up to avoid increasing the scope more. |
|
Or, for Spark/Hive probably better to just cast to BIGINT with unsupported_args("decimals") - the 'smart' workaround is somewhat weird.. If I do that now, then all the code fixes are really minor - and just a few more tests. @georgesittas does that seem acceptable at this stage of the PR? |
|
Let's do it in a follow-up PR @doripo, we can apply any last one-liners here. |
fivetran-amrutabhimsenayachit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Style fixups Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com>
Add exp.Trunc expression class for numeric truncation, filling a gap alongside existing Round, Floor, and Ceil functions (in AST since 2021).
Implementation:
Tests:
Out of scope / potential follow-up: