advanced_dates #11

Merged
dp merged 4 commits from advanced_dates into main 2026-04-23 22:33:18 +00:00

4 Commits

Author SHA1 Message Date
David Peterson
c3d1f72556 Add null string sentinel handling in load_sas.py for improved missing value detection
Introduced a frozenset of string literals that represent SQL NULL values, enhancing the inference and nullability detection processes. Implemented helper functions to identify null strings and unify missing value checks for CHAR/TEXT columns. Updated the _null_sentinel_mask function to replace these sentinel values with None, ensuring consistent handling across various data types during data loading. This change improves robustness in managing missing data scenarios.
2026-04-22 19:20:07 -05:00
David Peterson
998a3e282f Revert "Optimize datetime parsing in load_sas.py by implementing a sample-based format detection approach"
This reverts commit 857f696305.
2026-04-22 13:05:11 -05:00
David Peterson
857f696305 Optimize datetime parsing in load_sas.py by implementing a sample-based format detection approach
Introduced a new mechanism to sample non-null values for determining the appropriate datetime parsing strategy, significantly reducing processing time for large datasets. This change replaces the previous full row-walk method with a more efficient sampling technique, enhancing performance while maintaining robust handling of various date formats. Updated comments for clarity on the new approach.
2026-04-22 12:54:19 -05:00
David Peterson
c3fa943e77 Enhance date and datetime parsing in load_sas.py with flexible regex and fallback formats
Introduced a locale-independent month lookup and improved date parsing functions to handle various date formats, including SAS and Oracle styles. The new _parse_flexible_date and _parse_flexible_datetime functions provide robust parsing capabilities, accommodating both date-only and datetime inputs. Updated _try_date_coerce and _try_datetime_coerce to utilize these new functions, ensuring better handling of diverse date formats during data loading.
2026-04-22 12:28:19 -05:00