advanced_analyzer #8

Merged
dp merged 23 commits from advanced_analyzer into main 2026-04-21 22:32:18 +00:00
Showing only changes of commit a94ab68f4d - Show all commits

View File

@ -117,8 +117,12 @@ larger than the file, pyreadstat just hands back one chunk."""
PARTITION_NAME_PATTERNS: Tuple[re.Pattern, ...] = ( PARTITION_NAME_PATTERNS: Tuple[re.Pattern, ...] = (
re.compile(r"^state$", re.IGNORECASE), # ``state`` or ``state_code`` / ``statecode`` appearing as a full token
re.compile(r"^state_?code$", re.IGNORECASE), # anywhere in the column name. Uses underscore / start / end as token
# boundaries so we catch STATE, STATE_CODE, HOME_STATE,
# ADDR_LINE3_STATE, BIRTH_STATE_CODE, etc. without matching STATUS,
# ESTATE, INTERSTATE, or STATEWIDE.
re.compile(r"(?:^|_)state(?:_?code)?(?:_|$)", re.IGNORECASE),
) )
"""Only columns whose name matches one of these patterns are ever considered """Only columns whose name matches one of these patterns are ever considered
partition candidates. This deliberately ignores generic low-cardinality partition candidates. This deliberately ignores generic low-cardinality