Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py #9

Merged
dp merged 1 commits from fix_prescan into main 2026-04-22 15:35:20 +00:00
Contributor

Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.

Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.
dp added 1 commit 2026-04-22 15:19:20 +00:00
Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.
dp requested review from mc 2026-04-22 15:19:25 +00:00
mc approved these changes 2026-04-22 15:34:06 +00:00
dp merged commit 0632e110e5 into main 2026-04-22 15:35:20 +00:00
dp deleted branch fix_prescan 2026-04-22 15:35:24 +00:00
Sign in to join this conversation.
No reviewers
mc
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: OFRA/foxtrot#9
No description provided.