Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py #9

Merged

dp merged 1 commits from fix_prescan into main

2026-04-22 15:35:20 +00:00

Author	SHA1	Message	Date
David Peterson	dd83f58412	Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.	2026-04-21 21:43:42 -05:00

Author

SHA1

Message

Date

David Peterson

dd83f58412

Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py

Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.

2026-04-21 21:43:42 -05:00

Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py #9

1 Commits