Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py #9
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "fix_prescan"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.