Go to file
David Peterson 0632e110e5 Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py
Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline.
2026-04-22 15:35:19 +00:00
generic_loader Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py 2026-04-22 15:35:19 +00:00
utils adding exception counter 2026-04-22 10:09:41 -05:00
.gitignore adding explorer 2026-04-20 16:27:54 -05:00
requirements.txt Update requirements and enhance SAS file processing with progress tracking 2026-04-20 21:44:49 -05:00