Added support for parallel processing using ProcessPoolExecutor in the _discover_cluster_partitions function, allowing for efficient partition value discovery across multiple files. This change significantly reduces I/O overhead by reading only necessary columns during scans. Additionally, updated iter_sas_chunks and iter_text_chunks functions to accept a usecols parameter, enabling selective column parsing for improved performance during data loading. These enhancements aim to optimize resource usage and speed up the data processing pipeline. |
||
|---|---|---|
| .. | ||
| samples | ||
| .env.example | ||
| generate_sample_folder.py | ||
| generate_sample_sas.py | ||
| load_folder.py | ||
| load_sas.py | ||
| sample_config.yaml | ||
| sample_folder_config.yaml | ||