foxtrot

Author	SHA1	Message	Date
David Peterson	052fb0e087	Refactor pre-scan process in load_folder.py to utilize ThreadPoolExecutor for improved performance Updated the main function to replace sequential file processing with a threaded approach using ThreadPoolExecutor. This change enhances the efficiency of reading row counts from SAS files, particularly for large datasets, by allowing concurrent I/O operations. Added progress tracking with tqdm for better user feedback during the pre-scan phase.	2026-04-20 22:43:02 -05:00
David Peterson	fe7dc4d5a1	Enhance load_cluster function for parallel processing and progress tracking Refactored the load_cluster function in load_folder.py to support parallel file loading using ProcessPoolExecutor, improving performance during the append phase. Added workers parameter for controlling parallelism and integrated a progress_queue for real-time progress updates. Introduced read_sas_metadata function in load_sas.py to efficiently read metadata from SAS files, optimizing the pre-scan process for global progress tracking.	2026-04-20 22:02:55 -05:00
David Peterson	96f2d6fe79	Update requirements and enhance SAS file processing with progress tracking Updated the pyarrow version in requirements.txt to improve compatibility. Enhanced the _infer_cluster_schema and _stream_file functions in load_folder.py and load_sas.py to return total row counts for better progress tracking during data streaming. Integrated tqdm for visual feedback on row processing, improving user experience during large data loads.	2026-04-20 21:44:49 -05:00
David Peterson	b78f6d648f	Enhance file clustering by implementing numeric sorting for last digit groups in stems and updating documentation for embedded-digit handling in auto-detection.	2026-04-20 11:48:22 -05:00
michael-corey	b3d7a9d440	adding index field	2026-04-20 10:18:09 -05:00
michael-corey	0d955eeab1	adding partition flag	2026-04-20 09:56:00 -05:00
michael-corey	e39eb47a90	altering such that commit is by batch	2026-04-20 08:38:38 -05:00
michael-corey	1bbe0d4cd6	removing latin encoding, adding usage notes	2026-04-18 13:06:01 -05:00
michael-corey	3b913b2ca6	adding user prompt for db creds	2026-04-18 12:37:22 -05:00
David Peterson	5b48872dd7	Add generate_sample_folder.py and load_folder.py for clustered SAS file generation and loading Introduce generate_sample_folder.py to create a test folder with clustered SAS XPORT files, including configurations for schema compatibility checks. Implement load_folder.py to facilitate loading entire directories of SAS files into Postgres, supporting explicit and auto-detect clustering. Update sample_folder_config.yaml for usage examples and configuration structure. Enhance load_sas.py with a public schema compatibility check function for orchestrators.	2026-04-18 11:25:04 -05:00

10 Commits