Commit Graph

  • c3d1f72556 Add null string sentinel handling in load_sas.py for improved missing value detection main David Peterson 2026-04-22 19:20:07 -0500
  • 998a3e282f Revert "Optimize datetime parsing in load_sas.py by implementing a sample-based format detection approach" David Peterson 2026-04-22 13:05:11 -0500
  • 857f696305 Optimize datetime parsing in load_sas.py by implementing a sample-based format detection approach David Peterson 2026-04-22 12:54:19 -0500
  • c3fa943e77 Enhance date and datetime parsing in load_sas.py with flexible regex and fallback formats David Peterson 2026-04-22 12:28:19 -0500
  • f63d684d51 moving to env file michael-corey 2026-04-22 10:26:30 -0500
  • 0632e110e5 Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py David Peterson 2026-04-21 21:43:42 -0500
  • 75bbf5fcd2 moving to env file michael-corey 2026-04-22 10:26:30 -0500
  • f4b4d0e928 adding exception counter directory_explorer michael-corey 2026-04-20 17:02:35 -0500
  • dd83f58412 Implement parallel processing for partition discovery in load_folder.py and enhance column filtering in load_sas.py David Peterson 2026-04-21 21:43:42 -0500
  • 1197846d10 adding text file support michael-corey 2026-04-21 20:05:26 -0500
  • 64e7ff0b0a Enhance error reporting in load_folder.py and load_sas.py for better debugging David Peterson 2026-04-21 16:56:27 -0500
  • eff82c73ce Add all_nullable configuration option in load_folder.py and load_sas.py for flexible schema management David Peterson 2026-04-21 16:48:37 -0500
  • c283b42876 Add safe numeric to datetime conversion in load_sas.py to handle edge cases David Peterson 2026-04-21 15:55:25 -0500
  • a46f0518f6 Suppress PerformanceWarning in load_sas.py to reduce noise during processing of wide SAS files. This change filters out warnings related to DataFrame fragmentation, which are irrelevant for our pipeline as we directly convert DataFrames to pyarrow tables. David Peterson 2026-04-21 13:40:38 -0500
  • 969a442775 Refactor numeric column type inference in load_sas.py for improved data handling David Peterson 2026-04-21 13:17:01 -0500
  • 212218fb67 Enhance error handling and abort functionality in load_folder.py for parallel file loading David Peterson 2026-04-21 12:54:05 -0500
  • ae65140390 Add column type overrides in load_folder.py and load_sas.py for enhanced schema control David Peterson 2026-04-21 12:14:44 -0500
  • 0c5e6e31f0 Enhance memory management in load_folder.py and load_sas.py for improved performance David Peterson 2026-04-21 10:46:54 -0500
  • 9afb52aecb Add --chunk-rows option to load_folder.py for customizable memory management David Peterson 2026-04-21 10:05:21 -0500
  • eac75cbb26 Refactor load_cluster function in load_folder.py for improved parallel file loading David Peterson 2026-04-21 08:31:48 -0500
  • 1265489276 Enhance date and timestamp handling in _prepare_for_copy function in load_sas.py David Peterson 2026-04-21 08:16:17 -0500
  • 2dd247b067 Add --no-prescan option to load_folder.py for skipping metadata scan David Peterson 2026-04-21 08:12:39 -0500
  • 052fb0e087 Refactor pre-scan process in load_folder.py to utilize ThreadPoolExecutor for improved performance David Peterson 2026-04-20 22:43:02 -0500
  • fe7dc4d5a1 Enhance load_cluster function for parallel processing and progress tracking David Peterson 2026-04-20 22:02:55 -0500
  • 96f2d6fe79 Update requirements and enhance SAS file processing with progress tracking David Peterson 2026-04-20 21:44:49 -0500
  • 7beb44ac4d Add pyarrow dependency and optimize DataFrame serialization in load_sas.py David Peterson 2026-04-20 21:32:56 -0500
  • 5e347f50ef Add widening compatibility checks in load_sas.py for type inference David Peterson 2026-04-20 21:08:13 -0500
  • f84e127796 Update type inference behavior in load_sas.py to scan entire files by default David Peterson 2026-04-20 20:43:27 -0500
  • a94ab68f4d Refine partition name patterns in sas_profiler.py David Peterson 2026-04-20 19:27:01 -0500
  • 4fc85081c8 Enhance SAS profiling performance in sas_profiler.py David Peterson 2026-04-20 19:03:40 -0500
  • 5449a25b44 Refactor partition candidate logic in sas_profiler.py David Peterson 2026-04-20 18:49:23 -0500
  • b3b968edf2 Add openpyxl dependency to requirements.txt for Excel file handling David Peterson 2026-04-20 18:38:24 -0500
  • f1af1136dc Add standalone SAS profiling utility David Peterson 2026-04-20 18:38:01 -0500
  • e48038f3c6 updating for sas michael-corey 2026-04-20 16:30:35 -0500
  • 2390ce1e0c adding explorer michael-corey 2026-04-20 16:27:54 -0500
  • 384103f489 Update pyreadstat version constraint in requirements.txt to allow for version 2.0 David Peterson 2026-04-20 14:10:08 -0500
  • 03b97999dc Add S3 download utility and example configuration David Peterson 2026-04-20 13:14:42 -0500
  • b78f6d648f Enhance file clustering by implementing numeric sorting for last digit groups in stems and updating documentation for embedded-digit handling in auto-detection. David Peterson 2026-04-20 11:48:22 -0500
  • b3d7a9d440 adding index field michael-corey 2026-04-20 10:18:09 -0500
  • 0d955eeab1 adding partition flag michael-corey 2026-04-20 09:56:00 -0500
  • e39eb47a90 altering such that commit is by batch michael-corey 2026-04-20 08:38:38 -0500
  • 508cc974ea adding local check michael-corey 2026-04-20 08:25:27 -0500
  • 2d95711d9d Updating python reference file_viewer michael-corey 2026-04-18 13:43:29 -0500
  • f1e99d887d altering invalid arguments michael-corey 2026-04-18 13:41:54 -0500
  • f101eacffd Merging main michael-corey 2026-04-18 13:39:37 -0500
  • edb9146682 moving files michael-corey 2026-04-18 13:35:32 -0500
  • 1bbe0d4cd6 removing latin encoding, adding usage notes batch_folder_processing michael-corey 2026-04-18 12:54:29 -0500
  • c1e1fec10b Update requirements.txt to support new package versions and add boto3 dependency David Peterson 2026-04-18 12:39:44 -0500
  • 3b913b2ca6 adding user prompt for db creds michael-corey 2026-04-18 12:37:22 -0500
  • 5b48872dd7 Add generate_sample_folder.py and load_folder.py for clustered SAS file generation and loading David Peterson 2026-04-18 11:25:04 -0500
  • 6b12ab969b adding file_viewer michael-corey 2026-04-18 11:19:38 -0500
  • 5645ff5597 Update load_sas.py to support streaming data loads with iter_sas_chunks and copy_dataframes. Enhance documentation for schema inference and type detection, clarifying the use of read_sas_preview and the implications of sampling. Add __pycache__ to .gitignore. David Peterson 2026-04-18 10:44:32 -0500
  • 3a0537270c Implement type inference sampling in load_sas.py to improve performance on large SAS files. Introduce TYPE_INFERENCE_SAMPLE_ROWS to limit the number of rows scanned for type detection while ensuring nullability checks cover the entire column. Update documentation to reflect these changes. David Peterson 2026-04-18 10:28:37 -0500
  • 4f7ded09c6 Enhance load_sas.py with detailed usage instructions, YAML config structure, and command-line interface documentation for loading SAS files. David Peterson 2026-04-18 10:20:07 -0500
  • f681f1012a Adding generic loader michael-corey 2026-04-18 09:34:48 -0500