michael-corey
2390ce1e0c
adding explorer
2026-04-20 16:27:54 -05:00
David Peterson
384103f489
Update pyreadstat version constraint in requirements.txt to allow for version 2.0
2026-04-20 14:10:08 -05:00
David Peterson
03b97999dc
Add S3 download utility and example configuration
...
Introduced a new script `s3_download.py` for downloading files from S3 based on a YAML configuration. The script supports recursive listing, file clustering, and customizable download behavior. Also added a sample configuration file `sample_s3_download_config.yaml` to demonstrate usage.
2026-04-20 13:14:42 -05:00
David Peterson
b78f6d648f
Enhance file clustering by implementing numeric sorting for last digit groups in stems and updating documentation for embedded-digit handling in auto-detection.
2026-04-20 11:48:22 -05:00
michael-corey
b3d7a9d440
adding index field
2026-04-20 10:18:09 -05:00
michael-corey
0d955eeab1
adding partition flag
2026-04-20 09:56:00 -05:00
michael-corey
e39eb47a90
altering such that commit is by batch
2026-04-20 08:38:38 -05:00
michael-corey
508cc974ea
adding local check
2026-04-20 08:25:27 -05:00
michael-corey
2d95711d9d
Updating python reference
2026-04-18 13:43:29 -05:00
michael-corey
f1e99d887d
altering invalid arguments
2026-04-18 13:41:54 -05:00
michael-corey
f101eacffd
Merging main
2026-04-18 13:39:37 -05:00
michael-corey
edb9146682
moving files
2026-04-18 13:35:32 -05:00
michael-corey
1bbe0d4cd6
removing latin encoding, adding usage notes
2026-04-18 13:06:01 -05:00
David Peterson
c1e1fec10b
Update requirements.txt to support new package versions and add boto3 dependency
2026-04-18 12:41:02 -05:00
michael-corey
3b913b2ca6
adding user prompt for db creds
2026-04-18 12:37:22 -05:00
David Peterson
5b48872dd7
Add generate_sample_folder.py and load_folder.py for clustered SAS file generation and loading
...
Introduce generate_sample_folder.py to create a test folder with clustered SAS XPORT files, including configurations for schema compatibility checks. Implement load_folder.py to facilitate loading entire directories of SAS files into Postgres, supporting explicit and auto-detect clustering. Update sample_folder_config.yaml for usage examples and configuration structure. Enhance load_sas.py with a public schema compatibility check function for orchestrators.
2026-04-18 11:25:04 -05:00
michael-corey
6b12ab969b
adding file_viewer
2026-04-18 11:19:38 -05:00
David Peterson
5645ff5597
Update load_sas.py to support streaming data loads with iter_sas_chunks and copy_dataframes. Enhance documentation for schema inference and type detection, clarifying the use of read_sas_preview and the implications of sampling. Add __pycache__ to .gitignore.
2026-04-18 10:44:32 -05:00
David Peterson
3a0537270c
Implement type inference sampling in load_sas.py to improve performance on large SAS files. Introduce TYPE_INFERENCE_SAMPLE_ROWS to limit the number of rows scanned for type detection while ensuring nullability checks cover the entire column. Update documentation to reflect these changes.
2026-04-18 10:28:37 -05:00
David Peterson
4f7ded09c6
Enhance load_sas.py with detailed usage instructions, YAML config structure, and command-line interface documentation for loading SAS files.
2026-04-18 10:20:07 -05:00
michael-corey
f681f1012a
Adding generic loader
2026-04-18 09:34:48 -05:00