adding permit data

This commit is contained in:
michael-corey 2026-06-02 10:27:40 -05:00
commit 341a129953
6 changed files with 144783 additions and 0 deletions

1
README.md Normal file
View File

@ -0,0 +1 @@
This repo contains collected public source documents.

View File

@ -0,0 +1,12 @@
not_livable_permits -- loading notes
====================================
Rows: 135,159. Files: not_livable_permits.csv / .parquet / data_dictionary.csv
PREFERRED -- parquet (dtypes embedded, no re-contamination):
import pandas as pd; df = pd.read_parquet('not_livable_permits.parquet')
CSV: read code columns as str (else '06037'->int 6037, leading zero lost):
STR_COLS = ['permit_id', 'city', 'state', 'source_dataset', 'tract_geoid', 'county_fips']
df = pd.read_csv('not_livable_permits.csv', dtype={c:str for c in STR_COLS})
FIPS canonical: state=2, county_fips=5, tract_geoid=11 digits.
Tight headline = uninhabitable_on_census_day & residential_flag & ~excluded_any (=32,199).

View File

@ -0,0 +1,31 @@
column,dtype,pct_populated,n_distinct,example,definition
permit_id,string,95.7,125150,340653475.0,Source permit/record id within the issuing jurisdiction.
city,string,100.0,46,nyc,Source city/jurisdiction slug.
state,string,100.0,17,36,USPS state abbreviation.
source_dataset,string,100.0,50,nyc_construction_permits.csv,Originating open-data dataset id.
ingestion_quality,string,100.0,7,hand_mapped_v8,Ingestion grade (hand_mapped_v8/partial/narrow/...); weight screening rows by this.
address,string,94.6,84985,1621 AVENUE T,Street address as published by the source (street-only for most cities).
latitude,float,57.5,41689,40.601231,WGS84 latitude where the source provided coordinates.
longitude,float,57.4,41556,-73.955623,WGS84 longitude where the source provided coordinates.
tract_geoid,string,72.7,5976,36047055800,11-digit 2020 Census tract GEOID.
county_fips,string,75.3,32,36047,5-digit county FIPS.
geo_resolution,string,100.0,3,full_address,Best available geo precision: full_address > latlon > tract_only > none.
not_livable_type,string,100.0,3,demolition,demolition | condemned_unsafe | under_construction.
census_day_status,string,100.0,5,active,active|unconfirmed|completed_before|issued_after|unknown_dates vs 2020-04-01.
signal_strength,string,100.0,3,strong,strong (demo/condemn) | medium (active construction) | weak (unconfirmed).
match_keyword,string,100.0,33,demolition,Keyword that triggered the type classification.
match_basis,string,85.7,243,demolition|demo,"All matched keywords (pipe-delimited), for QA."
residential_flag,bool,100.0,2,unknown,"yes if residential/dwelling context detected, else unknown."
start_date,date,99.7,5544,2019-01-01,Permit issue date (the 'start').
end_date,date,56.2,3986,2019-01-02,Completion/CO date where a completion proxy exists; blank for issuance-only sources.
units,int,19.0,338,0.0,Dwelling units on the permit where reported.
use_type_raw,string,42.5,756,1-2-3 FAMILY,Verbatim source permit-type/use text.
description_raw,string,87.6,71365,INTERIOR DEMOLITION( PARTITION REMOVAL) ; NEW PARTITION; CEI,Verbatim source work-description text.
uninhabitable_18,bool,100.0,2,True,Same under an 18-month recency window (conservative alternative).
uninhabitable_24,string,100.0,2,True,
uninhabitable_on_census_day,bool,100.0,2,True,TRUE if uninhabitable on Census Day (type-aware; 24mo recency + no-rebuild for demo/condemn).
rebuilt_before_census_day,string,100.0,2,False,
excl_interior_accessory_demo,bool,100.0,2,False,TRUE if a bare-demo row is interior/accessory (not a dwelling).
excl_erect_nonresidential,bool,100.0,2,False,TRUE if an 'erect' row lacks residential context.
excl_dc_unitsfallback,bool,100.0,2,False,TRUE if a DC row entered only via the units>0 fallback.
excluded_any,bool,100.0,2,False,TRUE if any exclusion flag is set.
1 column dtype pct_populated n_distinct example definition
2 permit_id string 95.7 125150 340653475.0 Source permit/record id within the issuing jurisdiction.
3 city string 100.0 46 nyc Source city/jurisdiction slug.
4 state string 100.0 17 36 USPS state abbreviation.
5 source_dataset string 100.0 50 nyc_construction_permits.csv Originating open-data dataset id.
6 ingestion_quality string 100.0 7 hand_mapped_v8 Ingestion grade (hand_mapped_v8/partial/narrow/...); weight screening rows by this.
7 address string 94.6 84985 1621 AVENUE T Street address as published by the source (street-only for most cities).
8 latitude float 57.5 41689 40.601231 WGS84 latitude where the source provided coordinates.
9 longitude float 57.4 41556 -73.955623 WGS84 longitude where the source provided coordinates.
10 tract_geoid string 72.7 5976 36047055800 11-digit 2020 Census tract GEOID.
11 county_fips string 75.3 32 36047 5-digit county FIPS.
12 geo_resolution string 100.0 3 full_address Best available geo precision: full_address > latlon > tract_only > none.
13 not_livable_type string 100.0 3 demolition demolition | condemned_unsafe | under_construction.
14 census_day_status string 100.0 5 active active|unconfirmed|completed_before|issued_after|unknown_dates vs 2020-04-01.
15 signal_strength string 100.0 3 strong strong (demo/condemn) | medium (active construction) | weak (unconfirmed).
16 match_keyword string 100.0 33 demolition Keyword that triggered the type classification.
17 match_basis string 85.7 243 demolition|demo All matched keywords (pipe-delimited), for QA.
18 residential_flag bool 100.0 2 unknown yes if residential/dwelling context detected, else unknown.
19 start_date date 99.7 5544 2019-01-01 Permit issue date (the 'start').
20 end_date date 56.2 3986 2019-01-02 Completion/CO date where a completion proxy exists; blank for issuance-only sources.
21 units int 19.0 338 0.0 Dwelling units on the permit where reported.
22 use_type_raw string 42.5 756 1-2-3 FAMILY Verbatim source permit-type/use text.
23 description_raw string 87.6 71365 INTERIOR DEMOLITION( PARTITION REMOVAL) ; NEW PARTITION; CEI Verbatim source work-description text.
24 uninhabitable_18 bool 100.0 2 True Same under an 18-month recency window (conservative alternative).
25 uninhabitable_24 string 100.0 2 True
26 uninhabitable_on_census_day bool 100.0 2 True TRUE if uninhabitable on Census Day (type-aware; 24mo recency + no-rebuild for demo/condemn).
27 rebuilt_before_census_day string 100.0 2 False
28 excl_interior_accessory_demo bool 100.0 2 False TRUE if a bare-demo row is interior/accessory (not a dwelling).
29 excl_erect_nonresidential bool 100.0 2 False TRUE if an 'erect' row lacks residential context.
30 excl_dc_unitsfallback bool 100.0 2 False TRUE if a DC row entered only via the units>0 fallback.
31 excluded_any bool 100.0 2 False TRUE if any exclusion flag is set.

File diff suppressed because it is too large Load Diff

Binary file not shown.