Data Dictionary
Schema definitions for all tables in the D1 analytics database. Raw data is stored as parquet in R2; these tables contain pre-aggregated summaries.
datasets
Data source inventory
| Column | Type | Description |
|---|---|---|
| id | TEXT | Primary key: taxi, subway, citibike, air, weather |
| display_name | TEXT | Human-readable name |
| description | TEXT | Dataset description |
| earliest_month | TEXT | Earliest available YYYY-MM |
| latest_month | TEXT | Most recent YYYY-MM |
| total_records | INTEGER | Total row count across all files |
| updated_at | TEXT | Last update timestamp |
taxi_zone_monthly
Taxi zone-level monthly aggregations
| Column | Type | Description |
|---|---|---|
| taxi_type | TEXT | yellow, green, or fhvhv |
| year_month | TEXT | YYYY-MM |
| location_id | INTEGER | TLC taxi zone ID (1-263) |
| pickup_count | INTEGER | Number of pickups |
| dropoff_count | INTEGER | Number of dropoffs |
| avg_fare | REAL | Average fare amount (USD) |
| avg_distance | REAL | Average trip distance (miles) |
| avg_tip | REAL | Average tip amount (USD) |
| total_fare | REAL | Sum of all fares (USD) |
| total_trips | INTEGER | Total trip count |
taxi_hourly
Taxi hourly demand patterns
| Column | Type | Description |
|---|---|---|
| taxi_type | TEXT | yellow, green, or fhvhv |
| year_month | TEXT | YYYY-MM |
| hour | INTEGER | Hour of day (0-23) |
| trip_count | INTEGER | Number of trips |
| avg_fare | REAL | Average fare (USD) |
subway_station_monthly
Subway station monthly ridership
| Column | Type | Description |
|---|---|---|
| station_id | TEXT | MTA station identifier |
| year_month | TEXT | YYYY-MM |
| total_entries | INTEGER | Total turnstile entries |
| total_exits | INTEGER | Total turnstile exits |
| avg_daily_entries | REAL | Average entries per day |
citibike_station_monthly
Citi Bike station monthly activity
| Column | Type | Description |
|---|---|---|
| station_id | TEXT | Citi Bike station ID |
| year_month | TEXT | YYYY-MM |
| trip_starts | INTEGER | Trips starting here |
| trip_ends | INTEGER | Trips ending here |
| avg_duration_min | REAL | Average trip duration (minutes) |
| member_trips | INTEGER | Trips by annual members |
| casual_trips | INTEGER | Trips by casual riders |
air_monthly
Airport monthly operations
| Column | Type | Description |
|---|---|---|
| airport | TEXT | Airport code: JFK, LGA, EWR |
| year_month | TEXT | YYYY-MM |
| departures | INTEGER | Number of departures |
| arrivals | INTEGER | Number of arrivals |
| avg_delay_min | REAL | Average delay (minutes) |
| cancelled | INTEGER | Cancelled flights |
weather_daily
Daily weather observations
| Column | Type | Description |
|---|---|---|
| station | TEXT | NOAA weather station ID |
| date | TEXT | YYYY-MM-DD |
| temp_max_f | REAL | Daily high temperature (°F) |
| temp_min_f | REAL | Daily low temperature (°F) |
| temp_avg_f | REAL | Daily average temperature (°F) |
| precip_in | REAL | Precipitation (inches) |
| snow_in | REAL | Snowfall (inches) |
| wind_avg_mph | REAL | Average wind speed (mph) |
| conditions | TEXT | Weather condition description |