About NYC Trip Data
What is this?
NYC Trip Data is an open analytics platform that combines five major NYC transportation and weather datasets into a single, queryable interface. The goal is to make it easy to explore patterns in how New Yorkers move around the city and how weather affects transportation.
Data Sources
- NYC TLC — Yellow taxi, green taxi, and for-hire vehicle (FHVHV/Uber/Lyft) trip records
- MTA — Subway ridership data including turnstile counts by station
- Citi Bike — Bike share trip data with station-level activity
- BTS — Air traffic operations at JFK, LaGuardia, and Newark airports
- NOAA/NWS — Historical weather observations for New York City
Architecture
The platform is built on Cloudflare's edge infrastructure:
- R2 — Object storage for raw parquet files and reference GeoJSON data
- D1 — Edge SQLite database storing pre-aggregated analytics summaries
- KV — Key-value cache for expensive query results (TTL-based)
- Workers — Serverless compute running the Vinext (React RSC) application
Python data pipelines ingest raw data from public sources into R2 as parquet files. An aggregation step computes summary statistics and loads them into D1 for fast edge queries. The website serves pre-computed data — no raw parquet parsing happens at request time.
Open Source
This project is open source. The data pipeline scripts and website code are available on GitHub.
Tech Stack
Framework: Vinext (Vite + React RSC)
Hosting: Cloudflare Workers
CSS: Tailwind CSS v4
Charts: Recharts
Maps: MapLibre GL JS
Tables: TanStack Table
Pipeline: Python + PyArrow
Storage: R2 + D1 + KV