Add Python web scraper for NJC travel rates with currency extraction

- Implemented Python scraper using BeautifulSoup and pandas to automatically collect travel rates from official NJC website
- Added currency extraction from table titles (supports EUR, USD, AUD, CAD, ARS, etc.)
- Added country extraction from table titles for international rates
- Flatten pandas MultiIndex columns for cleaner data structure
- Default to CAD for domestic Canadian sources (accommodations and domestic tables)
- Created SQLite database schema (raw_tables, rate_entries, exchange_rates, accommodations)
- Successfully scraped 92 tables with 17,205 rate entries covering 25 international cities
- Added migration script to convert scraped data to Node.js database format
- Updated .gitignore for Python files (.venv/, __pycache__, *.pyc, *.sqlite3)
- Fixed city validation and currency conversion in main app
- Added comprehensive debug and verification scripts

This replaces manual JSON maintenance with automated data collection from official government source.
This commit is contained in:
2026-01-13 09:21:43 -05:00
commit 15094ac94b
84 changed files with 19859 additions and 0 deletions

46
data/sampleFlights.json Normal file
View File

@@ -0,0 +1,46 @@
[
{
"price": 1295.00,
"currency": "CAD",
"duration": "PT16H10M",
"durationHours": 16.2,
"businessClassEligible": true,
"stops": 1,
"carrier": "AC",
"departureTime": "2025-11-15T08:00:00",
"arrivalTime": "2025-11-15T16:10:00"
},
{
"price": 1420.50,
"currency": "CAD",
"duration": "PT14H25M",
"durationHours": 14.4,
"businessClassEligible": true,
"stops": 2,
"carrier": "BA",
"departureTime": "2025-11-15T09:30:00",
"arrivalTime": "2025-11-15T16:55:00"
},
{
"price": 980.25,
"currency": "CAD",
"duration": "PT20H05M",
"durationHours": 20.1,
"businessClassEligible": true,
"stops": 2,
"carrier": "QF",
"departureTime": "2025-11-15T07:15:00",
"arrivalTime": "2025-11-15T16:20:00"
},
{
"price": 875.75,
"currency": "CAD",
"duration": "PT18H40M",
"durationHours": 18.7,
"businessClassEligible": true,
"stops": 3,
"carrier": "SQ",
"departureTime": "2025-11-15T06:45:00",
"arrivalTime": "2025-11-15T15:25:00"
}
]