mblanke 15094ac94b Add Python web scraper for NJC travel rates with currency extraction
- Implemented Python scraper using BeautifulSoup and pandas to automatically collect travel rates from official NJC website
- Added currency extraction from table titles (supports EUR, USD, AUD, CAD, ARS, etc.)
- Added country extraction from table titles for international rates
- Flatten pandas MultiIndex columns for cleaner data structure
- Default to CAD for domestic Canadian sources (accommodations and domestic tables)
- Created SQLite database schema (raw_tables, rate_entries, exchange_rates, accommodations)
- Successfully scraped 92 tables with 17,205 rate entries covering 25 international cities
- Added migration script to convert scraped data to Node.js database format
- Updated .gitignore for Python files (.venv/, __pycache__, *.pyc, *.sqlite3)
- Fixed city validation and currency conversion in main app
- Added comprehensive debug and verification scripts

This replaces manual JSON maintenance with automated data collection from official government source.
2026-01-13 09:21:43 -05:00

Gov_Travel_App

Overview

This repository contains a Python scraper that collects travel rate tables from the NJC and accommodation listings, then stores the raw tables and normalized entries in a SQLite database.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .

Run the scraper

python -m gov_travel.main --db data/travel_rates.sqlite3

The database includes:

  • raw_tables for every scraped HTML table.
  • rate_entries for parsed rate rows (country/city/province + rate fields).
  • exchange_rates for parsed currency rates.
  • accommodations for parsed lodging listings.

If a field is not detected by the heuristics, the full row is still preserved in raw_tables and the raw_json columns for deeper post-processing.

Description
No description provided
Readme 1.1 MiB
Languages
Python 100%