mblanke 969ba062f7 Add alphabet navigation to scraper - now collects ALL 233 countries
- Implemented alphabet navigation (A-Z) for NJC international rates page
- Added request delays (2s) and retry logic with exponential backoff to avoid server timeouts
- Added error handling for pages without tables
- Installed html5lib for better HTML parsing
- Now scrapes 233 countries (up from 15) with 104 unique currencies
- Total 11,628 international rate entries collected
- Added verification scripts to check all countries and their currencies
- Fixed currency extraction working perfectly for EUR, USD, CAD, AUD, ARS, and 99+ other currencies
2026-01-13 09:27:21 -05:00

Gov_Travel_App

Overview

This repository contains a Python scraper that collects travel rate tables from the NJC and accommodation listings, then stores the raw tables and normalized entries in a SQLite database.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .

Run the scraper

python -m gov_travel.main --db data/travel_rates.sqlite3

The database includes:

  • raw_tables for every scraped HTML table.
  • rate_entries for parsed rate rows (country/city/province + rate fields).
  • exchange_rates for parsed currency rates.
  • accommodations for parsed lodging listings.

If a field is not detected by the heuristics, the full row is still preserved in raw_tables and the raw_json columns for deeper post-processing.

Description
No description provided
Readme 1.1 MiB
Languages
Python 100%