mirror of
https://github.com/mblanke/Gov_Travel_App.git
synced 2026-03-01 22:20:21 -05:00
969ba062f7df7ebe64dcd1d08f2e0d3aeff150d0
- Implemented alphabet navigation (A-Z) for NJC international rates page - Added request delays (2s) and retry logic with exponential backoff to avoid server timeouts - Added error handling for pages without tables - Installed html5lib for better HTML parsing - Now scrapes 233 countries (up from 15) with 104 unique currencies - Total 11,628 international rate entries collected - Added verification scripts to check all countries and their currencies - Fixed currency extraction working perfectly for EUR, USD, CAD, AUD, ARS, and 99+ other currencies
Gov_Travel_App
Overview
This repository contains a Python scraper that collects travel rate tables from the NJC and accommodation listings, then stores the raw tables and normalized entries in a SQLite database.
Setup
python -m venv .venv
source .venv/bin/activate
pip install -e .
Run the scraper
python -m gov_travel.main --db data/travel_rates.sqlite3
The database includes:
raw_tablesfor every scraped HTML table.rate_entriesfor parsed rate rows (country/city/province + rate fields).exchange_ratesfor parsed currency rates.accommodationsfor parsed lodging listings.
If a field is not detected by the heuristics, the full row is still preserved in raw_tables and the raw_json columns for deeper post-processing.
Description
Languages
Python
100%