mirror of https://github.com/mblanke/Gov_Travel_App.git synced 2026-03-01 14:10:22 -05:00

Go to file

mblanke 15094ac94b Add Python web scraper for NJC travel rates with currency extraction

- Implemented Python scraper using BeautifulSoup and pandas to automatically collect travel rates from official NJC website
- Added currency extraction from table titles (supports EUR, USD, AUD, CAD, ARS, etc.)
- Added country extraction from table titles for international rates
- Flatten pandas MultiIndex columns for cleaner data structure
- Default to CAD for domestic Canadian sources (accommodations and domestic tables)
- Created SQLite database schema (raw_tables, rate_entries, exchange_rates, accommodations)
- Successfully scraped 92 tables with 17,205 rate entries covering 25 international cities
- Added migration script to convert scraped data to Node.js database format
- Updated .gitignore for Python files (.venv/, __pycache__, *.pyc, *.sqlite3)
- Fixed city validation and currency conversion in main app
- Added comprehensive debug and verification scripts

This replaces manual JSON maintenance with automated data collection from official government source.

2026-01-13 09:21:43 -05:00

data

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

database

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

documents

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

scripts

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

services

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

src/gov_travel

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

tests

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

utils

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

.dockerignore

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

.env.example

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

.gitignore

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

docker-compose.yml

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

Dockerfile

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

enhanced-features.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

extract_canadian.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

extract_cities2.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

extract_cities.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

flightService.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

Govt Links.txt

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

improvements.json

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

index.html

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

jest.config.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

package.json

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

pyproject.toml

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

QUICK_START.md

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

README.md

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

requirements.txt

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

script.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

server.js

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

styles.css

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

travel_rates.db

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

validation.html

Add Python web scraper for NJC travel rates with currency extraction

2026-01-13 09:21:43 -05:00

README.md

Gov_Travel_App

Overview

This repository contains a Python scraper that collects travel rate tables from the NJC and accommodation listings, then stores the raw tables and normalized entries in a SQLite database.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e .

Run the scraper

python -m gov_travel.main --db data/travel_rates.sqlite3

The database includes:

raw_tables for every scraped HTML table.
rate_entries for parsed rate rows (country/city/province + rate fields).
exchange_rates for parsed currency rates.
accommodations for parsed lodging listings.

If a field is not detected by the heuristics, the full row is still preserved in raw_tables and the raw_json columns for deeper post-processing.