mirror of
https://github.com/mblanke/Gov_Travel_App.git
synced 2026-03-01 14:10:22 -05:00
60 lines
2.2 KiB
Markdown
60 lines
2.2 KiB
Markdown
# Gov_Travel_App
|
|
|
|
## Overview
|
|
This repository contains a Python scraper that collects travel rate tables from the NJC and accommodation listings, then stores the raw tables and normalized entries in a SQLite database.
|
|
|
|
## Setup
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -e .
|
|
```
|
|
|
|
## Run the scraper
|
|
```bash
|
|
python -m gov_travel.main --db data/travel_rates.sqlite3
|
|
```
|
|
|
|
### Optional flags
|
|
- `--sources international domestic accommodations` to limit which sources are scraped.
|
|
- `--pause 1.5` to pause between processing tables.
|
|
- `--log-level DEBUG` to increase logging verbosity.
|
|
- `--no-scrape` to skip scraping and only work with existing database data.
|
|
- `GOV_TRAVEL_USER_AGENT="YourOrg/1.0"` to override the default user agent.
|
|
|
|
## Export an estimate to Excel
|
|
After data exists in SQLite (from a previous scrape), export a cost estimate workbook:
|
|
|
|
```bash
|
|
python -m gov_travel.main \
|
|
--db data/travel_rates.sqlite3 \
|
|
--no-scrape \
|
|
--export-estimate-xlsx output/travel_estimate.xlsx \
|
|
--estimate-days 5 \
|
|
--estimate-rate-type meal \
|
|
--estimate-country Canada \
|
|
--estimate-city Ottawa \
|
|
--estimate-lodging-per-night 235 \
|
|
--estimate-transport-total 175 \
|
|
--estimate-misc-total 80
|
|
```
|
|
|
|
Workbook sheets:
|
|
- `estimate_summary`: Days, recommended meal allowance, line item subtotals, and grand total.
|
|
- `matched_rate_entries`: Source rows used to derive the allowance recommendation.
|
|
|
|
## Database contents
|
|
The database includes:
|
|
- `raw_tables` for every scraped HTML table.
|
|
- `rate_entries` for parsed rate rows (country/city/province + rate fields).
|
|
- `exchange_rates` for parsed currency rates.
|
|
- `accommodations` for parsed lodging listings.
|
|
|
|
If a field is not detected by the heuristics, the full row is still preserved in `raw_tables` and the `raw_json` columns for deeper post-processing.
|
|
|
|
## Suggested next improvements
|
|
- Add automated tests for parser heuristics and the estimate export path.
|
|
- Add currency conversion in estimate exports using `exchange_rates` so totals can be normalized to CAD.
|
|
- Add source-level freshness metadata to avoid duplicate inserts when scraping repeatedly.
|
|
- Expose estimate/export in a small web UI for non-technical users.
|