8 Commits

Author SHA1 Message Date
Eric Mill
0a27c8843a fix up tasks to pass pyflakes 2014-04-03 13:15:38 -04:00
Eric Mill
a7ff75442e Travis CI integratiin, with a workout script that imports every script to check for syntax errors. also updates each script to make them import-able withoue executing. 2014-04-03 12:58:57 -04:00
Joshua Tauberer
b0781fc441 Python 3 porting: charset and related issues
* Let utils.download() decode the byte stream for all of the scripts. As a result, no need to decode elsewhere (committee_membership.py, house_contacts.py, house_websites.py, thomas_ids.py, wikipedia_ids.py), but sometimes we need to encode back to UTF-8 for parsing XML in case the XML has an encoding declaration (committee_membership.py).
* In utils.unescape, to get a unicode character for a code point in another character set, the Python 2 way with chr doesn't work in Python 3. Replaced with using bytes.
* In bioguide.py, no need to decode the stream (didn't seem to be necessary in Py 2 or Py 3), and with changes in utils.py it's now already decoded.
* The json module now operates over unicode strings (cspan.py).
* Let Python handle UTF-8 encoding when writing to disk, including in CSV outputs (alternate_bulk_formats.py, export_csv.py, wikipedia_ids.py).
* In rtyaml.py, the latest logic for maintaining a comment block at the top of the file was not working at all in Py 3 because io.open doesn't provide a stream with a peek method. Since we're now operating on a seekable stream during *output*, we don't need to peek anymore anyway. So I re-did this.
* When we compute hashes over files for cache freshness checking we must read the files in binary mode and when we save pickle files we must open those files in binary mode too (utils.py's yaml_load, yaml_dump).
2014-04-02 18:15:22 -04:00
Joshua Tauberer
a1d5f0fa57 run 2to3 to start Python 2 => Python 3 conversion 2014-04-02 16:09:21 -04:00
Eric Mill
992865f20a executable flags 2013-10-31 16:01:47 -04:00
Joshua Tauberer
4c2a87e836 thomas_ids.py: update regex for change in href values (now full URLs not relative paths) 2013-02-08 07:40:05 -05:00
GovTrack.us
ef14d20d93 manually correct some THOMAS IDs: Congress.gov lists more than one person for a district in cases where they should have moved individuals to the senate. The thomas_ids script should not be used until that is fixed. 2013-01-05 09:28:25 -05:00
GovTrack.us
9a0d9ee9c4 assign new THOMAS IDs 2013-01-04 08:58:39 -05:00