From https://clerk.house.gov/xml/lists/unofficial-119-member-elect-data.xml, birthdays from Wikipedia, and review by the GovTrack team.
I think everyone was sworn in today except Sen. Justice who plans to take office on Jan 13, and consequently his bioguide ID is not available, so I've left him out.
Since Sen. Luján took office, this file's UTF-8 encoding became significant. It's been double-decoding(?) the text and I've been manually correcting it.
Also reverting the recent change for preserving initial comment blocks (not committed to the version in pypi) in
ddb73d42bd, 31f5c2e685, and e81a592b61
And instead of those changes, just copy the initial comment block data over when writing the social media file.
* Let utils.download() decode the byte stream for all of the scripts. As a result, no need to decode elsewhere (committee_membership.py, house_contacts.py, house_websites.py, thomas_ids.py, wikipedia_ids.py), but sometimes we need to encode back to UTF-8 for parsing XML in case the XML has an encoding declaration (committee_membership.py).
* In utils.unescape, to get a unicode character for a code point in another character set, the Python 2 way with chr doesn't work in Python 3. Replaced with using bytes.
* In bioguide.py, no need to decode the stream (didn't seem to be necessary in Py 2 or Py 3), and with changes in utils.py it's now already decoded.
* The json module now operates over unicode strings (cspan.py).
* Let Python handle UTF-8 encoding when writing to disk, including in CSV outputs (alternate_bulk_formats.py, export_csv.py, wikipedia_ids.py).
* In rtyaml.py, the latest logic for maintaining a comment block at the top of the file was not working at all in Py 3 because io.open doesn't provide a stream with a peek method. Since we're now operating on a seekable stream during *output*, we don't need to peek anymore anyway. So I re-did this.
* When we compute hashes over files for cache freshness checking we must read the files in binary mode and when we save pickle files we must open those files in binary mode too (utils.py's yaml_load, yaml_dump).