46 Commits

Author SHA1 Message Date
Joshua Tauberer
ce2c466470 2024 Election Results
From https://clerk.house.gov/xml/lists/unofficial-119-member-elect-data.xml, birthdays from Wikipedia, and review by the GovTrack team.

I think everyone was sworn in today except Sen. Justice who plans to take office on Jan 13, and consequently his bioguide ID is not available, so I've left him out.
2025-01-03 21:03:32 -05:00
Joshua Tauberer
9b8fa05f03 Fix character encoding issue reading senators_cfm.xml
Since Sen. Luján took office, this file's UTF-8 encoding became significant. It's been double-decoding(?) the text and I've been manually correcting it.
2023-11-14 10:34:07 -05:00
Joshua Tauberer
0ef4483df9 Add tests that term start/end dates don't span more congresses than they should and fix all the data errors by consulting bioguide 2020-05-16 23:47:14 -04:00
Forest Gregg
194e578fd7 Update historical committees and yaml file (#623) 2018-12-15 14:29:58 -05:00
Joel Collins
45a0ab5559 (FORCE) new commit with test data removed, generate json on saving data 2017-03-08 23:11:43 -05:00
Joel Collins
cf3dc8763a make redirecting less verbose 2017-02-27 14:07:44 -05:00
Joshua Tauberer
1afbea9df2 add a utility function to compute the start and end dates of congressional terms, by Congress number 2016-06-28 15:48:45 -04:00
Martin Burch
daa7161b8c Now 114th Congress 2015-04-15 16:22:45 -04:00
Sam Handler
1ca3e29ab1 Catch string formatting error in utils.download 2015-02-25 10:50:51 -05:00
Joshua Tauberer
177f2a2938 fix import issue in utils.py 2015-01-04 12:32:34 -05:00
Eric Mill
b296adedc1 update scrapelib to 0.10.0, which requires a reinstall. 2014-07-31 23:09:59 -04:00
Dan Drinkard
42ff8868f5 Sync contact forms from contact-congress 2014-06-11 16:39:43 -04:00
Eric Mill
9942dfb65b update user agent 2014-06-01 21:07:03 -04:00
Joshua Tauberer
3fcaa9f85f now that rtyaml is in pypi, remove its source here and add it to requirements.txt
Also reverting the recent change for preserving initial comment blocks (not committed to the version in pypi) in
ddb73d42bd, 31f5c2e685, and e81a592b61
And instead of those changes, just copy the initial comment block data over when writing the social media file.
2014-04-07 20:32:13 -04:00
Eric Mill
0a27c8843a fix up tasks to pass pyflakes 2014-04-03 13:15:38 -04:00
Joshua Tauberer
b0781fc441 Python 3 porting: charset and related issues
* Let utils.download() decode the byte stream for all of the scripts. As a result, no need to decode elsewhere (committee_membership.py, house_contacts.py, house_websites.py, thomas_ids.py, wikipedia_ids.py), but sometimes we need to encode back to UTF-8 for parsing XML in case the XML has an encoding declaration (committee_membership.py).
* In utils.unescape, to get a unicode character for a code point in another character set, the Python 2 way with chr doesn't work in Python 3. Replaced with using bytes.
* In bioguide.py, no need to decode the stream (didn't seem to be necessary in Py 2 or Py 3), and with changes in utils.py it's now already decoded.
* The json module now operates over unicode strings (cspan.py).
* Let Python handle UTF-8 encoding when writing to disk, including in CSV outputs (alternate_bulk_formats.py, export_csv.py, wikipedia_ids.py).
* In rtyaml.py, the latest logic for maintaining a comment block at the top of the file was not working at all in Py 3 because io.open doesn't provide a stream with a peek method. Since we're now operating on a seekable stream during *output*, we don't need to peek anymore anyway. So I re-did this.
* When we compute hashes over files for cache freshness checking we must read the files in binary mode and when we save pickle files we must open those files in binary mode too (utils.py's yaml_load, yaml_dump).
2014-04-02 18:15:22 -04:00
Joshua Tauberer
a1d5f0fa57 run 2to3 to start Python 2 => Python 3 conversion 2014-04-02 16:09:21 -04:00
Eric Mill
31f5c2e685 Patch up comment block preservation to not depend on the original object getting passed from load to save, by reading comment block at write-time. Also removes the need for another YAML serializer. 2014-03-30 19:26:25 -04:00
Joshua Tauberer
36ae4bc9d4 properly decode the Bioguide's Windows-1252 encoding 2014-01-14 10:40:36 -05:00
Eric Mill
992865f20a executable flags 2013-10-31 16:01:47 -04:00
bchartoff
f0133a7bb9 Scripts to create csv and json legislator bulk files
json script is straightforward (and easily adaptable to other yamls),
csv script tightly linked to data.
2013-08-05 13:09:27 -04:00
Eric Mill
6abbc6380c merging 2013-07-30 11:58:05 -04:00
bchartoff
16aa96e350 updated icpsr.py 2013-07-30 11:34:07 -04:00
Eric Mill
3a0404f78b merge resolution 2013-07-11 14:47:52 -04:00
Eric Mill
54472d433e Adding the ability to email things to an administrator, and used this in the social media lead generator script 2013-07-11 14:43:58 -04:00
Joshua Tauberer
9c442117bc move some of the Yaml routines into a new module called rtyaml (round-trippable yaml) 2013-07-03 18:27:38 -04:00
Eric Mill
2ee19a91fe Catch meta redirects. 2013-07-02 20:20:41 -04:00
Eric Mill
2961215c8e Add a user agent 2013-06-13 12:07:46 -04:00
Eric Mill
985e940030 Catch pickle errors, and add option to use plain urllib 2013-06-13 11:58:38 -04:00
Joshua Tauberer
f619d59b5d add wikipedia page names to the ID field, by scanning for pages using the CongBio and CongLinks templates and keying off of bioguide IDs 2013-04-06 17:50:32 -04:00
Eric Mill
5e0c72df45 Increased rate limit, NYT's rate limiter auto-emailed me to complain 2013-03-20 16:38:27 -04:00
Joshua Tauberer
7c8f489f84 updating committees-historical (mostly just indicating some committees present in 113th Congress, but also THOMAS made some name corrections) 2013-03-19 13:38:10 -04:00
Joshua Tauberer
2292033318 update our YAML dumper to use tildes for nulls, so our one tilde doesnt get changed on output from a script 2013-03-19 13:28:21 -04:00
Eric Mill
b22e973791 Use the SafeLoader to avoid crazy serialization security issues 2013-02-01 15:10:28 -05:00
Joshua Tauberer
99518273d7 use the faster libyaml parser if it is available 2013-01-29 13:34:46 -05:00
GovTrack.us
2de1484d8a ensure all strings that look like integers are quoted (only needed for zero-lead octal-integer-like strings with '8' or '9' in the value which would previously omit quotes) 2013-01-15 11:28:32 -05:00
Eric Mill
0853617624 Beginnings of a bioguide parser 2013-01-04 16:14:41 -05:00
GovTrack.us
e1e528db07 add retire.py script to make it easier to dead with moving legislators into the historical file 2012-12-17 19:05:27 -05:00
Joshua Tauberer
48d5514d80 make loading YAML faster (it is so slow) by caching it in pickle'd format 2012-12-01 17:16:45 -05:00
Eric Mill
7c94e7afc3 Fixed up some weird top level fields, probably my mistake 2012-11-27 13:57:59 -05:00
Eric Mill
4955831411 Using same utils stuff in committee membership script 2012-11-08 23:15:35 -05:00
Eric Mill
f203857c43 Merged changes 2012-11-08 22:24:01 -05:00
Eric Mill
88813f318e Default to not caching, allow override 2012-11-08 22:19:38 -05:00
Eric Mill
44206eb09c Borrowed some utils functions from the THOMAS scraper, updated README about running scripts with virtualenv 2012-11-08 22:00:07 -05:00
GovTrack.us
bd8a3c6444 update current House committee metadata from House Clerk pages using a new committee membership scraper (House only so far) 2012-11-06 17:53:21 -05:00
Eric Mill
1d2d1b5afb Moving scripts from congress-legislators-scripts in 2012-11-06 11:28:01 -05:00