Commit Graph

130 Commits

Author SHA1 Message Date
bchartoff
16aa96e350 updated icpsr.py 2013-07-30 11:34:07 -04:00
bchartoff
9f489cf1f6 removed scripst/build 2013-07-25 17:15:06 -04:00
bchartoff
0cf2f340f9 updated ICPSR id's
pulled and matched ICPSR id's from roll call source data
2013-07-25 17:11:31 -04:00
bchartoff
cb9ea0eb83 Handled bioguide IDs with no IE ID 2013-06-13 10:59:29 -04:00
bchartoff
f2df26e53c Updated CRP_ID.py 2013-06-11 11:16:09 -04:00
bchartoff
a017bf0537 Revert "Update CRP_ID to no longer require lxml scraping"
This reverts commit 678f2b8a14.
2013-06-11 10:54:15 -04:00
bchartoff
678f2b8a14 Update CRP_ID to no longer require lxml scraping 2013-06-11 10:52:43 -04:00
Eric Mill
16737a7f65 Removing some cruft 2013-06-11 10:21:19 -04:00
Eric Mill
897022a186 Remove .DS_Store 2013-06-11 10:19:37 -04:00
Eric Mill
0f02e9ecca Refactor to use json directly for IE download 2013-06-11 10:19:24 -04:00
bchartoff
39f1eced9f unchanged, but this was why it wasn't synching 2013-06-11 09:20:46 -04:00
bchartoff
6d49f83d39 added script to update CRP ID's from IE API 2013-06-10 15:19:37 -04:00
Jeremy Carbaugh
30720d6745 Added method to resolve Facebook graph IDs from usernames and updated social media YAML 2013-06-01 03:33:26 -04:00
Joshua Tauberer
6586c98fa4 Mark Sanford was sworn in. Moved him from the historical file using a new helper script 'untire.py' (a pun on un-retire) 2013-05-16 07:14:17 -04:00
Joshua Tauberer
54d1d95e82 in committee-membership-current, sorting members first by party (majority first) then by rank and updating the NYT scraper to produce the same ordering when run (but I didn't rerun the script for this commit, just sorted what we had) --- the purpose of this commit is to match the sort order of the main membership scraper 2013-05-06 10:43:08 -04:00
Joshua Tauberer
6526705363 update committee metadata, with updated committee_membership.py scraper that now uses House committee pages again which we thought were discontinued but actually still exist and are up to date 2013-05-06 10:32:46 -04:00
Derek Willis
d5a3064247 added script to scrape house history ids 2013-04-12 15:56:28 -04:00
Eric Mill
13d6758a29 use https urls 2013-04-08 14:47:14 -04:00
Joshua Tauberer
f619d59b5d add wikipedia page names to the ID field, by scanning for pages using the CongBio and CongLinks templates and keying off of bioguide IDs 2013-04-06 17:50:32 -04:00
Eric Mill
5e0c72df45 Increased rate limit, NYT's rate limiter auto-emailed me to complain 2013-03-20 16:38:27 -04:00
Eric Mill
70dc79e681 Fixed issue where it was always caching NYT committee members 2013-03-20 16:01:58 -04:00
Eric Mill
5ba221a5c3 Fix to include House Intelligence Committee 2013-03-20 15:47:19 -04:00
Eric Mill
d67656ccab Trying it again, this time not destroying Joint committee memberships 2013-03-20 15:12:26 -04:00
Eric Mill
b9bb3e6d89 Revert "I believe I have the House committee members (for top-level committees) from the NYT API"
This reverts commit ce24710938.
2013-03-20 15:10:55 -04:00
Eric Mill
ce24710938 I believe I have the House committee members (for top-level committees) from the NYT API 2013-03-20 15:09:11 -04:00
Joshua Tauberer
7c8f489f84 updating committees-historical (mostly just indicating some committees present in 113th Congress, but also THOMAS made some name corrections) 2013-03-19 13:38:10 -04:00
Joshua Tauberer
2292033318 update our YAML dumper to use tildes for nulls, so our one tilde doesnt get changed on output from a script 2013-03-19 13:28:21 -04:00
Eric Mill
c8bee96146 blacklisting a campaign account 2013-02-27 19:52:51 -06:00
Eric Mill
6170889712 fix bug for youtube sweeping, and add some better patterns 2013-02-15 17:46:26 -05:00
Eric Mill
a2207af108 Updated black and whitelists for youtube 2013-02-15 17:46:07 -05:00
Eric Mill
5e99a9a155 Caught an old url pattern 2013-02-15 13:27:17 -05:00
Eric Mill
c170311684 A couple more blacklists 2013-02-15 13:27:01 -05:00
Eric Mill
73ac7c38a6 Wasn't meant to be committed 2013-02-15 13:26:49 -05:00
Eric Mill
7b5ea31f66 A ccouple of campaign account things 2013-02-15 12:06:30 -05:00
Eric Mill
e1a1f7297f Switch field name from facebook_graph to facebook 2013-02-15 11:55:51 -05:00
Eric Mill
ddeff12220 Starting from scratch on Facebook accounts 2013-02-15 02:12:30 -05:00
Eric Mill
a6e346af39 Fixed up regexes and process for outputting facebook information 2013-02-14 18:59:49 -05:00
Eric Mill
30fbabe4e0 Lots more facebook blacklist entries 2013-02-14 18:58:59 -05:00
Eric Mill
46204f2c32 print out candidate URL in social media spreadsheet, add a pages regex for facebook, fix unicode error for outputting names 2013-02-14 17:54:33 -05:00
Eric Mill
8eb664a84d Updated committee memberships for the Senate, included a couple of committee name changes, a couple new subcommittees on the Judiciary Committee, and some temporary code in the memberships script to leave the House data as-is for the time being 2013-02-14 17:41:35 -05:00
Joshua Tauberer
4c2a87e836 thomas_ids.py: update regex for change in href values (now full URLs not relative paths) 2013-02-08 07:40:05 -05:00
Joshua Tauberer
a4a9166420 update senate contact info 2013-02-06 09:42:18 -05:00
Eric Mill
b22e973791 Use the SafeLoader to avoid crazy serialization security issues 2013-02-01 15:10:28 -05:00
Joshua Tauberer
99518273d7 use the faster libyaml parser if it is available 2013-01-29 13:34:46 -05:00
Eric Mill
57f36be968 Removing some unneeded code, and adding in some output - but the script still can't output correct data until the House and Senate are both up 2013-01-28 17:57:14 -05:00
Eric Mill
5cf0c787cc Made the blacklist more specific, it was cutting out an account 2013-01-28 16:30:38 -05:00
Eric Mill
9954cfd134 campaign account to blacklist 2013-01-28 14:33:19 -05:00
Eric Mill
a51a269466 runnable photo resize script 2013-01-25 19:23:48 -05:00
Eric Mill
3cbe861b1d Script to also get additional fields from the other Senate member XML source 2013-01-25 11:51:08 -05:00
Eric Mill
d32b869f73 Sweeping old out-of-office people from committee memberships, as a temporary measure until we get updated people 2013-01-24 12:56:21 -05:00