Commit Graph

108 Commits

Author SHA1 Message Date
Eric Mill
63cb58f9e0 whitespace 2013-05-08 18:09:55 -04:00
Eric Mill
ef2de7a39f Fixed longstanding bug at only showing the most recent stack trace for other errors 2013-05-08 18:08:43 -04:00
Eric Mill
1e9391a239 Allow for fetching senate unprinted amendments 2013-05-08 17:56:32 -04:00
Joshua Tauberer
b9448d83ed committee meetings parser 2013-05-02 10:12:07 -04:00
Eric Mill
28ced678a9 Set a custom user agent for the project 2013-04-29 22:46:29 -04:00
Joshua Tauberer
9b01c418c2 when mirroring FDSys it is considerably faster (and less memory intensive) to shell out to wget to do the download rather than using scrapelib (but what about throttling?) 2013-04-04 14:53:05 +00:00
Eric Mill
311cd1720e Spit out the # of errors at the end no matter what. (Many errors can push the count up too high.) 2013-03-01 22:51:34 -06:00
Eric Mill
f1ed39011d whitespace 2013-03-01 22:48:33 -06:00
Chris Wilson
be78684018 Added error catching for split nomination ids 2013-02-03 14:09:10 -05:00
Joshua Tauberer
4dfb1475dd when saving GovTrack files, use the faster libyaml parser to load the congress-legislators data files 2013-01-29 13:45:21 -05:00
Chris Wilson
4ae205cf58 added option for POST data to download() 2013-01-26 13:03:37 -05:00
Chris Wilson
0795853be5 Added a few nomination functions directly analogous to bill functions 2013-01-26 11:36:53 -05:00
Eric Mill
d93634f532 Move the thomas ID correction out of just the govtrack export code and into the general data output 2013-01-25 11:26:19 -05:00
Eric Mill
67a12db5cc name correction 2013-01-24 10:18:19 -05:00
Joshua Tauberer
15c095c668 handle THOMAS providing an incorrect ID for Rep. C. A. Dutch Ruppersberger 2013-01-24 07:41:16 -05:00
Joshua Tauberer
92a58a9017 a little cleanup for GovTrack output when a THOMAS ID is not found 2013-01-24 07:37:53 -05:00
Eric Mill
75bd99e359 Added a bill_versions task that takes a full --congress, a --bill_id, or a --bill_version_id, and saves a text-versions/[version_code].json file with the data for each version: when it was issued, its code/id, and URLs to each published version of it. Uses fdsys.py for sitemap crawling and MODS doc parsing 2013-01-20 18:55:54 -05:00
Eric Mill
69df2aebdb Added a couple helper functions, made sure to transform congress into integer all the time 2013-01-20 16:55:53 -05:00
Eric Mill
782af88e85 Refactored download helper to have all extra options go through options hash, documented each option 2013-01-20 15:16:54 -05:00
Joshua Tauberer
04322a0029 fdsys: add a method to locally store mods, PDF, etc. and update when sitemap indicates changes 2013-01-20 11:08:46 -05:00
Joshua Tauberer
e181aed4ba for --fast, move the part that writes the cache to be after the bill is successfully parsed 2013-01-17 08:16:40 -05:00
Eric Mill
4865ce2ad5 Drastically simplified fast-caching process for bills, by handling all cache detection in the bill pagination process. This allows for the possibility that we could overlook a change if the script aborted between caching the new state and completing the bill fetching/output process. 2013-01-15 12:08:58 -05:00
GovTrack.us
e13c5564f6 add a --fast option for parsing bills using Derek's original idea of detecting many (but not all) changes to bills by looking at the search result listing content 2013-01-15 08:14:30 -05:00
GovTrack.us
0382fb66d7 support Python < 2.7 by removing the dict-creation syntax { a:b ... } 2013-01-15 07:23:44 -05:00
GovTrack.us
51a143d8d8 parsing amendments (hopefully) 2013-01-06 10:43:48 -05:00
GovTrack.us
63d678e722 change some print statements to logging.warn and use a special Exception class when a GovTrack ID lookup fails 2013-01-06 09:41:46 -05:00
GovTrack.us
d816441d37 partial revert of 4cbf3bc2fe which added a check if a vote file is changed (module updated_at) before saving it, but this logic is better handled by my GovTrack import scripts rather than here 2013-01-06 09:41:09 -05:00
GovTrack.us
4cbf3bc2fe 1) fix how congress-legislators repo is updated; 2) in the votes parser change the meaning of --force and add new option --fetch 2013-01-03 18:36:26 -05:00
Eric Mill
0d6bb0bae5 Update current_congress function to consider the first 2.5 days of the year as the last year 2013-01-02 16:47:32 -05:00
GovTrack.us
e851ae6073 revised vote IDs to use canonical session years (e.g. 2012) rather than session ordinals (e.g. 1, 2) 2013-01-02 09:59:49 -05:00
GovTrack.us
eda6f0cb5a correct daylight saving timezone handling so that the UTC offset in all serialized dates is correct with respect to whether DST was in effect at the time (and does not change anything else) 2012-12-30 17:52:12 -05:00
GovTrack.us
a95e2b38e0 refactoring mistake 2012-12-30 17:42:48 -05:00
GovTrack.us
459a37d838 replace the submodule with scripted clone/pull, which makes it easier to always be at the latest upstream commit and lets us control the clone depth 2012-12-30 15:47:49 -05:00
Joshua Tauberer
a21fc6e954 more on vote parsing 2012-12-30 13:24:35 -05:00
Joshua Tauberer
a1da46b78f starting a roll call votes parser, including some refactoring of existing code so it can be reused 2012-12-27 16:12:24 -05:00
Eric Mill
e611f15378 whitespace 2012-11-27 18:00:35 -05:00
GovTrack.us
e8497461c2 missed a few log calls in the merge 2012-11-24 11:07:26 -05:00
GovTrack.us
1ad6eb72e5 merge... uhm this was a complicated one, hopefully not breaking anything 2012-11-24 08:44:47 -05:00
Eric Mill
544afd79ff Moved committee mapping fetching to a utils method with a globally cached map, to remove it from the method signatures of the main process. Also added a (version controlled) cache dir in the test/fixtures folder so that tests don't hit the network 2012-11-15 19:09:49 -05:00
GovTrack.us
3c847d3f19 swtich to using python logging module 2012-11-11 18:22:49 -05:00
GovTrack.us
5b6b6b1631 in the log util function, also treat unicode strings as plain strings to be printed 2012-10-31 18:04:12 -04:00
Eric Mill
b974ebcfc8 typo in comment 2012-10-01 17:30:15 -04:00
Eric Mill
22e59e2fad Incorporate control character removal into the unescape process of text, and bake that into the download/caching process 2012-09-08 16:54:59 -04:00
Eric Mill
030526bcea Remove unicode control chars from titles 2012-09-08 16:47:00 -04:00
Eric Mill
d6b4bc8e63 Fix date serialization for actions 2012-09-08 16:26:20 -04:00
Joshua Tauberer
b029e8c295 remove unused import (iso8601) 2012-09-08 14:12:42 -04:00
Eric Mill
387050ddf2 Some better error checking, and a fix for occasional 0-byte file downloads 2012-09-07 19:12:05 -04:00
Eric Mill
524c4d9fa5 Add a from_name attribute, and differentiate the subject line by time 2012-09-07 18:46:33 -04:00
Eric Mill
3137053e24 Made cache and data dirs configurable, made it so config.yml is read in only once 2012-09-07 14:44:59 -04:00
Eric Mill
724bf2c30f Swallowing errors around the admin logging function, don't want to end up in a loop 2012-09-07 14:30:00 -04:00