Commit Graph

682 Commits

Author SHA1 Message Date
Eric Mill
3fea0bd0c3 mkdir_p as necessary when walking disk 2017-02-25 19:50:03 -05:00
Eric Mill
7b0976cd59 ignore python-version 2017-02-25 19:49:40 -05:00
Eric Mill
0d9d4b238c ignore pyenv locking 2017-02-25 19:39:41 -05:00
Eric Mill
729d3219c0 https links 2017-02-25 19:39:25 -05:00
Joshua Tauberer
126ed1bbcb upcoming_house_floor: scan future week-of postings too because there's often info for the next week while the current week showing on docs.house.gov is up 2017-02-19 08:19:47 -05:00
Joshua Tauberer
1b089b0fc0 committee_meetings: dont die if there's an invalid event ID 2017-02-19 08:18:28 -05:00
Joshua Tauberer
f5da6e31fe theres a new(?) Motion to Proceed to Legislative Session vote type in the Senate that is getting an awkward question value tied to a nomination when the vote is likely about moving on from executive session to other matters 2017-02-03 10:08:07 -05:00
Joshua Tauberer
6ae006398e dont raise an xml parsing exception if docs.house.gov doesnt have a download link 2017-01-19 16:34:43 -05:00
Joshua Tauberer
517398f0ff fix votes task autodetection of the current session: we're still in the 2016 legislative year until Jan 3 at noon 2017-01-01 08:05:04 -05:00
Joshua Tauberer
53e946f6e3 fix upstream data error in s2943-114 which is mising 'as amended' in a House vote 2016-12-27 06:27:14 -05:00
Joshua Tauberer
3a0aa64127 add regex to detect when a bill is enacted by the ten day rule and mark as status ENACTED:TENDAYRULE
"Sent to Archivist of the United States unsigned." indicates this. I had hard-coded bill numbers in the past, but it happened again with hr6297-114 and so now I'm doing a proper fix. I can't remove the old hard-coded bills because I can't test the change because we can no longer fetch the data from THOMAS.
2016-12-26 10:48:57 -05:00
Joshua Tauberer
97dd2a42cd senate votes should use the 'vote_title' field as our 'question' when the vote is on cloture 2016-12-25 10:16:32 -05:00
Joshua Tauberer
dc041db5f6 fix parsing of historical vote legislator lookup to not mind if a legislator has two terms on Jan 3
After recent updates to congress-legislator historical start/end dates, we began getting:

Multiple matches of name Slaughter (VA-R; 1991-01-03) to legislators (excludes set([])).
[h1-102.1991] Missing bioguide ID and name lookup failed for Slaughter (VA-R on 1991-01-03 12:02:00)
Exception: No bioguide ID for Slaughter (VA-R)

But there weren't really multiple legislators matching, just multiple terms.

(There are other new cases of multiple legislators matching now though too.)
2016-12-25 10:16:32 -05:00
Joshua Tauberer
260b4c880c add --force flag to bills scraper to re-parse everything from the (existing) fdsys XML 2016-12-25 10:16:32 -05:00
Joshua Tauberer
a284a20e99 add committee_reports to bill output
e.g., for hr2028-114:

  "committee_reports": [
    "H. Rept. 114-91",
    "S. Rept. 114-54"
  ],
2016-12-25 10:16:32 -05:00
Bill Hunt
f2b9fbe0dc Fix aggregaton of results on import. (#193) 2016-12-16 20:44:34 -05:00
Joshua Tauberer
8da077083a Merge pull request #190 from unitedstates/josh
drop THOMAS IDs from output, replace with bioguide in XML outputs
2016-12-11 08:52:57 -05:00
Joshua Tauberer
4c35e1b5a8 store CRPT (committee reports) in the congress directories rather than in fdsys/CRPT/year/... 2016-12-11 08:01:24 -05:00
Joshua Tauberer
8d56a630dd drop THOMAS IDs from output, replace with bioguide in XML outputs 2016-12-03 14:12:53 -05:00
Joshua Tauberer
24ddb45639 tweak vote catgory regexes 2016-12-03 13:46:07 -05:00
Joshua Tauberer
37b8e67c60 deab2f384d broke fdsys: successfuld downloads were treated as unknown errors and files were being refetched over and over 2016-12-03 13:46:07 -05:00
Bill Hunt
6389102d02 Return list of new files from fdsys functions (#187) 2016-10-23 17:45:11 -04:00
Bill Hunt
5ceb7b2d27 Handle .DS_Store files (#186)
* Add handling for .DS_Store files in OS X
2016-10-23 12:10:00 -04:00
Bill Hunt
510c10fa29 Add filter for Congressional session (#185)
* Adding congress filter for BILLSTATUS
2016-10-22 11:47:42 -04:00
Joshua Tauberer
bd86189e0a put ENACTED:VETO_OVERRIDE on the final override vote action rather than on the OFR public law number action to be more parallel to when ENACTED is applied to bills signed by the president, and because this is more convenient for GovTrack; see s2040-114 2016-10-07 16:04:17 -04:00
Joshua Tauberer
69f118cee6 Merge pull request #179 from divergentdave/urlretrieve
Use scrapelib.urlretrieve()
2016-09-30 11:38:16 -04:00
Joshua Tauberer
dc4087688f bill subjects moved to a new field in the August 2016 updates to the bulk data 2016-09-02 19:43:16 -04:00
David Cook
deab2f384d Use urlretrieve() instead of wget, speed up FDsys 2016-08-29 18:57:20 -05:00
Joshua Tauberer
1c7c7ba0b8 upcoming_house_floor: Handle "Senate amendment to the House amendment to" bill descriptions 2016-07-13 12:20:22 -04:00
Joshua Tauberer
ee776621e8 fdsys: I deleted too much during the refactor. This puts back writing the bill text version data.json files which extract important MODS metadata fields. 2016-07-13 12:20:22 -04:00
Joshua Tauberer
ec61a3a255 fdsys: If nothing new was fetched, then there is no reason to update the lastmod file for every bill.
It was updating the lastmod JSON file for every bill on every run, even though the files were mostly not changing.

Big speedup. Less disk writes.
2016-07-13 12:20:22 -04:00
Joshua Tauberer
f54f61490d Merge pull request #175 from unitedstates/fdsys_billstatus_data
Replace THOMAS bill/amdt scrapers with bulk data importer
2016-07-07 13:00:06 -04:00
Joshua Tauberer
7719660f10 forgot to replace utils.get_govtrack_person_id with utils.translate_legislator_id in vote govtrack-compatble XML output 2016-07-06 12:14:35 -04:00
Joshua Tauberer
c458e71e5b amendment's introduced_at has been a date, so lopping off the time portion 2016-07-03 10:40:23 -04:00
Joshua Tauberer
68940b0e53 forgot to add xmltodict to requirements.txt 2016-07-02 19:13:12 -04:00
Joshua Tauberer
d58df048d3 replace THOMAS scraper with USGPO bill status XML importer
There is no longer a separate amendments scraper. Amendments are saved as a part of importing bills. Amendments to treaties are no longer available.

some of this work was done by @crdunwel
2016-07-01 08:47:57 -04:00
Joshua Tauberer
48c7b3c3ac merge branch 'fdsys_redo' 2016-06-30 17:51:03 -04:00
Joshua Tauberer
b042b4febc fdsys: no need to scrape for a list of bulk data collections, there's a master sitemap, see #170 2016-06-30 17:49:34 -04:00
Joshua Tauberer
47a5e9bc49 whitelist another bill enacted by the ten-day rule, from the 93rd Congress 2016-06-29 08:04:59 -04:00
Joshua Tauberer
8e0aed16c4 a House roll call vote line is missing the vote tally in H.R. 2577/114th
> On agreeing to the conference report Agreed to by the Yeas and Nays: (Roll No. 342).
2016-06-23 08:50:16 -04:00
Joshua Tauberer
dcd1b56bd5 parse House ping-pong vote pursuant to rule
in H.R. 2577 114th:

> House agreed to Senate amendment with amendment pursuant to H.Res. 751
2016-06-23 08:37:17 -04:00
Joshua Tauberer
a0761b4ae9 correct an upstream data error in an action line for s2012-114 (it's missing 'as amended') 2016-06-14 08:43:10 -04:00
Joshua Tauberer
5e723c2080 add 'House Amendment to' pattern to upcoming_house_floor 2016-06-06 07:00:15 -04:00
Joshua Tauberer
52651a211c fdsys: add a 'filter' option to only save certain packages/files 2016-05-27 09:20:42 -04:00
Joshua Tauberer
b8227b0d4a fdsys: the timestamp that appears in the bulk data listing page is not an indication of lastmod dates
Each bulk data sitemap root must be downloaded individually to see if there were any updates.
2016-05-27 09:19:46 -04:00
Joshua Tauberer
a7364ab062 my fix in c6b85a1153 was wrong, also adding validation to bill_id_for in upcoming house bills scraper 2016-05-25 10:06:51 -04:00
Joshua Tauberer
c6b85a1153 docs.house.gov has inconsistent case in descriptions of senate amendments to bills 2016-05-24 10:24:03 -04:00
Joshua Tauberer
f5423f4be7 sometimes Senate vote index pages come back as a 404 - catch that before getting an XML parsing error 2016-04-26 19:09:08 -04:00
Joshua Tauberer
79f1994089 re-write the FDSys scraper
It now can download bulk data files using the bulk data sitemaps.

It's also a bit cleaner / more maintainable.

Dropped some code no one was using.
2016-03-22 09:53:27 -04:00
Eric Mill
6e983befae Merge pull request #168 from unitedstates/fix-travis
Fix Travis CI by dropping OS X builds
2016-03-14 02:52:03 -04:00