Eric Mill
3fea0bd0c3
mkdir_p as necessary when walking disk
2017-02-25 19:50:03 -05:00
Eric Mill
7b0976cd59
ignore python-version
2017-02-25 19:49:40 -05:00
Eric Mill
0d9d4b238c
ignore pyenv locking
2017-02-25 19:39:41 -05:00
Eric Mill
729d3219c0
https links
2017-02-25 19:39:25 -05:00
Joshua Tauberer
126ed1bbcb
upcoming_house_floor: scan future week-of postings too because there's often info for the next week while the current week showing on docs.house.gov is up
2017-02-19 08:19:47 -05:00
Joshua Tauberer
1b089b0fc0
committee_meetings: dont die if there's an invalid event ID
2017-02-19 08:18:28 -05:00
Joshua Tauberer
f5da6e31fe
theres a new(?) Motion to Proceed to Legislative Session vote type in the Senate that is getting an awkward question value tied to a nomination when the vote is likely about moving on from executive session to other matters
2017-02-03 10:08:07 -05:00
Joshua Tauberer
6ae006398e
dont raise an xml parsing exception if docs.house.gov doesnt have a download link
2017-01-19 16:34:43 -05:00
Joshua Tauberer
517398f0ff
fix votes task autodetection of the current session: we're still in the 2016 legislative year until Jan 3 at noon
2017-01-01 08:05:04 -05:00
Joshua Tauberer
53e946f6e3
fix upstream data error in s2943-114 which is mising 'as amended' in a House vote
2016-12-27 06:27:14 -05:00
Joshua Tauberer
3a0aa64127
add regex to detect when a bill is enacted by the ten day rule and mark as status ENACTED:TENDAYRULE
...
"Sent to Archivist of the United States unsigned." indicates this. I had hard-coded bill numbers in the past, but it happened again with hr6297-114 and so now I'm doing a proper fix. I can't remove the old hard-coded bills because I can't test the change because we can no longer fetch the data from THOMAS.
2016-12-26 10:48:57 -05:00
Joshua Tauberer
97dd2a42cd
senate votes should use the 'vote_title' field as our 'question' when the vote is on cloture
2016-12-25 10:16:32 -05:00
Joshua Tauberer
dc041db5f6
fix parsing of historical vote legislator lookup to not mind if a legislator has two terms on Jan 3
...
After recent updates to congress-legislator historical start/end dates, we began getting:
Multiple matches of name Slaughter (VA-R; 1991-01-03) to legislators (excludes set([])).
[h1-102.1991] Missing bioguide ID and name lookup failed for Slaughter (VA-R on 1991-01-03 12:02:00)
Exception: No bioguide ID for Slaughter (VA-R)
But there weren't really multiple legislators matching, just multiple terms.
(There are other new cases of multiple legislators matching now though too.)
2016-12-25 10:16:32 -05:00
Joshua Tauberer
260b4c880c
add --force flag to bills scraper to re-parse everything from the (existing) fdsys XML
2016-12-25 10:16:32 -05:00
Joshua Tauberer
a284a20e99
add committee_reports to bill output
...
e.g., for hr2028-114:
"committee_reports": [
"H. Rept. 114-91",
"S. Rept. 114-54"
],
2016-12-25 10:16:32 -05:00
Bill Hunt
f2b9fbe0dc
Fix aggregaton of results on import. ( #193 )
2016-12-16 20:44:34 -05:00
Joshua Tauberer
8da077083a
Merge pull request #190 from unitedstates/josh
...
drop THOMAS IDs from output, replace with bioguide in XML outputs
2016-12-11 08:52:57 -05:00
Joshua Tauberer
4c35e1b5a8
store CRPT (committee reports) in the congress directories rather than in fdsys/CRPT/year/...
2016-12-11 08:01:24 -05:00
Joshua Tauberer
8d56a630dd
drop THOMAS IDs from output, replace with bioguide in XML outputs
2016-12-03 14:12:53 -05:00
Joshua Tauberer
24ddb45639
tweak vote catgory regexes
2016-12-03 13:46:07 -05:00
Joshua Tauberer
37b8e67c60
deab2f384d broke fdsys: successfuld downloads were treated as unknown errors and files were being refetched over and over
2016-12-03 13:46:07 -05:00
Bill Hunt
6389102d02
Return list of new files from fdsys functions ( #187 )
2016-10-23 17:45:11 -04:00
Bill Hunt
5ceb7b2d27
Handle .DS_Store files ( #186 )
...
* Add handling for .DS_Store files in OS X
2016-10-23 12:10:00 -04:00
Bill Hunt
510c10fa29
Add filter for Congressional session ( #185 )
...
* Adding congress filter for BILLSTATUS
2016-10-22 11:47:42 -04:00
Joshua Tauberer
bd86189e0a
put ENACTED:VETO_OVERRIDE on the final override vote action rather than on the OFR public law number action to be more parallel to when ENACTED is applied to bills signed by the president, and because this is more convenient for GovTrack; see s2040-114
2016-10-07 16:04:17 -04:00
Joshua Tauberer
69f118cee6
Merge pull request #179 from divergentdave/urlretrieve
...
Use scrapelib.urlretrieve()
2016-09-30 11:38:16 -04:00
Joshua Tauberer
dc4087688f
bill subjects moved to a new field in the August 2016 updates to the bulk data
2016-09-02 19:43:16 -04:00
David Cook
deab2f384d
Use urlretrieve() instead of wget, speed up FDsys
2016-08-29 18:57:20 -05:00
Joshua Tauberer
1c7c7ba0b8
upcoming_house_floor: Handle "Senate amendment to the House amendment to" bill descriptions
2016-07-13 12:20:22 -04:00
Joshua Tauberer
ee776621e8
fdsys: I deleted too much during the refactor. This puts back writing the bill text version data.json files which extract important MODS metadata fields.
2016-07-13 12:20:22 -04:00
Joshua Tauberer
ec61a3a255
fdsys: If nothing new was fetched, then there is no reason to update the lastmod file for every bill.
...
It was updating the lastmod JSON file for every bill on every run, even though the files were mostly not changing.
Big speedup. Less disk writes.
2016-07-13 12:20:22 -04:00
Joshua Tauberer
f54f61490d
Merge pull request #175 from unitedstates/fdsys_billstatus_data
...
Replace THOMAS bill/amdt scrapers with bulk data importer
2016-07-07 13:00:06 -04:00
Joshua Tauberer
7719660f10
forgot to replace utils.get_govtrack_person_id with utils.translate_legislator_id in vote govtrack-compatble XML output
2016-07-06 12:14:35 -04:00
Joshua Tauberer
c458e71e5b
amendment's introduced_at has been a date, so lopping off the time portion
2016-07-03 10:40:23 -04:00
Joshua Tauberer
68940b0e53
forgot to add xmltodict to requirements.txt
2016-07-02 19:13:12 -04:00
Joshua Tauberer
d58df048d3
replace THOMAS scraper with USGPO bill status XML importer
...
There is no longer a separate amendments scraper. Amendments are saved as a part of importing bills. Amendments to treaties are no longer available.
some of this work was done by @crdunwel
2016-07-01 08:47:57 -04:00
Joshua Tauberer
48c7b3c3ac
merge branch 'fdsys_redo'
2016-06-30 17:51:03 -04:00
Joshua Tauberer
b042b4febc
fdsys: no need to scrape for a list of bulk data collections, there's a master sitemap, see #170
2016-06-30 17:49:34 -04:00
Joshua Tauberer
47a5e9bc49
whitelist another bill enacted by the ten-day rule, from the 93rd Congress
2016-06-29 08:04:59 -04:00
Joshua Tauberer
8e0aed16c4
a House roll call vote line is missing the vote tally in H.R. 2577/114th
...
> On agreeing to the conference report Agreed to by the Yeas and Nays: (Roll No. 342).
2016-06-23 08:50:16 -04:00
Joshua Tauberer
dcd1b56bd5
parse House ping-pong vote pursuant to rule
...
in H.R. 2577 114th:
> House agreed to Senate amendment with amendment pursuant to H.Res. 751
2016-06-23 08:37:17 -04:00
Joshua Tauberer
a0761b4ae9
correct an upstream data error in an action line for s2012-114 (it's missing 'as amended')
2016-06-14 08:43:10 -04:00
Joshua Tauberer
5e723c2080
add 'House Amendment to' pattern to upcoming_house_floor
2016-06-06 07:00:15 -04:00
Joshua Tauberer
52651a211c
fdsys: add a 'filter' option to only save certain packages/files
2016-05-27 09:20:42 -04:00
Joshua Tauberer
b8227b0d4a
fdsys: the timestamp that appears in the bulk data listing page is not an indication of lastmod dates
...
Each bulk data sitemap root must be downloaded individually to see if there were any updates.
2016-05-27 09:19:46 -04:00
Joshua Tauberer
a7364ab062
my fix in c6b85a1153 was wrong, also adding validation to bill_id_for in upcoming house bills scraper
2016-05-25 10:06:51 -04:00
Joshua Tauberer
c6b85a1153
docs.house.gov has inconsistent case in descriptions of senate amendments to bills
2016-05-24 10:24:03 -04:00
Joshua Tauberer
f5423f4be7
sometimes Senate vote index pages come back as a 404 - catch that before getting an XML parsing error
2016-04-26 19:09:08 -04:00
Joshua Tauberer
79f1994089
re-write the FDSys scraper
...
It now can download bulk data files using the bulk data sitemaps.
It's also a bit cleaner / more maintainable.
Dropped some code no one was using.
2016-03-22 09:53:27 -04:00
Eric Mill
6e983befae
Merge pull request #168 from unitedstates/fix-travis
...
Fix Travis CI by dropping OS X builds
2016-03-14 02:52:03 -04:00