Commit Graph

187 Commits

Author SHA1 Message Date
Joshua Tauberer
53e946f6e3 fix upstream data error in s2943-114 which is mising 'as amended' in a House vote 2016-12-27 06:27:14 -05:00
Joshua Tauberer
3a0aa64127 add regex to detect when a bill is enacted by the ten day rule and mark as status ENACTED:TENDAYRULE
"Sent to Archivist of the United States unsigned." indicates this. I had hard-coded bill numbers in the past, but it happened again with hr6297-114 and so now I'm doing a proper fix. I can't remove the old hard-coded bills because I can't test the change because we can no longer fetch the data from THOMAS.
2016-12-26 10:48:57 -05:00
Joshua Tauberer
a284a20e99 add committee_reports to bill output
e.g., for hr2028-114:

  "committee_reports": [
    "H. Rept. 114-91",
    "S. Rept. 114-54"
  ],
2016-12-25 10:16:32 -05:00
Joshua Tauberer
8d56a630dd drop THOMAS IDs from output, replace with bioguide in XML outputs 2016-12-03 14:12:53 -05:00
Joshua Tauberer
bd86189e0a put ENACTED:VETO_OVERRIDE on the final override vote action rather than on the OFR public law number action to be more parallel to when ENACTED is applied to bills signed by the president, and because this is more convenient for GovTrack; see s2040-114 2016-10-07 16:04:17 -04:00
Joshua Tauberer
d58df048d3 replace THOMAS scraper with USGPO bill status XML importer
There is no longer a separate amendments scraper. Amendments are saved as a part of importing bills. Amendments to treaties are no longer available.

some of this work was done by @crdunwel
2016-07-01 08:47:57 -04:00
Joshua Tauberer
47a5e9bc49 whitelist another bill enacted by the ten-day rule, from the 93rd Congress 2016-06-29 08:04:59 -04:00
Joshua Tauberer
8e0aed16c4 a House roll call vote line is missing the vote tally in H.R. 2577/114th
> On agreeing to the conference report Agreed to by the Yeas and Nays: (Roll No. 342).
2016-06-23 08:50:16 -04:00
Joshua Tauberer
dcd1b56bd5 parse House ping-pong vote pursuant to rule
in H.R. 2577 114th:

> House agreed to Senate amendment with amendment pursuant to H.Res. 751
2016-06-23 08:37:17 -04:00
Joshua Tauberer
a0761b4ae9 correct an upstream data error in an action line for s2012-114 (it's missing 'as amended') 2016-06-14 08:43:10 -04:00
Clayton Dunwell
81bc022dfa Add source url in JSON data
It would be nice to have the url from which the data is scraped.
2015-03-11 12:00:09 -04:00
Joshua Tauberer
ad92e1b63b when parsing committees out of action line text, avoid warnings in cases where it's clearly not a committee reference 2014-12-10 12:49:41 -05:00
Joshua Tauberer
305dd8fd1d new bill relations parsed: causes X to be laid on the table, passed by virtue of, enrollment has been corrected by virtue of 2014-12-10 12:48:51 -05:00
Joshua Tauberer
bebe61b359 second-order ping-pong Senate votes were not parsed, and some first-order ping-pong votes because of a "the"
e.g.:

    Senate agreed to the House amendment to the Senate amendment
2014-12-10 12:47:55 -05:00
Joshua Tauberer
3592f7e08a refactor parsing vote action lines
Merge similar regex patterns. Also fixes parsing:

    Submitted in the Senate, read twice, considered, and agreed to

which didn't quite match:

    Submitted in the Senate, considered, and agreed to
2014-12-10 11:51:00 -05:00
Joshua Tauberer
65334bc0fc tidy vote action line parsing, split regex across multiple lines and sort the parts of the patterns 2014-12-10 11:30:57 -05:00
Joshua Tauberer
699fb63c74 'On motion that the House agree with an amendment' was not being treated like 'as amended', i.e. a ping pong vote 2014-12-09 20:58:00 -05:00
Joshua Tauberer
f0bd0f66a6 missing an action regex for a conference report passed under suspension 2014-07-31 16:50:43 -04:00
Gregory Petukhov
0191deb2c7 Implemented parsing the "by request" feature 2014-07-17 01:06:24 +07:00
Joshua Tauberer
960f0c7957 add a --diff option for bills and votes to test if output has changed
When --diff is specified (for bills & votes), instead of writing output files to
disk, we run a diff over the existing file and the new content and display the
diff. This is handy for testing.

At the same time I'm removing my previous preserve_update_time flag which
I had been using for a similar purpose, but this new method is much easier.

reverts 5122ad6f966ba5899a0758ed92d81ca779314c7f
2014-06-18 09:59:06 -04:00
Joshua Tauberer
429a82401b when looking for committees in bill action lines, "House|Senate" was duplicated in the regex 2014-06-02 11:37:25 -04:00
Joshua Tauberer
7d45f43bed new caused-action/action-caused-by bill relations 2014-06-02 09:17:43 -04:00
Eric Mill
1d2b57a026 Test for a new variant on cloture motions 2014-05-16 17:55:39 -04:00
Will Van Wazer
04d494856f Automated PEP8 refactoring with autopep8. 2014-04-28 22:39:50 -04:00
Joshua Tauberer
7b47095d19 have ENACTED:SIGNED be triggered at 'Signed by President', add a new code ENACTED:TENDAYRULE
Per #106, ENACTED:SIGNED should be triggered when the President signs a bill, which is
when the bill seems to actually become law. The "Became Public Law" action is typically
dated the same but may not actually be posted until much later, when the Office of the
Federal Register assigns the public law number. This can be problematic when trying to
count laws.

The same problem might occur with vetoed bills. In principle they must also become law
when the second chamber finishes its override, or thereabouts, but our ENACTED:VETO_OVERRIDE
status is still triggered by Became Public Law. This is more rare so I'll punt this
for another time.

This left a gap for six bills (see the commit for the list) that had Became Public Law
actions but neither Signed by President nor veto actions. They appear to be instances
of the "ten Days (Sundays excepted)" provision in the Constitution. So here I'm also
adding a new status code called ENACTED:TENDAYRULE. Like the other conditions, surely
they become law on the actual 10th day regardless of whether OFR assigns them a number,
but that is harder to detect. There is a "Sent to Archivist of the United States unsigned"
action that we might want to use instead. But this is historical and incredibly rare
so it doesn't make much difference now.
2014-03-20 19:42:40 -04:00
Joshua Tauberer
3922aa766d improve assigning committee IDs in action lines when the committee's chamber is ambiguous, fixes #108
Don't assume that the committee is in the bill's originating chamber. If it's a committee action,
look at the chamber of the committee the action is taking place in. Failing that, use some specific
regular expressions to see if it is a House or Senate action. And failing that, if the bill is
in an early stage when we are pretty sure actions are in the originating chamber use the originating
chamber.

This makes a number of corrections, but also some action lines lose their committee IDs. In some
cases (lots of references to the Budget Act) the original ID was incorrect. In other cases it's
ambiguous or hard to figure out.

also see #110
2014-01-30 14:47:09 -05:00
Joshua Tauberer
6f0b5032de fix committee IDs for action lines involving "House Committee on The Judiciary"
Because of the capital 'T', the regular expression was not parsing right
and the committee was associated with the chamber of the bill rather than
the chamber of the committee indicated in the action line. Solution is
to do regex case-insensitively.

see #108
2014-01-30 12:13:38 -05:00
Joshua Tauberer
dda61f406f fix parsing of Senate ping pong vote in hr3547-113 2014-01-21 12:17:50 -05:00
Peter Arzhintar
41662fc79d Preserve paragraph breaks in summary 2013-12-23 10:20:34 -08:00
Eric Mill
ff4652acc6 Be bolder about assuming committee chamber-ship using bill origin 2013-10-07 17:22:03 -04:00
Eric Mill
541827d350 Shored up committee detection, accepting some warnings for ambiguous committees (e.g. Appropriations, Budget, Judiciary, when mentioned without a parent committee chamber) 2013-10-07 17:10:40 -04:00
Daniel Cloud
4f01988cf7 Refactor bill_info to look for committee names separately from identifying referrals. Updated utils.fetch_committee_names to alias so we still get matches when parentheticals are omitted. 2013-10-07 10:03:37 -04:00
Daniel Cloud
0d26e32977 Pull committee names from action text and match to committee ids. Fails on occassions when the committee name is truncated in actions, such as 'House Intelligence' rather than 'House Intelligence (Permanent Select)'. 2013-10-02 17:26:26 -04:00
Joshua Tauberer
9aff82a398 bill_info: pass some additional fields from the American Memory files through to the XML output 2013-09-11 07:45:02 -04:00
Joshua Tauberer
7f65f33245 bill status errors with 'as amended', ping-pong votes, and conference reports
* On the first vote in the second chamber, we were not handling 'as amended' for joint/concurrent resolutions, so we were prematurely marking these as PASSED when they should get a PASS_BACK status.
* On ping-pong votes, we were not handling 'as amended' at all, so we were prematurely markinig these as PASSED when they should be PASSED_BACK.
* Ping-votes and votes on conference reports were also not handling joint/concurrent resolutions, and on a successful ping-pong/conference report vote these were incorrectly given the status PASSED:BILL, instead of PASSED:CONSTAMD/PASSED:CONCURRENTRES.
2013-08-19 13:33:42 -04:00
Joshua Tauberer
f30d132d16 added preserve_update_time command-line argument to make it easier to debug parser changes with diff; in bills, it maintains the updated_at value from the file last saved on disk 2013-08-19 13:33:42 -04:00
Joshua Tauberer
9495ac088d in GovTrack-style bill XML handle vetoes, pocket vetoes, and vote-aux actions 2013-08-19 13:33:42 -04:00
Joshua Tauberer
dd043926a7 parse more bill status lines from the 93rd Congress ('measured passed X', 'X agreed to Y amendments', 'reported to Senate', and enrollment without the word 'Became') 2013-08-19 13:33:42 -04:00
Joshua Tauberer
1876e932a8 forcing some attribute orders in GovTrack-style bill XML to make diffs easier 2013-08-19 13:33:42 -04:00
Joshua Tauberer
99b4373210 in utils.download(), renamed the 'xml' option to 'binary' 2013-07-17 15:30:31 -04:00
Joshua Tauberer
330cf03da4 prevent stray colons in action line references from breaking the parser (split only on the first colon), was a problem in samdt1024-107 2013-07-17 08:21:21 -04:00
Joshua Tauberer
0841d50ec5 When choosing {short,official,popular}_title, only choose from titles for the whole bill.
On THOMAS and Congress.gov, the main title displayed for a bill is never a title for
a portion of the bill, and if there is no title for the whole bill in the last 'as'
group, they display a title (for the whole bill) from the previous 'as' group

This affects four bills in the 113th Congress (so far). Three bills now on longer
have a short_title (it's now null), and one's short title changed (HR 1911).
2013-07-10 09:02:33 -04:00
Joshua Tauberer
add39b329a detect when bill titles are "for a portion of a bill"
Every title now has "is_for_portion": [true|false], and in the GovTrack-compatible
output partial="1" is added just when this is true.
2013-07-10 08:32:06 -04:00
Joshua Tauberer
4dbe3386a0 fdsys: when we write a bill mods file, also write the data.json file normally written by bill_versions so that it automatically keeps those files smartly up to date using sitemaps 2013-07-09 15:27:41 -04:00
Eric Mill
8ed0f60dfa Man, there really may be no actions for a bill, super rare 2013-06-10 11:40:02 -04:00
Eric Mill
3c15382f01 Another edge case reference parsing bug 2013-05-08 18:14:23 -04:00
Eric Mill
2bd11b12f5 Some slight rearrangements and comments, made a lack of house number a choke-able offense 2013-05-07 18:40:10 -04:00
wilson428
59a1a83436 fixed bad option to download 2013-04-23 17:09:25 -04:00
wilson428
bd84a81243 Avoids unplanned error if no GPO url found for text 2013-04-23 16:56:17 -04:00
wilson428
83950ef7ea Added --formats flag to bills for fetching complete text from GPO in pdf and/or html format 2013-04-23 12:24:39 -04:00