A major project digitizing court cases launched Monday, making available 6.4 million American cases dating back to 1658.
The Caselaw Access Project—a partnership between Harvard Law School’s Library Innovation Lab and the legal research company Ravel Law—spent the last three years digitizing 627 reporters for a total of 40 million scanned pages. Outside of the Library of Congress, it is the most comprehensive database of its kind—totaling 200 terabytes of information.
“A project like this should be unnecessary,” Andrew Ziegler, director of the Library Innovation Lab, told LawSites Blog. “But many states are still putting stuff in books first.”
The collection includes nearly all cases from an American court—including territorial courts—between the 1658 Maryland case William Stone against William Boreman and June 30, 2018. The collection may or may not be added to in the future. The Caselaw Access Project says that it does not include “cases not designated as officially published, such as most lower court decisions; non-published trial documents such as party filings, orders, and exhibits; parallel versions of cases from regional reporters, unless those cases were designated by a court as official; [and] cases officially published in digital form, such as recent cases from Illinois and Arkansas.”
The project allows people two ways to access the data: through an API and as a bulk download. The API, or application program interface, is a conduit between the database and a user creating remote access to the entire database.
The database information is free to the public, though per a deal with Ravel, no one can access more than 500 full-text cases a day. However, researchers can agree to different terms that lift the limit.
This cap also does not apply to “whitelisted jurisdictions”—jurisdictions that already make their new cases freely available online. Currently, that only includes Arkansas and Illinois. A user can also bulk download every case from these two jurisdictions in one fell swoop.
The data can be retrieved in HTML or XML formats.