Model versioningįor database versioning we use the Postgres temporal tables approach inspired by SQL:2011's temporal databases.įor an explanation of temporal tables and how to use them in Postgres.
Run pip install django-debug-toolbar it will be detected and enabled by settings_dev.py. If you received this message in error, please ignore it.ĭjango-extensions is enabled by default, including the veryĭjango-debug-toolbar is not automatically enabled, but if you Please click here to verify your email address: Subject: Caselaw Access Project: Verify your email address Installing Capstone and CAPAPI Hosts SetupĬontent-Type: text/plain charset="utf-8" Changelog data, tracking changes and corrections.External metadata, such as the Reporter database.Normalized metadata extracted from the XML.If you've got some killer OCR correction strategies, get at us.Ĭapstone is a Django application with a PostgreSQL database which stores and manages the non-image data output of the CAP project.
We're still trying to figure out how best to address this. OCR errors: There will be OCR errors on nearly every page.We're trying to get this normalized in the data, and we'll also publish a complete court name list when we're done.
Court Name: We've seen some inconsistencies in the court name.Jurisdictions: Though the jurisdiction values in our API metadata entries are normalized, we have not propagated those changes to the XML.Nominative Case Citations: In many cases that come from nominative volumes, the citation format is wrong.Missing Judges Tag: In many volumes, elements which should have the tag name instead have the tag name.
These are known issues - there's no need to file an issue if you come across one of these. We'll close the issue when the issue has been corrected. If you notice a large pattern of problems that would be better fixed programmatically, or have a very large number of modifications, describe it in an issue. If you find any errors in the data, we would be extraordinarily grateful for your taking a moment to create an issue in this GitHub repository's issue tracker to report it. When we were designing Capstone, we knew that one of its primary functions would be to facilitate safe, accountable updates. While we've taken great pains to ensure its accuracy and integrity, two large components of this project, namely OCR and human review, are utterly fallible. This is a living, breathing corpus of data. Please see our project site with more information about how to access the API, or get bulk access to the data: This data, with some temporary restrictions, is available to all. This repository has a more detailed explanation of the format, and two volumes worth of sample data:ĬAP Samples and Format Documentation Obtaining Real Data The output of the project consists of page images, marked up case XML files, ALTO XML files, and METS XML files. The Caselaw Access Project is a large-scale digitization project hosted by the Harvard Law School Library Innovation Lab. Case data may be obtained through the website. Other than several cases used for our automated testing, this repository does not contain case data.
This is the source code for case.law, a website written by the Harvard Law School Library Innovation Lab to manage and serve court opinions.