make-data
: CSV→CSL-JSON
Acknowledgement
Juris-M/citeproc-cite-service was developed by Frank Bennett and is a primary output of EURO-EXPERT (n.161814) funded by the European Research Council.
Introduction
In the CultExp project, the citation details of cases of interest from each target jurisdiction are written into a spreadsheet, and coded to one or more documents (judgments, expert reports) as appropriate. Spreadsheets have between ten and seventeen columns from a standard schema, with standard headings. [1]
Spreadsheets are then processed using the script documented here, to
build a file in a specific format (CSL-JSON) that can be imported into
a Jurism client and synced to the cloud service at zotero.org
. Data
is then fetched from the Zotero API and published to a front-end
service for public consumption.
Because the spreadsheet data model does not provide a record of the version level of individual lines or cells, the entire spreadsheet (typically containing all records for a single jurisdiction) is treated as a unit for purposes of adding or editing any content to be published to the front end.
The following are the steps needed to provide new or modified content to the front-end service:
- Prepare (or edit) an Excel spreadsheet with the appropriate columns, with an entry for each document or case to be added;
- Save the file, and also save it in CSV format;
- Place a set of files named to correspond to CultExp document
codes in a
files
subdirectory immediately below the CSV file to be processed; - Use the
make-data
script documented here to generate a CSL JSON file (import-me.json
), making amendments to the twocourt-*
configuration files as necessary to avoid runtime errors; - In the Jurism client library, move all items tagged for the jurisdiction to be updated to trash;
- Empty the Jurism client library trash;
- Import the CSL JSON data into the Jurism client library;
- Sync the client library to
zotero.org
; - Run the
zsyncdown
script (documented in the top-level README of this repository); - Perform whatever incantations are used to move retrieved data to the front-end service.
This full set of operations is required for any change to item content on the front end of the service.
Installing node
and npm
The make-data
conversion tool is a node
script, so node
must be installed on the system where CULTEXP spreadsheet data is to be processed. Node
is available for free installation. Use a recent version.
Installing the make-data
script
Activation of the make-data
script is included when the steps for
installing and activating the zsyncdown
"bridge" script are
followed. To wit:
$> git clone --recursive https://github.com/Juris-M/citeproc-cite-service.git
$> cd citeproc-cite-service
$> npm install
$> npm link
Pretty-printed code documentation
Comments to the code of the make-data
script are in JSDoc
format. To generate an HTML view of the code, issue the command npm run documentation
in the root directory of this repository:
> npm run documentation
The documentation can be served locally by spinning up a webserver in
the docs
subdirectory. For example, using Python:
> cd docs
> python3 -m http.server 8000
Then view the pages by calling local port 8000 with a web browser:
http://localhost:8000
The pages are also available to view on GitHub pages (but note that these are updated only when the refreshed documentation files are committed and pushed to GitHub):
https://juris-m.github.io/citeproc-cite-service
Installing the Legal Resource Registry
The make-data
conversion script relies on the Legal Resource Registry (LRR) to obtain court and jurisdiction codes. We will clone the LRR into a sibling directory to citeproc-cite-service
above:
$> cd ../..
$> git clone https://github.com/Juris-M/legal-resource-registry.git
Setting up a jurisdiction
To process a jurisdiction, create an empty directory and place the spreadsheet in it, saving the case listing in CSV format. Also (important!) copy all of the PDF attachment files for the jurisdiction into a single subdirectory named files
.
A note on dates: When saving from Excel in CSV format, dates should be set to a numeric form, as "YYYY-MM-DD." With other date formats, the make-data
script will almost certainly throw errors.
Enter the directory and run the command make-data
:
$> cd malta
$> ls
data-malta.csv data-malta.xlsx files
$> make-data
This will throw an error and create a configuration file make-data-config.json
with the following content:
{
"jurisdictionCode": "xx",
"jurisdictionName": "Laputa",
"jurisdictionDescPath": "/path/to/legal-resource-registry-repo"
}
Edit the configuration file to reflect the target jurisdiction and the absolute path to the src
subdirectory of the LRR. In this case, we are working on Malta:
{
"jurisdictionCode": "mt",
"jurisdictionName": "Malta",
"jurisdictionDescPath": "/my/path/to/legal-resource-registry/src"
}
Preparing a court map
With the configuration file in place, run make-data
again. The script will issue a string of warnings and generate a file court-code-map.json
. This file will be read by make-data
to map court names written into the spreadsheet to their respective court codes, optionally also setting a court division and case type, where that information is expressed in the spreadsheet entries. The file is formatted as a series of lists, ordered as follows:
- Court description (from the spreadsheet)
- Court code (intially set to the court description)
- Court division (optional)
- Case type (optional)
Open the relevant jurisdiction file in the LRR for reference (in this case, the file for Malta is juris-mt-desc.json
). The courts
section of the file contains the court codes recognized for the jurisdiction.
Edit each entry in court-code-map.json
, filling in the second element in each list with the appropriate court code. For example...
[
"Qorti Civili Prim Awla",
""
]
...becomes...
[
"Qorti Civili Prim Awla",
"qcpa"
]
Where the court description includes a court division, add a third element to the list. For example...
[
"Qorti Civili (Sezzjoni tal-Familja)",
""
]
...becomes...
[
"Qorti Civili (Sezzjoni tal-Familja)",
"qc",
"Sezzjoni tal-Familja"
]
Where the court description contains a note of the case type, add that as a fourth element in the list. In this case, if no court division is indicated, set null
in the third position. This the following example...
[
"Qorti Civili Prim Áwla (Gurisdizzjoni Kostituzzjonali)",
""
]
...becomes...
[
"Qorti Civili Prim Áwla (Gurisdizzjoni Kostituzzjonali)",
"qcpa",
null,
"Gurisdizzjoni Kostituzzjonali"
]
If courts are described in the spreadsheet that cannot be found in the LRR record of the jurisdiction, contact the back-end data manager to request an extension to the jurisdiction data.
Preparing a court jurisdiction map
In addition to the court-code-map.json
file, the make-data
script generates a file court-jurisdiction-code.json
. Both files are used by the script to generate the final data for import into Jurism, and court codes set in the latter depend on the (edited) mapping lists in the former. It is therefore necessary to regenerate court-jurisdiction-map.json
after completing edits to court-code-map.json
. To regenerate the file, remove it from the directory and rerun make-data
.
$> rm court-jurisdiction-map.json
$> make-data
The script will again issue warnings, due to mismatches between court codes and their associated jurisdictions. Open the regenerated file to make any necessary edits.
In the case of Malta, there is only one warning, and one entry in court-jurisdiction-code-map.json
, due to an unrecognized jurisdiction "Gozo":
{
"qc::Gozo": {
"court": "qc",
"jurisdiction": "Gozo"
}
}
Leaving the qc:Gozo
key untouched, and referring to the LRR, we enter the correct jurisdiction code for this island of Malta:
{
"qc::Gozo": {
"court": "qc",
"jurisdiction": "mt:gozo"
}
}
Note that the court code and jurisdiction code must be valid partners:
in the LRR, the court code must appear in the courts
array under the
given jurisdiction code. For example:
"mt:gozo": {
"name": "Gozo",
"courts": {
"qc": {}
}
}
If a valid jurisdiction for the given court cannot be found in the LRR, or if the jurisdiction itself cannot be found there, contact the back-end data manager to have the necessary changes made to the LRR jurisdiction records.
Uploading data for a jurisdiction
Once the above steps have been completed, the make-data
script will
run without warnings. It will generate a file import-me.json
, which
is a valid CSL-JSON import object reflecting all of the entries in the
spreadsheet.
To upload data for the jurisdiction, import this file into a Jurism client in the usual way, and sync the library to the Zotero servers.
Final preparation
Before data for a jurisdiction is pulled to the front end for the first time, check with the back-end data manager to be sure that the citation format for the target jurisdiction has been defined.
Footnotes
[1] Spreadsheets have the following column headings (those set in italics are optional):
Doc ID | Year as Volume | Link |
Date | Volume | Keywords |
Court Division | Reporter | Area of Law |
Case Type | Page | Summary |
Docket Number | Expert Presence | Language |
Case Name | Expert Instructed By |