Downloading the Data
-
Option 1: Direct download from the tasks page, in JSON lines format.
-
Option 2: Using the datasets library.
Running experiments
-
The code used to run our experiments is available on GitHub.
Making a Leaderboard Submission
-
Create a comma-separated values (CSV) file with the headers (Task, ID, Prediction), where each row represents one output.
For example:
We recommend using our conversion script to produce the CSV file from JSON prediction files to avoid discrepancies.
As inputs, it expects a predictions JSON file with a mapping from an ID to a textual prediction, for every task, e.g:
-
Login to the website (using your Google account is recommended).
-
Upload your CSV file via the submission page.
-
Within a few minutes, check your email for a confirmation message that your submission has been received.
-
Results will be sent by email within 24 hours. Valid public submissions will immediately appear on the leaderboard.
-
Each user is limited to 5 submissions per week and a total of 10 submissions per month.
If you need any help, please reach out to scrolls-benchmark-contact@googlegroups.com
Task,ID,Prediction
qasper,8941956c4b67e2436bbaf372a120f358f50c377b,"English, German, French"
qasper,5b63fb32633223fa4ee214979860349242a11451,"sentiment classifiers"
...
quality,72790_5QFDYSRE_4,"C"
...
summ_screen_fd,fd_Gilmore_Girls_01x13,"Rory's charity rummage sale is a disaste..."
...
from datasets import load_dataset
z_scrolls_datasets = ["gov_report", "summ_screen_fd", "qmsum","squality","qasper", "narrative_qa", "quality","musique","space_digest", "book_sum_sort"]
data = [load_dataset("tau/zero_scrolls", dataset) for dataset in z_scrolls_datasets]
{
"example_id1": "prediction1",
"example_id2": "prediction2",...
}