Getting Started | ZeroSCROLLS Benchmark

Downloading the Data

Option 1: Direct download from the tasks page, in JSON lines format.
Option 2: Using the datasets library.

Running experiments

The code used to run our experiments is available on GitHub.

Making a Leaderboard Submission

Create a comma-separated values (CSV) file with the headers (Task, ID, Prediction), where each row represents one output.
For example:

We recommend using our conversion script to produce the CSV file from JSON prediction files to avoid discrepancies.

As inputs, it expects a predictions JSON file with a mapping from an ID to a textual prediction, for every task, e.g:

Login to the website (using your Google account is recommended).
Upload your CSV file via the submission page.
Within a few minutes, check your email for a confirmation message that your submission has been received.
Results will be sent by email within 24 hours. Valid public submissions will immediately appear on the leaderboard.

Each user is limited to 5 submissions per week and a total of 10 submissions per month.

If you need any help, please reach out to scrolls-benchmark-contact@googlegroups.com

Task,ID,Prediction

qasper,8941956c4b67e2436bbaf372a120f358f50c377b,"English, German, French"

qasper,5b63fb32633223fa4ee214979860349242a11451,"sentiment classifiers"

...

quality,72790_5QFDYSRE_4,"C"

...

summ_screen_fd,fd_Gilmore_Girls_01x13,"Rory's charity rummage sale is a disaste..."

...

from datasets import load_dataset

z_scrolls_datasets = ["gov_report", "summ_screen_fd", "qmsum","squality","qasper", "narrative_qa", "quality","musique","space_digest", "book_sum_sort"]

data = [load_dataset("tau/zero_scrolls", dataset) for dataset in z_scrolls_datasets]

{

"example_id1": "prediction1",

"example_id2": "prediction2",...

}