ACL-AGD comprises 35,063 abstracts sourced from the BibTeX database on the ACL Anthology, including publications in computational linguistics and NLP. This extensive collection includes a wide array of ACL conference proceedings, journal articles, and contributions from non-ACL events, spanning the period from 1965 to 2023.
* Download link: ACL-AGD.zip (101 MB)
* The train.json/test.json/validation.json datasets contain 33,063/1,000/1,000 records, respectively.
* The detailed structure of the dataset contents is as follows:
Attributes | Content |
doc_key |
Anthology ID [string] |
sents |
Complete abstract paragraph [string] |
sents_label |
Sentences with corresponding labels [list] |
graphs |
Paragraphs with corresponding knowledge graphs [list] |
sents_ent |
Abstract paragraph containing entity types [string] |
preds_ner |
Entities and their corresponding entity types [list] |
For a comprehensive understanding of the dataset creation process, please refer to the following paper:
-- Will be soon.