EBMT / NLP Laboratory

Graduate School of Information, Production and Systems, Waseda University

ACL Abstract Graph Dataset (ACL-AGD)

 

ACL-AGD comprises 35,063 abstracts sourced from the BibTeX database on the ACL Anthology, including publications in computational linguistics and NLP. This extensive collection includes a wide array of ACL conference proceedings, journal articles, and contributions from non-ACL events, spanning the period from 1965 to 2023.

        * Download link:   ACL-AGD.zip (101 MB)

        * The train.json/test.json/validation.json datasets contain 33,063/1,000/1,000 records, respectively.

        * The detailed structure of the dataset contents is as follows:

Attributes Content

doc_key

Anthology ID [string]

sents

Complete abstract paragraph [string]

sents_label

Sentences with corresponding labels [list]

graphs

Paragraphs with corresponding knowledge graphs [list]

sents_ent

Abstract paragraph containing entity types [string]

preds_ner

Entities and their corresponding entity types [list]

 

For a comprehensive understanding of the dataset creation process, please refer to the following paper:

-- Will be soon.

 

 

 

Contact

EMBT / NLP Laboratory

Graduate School of Information,Production and Systems

Waseda University

2-7 Hibikino, Wakamatsu-ku,
Kitakyushu-shi, Fukuoka-ken, 808-0135, Japan

Tel/Fax: +81-93-692-5287