Full publications list at Google Scholar
2023
-
PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs
Rahul Goel, Waleed Ammar, Aditya Gupta, and
13 more authors
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023
Research interest in task-oriented dialogs has increased as systems such as Google Assistant, Alexa and Siri have become ubiquitous in everyday life. However, the impact of academic research in this area has been limited by the lack of datasets that realistically capture the wide array of user pain points. To enable research on some of the more challenging aspects of parsing realistic conversations, we introduce PRESTO, a public dataset of over 550K contextual multilingual conversations between humans and virtual assistants. PRESTO contains a diverse array of challenges that occur in real-world NLU tasks such as disfluencies, code-switching, and revisions. It is the only large scale human generated conversational parsing dataset that provides structured context such as a user’s contacts and lists for each example. Our mT5 model based baselines demonstrate that the conversational phenomenon present in PRESTO are challenging to model, which is further pronounced in a low-resource setup.
-
On Event Individuation for Document-Level Information Extraction
William Gantt, Reno Kriz, Yunmo Chen, and
2 more authors
In Findings of the Association for Computational Linguistics: EMNLP 2023, Dec 2023
As information extraction (IE) systems have grown more adept at processing whole documents, the classic task of *template filling* has seen renewed interest as a benchmark for document-level IE. In this position paper, we call into question the suitability of template filling for this purpose. We argue that the task demands definitive answers to thorny questions of *event individuation* — the problem of distinguishing distinct events — about which even human experts disagree. Through an annotation study and error analysis, we show that this raises concerns about the usefulness of template filling metrics, the quality of datasets for the task, and the ability of models to learn it. Finally, we consider possible solutions.
2021
-
LOME: Large Ontology Multilingual Extraction
Patrick Xia, Guanghui Qin, Siddharth Vashishtha, and
7 more authors
In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Apr 2021
We present LOME, a system for performing multilingual information extraction. Given a text document as input, our core system identifies spans of textual entity and event mentions with a FrameNet (Baker et al., 1998) parser. It subsequently performs coreference resolution, fine-grained entity typing, and temporal relation prediction between events. By doing so, the system constructs an event and entity focused knowledge graph. We can further apply third-party modules for other types of annotation, like relation extraction. Our (multilingual) first-party modules either outperform or are competitive with the (monolingual) state-of-the-art. We achieve this through the use of multilingual encoders like XLM-R (Conneau et al., 2020) and leveraging multilingual training data. LOME is available as a Docker container on Docker Hub. In addition, a lightweight version of the system is accessible as a web demo.
2020
-
Temporal Reasoning in Natural Language Inference
Siddharth Vashishtha, Adam Poliak, Yash Kumar Lal, and
2 more authors
In Findings of the Association for Computational Linguistics: EMNLP, Nov 2020
We introduce five new natural language inference (NLI) datasets focused on temporal reasoning. We recast four existing datasets annotated for event duration—how long an event lasts—and event ordering—how events are temporally arranged—into more than one million NLI examples. We use these datasets to investigate how well neural models trained on a popular NLI corpus capture these forms of temporal reasoning.
-
The Universal Decompositional Semantics Dataset and Decomp Toolkit
Aaron Steven White, Elias Stengel-Eskin, Siddharth Vashishtha, and
9 more authors
In Proceedings of the 12th Language Resources and Evaluation Conference, May 2020
We present the Universal Decompositional Semantics (UDS) dataset (v1.0), which is bundled with the Decomp toolkit (v0.1). UDS1.0 unifies five high-quality, decompositional semantics-aligned annotation sets within a single semantic graph specification—with graph structures defined by the predicative patterns produced by the PredPatt tool and real-valued node and edge attributes constructed using sophisticated normalization procedures. The Decomp toolkit provides a suite of Python 3 tools for querying UDS graphs using SPARQL. Both UDS1.0 and Decomp0.1 are publicly available at http://decomp.io.
-
Improving Semantic Parsing Using Statistical Word Sense Disambiguation (Student Abstract)
Ritwik Bose, Siddharth Vashishtha, and James Allen
In Proceedings of the AAAI Conference on Artificial Intelligence, May 2020
2019
-
Fine-Grained Temporal Relation Extraction
Siddharth Vashishtha, Benjamin Van Durme, and Aaron Steven White
In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019
We present a novel semantic framework for modeling temporal relations and event durations that maps pairs of events to real-valued scales. We use this framework to construct the largest temporal relations dataset to date, covering the entirety of the Universal Dependencies English Web Treebank. We use this dataset to train models for jointly predicting fine-grained temporal relations and event durations. We report strong results on our data and show the efficacy of a transfer-learning approach for predicting categorical relations.