Background Information

Note: this page is currently under construction. If you believe you have additional information relevant as background for this shared task please send us the reference and a very short description.

Dependency Parsing

This shared task uses a dependency-based representation of syntax. The dependency trees are automatically converted from the constituent structures in the Penn Treebank (Marcus et al, 1993) by means of head percolation strategies (Magerman, 1994) with special consideration of nonlocal dependencies (Johansson and Nugues, 2007a) and NP-internal relations (Meyers, 2001).

For the topic of automatic parsing of dependency structures, the background information and frameworks pages of the CoNLL 2007 shared task are excellent starting points.

Semantic Role Labeling

Generally, Semantic Role Labeling (SRL) is performed using complete constituent-based syntactic representations as input, following the approach pioneered by Gildea and Jurafsky (2002). More recent results confirmed that the best results for “traditional” systems, i.e., the identification of each argument is handled as an independent classification problem, are obtained when full constituent-based syntactic analysis is available (Pradhan et al., 2005a). Nevertheless, some systems obtained results very close to the state of the art using partial syntactic analysis, e.g., syntactic chunks and clause boundaries (Marquez et al., 2005). While a lot of work has analyzed the SRL problem in the context of constituent-based syntactic analysis, very little work has attempted to identify semantic frames using dependency syntax and a dependency-based representation of the semantic relations. Notable exceptions include Hacioglu (2004), who trained a system to locate and classify semantic arguments in a dependency treebank. Similarly, a few systems have applied SRL techniques to automatically produced dependency trees (Pradhan et al, 2005c, Johansson and Nugues, 2007b). There are however significant differences between previous work and this task: we use a more complex conversion process from semantic argument constituents to dependency relations which increases the coverage of semantic arguments represented in the corpus; the participants are required to predict syntactic dependencies rather than receiving them as input; we include NomBank frames as well as PropBank frames; and finally, this evaluation is meant to foster joint learning rather than pipeline approaches (even though the latter are possible and accepted).

The problem of SRL has received a significant boost during the CoNLL 2004 and 2005 shared tasks (Carreras and Marquez, 2004, 2005). The four best systems at the CoNLL-2005 shared task included a combination of different base models to increase robustness and to gain independence from parse errors. Generally the base models were combined using joint inference. Koomen et al. (2005) combine the argument candidates generated by several full-syntax SRL systems using constraint satisfaction formulated as an Integer Linear Programming problem. Haghighi et al. (2005) re-rank entire frames generated by several base SRL models. The re-ranking setting deployed has the advantage that it allows the definition of features that exploit frame or sentence-level structures. According to follow-up work by the same authors, these global features are the source of the major performance improvements in the re-ranking system. Pradhan et al. (2005) implemented a stacking approach where the output of a full-syntax SRL model is used to generate features for a chunk-by-chunk SRL system. Surdeanu et al. (2007) model the joint inference problem as meta-learning using discriminative classifiers, which combine the outputs of several base models (using both full and partial syntax) at argument level. The motivation of this work was to address the situation where each base model has better performance for different argument types. More recently, Surdeanu et al. (2008) have shown that model combination and joint inference are beneficial for languages other than English where available corpora are significantly smaller. Even though joint inference in this context was used mainly to gain robustness against parsing errors in the identification of semantic frames, these frameworks can be adapted for the problem proposed in this shared task.

Joint Inference

The idea of performing joint inference for the recognition of syntactic and semantic representations is not necessarily new. Miller et al. (1998) learned a model that jointly recognizes syntactic constituents and domain-specific semantic information (named entity mentions and entity relations). This model achieved state of the art performance in the MUC-7 Information Extraction evaluation. However, Miller et al. focused only on domain-specific information, whereas the focus of this shared task is on open-domain semantic representations.

Even though the past four CoNLL shared task evaluations have addressed either SRL or syntactic parsing, very little work has addressed the joint identification of both. During the CoNLL-2005 shared task, Yi and Palmer (2005) propose such a model, where a constituent-based syntactic corpus is extended with PropBank argument labels, and a single parser is trained on this corpus. Note that Yi and Palmer’s parser solves jointly the syntactic parsing and semantic role identification tasks (i.e., identifying argument boundaries). Argument labeling is performed in a later step. Musillo and Merlo (2006a, 2006b) propose a joint model for all three tasks (parsing, argument identification and labeling) and obtain state of the art results in all problems. Nevertheless, both Yi and Palmer and Musillo and Merlo used constituent-based representations, which are different from the representation proposed in this shared task.

References

  1. Carreras X. and Màrquez L. Introduction to the CoNLL-2004 Shared Task: Semantic Role Labeling. In Proceedings CoNLL-2004, 2004.
  2. Carreras X. and Màrquez L. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proceedings CoNLL-2005, 2005.
  3. Gildea D. and Jurafsky D. Automatic Labeling of Semantic Roles. Computational Linguistics, 28(3), 2002.
  4. Hacioglu K. Semantic Role Labeling Using Dependency Trees. In Proceedings of COLING-2004. 2004.
  5. Hacioglu K., Pradhan S., Ward W., Martin J.H., and Jurafsky D. Semantic Role Labeling by Tagging Syntactic Chunks. In Proceedings of CoNLL-2004, 2004.
  6. Haghighi A., Toutanova K., and Manning C. A Joint Model for Semantic Role Labeling. In Proceedings of CoNLL-2005, 2005.
  7. Johansson R. and Nugues P. Extended Constituent-to-dependency Conversion for English. In Proceedings of NODALIDA, 2007a.
  8. Johansson R. and Nugues P. Semantic Structure Extraction using Nonprojective Dependency Trees. In Proceedings of SemEval-2007, 2007b.
  9. Koomen P., Punyakanok V., Roth D., and Yih W. Generalized Inference with Multiple Semantic Role Labeling Systems. In Proceedings of CoNLL-2005, 2005.
  10. Marcus, M., Santorini, B., and Marcinkiewicz, M. A., Building a large annotated corpus of English: the Penn Treebank. Computation Linguistics, 19(2), 1993.
  11. Magerman, D. Natural Language Parsing as Statistical Pattern Recognition. Ph.D. thesis, Stanford University. 1994.
  12. Màrquez L., Comas P., Gimenéz J., and Català N. Semantic Role Labeling as Sequential Tagging. In Proceedings of CoNLL-2005, 2005.
  13. Meyers, A., Grishman, R., Kosaka, M., and Zhao, S. Covering Treebanks with GLARF. In Proceedings of the ACL/EACL 2001 Workshop on Sharing Tools and Resources for Research and Education, 2001.
  14. Miller S., Crystal M., Fox H., Ramshaw L., Schwartz R., Stone R., Weischedel R., and the Annotation Group (BBN Technologies), BBN: Description of the SIFT System as Used for MUC-7. In Proceedings of MUC-7, 1998.
  15. Musillo G. A., Merlo P., Accurate Parsing of the Proposition Bank. In Proceedings of HLT-NAACL’06. 2006a.
  16. Musillo G. A., Merlo P., Robust Parsing for the Proposition Bank. In Proceedings of EACL’06 Workshop: Robust Methods in Analysis of Natural Language Data, 2006b.
  17. Pradhan S., Hacioglu K., Ward W., Martin J.H., and Jurafsky D. Support Vector Learning for Semantic Argument Classification. Machine Learning, 60, 11-39, 2005a.
  18. Pradhan S., Hacioglu K., Ward W., Martin J.H., and Jurafsky D. Semantic Role Chunking combining Complementary Syntactic Views. In Proceedings of CoNLL-2005, 2005b.
  19. Pradhan S., Ward W., Hacioglu, K., Martin J., and Jurafsky D. Semantic Role Labeling Using Different Syntactic Views. In Proceedings of ACL-2005, 2005c.
  20. Surdeanu M., Màrquez L., Carreras X., and Comas P. Combination Strategies for Semantic Role Labeling. Journal of Artificial Intelligence Research, 29, 2007.
  21. Surdeanu M., Morante R., and Màrquez L. Analysis of Joint Inference Strategies for the Semantic Role Labeling of Spanish and Catalan. In Proceedings of CICLing, 2008.
  22. Toutanova K., Haghighi A., and Manning C. Joint learning improves semantic role labeling. In Proceedings of ACL‘05, 2005.
  23. Yi S. and Palmer M., The Integration of Syntactic Parsing and Semantic Role Labeling. In Proceedings of CoNLL-2005, 2005.
 
conll2008/background.txt · Last modified: 2008/02/19 14:24 by richard