About me

Who am I?

I have been a visiting professor (“Professurvertretung”) at
the Institute for Computational Linguistics of
the Ruprecht-Karls-Universität Heidelberg from Winter 2013 to Winter 2016 and worked as a Staff Applied Scientist for Natural Language Processing at LinkedIn in Dublin from April to May 2016.

Courses I taught in Heidelberg:

  • Einführung in die Computerlinguistik
    (WS13, WS14, WS15)
  • Forschungsmodul für Master-Studenten (WS15)
  • Formale Grundlagen der Computerlinguistik: Mathematische Grundlagen (SS14, SS15)
  • Structured Inference for NLP Applications (SS15)
  • Softwareprojekt (SS14 , WS14 )
  • NLP methods for Digital Humanities (WS15)
  • Multimodale Semantik (SS14)
  • Computational Linguistics in Context (WS13)
  • Statistical Parsing (WS13)

You can reach me at my email address,
For those interested, here is my CV.

Research Interests

What, Why and How?

I am interested in the question how the knowledge we have of the entities we talk about influences the way we talk about them, more specifically, what part of the whole "world knowledge" does actually influence the (syntactic and discourse) structure of our language (and how?). By extension, or sometimes as an interest of its own, I'm interested in techniques that allow the efficient construction of performant, interpretable, and accurate components for computational text understanding -- preferably without the effort that precludes anyone but a few big companies from using it.

I have defended my PhD thesis (University of Tübingen, advisor: Erhard Hinrichs) on the resolution of nominal anaphora using semantic information derived from large corpora, in June 2010. The thesis is available electronically.

I have reviewed articles for (special issues or regular articles in) journals including Computational Linguistics, Dialogue and Discourse, Lingua (pre-2016) as well as Natural Language Engineering.

I co-organized the workshop on Distributional Similarity beyond Concrete Concepts at CogSci 2009, as well as several workshops of a series of workshops on Statistical Parsing of Morphologically-Rich Language (SPMRL 2013, SPMRL-SANCL 2014, SPMRL 2014). I served as an area chair for the area of "discourse semantics" at the StarSem 2014 conference.

I served as member of the program committee for SPMRL/SP-Sem-MRL (2012,2013), SemEval (2010,2013), STARSEM (2012,2013), Coling (2014), as well as ACL (2008,2009,2014,2015,2016), EMNLP (2011,2012,2013,2014), IJCAI (2011), and EACL (2012,2017). I have acted as an external reviewer for DFG (Germany), NWO (Netherlands), MSMT (Czech Republic), and ERC (European Research Council).

Nicer than before

but still in neglect

This site was made using (a modernized version of) the terrafirma layout from the Open-source Web Design page. I have changed the colors to match my graphics and changed the plant image to a view of Tübingen (near Stiftskirche) which I made in January 2006. Templating is done using jinja2 and kelvin.

Selected Publications

an entertaining read?

For a full list, see Google Scholar or Microsoft Academic.

Journal articles.

Versley, Y. (2013)
A graph-based approach for implicit discourse relations. CLIN Journal 3:148–173. [clinjournal.org]
Telljohann, H., Versley, Y., Beck, K., Hinrichs, E., and Zastrow, T. (2013): STTS als Part-of-Speech-Tagset in Tübinger Baumbanken. Journal for Language Technology and Computational Linguistics 28(1), 1–16. [pdf]
Versley, Y. and Gastel, A. (2013)
Linguistic Tests for Discourse Relations in the TüBa-D/Z Treebank of German. S. Dipper, B. Webber and H. Zinsmeister (eds.): Beyond Semantics: the challenges of annotating pragmatic and discourse phenomena. Dialogue & Discourse 4(2), 142–173. [ELanguage.net] [preprint pdf]
Versley, Y. (2008)
Vagueness and Referential Ambiguity in a Large-scale Annotated Corpus. Massimo Poesio and Ron Artstein (eds.): Ambiguity in Anaphora. Journal on Research in Language and Computation 6(3–4), 333–353. [SpringerLink] [preprint pdf]

Conference and Workshop papers.

Versley, Y. (2016): Discontinuity (Re)2visited: A Minimalist Approach to Pseudoprojective Constituent Parsing Proceedings of the DiscoNLP workshop at NAACL-HLT 2016 [pdf]
Versley, Y. and Steen, J. (2016): Detecting Annotation Scheme Variation in Out-of-Domain Treebanks Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) [pdf]
Haas, M. and Versley, Y. (2015): Subsentential Sentiment on a Shoestring: A Crosslingual Analysis of Compositional Classification Proceedings of NAACL-HLT 2015 [pdf]
Versley, Y. (2014): Experiments with Easy-first nonprojective constituent parsing. Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages. [pdf]
Versley, Y. (2013)
SFS-TUE: Compound Paraphrasing with a Language Model and Discriminative Reranking. Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, US. [pdf]
Versley, Y. (2013)
Graph-based Classification of Explicit and Implicit Discourse Relations. International Conference on Computational Semantics (IWCS 2013), Potsdam, Germany. [pdf]
Versley, Y. (2012)
Supervised Learning of German Qualia Relations. ACL 2012 workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jeju, Korea. [pdf]
Versley, Y. and Panchenko, Y. (2012)
Not just bigger: Towards Better-Quality Web Corpora. Seventh Web-as-Corpus Workshop at WWW2012 (WAC7), Lyon, France. [pdf]
Versley, Y. (2011)
Multilabel Tagging of Discourse Relations in Ambiguous Temporal Connectives. Recent Advances in Natural Language Processing, Hissar, Bulgaria. [pdf]
Versley, Y. (2011)
Towards finer-grained tagging of discourse connectives. DGfS Workshop Beyond Semantics, Göttingen, Germany. [pdf]
Versley, Y. (2010)
Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection. Workshop on the Annotation and Exploitation of Parallel Corpora (AEPC), Tartu, Estland. [pdf]
Versley, Y., Beck, K., Hinrichs, E. and Telljohann, H. (2010)
A Syntax-first Approach to High-quality Morphological Analysis and Lemma Disambiguation for the TüBa-D/Z Treebank. 9th Conference on Treebanks and Linguistic Theories (TLT9), Tartu, Estland. [pdf]
Versley, Y. and Rehbein, I. (2009)
Scalable Discriminative Parsing for German. International Conference on Parsing Technology (IWPT'09). [pdf]
Versley, Y. (2008)
Decorrelation and Shallow Semantic Patterns for Distributional Clustering of Nouns and Verbs. ESSLLI'08 Workshop on Distributional Lexical Semantics. [pdf]
Versley, Y., Moschitti, A., Poesio, M. and Yang, X. (2008)
Coreference Systems based on Kernel Methods. Coling 2008. [pdf]
Versley, Y., Ponzetto, S.P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A. (2008)
BART: A Modular Toolkit for Coreference Resolution. LREC 2008. [pdf]
Versley, Y., Ponzetto, S.P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A. (2008)
BART: A Modular Toolkit for Coreference Resolution. ACL 2008 System demo. [pdf]
Versley, Y. (2007)
Antecedent Selection Techniques for High-Recall Coreference Resolution EMNLP-CoNLL 2007. [pdf]
Versley, Y. (2007)
Using the Web to Resolve Coreferent Bridging in German Newspaper Text GLDV-Frühjahrstagung 2007. [pdf]
Versley, Y. and Zinsmeister, H. (2006)
From Surface Dependencies towards Deeper Semantic Representations Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006) . Due to technical issues, the title in the conference proceedings has been changed to "Semantic Representations" [pdf]
Versley, Y. (2006)
A Constraint-based Approach to Noun Phrase Coreference Resolution in German Newspaper Text Konferenz zur Verarbeitung Natürlicher Sprache (KONVENS 2006). [pdf]
Versley, Y. (2006)
Disagreement Dissected: Vagueness as a Source of Ambiguity in Nominal (Co-)Reference ESSLLI 2006 Workshop on Ambiguity in Anaphora . [pdf]
Versley, Y. (2005)
Parser Evaluation across Text Types Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005) . [pdf] [pdf (slides)]
Schilder, F., Versley, Y., and Habel, Ch. (2004)
Extracting spatial information: grounding, classifying and linking spatial expressions. Workshop on Geographic Information Retrieval, 27th Annual International ACM SIGIR Conference. [pdf]
Schilder, F., Habel, Ch., and Versley, Y. (2003)
Temporal information extraction and question answering: Deriving answers for when-questions. Questions and Answers: Theoretical and Applied Perspectives (2nd CologNet-ElsNet Symposium).


Yannick Versley (2010)
Resolving Coreferent Bridging in German Newspaper Text. PhD Thesis, Seminar für Sprachwissenschaft, Universität Tübingen.
Yannick Versley (2004)
Tagging kausaler Relationen. Diplomarbeit. Fachbereich Informatik, Universität Hamburg
Also available as: Tagging kausaler Relationen: Grundlagen kausaler Ereignisrelationen und aktuelle Probleme; VDM Verlag Dr. Müller. ISBN 978-3-8364-3259-7

Blog posts

The brave new world of search engines
In an earlier post, I talked about current Google's search results in terms of personalization, and whether to like it or not. This post takes another aspect of 2011 Google search: what they do with complex queries. (more...)

Simple Pattern extraction from Google n-grams
Google has released n-gram datasets for multiple languages, including English and German. For my needs (lots of patterns, with lemmatization), writing a small bit of C++ allows me to extract pattern instances in bulk, more quickly and comfortably than with bzgrep. (more...)

Where to buy Music
After searching around a disproportionate time to find nice music that I want to buy, I decided to compile this list of internet shops that sell music in MP3 format to German citizens. (And no, I can't/won't use iTunes unless they make a Linux client).

Useful links

WCDG parser.
The Weighted Constraint Dependency Grammar parser which is one of the best parsers for German that you can get. It's available under an open source license and there is an online demo.

BitPar and SFST.
Helmut Schmid has written several tools that may come in useful in your next NLP application, including the TreeTagger, a decision-tree based part of speech tagger, BitPar, a fast PCFG parsing engine, and SFST, a set of highly useful tools for finite-state morphology analysis.

Conditional Random Fields.
Hanna Wallach has a very useful link collection on Conditional Random Fields. I'd recommend especially her tutorial on CRFs (which is also the introductory part of her MSc thesis) as well as Simon Lacoste-Juliens tutorial on SVMs, graphical models, and Max-Margin Markov Networks (also linked there).

Nice blogs

Language Log
Technologies du Langage
Earning my Turns
Leiter Reports