About me

Who am I?

I work as a Senior Applied Scientist at Amazon Alexa AI. Before, I've developed improvement processes and machine learning modeling for chatbots at OttoGroup data.works as a Senior Data Scientist, and did consulting, architecture, and implementation work around natural language processing as a Senior Cognitive Expert at IBM Services.

I have been a visiting professor (“Professurvertretung”) at
the Institute for Computational Linguistics of
the Ruprecht-Karls-Universität Heidelberg from Winter 2013/2014 to Winter 2015/2016 and worked as a Staff Applied Scientist for Natural Language Processing at LinkedIn in Dublin from April to May 2016, and I worked as a Senior Computational Linguist / Team Lead Natural Language Generation for LangTec, a small language technology and NLP consultancy in Hamburg.

I co-edited a new handbook on Anaphora Resolution that recently appeared with Springer.

My courses in Heidelberg ranged from an Introduction to Computational Linguistics and Mathematical Foundations which familiarizes first semesters with basic dynamic programming techniques and algorithms for syntactic and semantic processing, as well as the basics of linear algebra, to specialized Master-level courses on structured inference techniques for machine learning, an introduction to statistical parsing techniques, as well as a research module for Master-level students which resulted in two SemEval submission on semantic parsing and supersense tagging tasks.

You can reach me at my email address,
For those interested, here is my CV.

Research Interests

What, Why and How?

I am interested in the question how the knowledge we have of the entities we talk about influences the way we talk about them, more specifically, what part of the whole "world knowledge" does actually influence the (syntactic and discourse) structure of our language (and how?). By extension, or sometimes as an interest of its own, I'm interested in techniques that allow the efficient construction of performant, interpretable, and accurate components for computational text understanding -- preferably without the effort that precludes anyone but a few big companies from using it.

I have defended my PhD thesis (University of Tübingen, advisor: Erhard Hinrichs) on the resolution of nominal anaphora using semantic information derived from large corpora, in June 2010. The thesis is available electronically.

I have reviewed articles for (special issues or regular articles in) journals including Computational Linguistics, Dialogue and Discourse, Lingua (pre-2016) as well as Natural Language Engineering.

I co-organized the workshop on Distributional Similarity beyond Concrete Concepts at CogSci 2009, as well as several workshops of a series of workshops on Statistical Parsing of Morphologically-Rich Language (SPMRL 2013, SPMRL-SANCL 2014, SPMRL 2014). I served as an area chair for the area of "discourse semantics" at the StarSem 2014 conference.

I served as member of the program committee for SPMRL/SP-Sem-MRL (2012,2013), SemEval (2010,2013), STARSEM (2012,2013,2017,2018), Coling (2014), as well as ACL (2008,2009,2014,2015,2016,2020), EMNLP (2011,2012,2013,2014,2017,2018,2019), IJCAI (2011), and EACL (2012,2017). I have acted as an external reviewer for DFG (Germany), NWO (Netherlands), MSMT (Czech Republic), and ERC (European Research Council).

Nicer than before

but still in neglect

This site was made using (a modernized version of) the terrafirma layout from the Open-source Web Design page. I have changed the colors to match my graphics and changed the plant image to a view of Tübingen (near Stiftskirche) which I made in January 2006. Templating is done using jinja2 and kelvin.

Selected Publications

an entertaining read?

For a full list, see SemanticScholar, Google Scholar or Microsoft Academic.

Journal articles.

Versley, Y. (2013)
A graph-based approach for implicit discourse relations. CLIN Journal 3:148–173. [clinjournal.org]
Telljohann, H., Versley, Y., Beck, K., Hinrichs, E., and Zastrow, T. (2013): STTS als Part-of-Speech-Tagset in Tübinger Baumbanken. Journal for Language Technology and Computational Linguistics 28(1), 1–16. [pdf]
Versley, Y. and Gastel, A. (2013)
Linguistic Tests for Discourse Relations in the TüBa-D/Z Treebank of German. S. Dipper, B. Webber and H. Zinsmeister (eds.): Beyond Semantics: the challenges of annotating pragmatic and discourse phenomena. Dialogue & Discourse 4(2), 142–173. [ELanguage.net] [preprint pdf]
Versley, Y. (2008)
Vagueness and Referential Ambiguity in a Large-scale Annotated Corpus. Massimo Poesio and Ron Artstein (eds.): Ambiguity in Anaphora. Journal on Research in Language and Computation 6(3–4), 333–353. [SpringerLink] [preprint pdf]

Conference and Workshop papers.

Rosenbaum, A., Soltan, S., Hamza, W., Versley, Y. and Boese, M. (2022): LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging Proceedings of the 29th International Conference on Computational Linguistics [pdf]
Versley, Y. (2016): Discontinuity (Re)2visited: A Minimalist Approach to Pseudoprojective Constituent Parsing Proceedings of the DiscoNLP workshop at NAACL-HLT 2016 [pdf]
Versley, Y. and Steen, J. (2016): Detecting Annotation Scheme Variation in Out-of-Domain Treebanks Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) [pdf]
Haas, M. and Versley, Y. (2015): Subsentential Sentiment on a Shoestring: A Crosslingual Analysis of Compositional Classification Proceedings of NAACL-HLT 2015 [pdf]
Versley, Y. (2014): Experiments with Easy-first nonprojective constituent parsing. Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages. [pdf]
Versley, Y. (2013)
SFS-TUE: Compound Paraphrasing with a Language Model and Discriminative Reranking. Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, US. [pdf]
Versley, Y. (2013)
Graph-based Classification of Explicit and Implicit Discourse Relations. International Conference on Computational Semantics (IWCS 2013), Potsdam, Germany. [pdf]
Versley, Y. (2012)
Supervised Learning of German Qualia Relations. ACL 2012 workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, Jeju, Korea. [pdf]
Versley, Y. and Panchenko, Y. (2012)
Not just bigger: Towards Better-Quality Web Corpora. Seventh Web-as-Corpus Workshop at WWW2012 (WAC7), Lyon, France. [pdf]
Versley, Y. (2011)
Multilabel Tagging of Discourse Relations in Ambiguous Temporal Connectives. Recent Advances in Natural Language Processing, Hissar, Bulgaria. [pdf]
Versley, Y. (2011)
Towards finer-grained tagging of discourse connectives. DGfS Workshop Beyond Semantics, Göttingen, Germany. [pdf]
Versley, Y. (2010)
Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection. Workshop on the Annotation and Exploitation of Parallel Corpora (AEPC), Tartu, Estland. [pdf]
Versley, Y., Beck, K., Hinrichs, E. and Telljohann, H. (2010)
A Syntax-first Approach to High-quality Morphological Analysis and Lemma Disambiguation for the TüBa-D/Z Treebank. 9th Conference on Treebanks and Linguistic Theories (TLT9), Tartu, Estland. [pdf]
Versley, Y. and Rehbein, I. (2009)
Scalable Discriminative Parsing for German. International Conference on Parsing Technology (IWPT'09). [pdf]
Versley, Y. (2008)
Decorrelation and Shallow Semantic Patterns for Distributional Clustering of Nouns and Verbs. ESSLLI'08 Workshop on Distributional Lexical Semantics. [pdf]
Versley, Y., Moschitti, A., Poesio, M. and Yang, X. (2008)
Coreference Systems based on Kernel Methods. Coling 2008. [pdf]
Versley, Y., Ponzetto, S.P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A. (2008)
BART: A Modular Toolkit for Coreference Resolution. LREC 2008. [pdf]
Versley, Y., Ponzetto, S.P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A. (2008)
BART: A Modular Toolkit for Coreference Resolution. ACL 2008 System demo. [pdf]
Versley, Y. (2007)
Antecedent Selection Techniques for High-Recall Coreference Resolution EMNLP-CoNLL 2007. [pdf]
Versley, Y. (2007)
Using the Web to Resolve Coreferent Bridging in German Newspaper Text GLDV-Frühjahrstagung 2007. [pdf]
Versley, Y. and Zinsmeister, H. (2006)
From Surface Dependencies towards Deeper Semantic Representations Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006) . Due to technical issues, the title in the conference proceedings has been changed to "Semantic Representations" [pdf]
Versley, Y. (2006)
A Constraint-based Approach to Noun Phrase Coreference Resolution in German Newspaper Text Konferenz zur Verarbeitung Natürlicher Sprache (KONVENS 2006). [pdf]
Versley, Y. (2006)
Disagreement Dissected: Vagueness as a Source of Ambiguity in Nominal (Co-)Reference ESSLLI 2006 Workshop on Ambiguity in Anaphora . [pdf]
Versley, Y. (2005)
Parser Evaluation across Text Types Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005) . [pdf] [pdf (slides)]
Schilder, F., Versley, Y., and Habel, Ch. (2004)
Extracting spatial information: grounding, classifying and linking spatial expressions. Workshop on Geographic Information Retrieval, 27th Annual International ACM SIGIR Conference. [pdf]
Schilder, F., Habel, Ch., and Versley, Y. (2003)
Temporal information extraction and question answering: Deriving answers for when-questions. Questions and Answers: Theoretical and Applied Perspectives (2nd CologNet-ElsNet Symposium).


Yannick Versley (2010)
Resolving Coreferent Bridging in German Newspaper Text. PhD Thesis, Seminar für Sprachwissenschaft, Universität Tübingen.
Yannick Versley (2004)
Tagging kausaler Relationen. Diplomarbeit. Fachbereich Informatik, Universität Hamburg
Also available as: Tagging kausaler Relationen: Grundlagen kausaler Ereignisrelationen und aktuelle Probleme; VDM Verlag Dr. Müller. ISBN 978-3-8364-3259-7

Blog posts

Neural Networks are Quite Neat (a rant)
After decades of Neural Network overhype, and a following time of disrespect, Neural Networks have become popular again - for a reason, as they can fit large amounts of data better than the feature-based models that came before them. Nonetheless, people who lived through the first overhyped episod are asking critical questions - the answers to which are (hopefully!) enlightening (more ...)

The brave new world of search engines
In an earlier post, I talked about current Google's search results in terms of personalization, and whether to like it or not. This post takes another aspect of 2011 Google search: what they do with complex queries. For a more current perspective, see this presentation (by Will Critchlow) from 2013. (more...)

Simple Pattern extraction from Google n-grams
Google has released n-gram datasets for multiple languages, including English and German. For my needs (lots of patterns, with lemmatization), writing a small bit of C++ allows me to extract pattern instances in bulk, more quickly and comfortably than with bzgrep. (more...)

Useful links

Fast dependency parsing
For doing syntactic preprocessing without spending too much time (CPU or engineering) on it, SpaCy and NLP4J should be among the first things to try. SpaCy covers English and German, whereas NLP4J covers only English, but is trained on biomedical treebanks (in addition to the WSJ news that everyone trains on), which makes it especially useful for that kind of texts. If you're looking towards parsing French, the Bonsai Model collection from the French Alpage group and the Mate Parser from Bernd Bohnet (now at Google) are good first guesses. If you have a suitable treebank at hand and want neural network parsing, you might as well try UDPipe and its Parsito parser (for speed) or the Stanford NLP group's Stanza toolkit which implements a successor to the BiLSTM graph-based parser by Eliyahu Kiperwasser and Yoav Goldberg (for accuracy).

Neural Network Toolkits
My favorite toolkit for modeling natural language text using LSTMs and other gadgetry is PyTorch, and I have a history of liking now-defunct frameworks and components: I have worked using AllenNLP which offered a more cohesive experience when PyTorch was not a complete framework, and before PyTorch became popular, I was a fan of DyNet, which uses dynamically constructed computation graphs and allows to model arbitrarily-structured recursive neural networks and other gadgetry without much fuss.

Conditional Random Fields.
Hanna Wallach has a very useful link collection on Conditional Random Fields. I'd recommend especially her tutorial on CRFs (which is also the introductory part of her MSc thesis) as well as Simon Lacoste-Juliens tutorial on SVMs, graphical models, and Max-Margin Markov Networks (also linked there).

Social Media


Nice blogs

Language Log
Technologies du Langage
Earning my Turns
Leiter Reports