Roi Reichart's Homepage



Contact Information

E-MAIL: roiri at ie dot technion dot ac dot il

CV

About


I am an Assistant Professor at the faculty of Industrial Engineering and Management of the Technion - Israel Institute of Technology.

Research

Publications
Students Data

Code


Research


I am working on Natural Language Processing (NLP). I am interested in language learning in its context and design models that integrate domain and world knowledge with data-driven methods. I have hence worked on problems such as domain adaptation, learning with minimal human annotation (and involvement), language transfer and multilingual learning, multi-modal (text and vision) processing and NLP of Web data. I am interested in structured aspects of language and have developed effective algorithms for inference across linguistic structures. Finally, I am interested in proper evaluation of NLP algorithms and have worked on problems such as measuring statistical significance in NLP, word embedding evaluation and unsupervised learning (particularly clustering) evaluation.



Publications

    2018

    "Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance." Yftah Ziser and Roi Reichart. EMNLP 2018 (long paper). [Paper: pdf] [Code: github]
    "Neural Transition Based Parsing of Web Queries: An Entity Based Approach." Rivka Malca and Roi Reichart. EMNLP 2018 (long paper). [Paper: pdf] [Code: bitbucket]
    "On the Relation Between Linguistic Typology and (Limitations of) Multilingual Language Modeling." Daniela Gerz, Ivan Vulic, Edoardo Maria Ponti, Roi Reichart and Anna Korhonen. EMNLP 2018 (long paper). [Paper: pdf]
    "Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction." Daniela Gerz, Edoardo Maria Ponti, Jason Naradowski, Ivan Vulic, Roi Reichart and Anna Korhonen. Transactions of the Association for Computational Linguistics (TACL(6):451-465). [Paper: pdf] [Data: LM training data in 50 morphologically diverse languages]
    "Bridging Languages Through Images with Deep Partial Canonical Correlation Analysis." Guy Rotman, Ivan Vulic and Roi Reichart. ACL 2018 (long paper). [Paper: pdf] [Code and Data: github]
    "The Hichhiker's Guide to Testing Statistical Significance in Natural Language Processing." Rotem Dror, Gili Baumer, Segev Shlomov and Roi Reichart. ACL 2018 (long paper). [Paper: pdf] [Note on how to choose a statistical significance test: arxiv] [Code: github]
    "Isomorphic Transfer of Syntactic Structures in Cross-Lingual NLP." Edoardo Maria Ponti, Roi Reichart, Anna Korhonen and Ivan Vulic. ACL 2018 (long paper). [Paper: pdf]
    "Pivot Based Language Modeling for Improved Neural Domain Adaptation." Yftah Ziser and Roi Reichart. NAACL 2018 (long paper). [Paper: pdf] [Code: github]

    2017

    "Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets." Rotem Dror, Gili Baumer, Marina Bogomolov and Roi Reichart. Accepted to the Transactions of the Association for Computational Linguistics (TACL). [arXiv: pdf] [Code: github]
    "Automatic Selection of Context Configuraitons for Improvred (and Fast) Class-specific Word Representations." Ivan Vulic, Roy Schwartz, Ari Rappoport, Roi Reichart and Anna Korhonen. CoNLL 2017 (long paper). [arXiv (older version): pdf]
    "Neural Structural Correspondence Learning for Domain Adaptation." Yftah Ziser and Roi Reichart. CoNLL 2017 (long paper). [Paper: pdf] [Code: github]
    "Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-lingual Constraints." Nikola Mrksic, Ivan Vulic, Diarmuid O Seaghdha, Ira Leviant, Roi Reichart, Milica Gasic, Anna Korhonen and Steve Young. Transactions of the Association for Computational Linguistics (TACL). [Paper: pdf] [Code: github]
    "Sarcasm SIGN: Interpreting Sarcasm with Monolingual Machine Translation." Lotem Peled and Roi Reichart. ACL 2017 (long paper). [Paper: pdf] [Data set: Project Page]
    "Morph-fitting: Fine-Tunning Word Vector Spaces with Simple Language Specific Rules." Ivan Vulic, Nikola Mrksic, Roi Reichart, Diarmuid O Seaghdha, Steve Young and Anna Korhonen. ACL 2017 (long paper). [Paper: pdf]

    2016

    "Survey on the Use of Typological Information in Natural Language Procecessing." Helen O'Horan, Yevgeni Berzak, Ivan Vulic, Roi Reichart and Anna Korhonen. COLING 2016 (long paper). [Paper: pdf]
    "The Structured Weighted Violations Perceptron Algorithm." Rotem Dror and Roi Reichart. EMNLP 2016 (long paper). [arXiv preprint (arxiv:1602.03040): pdf] [Paper: pdf]
    "Effective Greedy Inference for Graph-based Non-Projective Dependency Parsing." Ilan Tchernowitz, Liron Yedidsion and Roi Reichart. EMNLP 2016 (long paper). [Paper: pdf]
    "SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity." Daniela Gerz, Iva Vulic, Felix Hill, Roi Reichart and Anna Korhonen. EMNLP 2016 (long paper). [arXiv preprint (arxiv:1602.03040): pdf] [Paper: pdf]
    "Edge-Linear First-Order Dependency Parsing with Undirected Minimum Spanning Tree Inference." Effi Levi, Roi Reichart and Ari Rappoport. ACL 2016 (long paper). [Paper: pdf]
    "Syntactic Parsing of Web Queries with Question Intent." Yuval Pinter, Roi Reichart and Idan Szpektor. NAACL 2016 (long paper). [Paper: pdf] [CQA Query Treebank (Go to "Language Data" and then "L28")] [Parsing Guidelines]
    "Symmetric Patterns and Coordinations: Fast and Enhanced Reresentations of Verbs and Adjectives." Roy Schwartz, Roi Reichart and Ari Rappoport. NAACL 2016 (short paper). [Paper: pdf]
    "Finger Flexion Imagery: EEG Classification Through Physiologically-Inspired Feature Extraction and Hierarchical Voting" Daniel Furman, Roi Reichart and Hillel Pratt. Brain-Computer Interface (BCI) 2016. (long paper). [Paper: pdf] (This is a non-NLP paper)

    2015

    "Separated by an Un-common Language: Towards Judgment Language Informed Vector Space Modeling" Ira Leviant and Roi Reichart. 2015. Preprint pubslished on arXiv. arxiv:1508.00106[Paper: pdf] [Dataset: Multilingual-WS353-SimLex999]
    "SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation" Felix Hill, Roi Reichart and Anna Korhonen. Accepted to Computational Linguistics (CL Journal). [Paper (arxiv:1408:3456): pdf] [Paper (CL): pdf] [Dataset: SimLex-999]
    "Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction" Roy Schwartz, Roi Reichart and Ari Rappoport. CoNLL 2015 (long paper). [Paper: pdf] [Dataset: sp-embeddings]
    "Contrastive Analysis with Predictive Power: Typology Driven Estimation of Grammatical Error Distribution in ESL" Yevgeni Berzak, Roi Reichart and Boris Katz. CoNLL 2015 (long paper). [Paper: pdf]
    "Unsupervised Declarative Knowledge Induction for Constraint-Based Learning of Information Structure in Scientific Documents" Yufan Guo, Roi Reichart and Anna Korhonen. Transactions of the Association for Computational Linguistics (TACL(3):131-143), (Presented in NAACL 2015). [Paper: pdf]

    2014

    "An Unsupervised Model for Instance Level Subcategorization Acquisition" Simon Baker, Roi Reichart and Anna Korhonen. EMNLP 2014 (long paper).[Paper: pdf] [Verb Similarity Dataset: download]
    "Multi-Modal Models for Concrete and Abstract Concept Meaning" Felix Hill, Roi Reichart and Anna Korhonen. Transactions of the Association for Computational Linguistics (TACL(2):285-296), (Presented in EMNLP 2014). [Paper: pdf]
    "Minimally Supervised Classification to Semantic Categories Using Symmetric Patterns" Roy Schwartz, Roi Reichart and Ari Rappoport. COLING 2014 (long paper). [Paper: pdf]
    "Reconstructing Native Language Typology from Foreign Language Usage" Yevgeni Berzak, Roi Reichart and Boris Katz. CoNLL 2014 (long paper). [Paper: pdf]

    2013

    "Improved Lexical Acquisition through DPP-based Verb Clustering" Roi Reichart and Anna Korhonen. ACL 2013 (long paper). [Paper: pdf] [Source Code: download]
    "Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints." Yufan Guo, Roi Reichart and Anna Korhonen. NAACL 2013 (long paper). [Paper: pdf]

    2012

    "Document and Corpus Level Inference For Unsupervised Learning of Information Structure of Scientific Documents." Roi Reichart and Anna Korhonen. COLING 2012 (short paper). [Paper: pdf]
    "A Diverse Dirichlet Process Ensemble for Unsupervised Induction of Syntactic Categories." Roi Reichart, Gal Elidan and Ari Rappoport. COLING 2012 (long paper). [Paper: pdf]
    "CRAB Reader: A Tool for Analysis and Visualization of Argumentative Zones of Scientific Literature." Yufan Guo, Ilona Solinis, Roi Reichart and Anna Korhonen. COLING 2012 (demo paper). [Paper: pdf]
    "Improved Parsing and POS tagging Using Inter-Sentence Consistency Constraints." Alexander M. Rush (*) , Roi Reichart (*), Michael Collins and Amir Globerson , EMNLP 2012 (long paper). (*) - Both authors equally contributed to the paper. [Paper: pdf]
    "Learning to MAP into a Universal POS Tagset." Yuan Zhang, Roi Reichart, Regina Barzilay and Amir Globerson , EMNLP 2012 (long paper). [Paper: pdf]
    "Multi Event Extraction Guided by Global Constraints." Roi Reichart and Regina Barzilay , NAACL 2012 (long paper). [Paper: pdf]
    "You Too?! Mixed-initiative LDA story matching to help teens in distress." , Karthik Dinakar, Birago Jones , Henri Lieberman, Rosalind W. Picard , Carolyn Rose , Matthew Thoman and Roi Reichart, International Conference on Weblog and Social Media (ICWSM) 2012.

    2011

    "Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation." Roy Schwartz, Omri Abend, Roi Reichart and Ari Rappoport, ACL 2011 (long paper). [Paper: pdf] [Source Code: download]
    "Confidence Driven Unsupervised Semantic Parsing." Dan Goldwasser, Roi Reichart, James Clarke and Dan Roth, ACL 2011 (long paper). [Paper: pdf]
    Modeling the Detection of Textual Cyberbullying , Karthik Dinakar, Roi Reichart and Henri Lieberman, International Conference on Weblog and Social Media (ICWSM) - Social Mobile Web Workshop 2011. [Paper: pdf]

    2010

    "Improved Fully Unsupervised Parsing with Zoomed Learning." Roi Reichart and Ari Rappoport , EMNLP 2010 (long paper). [Paper: pdf]
    "Tense Sense Disambiguation: a New Task and a Supervised Algorithm." Roi Reichart and Ari Rappoport , EMNLP 2010 (long paper). [Paper: pdf]
    "A Multi-Domain Web-Based Algorithm for POS Tagging of Unknown Words." Shulamit Umansky-Pesin , Roi Reichart and Ari Rappoport , COLING 2010 (long paper). [Paper: pdf]
    "Improved Unsupervised POS Induction through Prototype Discovery." Omri Abend , Roi Reichart and Ari Rappoport , ACL 2010 (long paper). [Paper: pdf]
    "Improved Unsupervised POS Induction Using Intrinsic Clustering Quality and a Zipfian Constraint." Roi Reichart, Raanan Fattal , and Ari Rappoport , CoNLL 2010 (long paper). [Paper: pdf]
    "Type Level Clustering Evaluation: New Measures and a POS Induction Case Study." Roi Reichart (*), Omri Abend (*) , and Ari Rappoport , CoNLL 2010 (long paper). (*) - Both authors equally contributed to the paper. [Paper: pdf]

    2009

    "Sample Selection for Statistical Parsers: Cognitively Driven Algorithms and Evaluation Measures." Roi Reichart and Ari Rappoport , CoNLL 2009 (long paper). Received the best paper award. [Paper: pdf]
    "The NVI Clustering Evaluation Measure." Roi Reichart and Ari Rappoport , CoNLL 2009 (long paper). [Paper: pdf]
    "Automatic Selection of High Quality Parses Created By a Fully Unsupervised Parser." Roi Reichart and Ari Rappoport , CoNLL 2009 (long paper). [Paper: pdf]
    "Superior and Efficient Fully Unsupervised Pattern-based Concept Acquisition Using an Unsupervised Parser." Dmitry Davidov , Roi Reichart and Ari Rappoport , CoNLL 2009 (long paper). [Paper: pdf]
    "Unsupervised Argument Identification for Semantic Role Labeling." Omri Abend , Roi Reichart and Ari Rappoport , ACL 2009 (long paper). [Paper: pdf]

    2008

    "Unsupervised Induction of Labeled Parse Trees by Clustering with Syntactic Features." Roi Reichart and Ari Rappoport , COLING 2008 (long paper, oral presentation). [Paper: pdf ]
    "A Supervised Algorithm for Verb Disambiguation into VerbNet Classes." Omri Abend , Roi Reichart and Ari Rappoport , COLING 2008 (long paper, oral presentation). [Paper: pdf ]
    "Multi-Task Active Learning for Linguistic Annotations." Roi Reichart(*), Katrin Tomanek(*), Udo Hahn and Ari Rappoport , ACL 2008 (long paper, oral presentation). (*) - Both authors equally contributed to the paper. [Paper: pdf ]
    "Extraction of Entailed Semantic Relations Through Syntax-based Comma Resolution." Vivek Srikumar, Roi Reichart, Mark Sammons, Ari Rappoport and Dan Roth , ACL 2008 (long paper, oral presentation). [Paper: pdf ]

    2007

    "Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets." Roi Reichart and Ari Rappoport, ACL 2007 (long paper, oral presentation). [Paper: pdf ]
    "An Ensemble Method for Selection of High Quality Parses." Roi Reichart and Ari Rappoport , ACL 2007 (long paper, oral presentation). [Paper: pdf ]

    Students



    PhD Students


    • Rotem Dror

    • Lilach Edelstein

    • Amir Feder

    • Daniela Gerz (at Cambridge, together with Anna Korhonen)

    • Ira Leviant

    • Yftah Ziser


    MSc Students


    • Reut Apel

    • Nadav Oved

    • Amichay Doitch

    • Ram Yasdi

    • Dor Zohar


    Alumni


    • Rotem Dror (MSc, moved to direct PhD)

    • Ofer Givoli (Msc)

    • Hagar Leoub (MSc)

    • Ira Leviant (MSc, moved to direct PhD)

    • Rivka Malka (MSc)

    • Lotem Peled (MSc)

    • Iftah Peretz (MSc)

    • Guy Rotman (MSc)

    • Meiran Rubinstein (MSc)

    • Ilan Tchernovitz (MSc)

    • Yftah Ziser (MSc, moved to direct PhD)


    Data


    Multilingual data for language modeling : LM training data in 50 morphologically diverse languages (see Gerz et al., TACL 2018 paper).

    WIW (word, translation and image) : English words, their translation to several languages and an image that describes the word (see Rotman et al., ACL 2018 paper).

    Sarcasm Sign: 3000 sarcastic tweets, each with five honest (non-sarcastic) interpretations. See (Peled and Reichart, ACL 2017) for details.

    sp-embeddings: Word embeddings based on word co-occurrence in symmetric patterns that were acquired from text in an unsupervised manner. These vectors achieve state-of-the-art results on the SimLex999 word similarity dataset, and are particularly useful for verb similarity prediction.

    SimLex-999: SimLex-999 provides similarity scores for 999 word pairs - comprised of nouns, verbs and adjectives. It was annotated through the Amazon Mechanical Turk with annotation guidelines that emphasize the similarity, rather than association, between the words in the pair. This is in contrast to existing datasets that where annotation was done with guidelines that were strongly biased towards association. The data set was created in a collaboration with Felix Hill and Anna Korhonen from the University of Cambridge.

    MultilingualWS353&MultilingualSimLex999: Translated versions of the WordSim353 and SimLex999 datasets to German, Italian and Russian. The translated datasets were scored according to translated versions of the scoring guidlines of the original datasets. The new scores were produced by crowd workers, all fluent speakers of the target language.

    simVerb-3500: A gold standarad evlauation resource for verb pair similarity, providing 3500 verb pairs with human similarity ratings on the scale of 1-10. The resource covers all normed verb type from the USF free-association database, and provides at least three examples for each verbNet class.

    Verb Similarity Dataset (Baker, Reichart and Korhonen, EMNLP 2014): 143 pairs of verbs annotated by 10 annotators following the WS-353 guidelines. The data was used for evaluation in the paper and collected together with my co-authors. Please cite the paper when using it.

    Code


    Clustering Evaluation Code: MATLAB code for Information-theoretic and mapping-based measures for clustering evaluation (Reichart and Rappoport, CoNLL 2009). The IT-based measures are: V,VI and NVI. The mapping-based measures are two variants of greedy mapping: many-to-one and one-to-one (based on the Kuhn-Munkres algorithm).

    Clustering with DPPs: MATLAB code for DPP-based hierarchical clustering (Reichart and Korhonen, ACL 2013). The algorithm achieves state-of-the-art verb clustering performance and is particularly suited for joint modeling.