Arne Rubehn
Arne Rubehn

Curriculum vitae
since 2023: PhD Student at the Chair for Multilingual Computational Linguistics, University of Passau.
2019-2023: Master of Arts, Computational Linguistics, University of Tübingen.
03-07/2018: Abroad studies (ERASMUS+), Applied Linguistics, Universitat Pompeu Fabra, Barcelona.
2015-2019: Bachelor of Arts, General Linguistics and Latin, University of Tübingen.
Publications
- Rubehn, A., Rzymski, C., Ciucci, L., van Dam, K. P., Kučerová. A., Bocklage, K., Snee, D., Stephen, A., and List, J.-M. (2025). Annotating and Inferring Compositional Structures Across Languages. arXiv:2503.01625 [preprint, under review, not peer-reviewed]. https://doi.org/10.48550/arXiv.2503.01625
- Snee, D., Ciucci, L., Rubehn, A., van Dam, K. P., and List, J.-M. (2025). Unstable Grounds for Beautiful Trees? Testing the Robustness of Concept Translations in the Compilation of Multilingual Wordlists. arXiv:2503.00464 [preprint, under review, not peer-reviewed]. https://doi.org/10.48550/arXiv.2503.00464
- Rubehn, A. and List, J.-M. (2025). Partial Colexifications Improve Concept Embeddings. arXiv:2502.09743 [preprint, under review, not peer-reviewed]. https://doi.org/10.48550/arXiv.2502.09743
- Rubehn, A., Nieder, J., Forkel, R., and List, J.-M. (2024). Generating Feature Vectors from Phonetic Transcriptions in Cross-Linguistic Data Formats. In Proceedings of the 2024 Meeting of the Society for Computation in Linguistics (SCiL). https://doi.org/10.7275/scil.2144
- Rubehn, A., Montemagni, S., and Nerbonne, J. (2024). Extracting Tuscan phonetic correspondences from dialect pronunciations automatically. Language Dynamics and Change, 14(1), 1-33. https://doi.org/10.1163/22105832-bja10034
- Rubehn, A. (2022). A feature-based neural model of sound change informed by global lexicostatistical data. Master's thesis, Eberhard Karls Universität Tübingen. https://doi.org/10.15496/publikation-94055
Focus areas
I am a PhD student within the „ProduSemy“ project and focus on computer-assisted, data-driven methods for historical linguistics. I aim at advancing comparative historical linguistics by the means of intelligent algorithmic methods that can alleviate researchers’ workload by processing large-scale data efficiently. My current research focuses on embedding of "intuitive" linguistic knowledge to make it accessible for computational methods as well.
I have studied Computational Linguistics, General Linguistics, and Latin at the University of Tübingen. Within my MA thesis project I have trained a neural network that estimates global probabilities for arbitrary sound changes. Additionally, I have years of working experience as a software developer for EtInEn (Etymological Inference Engine), a software for historical linguists that is being developed at the Linguistic Department in Tübingen.