<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><id>https://calclab.org/</id><title>Computer-Assisted Language Comparison</title><updated>2026-04-04T19:35:34.622598+00:00</updated><author><name>Johann-Mattis List</name></author><link href="https://calclab.org/"/><generator uri="https://lkiesow.github.io/python-feedgen" version="1.0.0">python-feedgen</generator><entry><id>https://calclab.org/#2017-03-31-website</id><title>Project Website Online</title><updated>2017-03-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In time with the official start of the project, we are glad to announce that the official project website is now online. 
It is without question that this website will be refined during the project duration, but the basic infrastructure is now there, and those interested in our project will be able to follow our news.</p></div></content><link href="https://calclab.org/?news=2017-03-31-website#2017-03-31-website"/><published>2017-03-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-04-02-eacl</id><title>Attending the EACL Conference in Valencia</title><updated>2017-04-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Next week, I will attend the conference of the European Chapter of the Association of Computational Linguistics.
After Lyon in 2012, this is my second EACL, and I will be involved in two presentations, one together with Gerhard Jäger and Pavel Sofroniev on automatic cognate detection, and one where I present the current state of my <a href="http://edictor.digling.org">EDICTOR</a> tool for computer-assisted language comparison.</p></div></content><link href="https://calclab.org/?news=2017-04-02-eacl#2017-04-02-eacl"/><published>2017-04-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-04-12-zurich</id><title>Mini-Workshop on Sino-Tibetan Phylogenies in Zürich</title><updated>2017-04-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We had an interesting small workshop in Zürich where my former colleagues from
Paris, Laurent Sagart, Guillaume Jacques, and Yunfan Lai, with whom I pursue
the goal to establish a larger lexicostatistic database of Sino-Tibetan
languages, as well as people from Balthasar Bickel's team were present.  We
presented our respective work we have done so far, and I myself gave a talk on
my ideas regarding a <a href="http://lingulist.de/documents/talks/list-2017-sino-tibetan-database.html">Sino-Tibetan Lexicostatistic
Database</a>.
We will all keep collaborating in the future and potentially organize a second
meeting, either in Paris or in Jena, later during this year.</p></div></content><link href="https://calclab.org/?news=2017-04-12-zurich#2017-04-12-zurich"/><published>2017-04-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-04-22-poetry</id><title>Mini-Workshop on Poetry</title><updated>2017-04-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Thursday, last week, we had a mini-workshop on poetry for which we invited colleagues from the Max Planck Institute for Empirical Aesthetics and from the University of Zurich. It may look strange on first sight why poetry would matter for computer-assisted language comparison, but the poetic tradition of rhyming in the history of Chinese in fact plays a crucial role for the reconstruction of the oldest stages of the languages. I myself devoted two recent studies to the application of network approaches to study Old Chinese phonology which are currently in the final phase of editing and will hopefully appear soon (the draft for one study can be found <a href="http://lingulist.de/papers.html">here</a>). In my talk, I presented this research quickly (the slides are <a href="http://lingulist.de/documents/talks/list-2017-poetic-function.html">here</a>), and pointed to future questions on the dynamics underlying the development of poetic traditions from a cross-linguistic and historical perspective.</p>
<p>The other speakers discussed many interesting topics, ranging from empirical studies on poetry and how one can annotate the important factors that constitute poetic speech (Winfried Menninghaus and Christine Knoop, MPI-AE), via the automatic detection of rhyme patterns in German poetry (Thomas Haider MPI-AE), up to tquestions of language contact and cultural exchange (Paul Widmer, UZH), and the co-evolution of linguistic and poetic forms (Cormac Anderson, MPI-SHH). Our discussions during the talks were long, and since we had to stop at some point, there was no time for the talk by Olivier Morin (MPI-SHH) on "poetry as super-week communication". This was a definit loss, as I saw when Olivier shared his slides afterwards, but luckily we are working in the same department, and nothing will prevent us to go on with discussions and exchange of ideas. </p>
<p>We all decided to stay in close contact and keep each other informed on future ideas as well as concrete research, and it is quite likely that at some point in the not-so-far future, I will present more of this here.</p></div></content><link href="https://calclab.org/?news=2017-04-22-poetry#2017-04-22-poetry"/><published>2017-04-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-05-15-tutorial</id><title>LingPy-Tutorial at the Quantitative Methods Spring School</title><updated>2017-05-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, we had a spring school on Quantitative Methods here in Jena. This is an annual event, and it was the second time that it took place, with Fiona Jordan organizing the main event, and many interesting scientists coming here as tutors or students for one full week (seven days), which was quite exhausting but also very interesting. 
This time, I gave a tutorial on <a href="http://lingpy.org">LingPy</a>, introducing the basic ideas of automatic sequence comparison and how it can be used to get started on computer-assisted work flows. You will find the tutorial online <a href="https://github.com/shh-dlce/qmss-2017/blob/master/LingPy/Sequence%20Comparison%20with%20LingPy%20(Johann-Mattis%20List).ipynb">here</a> in form of an Ipython Notebook, but you can likewise download the <a href="http://lingulist.de/documents/tutorials/list-2017-sequence-comparison-lingpy-tutorial.pdf">pdf</a> or follow my introductory <a href="http://lingulist.de/documents/talks/list-2017-lingpy-tutorial.html">slides</a>. All in all, this tutorial will provide you with all the most recent information needed to start making your own analyses with LingPy. </p></div></content><link href="https://calclab.org/?news=2017-05-15-tutorial#2017-05-15-tutorial"/><published>2017-05-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-05-19-abschlussbericht</id><title>Final Report of SinDial Project </title><updated>2017-05-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My DFG-funded research project on <em>Vertical and lateral aspects of Chinese dialect history</em> officially ended on December 31, 2016. From January 2015 until December 2016 I had two very interesting but also challenging years during which I made acquaintance with many different scholars from different disciplines and countries but also with many new approaches and methods to historical linguistics and related disciplines. </p>
<p>Having submitted my final report in April (first time for a long time I wrote in German again), and hoping that the reviewers do not have anything grave to complain about, I now published the report online with <a href="http://zenodo.org">Zenodo</a>, and you can find it online <a href="https://doi.org/10.5281/zenodo.581413">here</a>.</p>
<p>In case you wonder why I recommend this final report in the context of the CALC project, the answer is simple: Much of the ideas that I put into the project application for the CALC project were developed while I was in Paris, funded by the DFG, so in some sense, the SinDial project on Chinese dialects was the root of CALC. </p></div></content><link href="https://calclab.org/?news=2017-05-19-abschlussbericht#2017-05-19-abschlussbericht"/><published>2017-05-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-06-15-papers</id><title>New Papers Accepted </title><updated>2017-06-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two new papers have been accepted during the last two weeks, and I am very glad about both publications, since they cover topics that touch the core of my project on computer-assisted language comparison. </p>
<p>The first is joint work with Nathan W. Hill (SOAS, London), and titled "Challenges of annotation and analysis in computer-assisted language comparison: A case study on Burmish languages". In this paper, we point to general annotation challenges when analysing South-East Asian languags in which compounding is frequent and sound correspondences are often hard to discover. We present a new database of cognate sets across 8 Burmish languages, all coded for partial cognacy, and consistently aligned. The final version of the paper which we submitted as our final version to the <em>Yearbook of the Poznań Linguistic Meeting</em> is available <a href="http://lingulist.de/documents/papers/hill-list-2017-challenges-of-annotation.pdf">here</a>.</p>
<p>The second paper is joint work with Gerhard Jäger (University Tübingen), and concentrates on a problem which is often overlooked in the literature, namely the problem of how well current algorithms infer which word forms where used to express a given concept in ancestral, unattested languages. This is not a trivial problem, and we only address it from the perspective of the classical lexicostatistical word lists, where we test on three datasets (Indo-European, Austronesian, and Chinese) how well different algorithms infer the ancestral states as they are predicted by the gold standard (the proto-forms provided along with the datasets). It turns out that the algorithms do not perform very well (unfortunately, MLN, an algorithm on which I worked a lot myself, performs even worst), but when looking at the gold standard in detail, we realized that many of the errors are due to problems with the gold standards, which are themselves quite inconsistent and not very trustworthy. As a result, we think that using ancestral state reconstruction methods for this purpose of "onomasiological reconstruction" might actually really help to get a better estimate. The draft of the paper can be found <a href="http://www.sfs.uni-tuebingen.de/~gjaeger/publications/asrLDC.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2017-06-15-papers#2017-06-15-papers"/><published>2017-06-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-06-29-blogs</id><title>New Blog Posts and Papers</title><updated>2017-06-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I have published a couple of new papers recently, but since they go back to my former research project and were not directly developed as part of the CALC project, I do not list them in the list of papers. They are, however, quite important for our research, since they both deal with Old Chinese reconstruction. </p>
<p>The first paper is in "Using network models to analyze Old Chinese rhyme data" and will soon officially appear in the Bulletin of Chinese Linguistics. In the meantime, you can find my author's copy <a href="http://lingulist.de/documents/papers/list-2017-rhyme-networks-chinese.pdf">here</a>.</p>
<p>The second paper is on "Vowel purity and rhyme evidence in Old Chinese reconstruction" (common work with my colleagues from Paris and London, Jananan S. Pathmanathan, Eric Bapteste, Philippe Lopez, and Nathan W. Hill) finally came out today, and you can find the PDF for download <a href="https://link.springer.com/content/pdf/10.1186%2Fs40655-017-0021-8.pdf">here</a>.</p>
<p>I also wrote two more blog posts, both devoted to language comparison in general and my view on computer-assisted language comparison in particular. The first blog post (in English) is titled "Trees do not necessarily help in linguistic reconstruction" can be found <a href="http://phylonetworks.blogspot.de/2017/06/trees-do-not-necessarily-help-in.html">here</a>, and the second one deals with sound change and explains them with help of tooth-loss in comic books and can be found <a href="https://wub.hypotheses.org/109">here</a>.</p></div></content><link href="https://calclab.org/?news=2017-06-29-blogs#2017-06-29-blogs"/><published>2017-06-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-07-09-talks</id><title>Three new talks during a busy week </title><updated>2017-07-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>From Friday, 30th of June, until last Friday, 7th of July, I was giving three talks on three different topics. It started with a summary on the potential of <a href="https://speakerdeck.com/lingulist/network-approaches-to-old-chinese-reconstruction">networks approaches in Old Chinese reconstruction</a> in Paris, after which I was very surprised that many scholars seem to support the idea of handling Chinese character formation with directed networks (and I hope that I will find time to address this soon, even if only in a small example). After that, I gave a talk in Liège on colexifications and cross-linguistic polysemies, and how we plan to update the <a href="http://clics.lingpy.org">CLICS database</a> when we launch <a href="https://speakerdeck.com/lingulist/clics-2-dot-0-towards-and-improved-handling-of-cross-linguistic-colexification-patterns">CLICS 2.0</a>. Finally, I introduced some basic ideas on how to handle lexical and etymological data within the <a href="http://cldf.clld.org">Cross-Linguistic Data Formats</a> initiative, focusing specifically on <a href="https://speakerdeck.com/lingulist/annotation-and-analysis-of-cross-linguistic-lexical-data-in-historical-linguistics">annotation and analysis</a>. Although it was quite exhaustive to prepare all these talks, I am glad that I scheduled them for this time, since it forced me to push a couple of important projects, such as cross-linguistic colexifications, and the cross-linguistic data formats, which are all central for computer-assisted language comparison in general, and also important for Sino-Tibetan in specific.</p>
<p>After one week in Jena, where I'll try to catch up with the work I could not finish yet, I'll finally have two weeks of holidays until beginning of August, interrupted only from another talk in Cologne next week on Friday, in which me and Nathan Hill will present some interesting new work on Burmish languages (I'll report later in more detail).</p></div></content><link href="https://calclab.org/?news=2017-07-09-talks#2017-07-09-talks"/><published>2017-07-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-08-01-blogs</id><title>Back from Holidays</title><updated>2017-08-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Having been traveling for about two weeks, interrupted by a talk I gave in Cologne, I am now back at work and finally find time to announce some news on what happened recently. First, there are two new blogposts I wrote, one in English on <a href="http://phylonetworks.blogspot.de/2017/07/more-on-similarities-in-linguistics.html">similarities in linguistics</a>, a follow-up to a blogpost I devoted to the same topic <a href="http://phylonetworks.blogspot.de/2017/01/similarities-and-language-relationship.html">earlier this year</a>. The other blogpost in German is devoted to <a href="http://wub.hypotheses.org/117">impoliteness (Unhöflichkeit)</a> in Chinese and other languages. Second, there is the talk I gave together with Nathan W. Hill in Cologne, on a workshop on the regularity of sound change, organized by Eugen Hill and Robert Mailhammer. In our talk, titled <a href="https://speakerdeck.com/lingulist/computer-assisted-approaches-to-linguistic-reconstruction">"Computer-assisted approaches to linguistic reconstruction"</a> , we outlined a new framework for automated linguistic reconstruction which we illustrated with examples from the Burmish languages. </p></div></content><link href="https://calclab.org/?news=2017-08-01-blogs#2017-08-01-blogs"/><published>2017-08-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-08-02-yunfan</id><title>New Post-Doc in CALC</title><updated>2017-08-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>It is my pleasure to welcome <a href="http://khroskyabs.info/">Yunfan Lai</a> as a post-doc in the CALC project. He has a lot of experience in working with Sino-Tibetan languages and devoted his PhD to Khroskyabs, a very interesting branch of Sino-Tibetan whose history is still not clearly understood. As a member of CALC, Yunfan will pursue his studies on Khroskyabs and related varieties, and also provide help to uncover the mysterious history of Sino-Tibetan. </p></div></content><link href="https://calclab.org/?news=2017-08-02-yunfan#2017-08-02-yunfan"/><published>2017-08-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-08-09-hudoc</id><title>Talk at the Human Document Project </title><updated>2017-08-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, I visited the <a href="http://hudoc2017.manucodiata.org/">Human Document Project 2017</a> in Freiburg, a project that seeks to preserve information about humans beyond the existence of the human race. As scify as this may sound on the first sight, as interesting it is, how many different questions and disciplines need to be involved into the plan of creating a time capsule that could witness of our existence even if we, that is, humanity, no longer exists. They invited philosophers, artists, technicians, data-experts, informaticians, physicists, and also me, as a linguist, whose job it was to give a rough overview on linguistic diversity and how we try to represent our knowledge about it. Although my talk, titled <a href="http://hudoc2017.manucodiata.org/index.php/slides?view=download&amp;id=13">Storing our knowledge of linguistic diversity: Towards the standardization of cross-linguistic data formats</a> did not involve the longer perspective of the next million years, I had the impression that it triggered the interest of the colleagues. While I remain sceptical about the general usefulness of science fiction questions in science, I have to admit that the day I spent in Freiburg was very inspiring, as I learned so many new things. Maybe, in the end, this is even the more important aspect of the HUDOC project: bringing together people from different disciplines and having them talk with each other...</p></div></content><link href="https://calclab.org/?news=2017-08-09-hudoc#2017-08-09-hudoc"/><published>2017-08-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-08-10-yunfanthesis</id><title>Yunfan Lai's PhD thesis is now online</title><updated>2017-08-10T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Hi there. I defended my thesis back in June, but after a large gap, I failed to motivate myself to upload it online. Now I finally did it. 
You may now have a look at my thesis <a href="https://tel.archives-ouvertes.fr/tel-01571916/">here</a>.
Have fun!</p></div></content><link href="https://calclab.org/?news=2017-08-10-yunfanthesis#2017-08-10-yunfanthesis"/><published>2017-08-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-08-17-blogposts</id><title>New Blog Posts for August </title><updated>2017-08-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote two new blogposts in August, one in German on the benefits of using alignments and similar visualization techniques more broadly in the media, which you can find <a href="http://wub.hypotheses.org/128">here</a>, and one in English, where I discuss the problem of unattested character states in phylogenetic reconstruction, specifically in linguistics, which you can find <a href="http://phylonetworks.blogspot.de/2017/08/unattested-character-states.html">here</a>.</p></div></content><link href="https://calclab.org/?news=2017-08-17-blogposts#2017-08-17-blogposts"/><published>2017-08-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-09-06-dot</id><title>CALC and DLCE Organize Panel at the Deutscher Orientalistentag (Jena)</title><updated>2017-09-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The <a href="http://www.shh.mpg.de/518807/dlce-research-projects">DLCE</a> and <a href="http://calc.digling.org">CALC</a> are organizing a panel on the <a href="http://www.dot2017.de/en/">Deutscher Orientalistentag</a>, which will take place in Jena this year (September 18-22). On September 21, from 9am to 1pm scientists from the institute and external guests will share and discuss their thoughts on the topic "Languages as keys to our past". </p>
<p>We will soon provide more information on the list of speakers and their abstracts.</p></div></content><link href="https://calclab.org/?news=2017-09-06-dot#2017-09-06-dot"/><published>2017-09-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-09-07-update</id><title>Schedule and Abstracts for DOT Panel on Historical Linguistics Online</title><updated>2017-09-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just finalized the first version of our website for the Panel on "Languages as Keys to Our Past" which we organize for the <a href="http://www.dot2017.de">33. Deutscher Orientalistentag „Asien, Afrika und Europa“</a>. The website can be found <a href="http://calc.digling.org/dot.html">here</a>. Later, we will also link the slides of all speakers in PDF format.</p></div></content><link href="https://calclab.org/?news=2017-09-07-update#2017-09-07-update"/><published>2017-09-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-09-11-interview</id><title>Radio Interview on Language Diversity</title><updated>2017-09-11T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, I gave a radio interview with <a href="http://www.deutschlandfunknova.de">Deutschlandfunk Nova</a> in which I tried my best to answer questions regarding language diversity and its driving forces. The interview, which was broadcasted yesterday, can also be found online <a href="https://www.deutschlandfunknova.de/beitrag/linguistik-warum-wir-viele-sprachen-sprechen">under this link</a>.</p></div></content><link href="https://calclab.org/?news=2017-09-11-interview#2017-09-11-interview"/><published>2017-09-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-09-14-paper</id><title>New Paper on Annotation in Historical Linguistics</title><updated>2017-09-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I am proud to announce that a paper in which me and Nathan Hill discuss <a href="https://www.degruyter.com/view/j/yplm.2017.3.issue-1/yplm-2017-0003/yplm-2017-0003.xml">Challenges of Annotation and Analysis in Computer-Assisted Language Comparison</a> has now been published online and can be freely downloaded form <a href="https://www.degruyter.com/downloadpdf/j/yplm.2017.3.issue-1/yplm-2017-0003/yplm-2017-0003.xml">this</a> link. The paper discusses general challenges of annotation for the purpose of historical language comparison and also introduces first ideas on how to solve these challenges. Here is the abstract:</p>
<blockquote>
<p>The use of computational methods in comparative linguistics is growing in popularity. The increasing deployment of such methods draws into focus those areas in which they remain inadequate as well as those areas where classical approaches to language comparison are untransparent and inconsistent. In this paper we illustrate specific challenges which both computational and classical approaches encounter when studying South-East Asian languages. With the help of data from the Burmish language family we point to the challenges resulting from missing annotation standards and insufficient methods for analysis and we illustrate how to tackle these problems within a computer-assisted framework in which computational approaches are used to pre-analyse the data while linguists attend to the detailed analyses.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2017-09-14-paper#2017-09-14-paper"/><published>2017-09-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-09-15-poznan</id><title>DLCE, CALC, and University Jena Co-Organize Workshop at the Poznań Linguistic Meeting</title><updated>2017-09-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The <a href="http://www.shh.mpg.de/518807/dlce-research-projects">DLCE</a> (Cormac Anderson, Paul Heggarty) <a href="http://calc.digling.org">CALC</a> (Johann-Mattis List), and Friedrich Schiller University Jena (Adrian Simpson) are co-organizing a workshop as part of the <a href="http://wa.amu.edu.pl/plm/2017/Home">Poznań Linguistic Meeting</a> on Monday, September 18. For more information, see the <a href="http://calc.digling.org/events/poznan.html">workshop website</a> which has just been launched.</p></div></content><link href="https://calclab.org/?news=2017-09-15-poznan#2017-09-15-poznan"/><published>2017-09-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-09-21-blog</id><title>New Blog Post on Authority Arguments </title><updated>2017-09-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two days ago, I wrote another blogpost, this time in <a href="http://phylonetworks.blogspot.de/2017/09/arguments-from-authority-and-cladistic.html">Arguments from authority, and the Cladistic Ghost, in historical linguistics</a>. This may look like an offensive argument I make there, but my major intention was to draw the attention to the fact that our "classical" comparative method was never classical in any sense, as it is just a label we use to denote what we do to compare languages, and that, in the light of new approaches, we should not be too dismissive, but rather try to work harder on integrated, computer-assisted frameworks, which will hopefully enable us to understand better, how our languages evolved into their current shape.</p></div></content><link href="https://calclab.org/?news=2017-09-21-blog#2017-09-21-blog"/><published>2017-09-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-10-02-blogpost</id><title>New Blog Posts for September </title><updated>2017-10-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On a last-minute-note I managed to write my monthly German blogpost for September, which was published last Saturday and deals with freedom and obligation in languages: <a href="http://wub.hypotheses.org/139">Wahlpflicht und Wahlfreiheit in der Sprache</a>. </p>
<p>You may further notice, that we have added a <a href="http://calc.digling.org?events">events</a> section to this website, in which we list upcoming and past events which were carried out as part of the CALC research project.</p></div></content><link href="https://calclab.org/?news=2017-10-02-blogpost#2017-10-02-blogpost"/><published>2017-10-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-10-30-back</id><title>New Blogposts and Lectures Online </title><updated>2017-10-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My German blog post for this month is devoted to the case system in Russian, titled <a href="https://wub.hypotheses.org/147">Ein Fall für Tee</a>. In this post, I discuss how difficult it is in linguistics to identify true regularities without exceptions. </p>
<p>In addition, my lecture series which I gave at Tianjin university is now available online. You can download the full lecture <a href="http://lingulist.de/documents/lectures/list-2017-lecture-ws-calc.pdf">here</a>.</p>
<p>My monthly post for <a href="http://phylonetworks.blogspot.de/">The Genealogical World of Phylogenetic Networks</a> also appeared. This time, I collaborated with Guido Grimm to investigate cross-linguistic naming patterns for domesticated animals, like <em>cat</em>, <em>dog</em>, <em>goat</em>, and <em>sheep</em>.</p></div></content><link href="https://calclab.org/?news=2017-10-30-back#2017-10-30-back"/><published>2017-10-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-11-23-lingpy</id><title>LingPy 2.6 released! </title><updated>2017-11-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We just released LingPy 2.6, which in addition to some smaller changes that may prove useful was fully concentrated on stabilizing the behavior of the algorithms and making the package easier to use. </p>
<p>Documentation of the package can be found, as usually, at <a href="http://lingpy.org">http://lingpy.org</a>, and the package itself can be downloaded from the traditional channels, be it <a href="https://doi.org/10.5281/zenodo.1065403">Zenodo</a>, <a href="https://github.com/lingpy/lingpy">GitHub</a>, or <a href="https://pypi.python.org/pypi/lingpy">PyPi</a>.</p></div></content><link href="https://calclab.org/?news=2017-11-23-lingpy#2017-11-23-lingpy"/><published>2017-11-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-11-29-blogs</id><title>Two new blog posts and website for upcoming workshop</title><updated>2017-11-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>During the last days, we managed to finish two new blogpost. The first is a follow-up from our earlier blogpost on animal names, this time devoted to <a href="http://phylonetworks.blogspot.de/2017/11/man-gave-names-to-all-those-animals.html">goats and sheep</a>. The second blogpost (in German) is devoted to "hybrid pronunciations", exemplified with help of the debate about the <a href="http://wub.hypotheses.org/160">Jamaica coalition</a> in Germany.</p>
<p>I would also like to announce that our upcoming workshop on "Old Chinese and Friends" is gaining more structure (to take place from 26-27 of April, 2018, in Jena), and we have just managed to launch the <a href="http://calc.digling.org/events/ocaf.html">project website</a> along with the <a href="http://calc.digling.org/events/abstracts/ocaf-call.pdf">call for abstracts</a> online.</p></div></content><link href="https://calclab.org/?news=2017-11-29-blogs#2017-11-29-blogs"/><published>2017-11-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-12-01-posts</id><title>Three Positions Available in the CALC Project</title><updated>2017-12-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our research project offers three positions for three years each, two for doctoral students, and one for a post-doc (the post-doc position is initially for two years with an option for a one-year extension after positive evaluation after the first year). Starting date is April 2018, and deadline for the submission of applications is end of January. The call for post-docs with all details, can be found <a href="http://calc.digling.org/resources/job-postdoc2-calc_english.pdf">here</a> and the call for doctoral studens can be found <a href="http://calc.digling.org/resources/job-doc-calc_english.pdf">here</a>. </p></div></content><link href="https://calclab.org/?news=2017-12-01-posts#2017-12-01-posts"/><published>2017-12-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2017-12-19-blogs</id><title>Final blog posts for 2017</title><updated>2017-12-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A week earlier than normal, my final blog posts for this year have now appeared, the German one is titled <a href="http://wub.hypotheses.org/169">Die Angst des Jongleurs vorm Fallenlassen</a> and discusses the problem of letting things go (especially if one tries to juggle). The English one is titled <a href="http://phylonetworks.blogspot.de/2017/12/the-art-of-doing-science-alignments-in.html">The art of doing science: alignments in historical linguistics</a> and discusses what we should keep in mind when using alignments in historical linguistics.</p></div></content><link href="https://calclab.org/?news=2017-12-19-blogs#2017-12-19-blogs"/><published>2017-12-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-01-15-sicl</id><title>Call for papers on a special issue on "Computational approaches in historical linguistics after the quantitative turn"</title><updated>2018-01-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I would like to point to a call for papers for the journal
"Computational Linguistics" on "Computational approaches in historical
linguistics after the quantitative turn", guest-edited by Taraka Rama,
Simon J. Greenhill, Harald Hammarström, Gerhard Jäger, and myself.</p>
<p>The deadline is July 15, 2018, and detailed information can be in <a href="http://calc.digling.org/resources/call-si-cl.pdf">this PDF</a>.</p></div></content><link href="https://calclab.org/?news=2018-01-15-sicl#2018-01-15-sicl"/><published>2018-01-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-01-23-blog</id><title>New Blog on Mayflies</title><updated>2018-01-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just finished my regular German blog post for January, this time on <a href="https://wub.hypotheses.org/181">Terry Pratchett's Eintagsfliege</a> and the question whether language change and language decay are the same phenomenon (which they aren't, of course).</p></div></content><link href="https://calclab.org/?news=2018-01-23-blog#2018-01-23-blog"/><published>2018-01-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-01-29-pronunciation</id><title>English blogpost on pronunciation networks in Chinese phonology </title><updated>2018-01-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>It was in some sense last minute (but planned as such before) to write my monthly blogpost for David Morrison's blog on <a href="https://phylonetworks.blogspot.com">phylogenetic networks</a> in the last week of January. This has now been done (also thanks to David's help in making my English and the story in general more readable), and you can find my blogpost on <a href="http://phylonetworks.blogspot.de/2018/01/networks-of-pronunciation-glosses-in.html">pronunciation networks in Chinese phonology</a> online. </p></div></content><link href="https://calclab.org/?news=2018-01-29-pronunciation#2018-01-29-pronunciation"/><published>2018-01-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-02-13-newdraft</id><title>New Draft Paper on Network Approaches to Historical Chinese Phonology </title><updated>2018-02-13T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I finished preparing a paper that will be presented at this year's LFK Young Scholars Symposium in Taipei. It deals with new network approaches to the discipline of Historical Chinese Phonology, including networks of Chinese character formation and networks of Chinese sound glosses. A draft form of the paper can be found <a href="http://lingulist.de/documents/papers/list-2018-network-approaches-yinyunxue.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-02-13-newdraft#2018-02-13-newdraft"/><published>2018-02-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-02-20-blogpost</id><title>New German Blog Post on Kitchen Etymologies </title><updated>2018-02-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Saturday, I published another German blog post. This time about kitchen etymologies in times of elections, titled <a href="http://wub.hypotheses.org/199">Konservativ kommt wirklich nicht von Konserve</a>. </p></div></content><link href="https://calclab.org/?news=2018-02-20-blogpost#2018-02-20-blogpost"/><published>2018-02-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-02-26-synonymy</id><title>New English Blog Post Synonymy and Phylogenetic Reconstruction </title><updated>2018-02-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my regularly monthly English blog post appeared. This time, it discusses the problem of excessive synonymy in linguistic datasets and its implication for phylogenetic reconstruction: <a href="http://phylonetworks.blogspot.de/2018/02/tossing-coins-linguistic-phylogenies.html">Tossing coins: linguistic phylogenies and extensive synonymy </a>. </p></div></content><link href="https://calclab.org/?news=2018-02-26-synonymy#2018-02-26-synonymy"/><published>2018-02-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-03-08-concepticon</id><title>Finally Released Concepticon 1.1 and New Drafts</title><updated>2018-03-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After two years of hard work on API and data, we have finally managed to release version 1.1 of the <a href="http://concepticon.clld.org">Concepticon resource</a>.  In addition to the general application, we also offer a standalone app with enhanced search functionalities in currently seven languages, which can be found <a href="http://calc.digling.org/concepticon/">here</a>. </p>
<p>Furthermore, we just submitted our final version (before the final proofs) of an paper that will appear some time later in 2018 with the provocative title <a href="http://lingulist.de/documents/papers/jacques-list-2018-save-the-trees-draft.pdf">Save the trees: Why we need tree models in historical linguistics (and when we should apply them)</a>. </p>
<p>Last not least, a short paper on the question <em>Are automatic methods for cognate detection good enough for phylogenetic reconstruction</em> (Taraka Rama, myself, Johannes Wahle, and Gerhard Jäger) was now accepted as a short paper to be presented in form of a poster at the NAACL conference. We're currently revising the draft, but we will try to put a draft close to the final version soon. The results indicate that especially the simpler methods may perform surprisingly well, although we could, unfortunately, only check the topology.</p></div></content><link href="https://calclab.org/?news=2018-03-08-concepticon#2018-03-08-concepticon"/><published>2018-03-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-03-09-latesthinking</id><title>Latest Thinking Interview on Automatic Cognate Detection</title><updated>2018-03-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, an interview on our work on automatic cognate detection from early last year (<a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0170046">List, Greenhill, and Gray 2017</a>) appeared online at the <a href="https://lt.org/publication/how-well-do-automatic-methods-language-comparison-work">Latest Thinking Platform</a>.</p></div></content><link href="https://calclab.org/?news=2018-03-09-latesthinking#2018-03-09-latesthinking"/><published>2018-03-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-03-12-langfing</id><title>New German Blog Post on the Imitation of Unknown Languages </title><updated>2018-03-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>During the weekend, I found time to write my monthly German blog post. This time, I discuss 
how speakers of a given language imitate or joke about speakers from other languages. This topic is linguistically interesting, since it may reveal quite a few things about the speakers who joke about foreign languages, 
as I try to show with German jokes about the Chinese language. The post can be found <a href="https://wub.hypotheses.org/227">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-03-12-langfing#2018-03-12-langfing"/><published>2018-03-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-03-16-interview</id><title>Article in FAZ Discussing Computational Historical Linguistics </title><updated>2018-03-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On March 14, an article appeared in the <a href="https://faz.net">Frankfurter Allgemeine Zeitung</a>, discussing the usefulness and appropriateness of computational approaches in historical linguistics, titled "Bäume der Erkenntnis", by Wolfgang Krischke. The article also presents our department and mentions our attempts to work on a reconciliation of computational and classical historical linguistics. Unfortunately, I cannot share it at the moment, since it did not appear online, but if it does, we will announce it here. </p></div></content><link href="https://calclab.org/?news=2018-03-16-interview#2018-03-16-interview"/><published>2018-03-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-03-26-post</id><title>FAZ Article Online and Blog Post on the Systemic Aspects of Sound Change </title><updated>2018-03-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I am pleased to announce that the <a href="https://faz.net">FAZ</a> article I
mentioned in a <a href="http://calc.digling.org#2018-03-16-interview">previous post</a> has now
appeared online, where you can freely read it: <a href="http://www.faz.net/aktuell/feuilleton/forschung-und-lehre/wie-erforscht-man-urspruenge-streit-der-indogermanistik-15490879.html">Wie erforscht man
Ursprünge?</a>.</p>
<p>Furthermore, my English blogpost for March just appeared. This time, I try to explain why
sound change is so peculiar, and why it cannot be simplified with changes in
DNA or protein sequences due to mutations: <a href="http://phylonetworks.blogspot.de/2018/03/its-system-stupid-more-thoughts-on.html">It's the system,
stupid!</a>. </p></div></content><link href="https://calclab.org/?news=2018-03-26-post#2018-03-26-post"/><published>2018-03-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-04-05-lecture</id><title>Lecture on Computer-Assisted Language Comparison</title><updated>2018-04-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Now that our group has finally been assembled (more info on that regard soon),
we are ready to spread the word by presenting a lecture on computer-assisted
language comparison at the Friedrich-Schiller-University
Jena
during the summer term, regularly on Tuesdays from 14 to 16 o'clock. </p>
<p>The target audience of the lecture are linguists with a background in
historical linguistics and the interest to learn more about computational
approaches. For those interested in joining, you can check out the <a href="http://calc.digling.org/seminar/plan.html">seminar plan</a> or the <a href="http://calc.digling.org/seminar/plan.html">official announcement</a> of the lecture with FSU Jena.</p></div></content><link href="https://calclab.org/?news=2018-04-05-lecture#2018-04-05-lecture"/><published>2018-04-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-05-03-ocaf</id><title>Old Chinese and Friends workshop ended successful</title><updated>2018-05-03T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our workshop, <a href="http://calc.digling.org/events/ocaf.html">Old Chinese and Friends</a>, held from 26/04/2018 to 27/04/2018, has enjoyed high appreciations from the participants.</p>
<p>17 renowned scholars from all over the world participated in the workshop and had their say. The presentations covered a wide range of domains concerning Old Chinese, including historical phonology, morphology, methodology and paleography. New ideas, opinions and hypotheses have successfully found their way to Jena, a city which can now be named the home of historical linguistics.</p>
<p>The participants were also amazed at how much time they were given for discussion at the end of each day. The discussion sessions were full of insightful questions and comments. </p>
<p>The workshop embraced language variety, as both English and Chinese presentations and discussions were accepted. </p>
<p>Old Chinese and Friends is a follow-up workshop of the 2016 <a href="https://www.soas.ac.uk/south-asia-institute/events/asia-beyond-boundaries/05nov2015-recent-advances-in-old-chinese-historical-phonology.html">Recent Advances in Old Chinese Historical Phonology</a> at SOAS, University of London, UK. The aim of this workshop is to share new findings and results in the field of the Old Chinese language. </p>
<p>Apart from this exciting event, the month of April has also seen the acceptance of my paper on "Relativisation in Wobzi Khroksyabs and the integration of genitivisation" with <a href="https://benjamins.com/catalog/ltba">Linguistics of the Tibeto-Burman Area</a>. This article is the first contribution on sentential construction of the Khroskyabs language (more info coming soon). </p></div></content><link href="https://calclab.org/?news=2018-05-03-ocaf#2018-05-03-ocaf"/><published>2018-05-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-05-16-lingpy</id><title>LingPy Tutorial Accepted </title><updated>2018-05-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our tutorial for the LingPy library, which describes in detail how cognate detection and alignment analyses can be carried out with help of LingPy 2.6, has now finally been accepted for publication with the <a href="https://academic.oup.com/jole">Journal of Language Evolution</a>. This tutorial is supposed to represent the state of the art of what can be done with LingPy and how it should be done. It was prepared in collaboration with Mary Walworth, Simon Greenhill, Tiago Tresoldi, and Robert Forkel, and reflects the strong collaboration between different members of our Department of Linguistic and Cultural Evolution and our CALC research group. The draft of the paper can be found <a href="http://lingulist.de/documents/papers/list-et-al-2018-lingpy-tutorial.pdf">here</a> and the tutorial itself is available from <a href="https://github.com/lingpy/lingpy-tutorial">GitHub</a>.</p></div></content><link href="https://calclab.org/?news=2018-05-16-lingpy#2018-05-16-lingpy"/><published>2018-05-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-05-18-blog</id><title>German Blog Post </title><updated>2018-05-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published my traditional German blog post for May. This time, I discuss different aspects of linguistic variations: <a href="https://wub.hypotheses.org/248">Ur-in-stinkt: Grenzen und Chancen der Schriftsprache</a>.</p></div></content><link href="https://calclab.org/?news=2018-05-18-blog#2018-05-18-blog"/><published>2018-05-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-05-24-clics</id><title>New Version of CLICS </title><updated>2018-05-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We have completely relaunched the database of cross-linguistic colexifications
with help of the CLLD framework, which is now available as a beta-release at
http://clics.clld.org. Our paper (together with Simon Greenhill, Cormac
Anderson, Thomas Mayer, Tiago Tresoldi, and Robert Forkel) introducing the
database, titled "An improved database of cross-linguistic colexifications" is
available in draft form
<a href="http://lingulist.de/documents/papers/list-et-al-2018-clics-draft.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-05-24-clics#2018-05-24-clics"/><published>2018-05-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-05-26-fieldwork</id><title>Yunfan is going to the field </title><updated>2018-05-26T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I am now packing my stuff, getting ready for a summer fieldtrip in Sichuan. I will be working on Khroskyabs dialects (hopefully several new dialects). I will focus on basically everything, from phonology to morphosyntax. I will also keep an eye on the expressions of geography in this language, as well as animal calling sounds and other interesting stuff. </p></div></content><link href="https://calclab.org/?news=2018-05-26-fieldwork#2018-05-26-fieldwork"/><published>2018-05-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-05-30-reconstruction</id><title>New English Blog Post  </title><updated>2018-05-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Monday, I published my traditional monthly blog post for the Phylogenetic Networks Blog, this time <a href="http://phylonetworks.blogspot.com/2018/05/comparing-reconstruction-systems-in_28.html">Comparing reconstruction systems in historical linguistics</a>.</p></div></content><link href="https://calclab.org/?news=2018-05-30-reconstruction#2018-05-30-reconstruction"/><published>2018-05-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-06-06-calcblog</id><title>New Weblog on Computer-Assisted Language Comparison  </title><updated>2018-06-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After almost two months of preparation, our <a href="http://calc.digling.org">CALC team</a> is now busy preparing the first blogposts for our new weblog on <a href="http://calc.hypotheses.org">Computer-Assisted Language Comparison in Practice</a>. 
In the future, we hope to publish minimaly one post per month, targeting different topics, including methodological discussion notes and fresh tutorials on software, data curation, and data analysis.</p></div></content><link href="https://calclab.org/?news=2018-06-06-calcblog#2018-06-06-calcblog"/><published>2018-06-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-06-10-gender</id><title>New Post on Gender Differences in Language  </title><updated>2018-06-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, I wrote my monthly German post, titled <a href="https://wub.hypotheses.org/269">Huhu, Digga! Geschlechtsunterschiede in der Sprache</a>. I discuss recent phenomena of gender differences in the German language and their potential implications for the debate in German about "fair language" ("gerechte Sprache").</p></div></content><link href="https://calclab.org/?news=2018-06-10-gender#2018-06-10-gender"/><published>2018-06-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-06-18-blog</id><title>CALC Blog Post </title><updated>2018-06-18T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week I published a short info on a dataset we developed for our project, containing all
the parallel translations in the <a href="https://en.wiktionary.org">English Wiktionary</a>. The post
can be found <a href="https://calc.hypotheses.org/32">here</a> and the data is available
on Zenodo as <a href="https://zenodo.org/record/1286991">Parallel Translations from the English Wiktionary</a>.</p></div></content><link href="https://calclab.org/?news=2018-06-18-blog#2018-06-18-blog"/><published>2018-06-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-06-26-horizontal</id><title>New Post on Internal and External Language Comparison  </title><updated>2018-06-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my monthly post for the phylogenetic networks blog appeared, this time discussing <a href="http://phylonetworks.blogspot.com/2018/06/horizontal-and-vertical-language.html">Horizontal and vertical language comparison</a>, that is, the differences in comparing languages internally or externally.</p></div></content><link href="https://calclab.org/?news=2018-06-26-horizontal#2018-06-26-horizontal"/><published>2018-06-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-07-07-sequences</id><title>LingPy Tutorial Published Online  </title><updated>2018-07-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our tutorial on LingPy (common work with Mary Walworth, Simon Greenhill, Tiago Tresoldi, and Robert Forkel) has now appeared online, published with the Journal of Language Evolution. The article is open access and can be downloaded or viewed online under <a href="https://t.co/Mz1AUx9NpX">this link</a>. It reflects the current state of the art of our <a href="http://lingpy.org">LingPy</a> in its 2.6 version. </p></div></content><link href="https://calclab.org/?news=2018-07-07-sequences#2018-07-07-sequences"/><published>2018-07-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-07-19-clics</id><title>Official Release of CLICS 2.0  </title><updated>2018-07-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The CLICS database of cross-linguistic colexifications has now officially been
released in a new version, called CLICS², available at
<a href="http://clics.clld.org">http://clics.clld.org</a>. The database features a
multitude of new data points and a completely new framework for data curation
and data analysis. What is new in this new database is also documented in a forthcoming paper, which will soon be published with Linguistic Typology. You can find the draft of that paper <a href="http://lingulist.de/documents/papers/list-et-al-2018-clics-draft.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-07-19-clics#2018-07-19-clics"/><published>2018-07-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-07-24-blog</id><title>German Blog on Teekesselchen  </title><updated>2018-07-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just wrote another German blog for July, this time focusing on homophony, polysemy, and the game that we used to play when I was young, called <em>Teekesselchen</em> "teapot": <a href="https://wub.hypotheses.org/327">Netzwerke aus Teekesselchen</a>. </p></div></content><link href="https://calclab.org/?news=2018-07-24-blog#2018-07-24-blog"/><published>2018-07-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-07-30-blogpost</id><title>English post on colexification networks  </title><updated>2018-07-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My monthly blogpost in English just appeared online. This time, I am introducing the idea of cross-linguistic colexification networks (as they appear in our <a href="http://clics.lingpy.org">CLICS database</a>): <a href="http://phylonetworks.blogspot.com/2018/07/networks-of-polysemous-and-homophonous.html">Networks of polysemous and homophonous words</a>. </p>
<p><img alt="image" src="https://3.bp.blogspot.com/-mU7C7M_iBes/W1isgYVmr-I/AAAAAAAAAcY/P34kzyeZTKU8yBh35fuQOEQEz8QDh18eACLcBGAs/s400/polysemies.png"/></p></div></content><link href="https://calclab.org/?news=2018-07-30-blogpost#2018-07-30-blogpost"/><published>2018-07-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-08-05-wordhistory</id><title>German blog post on word histories  </title><updated>2018-08-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just published my German blog post for August, this time reflecting about word histories, and how social factors may influence the history of words. You find the post <a href="https://wub.hypotheses.org/367">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-08-05-wordhistory#2018-08-05-wordhistory"/><published>2018-08-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-08-09-clics</id><title>New cookbook entry for CLICS  </title><updated>2018-08-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I published a new cookboor for the usage of the <code>pyclics</code> API with our blog for tutorials and methodological discussions on CALC, which you can find <a href="https://calc.hypotheses.org/384">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-08-09-clics#2018-08-09-clics"/><published>2018-08-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-08-22-clics</id><title>CLICS paper appeared online  </title><updated>2018-08-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper presenting the CLICS² database of cross-linguistic colexifications has now appeared online officially. Unfortunately, the production team of DeGruyter messed up the online version of the article, so Chinese characters are not readable, and Russian characters are turned upside down. Luckily, the PDF version is correct. The paper is open access and can be accessed <a href="https://www.degruyter.com/downloadpdf/j/lity.2018.22.issue-2/lingty-2018-0010/lingty-2018-0010.xml">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-08-22-clics#2018-08-22-clics"/><published>2018-08-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-08-24-sinopy</id><title>First official beta release of SinoPy library  </title><updated>2018-08-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just submitted a first (beta) version of the SinoPy library for quantitative tasks in Chinese historical linguistics. 
SinoPy is an attempt to provide useful functionality for users working with Chinese dialects and Sino-Tibetan language data and struggling with tasks like converting characters to Pinyin, analysing characters, or analysing readings in Chinese dialects and other SEA languages.</p>
<p>You can find the library on <a href="https://github.com/lingpy/sinopy">GitHub</a>, on <a href="https://pypi.org/project/sinopy">PyPi</a>, or on <a href="https://zenodo.org/badge/latestdoi/30593438">Zenodo</a>.</p></div></content><link href="https://calclab.org/?news=2018-08-24-sinopy#2018-08-24-sinopy"/><published>2018-08-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-08-27-cldf</id><title>Paper on CLDF accepted by Scientific Data  </title><updated>2018-08-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper introducing the CLDF (Cross-Linguistic Data Formats, https://cldf.clld.org) initiative has been accepted by Nature's Scientific Data journal:</p>
<blockquote>
<p>Forkel, R., J.-M. List, S. Greenhill, C. Rzymski, S. Bank, M. Cysouw, H. Hammarström, M. Haspelmath, G. Kaiping, and R. Gray (forthcoming): Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics. Scientific Data.</p>
</blockquote>
<p>The paper introduces the basic ideas behind the CLDF standard and provides some examples and background information. I have uploaded the final draft we submitted to Nature <a href="http://lingulist.de/documents/papers/forkel-et-al-2018-cross-linguistic-data-formats.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-08-27-cldf#2018-08-27-cldf"/><published>2018-08-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-08-28-cognates</id><title>Blogpost on terminology for cognate relations  </title><updated>2018-08-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published a new English blogpost, this time introducing a new term for cognate relations in historical linguistics: "regular cognates". The concept is crucial for our terminology, although we are only beginning to develop methods to assess the regularity properly within computer-assisted frameworks. You can find the blogpost <a href="http://phylonetworks.blogspot.com/2018/08/regular-cognates-new-term-for-homology.html">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-08-28-cognates#2018-08-28-cognates"/><published>2018-08-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-09-05-structure</id><title>Blogpost on representing structural data in CLDF  </title><updated>2018-09-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last Monday, I published a blogpost presenting how structural data can be represented in <a href="https://cldf.clld.org">CLDF</a> format. The blog can be found <a href="https://calc.hypotheses.org/445">here</a>, and it includes two datasets that were published before, which are now provided in CLDF format. </p></div></content><link href="https://calclab.org/?news=2018-09-05-structure#2018-09-05-structure"/><published>2018-09-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-09-06-bangime</id><title>Paper with Abbie Hantgan on Bangime and Dogon accepted  </title><updated>2018-09-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A paper by Abbie Hantgan and me, where we present the preliminaries for a computer-assisted analysis of Bangime and its relation to the Dogon languages in Mali, has now been accepted with the Journal of Language Contact. The paper presents a new approach for automatic borrowing detection based on a comparison of different algorithms for automatic cognate detection. Although the approach is rather simple, it seems to be efficient enough to provide initial hints regarding major borrowing partners in language contact scenarios, and it also shows that the mysterious Bangime language remains an isolate, at least with the methods we have at our disposal by now. The draft of the paper is available <a href="http://lingulist.de/documents/papers/hantgan-list-2018-bangime-secret-language.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-09-06-bangime#2018-09-06-bangime"/><published>2018-09-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-09-09-etymology</id><title>New blog post on the dangers of etymologies in our daily life  </title><updated>2018-09-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published my monthly German blogpost, this time discussing the problem of using etymologies in speeches for rhetorical reasons. You can find the post, titled "Von hohen Zeiten und Schlägen in Raten: Vorsicht vor Alltagsetymologien!" <a href="https://wub.hypotheses.org/410">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-09-09-etymology#2018-09-09-etymology"/><published>2018-09-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-09-24-cldf</id><title>New blog post on CLDF for structural data  </title><updated>2018-09-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published my monthly English blogpost, this time showing how structural data can be represented in th e CLDF format. We plan to publish follow-up posts where we show how this data can be analyzed with network approaches. You can find the blogpost <a href="http://phylonetworks.blogspot.com/2018/09/structural-data-in-historical.html">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-09-24-cldf#2018-09-24-cldf"/><published>2018-09-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-03-algorithm</id><title>A fast algorithm for cognate detection based on matching consonant classes  </title><updated>2018-10-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Monday, I published a new blogpost presenting a fast algorithm for cognate detection using Dolgopolsky's approach of matching consonant classes. The blogpost with an example using LingPy's test sets and basic data structures can be found <a href="https://calc.hypotheses.org/477">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-10-03-algorithm#2018-10-03-algorithm"/><published>2018-10-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-08-predictions</id><title>Prediction experiment on Kho-Bwa language data  </title><updated>2018-10-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>During the last weekend, Tim Bodt, Nathan Hill, and me submitted a preregistration with the Open Science Framework. In this experiment, we used computer-assited techniques to predict the most likely pronunciations for words so far missing in Tim's corpus on Kho-Bwa languages. During fieldwork in November, Tim will try to verify how good our predictions are. The preregistered version of the experiment can be found <a href="https://osf.io/evcbp/">here</a>.</p>
<p>For the prediction, a new method for sound correspondence pattern inference was used, which I developed in close collaboration with Nathan Hill during the last three years. Algorithm and code are now also available, both in a preprint (which you can find <a href="https://www.biorxiv.org/content/early/2018/10/03/434621.full.pdf">here</a>) and with GitHub (<a href="https://github.com/lingpy/lingrex">lingpy/lingrex</a>). </p></div></content><link href="https://calclab.org/?news=2018-10-08-predictions#2018-10-08-predictions"/><published>2018-10-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-10-morphannot</id><title>New blog post on morphological annotation </title><updated>2018-10-10T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published my first blog post on our project weblog, in which I
propose a workflow for enhancing wordlists with morphological
information. You can find the blogpost <a href="https://calc.hypotheses.org/570">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-10-10-morphannot#2018-10-10-morphannot"/><published>2018-10-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-15-promiscuity</id><title>Promiscuity of words  </title><updated>2018-10-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>It is surprising how many of the words in our languages are composed of other words. It is also surprising that linguistics has not yet come up with a term for the fact that it is specifically certain concepts that denote word forms which are then frequently reused throughout the lexicon of a language. I discuss this briefly in my monthly German blog post, titled <a href="https://wub.hypotheses.org/464">Von Wortfamilien und promiskuitiven Wörtern</a>. </p></div></content><link href="https://calclab.org/?news=2018-10-15-promiscuity#2018-10-15-promiscuity"/><published>2018-10-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-17-cldf</id><title>Cross-Linguistic Data Formats  </title><updated>2018-10-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper describing the basic ideas underlying the <a href="https://cldf.clld.org">Cross-Linguistic Data Formats</a> initiative was finally published with the Scientific Data journal: </p>
<blockquote>
<p>The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.</p>
</blockquote>
<p>You can find the paper <a href="https://www.nature.com/articles/sdata2018205">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-10-17-cldf#2018-10-17-cldf"/><published>2018-10-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-19-cfp</id><title>Call for abstracts: Proposal for a Workshop on CALC at SLE 2019   </title><updated>2018-10-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We plan on submitting a workshop proposal for computer-assisted language comparison at the SLE 2019 meeting in Leizpig. You find the detailed call for papers <a href="https://linguistlist.org/issues/29/29-4042.html">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-10-19-cfp#2018-10-19-cfp"/><published>2018-10-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-22-blog</id><title>Blogpost on Structural Data in Historical Linguistics  </title><updated>2018-10-22T12:00:00+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Not all people will agree with my view, but I see the use of structural data as problematic when trying to either do phylogenetic reconstructio or to infer so far undemonstrated genetic relationships among languages. I have summarized my arguments in a blogpost, entitled <a href="http://phylonetworks.blogspot.com/2018/10/controversies-about-structural-data-in.html">Controversies about structural data in historical linguistics.</a>.</p></div></content><link href="https://calclab.org/?news=2018-10-22-blog#2018-10-22-blog"/><published>2018-10-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-10-31-hiphilangsci</id><title>Blogpost on the History of Concept Lists  </title><updated>2018-10-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blogpost appeared, this time presenting some kind of a "making off" of <a href="https://concepticon.clld.org">Concepticon</a>, where I present ideas <a href="https://hiphilangsci.net/2018/10/31/concept-list-compilation/">"Towards a history of concept list compilation in historical linguistics"</a> in the blog <a href="https://hiphilangsci.net/">History and Philosophy of the Language Sciences</a>.
I also formatted the text and submitted a PDF version to Zenodo, which you can find <a href="https://zenodo.org/record/1474751">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-10-31-hiphilangsci#2018-10-31-hiphilangsci"/><published>2018-10-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-11-07-clusters</id><title>Tutorial on inferring consonant clusters with LingPy  </title><updated>2018-11-07T12:00:00+00:00</updated><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today I published a new blog post in our tutorial blog, this time explaining how consonant clusters that recur in the languages of the world can be inferred with help of LingPy, based on data derived from the CLICS database. The tutorial can be found <a href="https://calc.hypotheses.org/998">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-11-07-clusters#2018-11-07-clusters"/><published>2018-11-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-11-09-yunfanarticle</id><title>Yunfan's article on relativisation in Khroskyabs</title><updated>2018-11-09T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My article, entitled <a href="https://halshs.archives-ouvertes.fr/halshs-01907013"><em>Relativisation in Wobzi Khroskyabs and the integration of genitivisation</em></a>, is going to appear in the upcoming issue of <a href="https://benjamins.com/catalog/ltba.17015.lai"><em>Linguistics of the Tibeto-Burman Area</em></a>. In this article, the relativisation strategies in Wobzi Khroskyabs is described, and the historical pathway of the genitive relativisation in this language is hypothesised. You can view this article by clicking on the link above. </p></div></content><link href="https://calclab.org/?news=2018-11-09-yunfanarticle#2018-11-09-yunfanarticle"/><published>2018-11-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-11-12-blog</id><title>Blogpost on language decay  </title><updated>2018-11-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published my monthly German blogpost, this time discussing the question of language decay: <a href="https://wub.hypotheses.org/510">Wer hat Angst vorm Sprachverfall?</a>. </p></div></content><link href="https://calclab.org/?news=2018-11-12-blog#2018-11-12-blog"/><published>2018-11-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-11-19-terminology</id><title>New blog post on semantic aspects of word formation</title><updated>2018-11-19T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published my second blog post on our project weblog. This time I talk about a semasiological approach to the study of word formation and argue for the use of a new term to denote the idea of concept-based type-frequency. You can find the blogpost <a href="https://calc.hypotheses.org/1169">here</a>.</p></div></content><link href="https://calclab.org/?news=2018-11-19-terminology#2018-11-19-terminology"/><published>2018-11-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-11-26-blog</id><title>Blogpost on structural data and accepted paper  </title><updated>2018-11-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Together with Guido Grimm, I have published a new blog post devoted to the question of structural data in historical linguistics and how it should be used. You can find the blog <a href="http://phylonetworks.blogspot.com/2018/11/how-languages-lose-body-parts-once-more.html">here</a>. </p>
<p>Furthermore, I was really pleased when I heard that my paper for the automatic inference of sound correspondence patterns was now officially accepted by the Computational Linguistics journal. I just submitted my final author edits, and uploaded the draft <a href="http://lingulist.de/documents/papers/list-2018-automatic-inference-of-sound-correspondence-patterns.pdf">here</a>. The code has now also been officially released, and you can test the <a href="https://github.com/lingpy/lingrex">lingrex</a> package yourself, if you want.</p></div></content><link href="https://calclab.org/?news=2018-11-26-blog#2018-11-26-blog"/><published>2018-11-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-11-29-papers</id><title>Submitted and accepted papers</title><updated>2018-11-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Together with Nathan W. Hill and Christopher Forster, I submitted a paper on rhyme annotation in Chinese historical phonology and beyond. You can find the draft <a href="http://lingulist.de/documents/papers/list-et-al-2018-rhyme-annotation.pdf">here</a>.</p>
<p>In addition, our paper presenting a database of cross-linguistic transcription systems was now finally accepted for publication with the Yearbook of the Poznań Linguistic Meeting. We now submitted our final version, which is also online available <a href="http://lingulist.de/documents/papers/anderson-et-al-2018-cross-linguistic-transcription-systems.pdf">here</a>. The source code accompanying this paper has now also been released and can be found on GitHub at <a href="https://github.com/cldf/clts">cldf/clts</a>. In addition, you can inspect the data <a href="http://calc.digling.org/clts/">here</a>. </p></div></content><link href="https://calclab.org/?news=2018-11-29-papers#2018-11-29-papers"/><published>2018-11-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-12-10-tutorial</id><title>New blog posts </title><updated>2018-12-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, I published my monthly blog post in German, this time devoted to questions of <a href="https://wub.hypotheses.org/569">open research</a>, titled "Von hupenden Radlern und schludrigen Wissenschaftlern". </p>
<p>Today, another blog post appeared in our blog on tutorials for computer-assisted language comparison, this time devoted to <a href="https://calc.hypotheses.org/1668">Merging datasets with LingPy and the CLDF curation framework</a>.</p></div></content><link href="https://calclab.org/?news=2018-12-10-tutorial#2018-12-10-tutorial"/><published>2018-12-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2018-12-17-sle</id><title>SLE-2019 Workshop on Computer-Assisted Language Comparison </title><updated>2018-12-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I am very pleased to announce that the workshop on computer-assisted language
comparison, which I submitted in November, has been accepted for the annual
<a href="http://sle2019.eu/">SLE conference</a> next year in Leipzig. I just submitted the
final version of our workshop description, which you can also find <a href="http://calc.digling.org/events/abstracts/list-2018-sle-workshop.pdf">here</a>. </p>
<p>Here is the abstract of the workshop:</p>
<blockquote>
<p>The workshop invites papers that deal with computer-assisted (as opposed to pure computational or pure qualitative) approaches to historical and typological language comparison. Computer-assisted approaches are hereby understood as procedures involving different stages of qualitative and quantitative data analysis, ranging from the initial preparation of lexical or structural data, via automatic or manual annotation, up to qualitative or quantitative analysis, that yield a specific result, be it a linguistic reconstruction system linking proto-forms to aligned reflexes, a phylogeny that lists inferred word histories, or tools for exploratory data analysis. By focusing on computer-assisted approaches, we hope to foster a more intensive collaboration between classical and computational linguists. In addition to detailed  descriptions of concrete tasks in historical and typological language comparison, we also encourage submissions dealing with data standards enhancing data sharing and reuse, as well as the presentation of purely qualitative approaches for which no computational solutions exist so far.</p>
</blockquote>
<p>If you are working on topics that seem apt for this workshop, consider applying, by submitting an abstract for the SLE conference, where you specify our workshop (see <a href="http://sle2019.eu/call-for-papers">here</a>).  </p></div></content><link href="https://calclab.org/?news=2018-12-17-sle#2018-12-17-sle"/><published>2018-12-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-01-30-update</id><title>Updates</title><updated>2019-01-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I haven't shared many updates in January, as I was in holiday until the first half of January. In the meantime, however, a couple of blog posts appeared, and I'll quickly summarize them now.</p>
<p>First, already in December, an English blog post on «Patterns, processes, abduction, and consilience» appeared on <a href="">The Genealogical World of Phylogenetic Networks</a>. In this post, I discuss questions of what we can know and what we can infer from the data and the patterns we observe in historical linguistics. The post can be found <a href="http://phylonetworks.blogspot.com/2018/12/patterns-processes-abduction-and.html">here</a>.</p>
<p>Second, earlier in January, I wrote a German blog post, discussing problems of fake news, fiction, and the potential crisis in journalism and science, which you can find <a href="https://wub.hypotheses.org/782">here</a>.</p>
<p>Third, another English post appeared on the phylogenetic networks blog, where I discuss <a href="http://phylonetworks.blogspot.com/2019/01/future-challenges-for-computational.html">Future challenges for computational diversity linguistics</a>. I present 10 different problems, and I will try to comment on each of these 10 problems in more detail during the next 10 months.</p>
<p>Forth, I decided to start a new series of tutorial posts in our <a href="https://calc.hypotheses.org">CALC</a> blog. The idea, presented in the <a href="https://calc.hypotheses.org/1802">introductory post</a> is to create something like a «Primer on automatic inference of sound correspondence patterns», presenting how the algorithm I present in a <a href="http://doi.org/10.1162/coli_a_00344">forthcoming paper</a> can be used in practice.</p></div></content><link href="https://calclab.org/?news=2019-01-30-update#2019-01-30-update"/><published>2019-01-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-02-25-blogposts</id><title>Two more blogposts</title><updated>2019-02-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>One month has past since I have shared news the last time. I have not been idle in the meantime, but did not find time in all my work to share any updates. Besides, I was writing papers, which are now under review, and will be shared online in due time, once I find time to prepare them in form of preprints. </p>
<p>I just would like to announce two more blogposts that I have written in February, the first one, in German, is devoted to potential errors in science which can -- nevertheless -- improve our knowledge, titled <a href="https://wub.hypotheses.org/842">Darwin's Finkenschnäbel und der Nutzen des Irrtums</a>. The second post follows up on my 10 open problems for diversity linguistics, and discusses why the problem of <a href="http://phylonetworks.blogspot.com/2019/02/automatic-morpheme-segmentation-open.html">Automatic Morpheme Segmentation</a> is such a huge problem for historical linguistics, and how it might be that we could tackle it in the future. This blog is titled.</p></div></content><link href="https://calclab.org/?news=2019-02-25-blogposts#2019-02-25-blogposts"/><published>2019-02-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-02-28-primer</id><title>Tutorial Blogpost</title><updated>2019-02-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, I wrote another small blogpost, a primer on automatic sound correspondence pattern inference, or, more properly, a second post discussing the topic, this time showing how data from the <a href="http://alignments.lingpy.org">Benchmark Database of Phonetic Alignments</a> can be harvested and directly analyzed with <a href="http://edictor.digling.org">EDICTOR</a>. The post can be found <a href="https://calc.hypotheses.org/1807">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-02-28-primer#2019-02-28-primer"/><published>2019-02-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-03-14-blogpost</id><title>German Blogpost on Plagiarism</title><updated>2019-03-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Sunday, I published my monthly German blogpost, thist time discussing 
the problem of plagiarism in science: <a href="https://wub.hypotheses.org/854">Von falschen Originalen und echten Kopien</a>. </p></div></content><link href="https://calclab.org/?news=2019-03-14-blogpost#2019-03-14-blogpost"/><published>2019-03-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-03-25-blogpost</id><title>English blogpost on automatic borrowing detection </title><updated>2019-03-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The second problem in my series on open problems in computational diversity linguistics deals with the problem of automatic borrowing detection. While this may not seem to be per se a hard one, I think it is a huge problem speciically because there are not even standardized procedures in classical, qualitative historical linguistics for this task. You can find the blogpost <a href="http://phylonetworks.blogspot.com/2019/03/automatic-detection-of-borrowing-open.html">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-03-25-blogpost#2019-03-25-blogpost"/><published>2019-03-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-03-28-blogpost</id><title>Tutorial blogpost </title><updated>2019-03-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In the third part of a series of tutorial blogposts on the automatic inference of sound correspondence patterns across multilingual wordlists, I present how the Python code of the LingRex software package can be applied to the data of the TPPSR. You can find the post <a href="https://calc.hypotheses.org/1823">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-03-28-blogpost#2019-03-28-blogpost"/><published>2019-03-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-04-01-blogpost</id><title>Blogpost on pyconcepticon </title><updated>2019-04-01T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We published the first of a series of blog posts on how to use pyconcepticon,
both as a library and as a command-line tool, for the semi-automatic
mapping of concept lists to <a href="https://concepticon.clld.org/">Concepticon</a>.
This first blog posts guides the readers through the command line tool,
hinting at the internals of the library that will explored in more detail
in the following post (to be published next week).</p>
<p>You can find the post <a href="https://calc.hypotheses.org/1820">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-04-01-blogpost#2019-04-01-blogpost"/><published>2019-04-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-04-15-blogpost</id><title>German blogpost on translation </title><updated>2019-04-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just published my monthly German blogpost for April, this time discussing questions of translation, specifically literal and adquate translation. You can find the post, titled «Wörtlichkeit, Freiheit, Adäquatheit und die Aufgabe der Übersetzer» <a href="https://calc.hypotheses.org/866">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-04-15-blogpost#2019-04-15-blogpost"/><published>2019-04-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-04-18-concepticon2</id><title>Concepticon 2.0 released</title><updated>2019-04-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>While working during the last weeks, I completely forgot to announce that the Concepticon was now released in its version 2.0, this time comprising as many as 240 different concept lists, and many mappings being refined in contrast to earlier version. Have fun exploring the resource at <a href="https://concepticon.clld.org">https://concepticon.clld.org</a>. </p>
<p>And once I am already announcing this, I also forgot to mention that version 1.2 of the Cross-Linguistic Transcription Systems was now also released, and you will find it at <a href="https://clts.clld.org">https://clts.clld.org</a>. </p>
<p>I am very thankful to all colleagues involved in the preparation of these sources.</p></div></content><link href="https://calclab.org/?news=2019-04-18-concepticon2#2019-04-18-concepticon2"/><published>2019-04-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-04-29-newblog</id><title>New Blogpost and Reference Browser</title><updated>2019-04-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Following up on open problems in computational diversity linguistics, my English blogpost for April now discusses the third problem, the induction of sound laws, which has been largely neglected both in the classical and the computational literature. The post can be found <a href="https://phylonetworks.blogspot.com/2019/04/automatic-sound-law-induction-open.html">here</a>.</p>
<p>In addition, I would like to announce a new tool that I have created recently. I call it EvoRef, and the tool offers currently 4669 distinct quotes (including abstracts and comments) from 2383 different references on topics in historical linguistics, language typology, and evolution. The tool is organized in such a way that many of the references can already be found in <a href="http://bibliography.lingpy.org">EvoBib</a>, although they may be occasionally missing. The tool can be used to search for my specific interpretation of linguistic literature, since it offers the keywords that I give to work I cited. As my original database also contains specific comments and evaluations, which I do not necessarily want to share in public, this official version only offers the raw quotes with comments being hidden. Given the huge number of inter-linked resources, also with occasional translations of non-English resources into English, I hope it will be useful for those interested in topics on language evolution and historical linguistics. You can find the tool at <a href="http://calc.digling.org/evoref">http://calc.digling.org/evoref/</a>. </p></div></content><link href="https://calclab.org/?news=2019-04-29-newblog#2019-04-29-newblog"/><published>2019-04-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-05-07-sinotibetan</id><title>New Paper on Sino-Tibetan Phylogenies in PNAS </title><updated>2019-05-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After four years of hard work, our study on the phylogeny and age of the Sino-Tibetan language family has finally appeared in PNAS. The article, which can be found <a href="https://10.1073/pnas.1817972116">here</a>. A short video, in which I introduce our major findings can be found <a href="https://vimeo.com/333287191/5c68dfb647">here</a>. Our press release presenting some details of the study is available from <a href="https://www.shh.mpg.de/1285923/sino-tibetan-origin">this link</a>, offered also in different translations.</p>
<p>To summarize our findings, here is what the abstract of the paper says:</p>
<blockquote>
<p>The Sino-Tibetan language family is one of the world’s largest
and most prominent families, spoken by nearly 1.4 billion people.
Despite the importance of the Sino-Tibetan languages, their pre-history remains controversial, with ongoing debate about when
and where they originated. To shed light on this debate we
develop a database of comparative linguistic data, apply the linguistic comparative method to identify sound correspondences
and establish cognates. We then use phylogenetic methods to
infer the relationships among these languages and estimate the
age of their origin and homeland. Our findings point to Sino-Tibetan originating with north Chinese millet farmers around
7200 B.P. and suggest a link to the late Cishan and the early
Yangshao cultures.</p>
</blockquote>
<p>The paper was based on a large collaborative effort, involving teams from Paris (Guillaume Jacques and Laurent Sagart from the CRLAO and Robin Ryder and Valentin Thouzeau from the Université Paris-Dauphine) and Jena (Simon J. Greenhill and Yunfan Lai). In addition, many people helped us in collecting the data, which can be freely accessed on <a href="https://zenodo.org/record/2581321">Zenodo</a>, or even directly inspected through the <a href="https://dighl.github.io/sinotibetan/">EDICTOR software</a>.</p>
<p>Apart from the co-authors, I am also very thankful to the numerous contributors who shared data, and to the reviewing process, which was professional, challenging, and extremely fair. </p>
<p>With this study, we hope to contribute to the ongoing debate regarding the origin and spread of the Sino-Tibetan languages. Given that three teams were working in parallel on this question, with one study being published earlier in <a href="https://www.nature.com/articles/s41586-019-1153-z">Nature two weaks ago</a>, and one in preparation (preliminary results will be presented on a <a href="www.iacl27kobe.net/download/Workshop-Abs-collected.pdf">conference this week</a>. </p></div></content><link href="https://calclab.org/?news=2019-05-07-sinotibetan#2019-05-07-sinotibetan"/><published>2019-05-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-05-10-interview</id><title>Radio interview and another paper accepted</title><updated>2019-05-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper in <a href="https://www.pnas.org/content/early/2019/04/30/1817972116/">PNAS</a> on the history of the Sino-Tibetan languages seems to have caught some media attention. As a result, I was asked to give a short interview on the matter, which already appeared on Tuesday in <a href="https://www.deutschlandfunk.de/forschung-aktuell.675.de.html?drbm:date=2019-05-07">Deutschlandfunk</a>, but is still available from their archive (see below on the website).</p>
<p>Furthermore, our paper on rhyme annotation, common work with Nathan W. Hill and Christopher Foster, which has been under review for some time, has now been accepted. Our final draft before it goes to production can be found <a href="http://lingulist.de/documents/papers/list-et-al-2019-rhyme-annotation-chinese-historical-phonology.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-05-10-interview#2019-05-10-interview"/><published>2019-05-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-05-14-blogposts</id><title>Two blogposts and a paper accepted</title><updated>2019-05-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two more blogposts appeared this week. First, I decided to start writing a series on the background of our <a href="https://dighl.github.io/sinotibetan/">Sino-Tibetan Database of Lexical Cognates</a>, which you can find <a href="https://calc.hypotheses.org/1882">here</a>. Second, I published a German blogpost on the importance of baselines and benchmarks (gold standards) for testing and training of algorithms, which you can find <a href="https://wub.hypotheses.org/902">here</a>. In addition, a paper I wrote with Taraka Rama on fast cognate detection and fast phylogenetic reconstruction was now accepted as a long paper for the <a href="http://www.acl2019.org/EN/index.xhtml">ACL</a> conference this year. We'll still have to finalize the paper itself according to reviewer suggestions, but will upload a preprint as early as possible. The code for my fast cognate detection method can be found <a href="https://github.com/lingpy/bipskip">here</a>. </p></div></content><link href="https://calclab.org/?news=2019-05-14-blogposts#2019-05-14-blogposts"/><published>2019-05-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-05-15-blogpost</id><title>New blogpost series on biological methods in linguistics</title><updated>2019-05-15T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week we start a new blogpost series, focusing on metaphors and methods shared by historical linguistics and evolutionary biology. You can find the first post <a href="https://calc.hypotheses.org/1866">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-05-15-blogpost#2019-05-15-blogpost"/><published>2019-05-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-05-27-papers</id><title>Blog post and paper accepted </title><updated>2019-05-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My English blogpost for may discusses phonological reconstruction as my fourth open problem in computational diversity linguistics and can be found <a href="http://phylonetworks.blogspot.com/2019/05/automatic-phonological-reconstruction.html">here</a>. Furthermore, a paper reviewing automated methods for contact inference in historical linguistics has now been officially accepted by Language and Linguistics Compass. You can find my most recent draft <a href="http://lingulist.de/documents/papers/list-2019-automated-contact-inference.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-05-27-papers#2019-05-27-papers"/><published>2019-05-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-06-06-papers</id><title>Two new papers accepted </title><updated>2019-06-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I was very glad when I heard that the paper I wrote with Taraka Rama on "An automated framework for fast cognate detection and Bayesian phylogenetic inference in computational historical linguistics" has been accepted as a long paper for the <a href="https://acl2019.org">ACL 2019</a> conference. Our preprint can be found <a href="https://hcommons.org/deposits/item/hc:24605/">here</a>, and the code for the cognate detection algorithm can be found at <a href="https://github.com/lingpy/bipskip">GitHub/lingpy/bipskip</a>. </p>
<p>At the same time, I was also notified that my paper with Tim Bodt on "Testing the predictive strength of the comparative method: An ongoing experiment on unattested words in Western Kho-Bwa langauges" has also been accepted by the journal <a href="http://journals.ed.ac.uk/pihph">Papers in Historical Phonology</a>. The preprint can be found <a href="http://lingulist.de/documents/papers/bodt-list-2019-predictive-strength-comparative-method.pdf">here</a>, and the code has been registered with the <a href="https://osf.io/evcbp/">Open Science Framework</a>. </p></div></content><link href="https://calclab.org/?news=2019-06-06-papers#2019-06-06-papers"/><published>2019-06-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-06-20-blogpost</id><title>New blog post and papers </title><updated>2019-06-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already on Sunday I published a new German blog post, this time discussing the phenomenon of <em>epenthesis</em> and other sound change types in German and other languages, titled «Wissentschaft und Abenbrot: Einschübe und Aussetzer im Sprachwandel», available <a href="https://wub.hypotheses.org/917">here</a>. </p>
<p>Furthermore, two more drafts of accepted papers have been added to my list of forthcoming papers. The first draft is a comment on a forthcoming paper by Gerhard Jäger in the journal <em>Theoretical Linguistics</em>, discussing questions of comparing reconstruction systems. The draft, titled «Beyond Edit Distances: Comparing linguistic reconstruction systems» can be found <a href="http://lingulist.de/documents/papers/list-2019-beyond-edit-distances.pdf">here</a>. </p>
<p>The second draft is a paper to appear in the <em>Bulletin of Chinese Linguistics</em>, written together with Nathan W. Hill, presenting a new idea of handling Chinese character formation processes in the reconstruction of Old Chinese phonology. This draft, titled «Using Chinese character formation graphs to test proposals in Chinese historical phonology» can be found <a href="http://lingulist.de/documents/papers/hill-list-2019-chinese-character-formation-graphs.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-06-20-blogpost#2019-06-20-blogpost"/><published>2019-06-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-06-25-preprint</id><title>New blog post, new Python library, and new preprint </title><updated>2019-06-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my monthly English blog post appeared, discussing problem number 5 of my list of open problems in computational diversity linguistics, this time devoted to the "Simulation of lexical change", which you can find <a href="http://phylonetworks.blogspot.com/2019/06/simulation-of-lexical-change-open.html">here</a>. </p>
<p>Furthermore, I released a beta-version of the PoePy library, a Python library devoted to quantitative task in the investigation of poetry, available on <a href="https://github.com/lingpy/poepy">GitHub</a>. </p>
<p>Last not least, Justin Power, Guido Grimm, and myself, finally managed to submit our pilot study on sign language evolution, titles "Evolutionary dynamics in the dispersal of sign languages". The preprint can be found <a href="https://hcommons.org/deposits/item/hc:24953/">here</a>. </p></div></content><link href="https://calclab.org/?news=2019-06-25-preprint#2019-06-25-preprint"/><published>2019-06-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-06-27-talkntu</id><title>Talk at National Taiwan University</title><updated>2019-06-27T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Tomorrow, June 28, 2019, Tiago Tresoldi, Mei-Shin Wu, and Nathanael E.
Schweikhard will give a talk titled "Fundamentals of Computer-Assisted Language
Comparison" at the National Taiwan University (NTU), in Taipei (Taiwan).
Introduction to computational methods of language comparison, discussion on the
software, methods, and interfaces developed by the CALC group, as well as
illustrations of data annotation and modeling, will be presented, with a
session for question &amp; answers and pratical demonstrations.</p></div></content><link href="https://calclab.org/?news=2019-06-27-talkntu#2019-06-27-talkntu"/><published>2019-06-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-07-02-newpaper</id><title>New paper on Cross-Linguistic Transcription Systems appeared </title><updated>2019-07-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, our paper on Cross-Linguistic Transcriptoin Systems finally appeared online. In this paper, we explain how we established the <a href="https://clts.clld.org">Cross-Linguistic Transcription Systems</a> (CLTS) database, which links different transcription systems and transcription datasets to a unified set of sounds, which are defined by a feature system. The paper, titled <a href="https://content.sciendo.com/view/journals/yplm/4/1/article-p21.xml">A cross-linguistic database of phonetic transcription systems</a>, coauthored by Cormac Anderson, Tiago Tresoldi, Thiago Chacon, Anne-Maria Fehn, Mary Walworth, Robert Forkel, and myself, introduces the database and also discusses general ideas with respect to standardization efforts and traditions for linguistic transcription systems.</p></div></content><link href="https://calclab.org/?news=2019-07-02-newpaper#2019-07-02-newpaper"/><published>2019-07-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-07-10-releases</id><title>New paper and new releases </title><updated>2019-07-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I released <a href="http://bibliography.lingpy.org">EvoBib</a>, version 0.26, with now 3404 bibliographyic entries, and <a href="http://calc.digling.org/evoref">EvoRef</a>, version 0.3, with now 4835 quotes from 2515 references. In addition to accessing the data via the web interfaces, they can also be downloaded from Zenodo, via <a href="https://zenodo.org/record/3301938">this link for EvoBib</a> and <a href="https://zenodo.org/record/3302056">this link for EvoRef</a>.</p>
<p>In addition, a paper that I wrote almost three years ago with Guillaume Jacques has now finally appeared. This paper, titled «Save the trees» discusses the advantages of tree models in historical linguistics. Due to open access restrictions, I can only offer the preprint of the paper, which has been available for download for quite some time from <a href="http://lingulist.de/documents/papers/jacques-list-2018-save-the-trees-draft.pdf">this link</a>. A refined version can be found <a href="http://alex.francois.online.fr/data/List_Jacques_2019_Save-the-trees_JHL_9-1_print.pdf">here</a>. </p></div></content><link href="https://calclab.org/?news=2019-07-10-releases#2019-07-10-releases"/><published>2019-07-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-07-29-paper</id><title>New paper, new releases, and new blogpost</title><updated>2019-07-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a paper by Taraka Rama and me, titled "An Automated Framework for Fast Cognate Detection and Bayesian Phylogenetic Inference in Computational Historical Linguistics" appeared online in its final version, and you can find it <a href="https://www.aclweb.org/anthology/papers/P/P19/P19-1627/">here</a>. </p>
<p>Furthermore, we released version 2.1 of <a href="https://concepticon.clld.org">Concepticon</a>, now offering concept links to 250 different concept lists.</p>
<p>Finally, a blogpost devoted to the problem of the simulation of sound change, as part of my series on open problems in computational diversity linguistics, appeared today, and you can find it <a href="http://phylonetworks.blogspot.com/2019/07/simulation-of-sound-change-open.html">here</a>. </p></div></content><link href="https://calclab.org/?news=2019-07-29-paper#2019-07-29-paper"/><published>2019-07-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-07-30-ocaf</id><title>Special issue of the OCAF conference published </title><updated>2019-07-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>It is my great pleasure to announce that the Journal of Language Relationship has now published our special issue that reflects some of the work presented in our <a href="http://calc.digling.org/events/ocaf.html">Old Chinese and Friends conference</a>. All articles are freely available and can be found <a href="http://www.jolr.ru/">here</a>. I am very thankful to George Starostin for the fantastic work done as an editor of this issue, with all articles being thoroughly checked and adjusted to journal style guides by him, and to Yunfan Lai for help in organizing both the conference and submission and reviews.</p></div></content><link href="https://calclab.org/?news=2019-07-30-ocaf#2019-07-30-ocaf"/><published>2019-07-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-08-13-interview</id><title>Interview on Juggling and Science </title><updated>2019-08-13T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>An interview, in which I talk about juggling and science, appeared three days
ago in the Chinese online journal
<a href="http://zhishifenzi.com/depth/character/6622.html">Zhìshifènzi</a>
(<em>intellectuals</em>). When reading this interview with help of automated
translation, it nicely illustrates the limitations that computational
approaches still have. My Chinese name Yóuhán 游函, which I chose back in 2005
because the pronunciation comes so close to my first name, is consequently
translated as ``travel letter'' or similar, because the name is not
recognizable as a standard name by the translation software. If you check the
same interview (but with few errors in the text already corrected) on the
<a href="https://mp.weixin.qq.com/s/b1rZqObj1AAg6_ZRtHc0gQ">We-Chat platform</a>, you can
also see a recent video in which I perform the pirouette with five clubs in a
gym in Berlin. While this has nothing to do with science, I see it as one of the factors that allow me to pursue my research: Juggling is excellent for preventing pain in the back, resulting from sitting for too long a time in front of a computer. Therefore, the more I juggle in my free time, the more I can sit and program in my working time.</p></div></content><link href="https://calclab.org/?news=2019-08-13-interview#2019-08-13-interview"/><published>2019-08-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-08-19-blogposts</id><title>New Blog Posts and Workshop</title><updated>2019-08-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday and today, I published two new blogposts. My German blogposts deals with predictions in the humanities, and specifically in linguistics, titled <a href="https://wub.hypotheses.org/940">«Und nun zur Wörtervorhersage...»: Vorhersagen in der Sprachwissenschaft</a>. The other blogpost is a tutorial on alignment analyses with <a href="http://lingpy.org">LingPy</a> and custom scoring functions based on the <a href="https://clts.clld.org">CLTS</a> feature system, titled <a href="https://calc.hypotheses.org/1962">Feature-Based Alignment Analyses with LingPy and CLTS (1)</a>, and will be followed up by one or two more posts which present a full-fledged algorithm devoted to the topic.</p>
<p>Furthermore, our workshop on «Computer-assisted approaches in historical and typological language comparison» will be organized as part of the annual conference of the <a href="https://sle2019.eu">SLE (2019)</a>. For those interested in the specific speakers of this workshop, I made a small <a href="http://calc.digling.org/events/sle.html">workshop website</a> which shares the abstract, gives some information on the full description of the workshop, and also summarizes the speakers, their titles, and provides direct links to their abstracts at the official SLE website. </p></div></content><link href="https://calclab.org/?news=2019-08-19-blogposts#2019-08-19-blogposts"/><published>2019-08-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-08-26-article</id><title>New article and blog post and past workshop </title><updated>2019-08-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My article, discussing currently available methods for the automated studying
of language contact, has now appeared in the journal <em>Language and Linguistics
Compass</em>, titled <a href="https://doi.org/10.1111/lnc3.12355">Automated methods for the investigation of language contact
situations, with a focus on lexical
borrowing</a>. Unfortunately, I could not
afford the high costs for direct open access with the publisher. As a result, I
placed my final version before copy-editing on <a href="https://hcommons.org/deposits/download/hc:26184/CONTENT/list-2019-automated-methods-contact-inference-preprint.pdf">Humanities
Commons</a>.</p>
<p>Furthermore, I managed to stick to my self-made promise and discuss the seventh
problem of computational diversity linguistics in the eighth month of the year.
This month, I discuss questions regarding the <a href="http://phylonetworks.blogspot.com/2019/08/statistical-proof-of-language.html">Statistical proof of language
relationship</a>.</p>
<p>Last not least, our workshop on <a href="http://calc.digling.org/events/sle.html">Computer-assisted approaches to historical and
typological language comparison</a>,
which was held last week, organized as part of the <a href="https://sle2019.org">Annual Meeting of the
SLE</a> in Leipzig, turned out to be very nice, with a lot of
presentations devoted to very different aspects of computer-assisted research.</p></div></content><link href="https://calclab.org/?news=2019-08-26-article#2019-08-26-article"/><published>2019-08-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-09-01-interview</id><title>Interview on Computer-Assisted Language Comparison with SysBlok.ru </title><updated>2019-09-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, the Russian portal <a href="https://sysblok.ru">Sistemny Blok</a> published an
interview, in which we discussed computer-based, computer-assisted, and general
language comparison, as well as the benefits of juggling for doing science. The
interview, which was conducted in English and then translated to Russian, can
be found
<a href="https://sysblok.ru/interviews/obedinit-klassicheskih-filologov-i-specialistov-po-cifre/">here</a>.
I am very thankful to all involved in this, specifically Mariana Zorkina, who
interviewed me. Furthermore, this enterprise helped me to find a new scientific
blog with very interesting content, and I recommend to all who read Russian to
have a look at the SysBlok.ru.</p></div></content><link href="https://calclab.org/?news=2019-09-01-interview#2019-09-01-interview"/><published>2019-09-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-09-20-blogposts</id><title>New Blog Posts in the CALC Blog </title><updated>2019-09-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In this week, we published two new blog posts in our
<a href="https://calc.hypotheses.org">CALC</a> blog, both follow-ups from series that were
started earlier in this year, one by myself, devoted to <a href="https://calc.hypotheses.org/1971">Feature-Based
Alignment Analyses with LingPy and CLTS (2)</a>,
and one by Nathanael E. Schweikhard, discussing <a href="https://calc.hypotheses.org/1951">Biological metaphors and
methods in historical linguistics (2): Words and
genes</a>. </p>
<p>Additionally, Tim Bodt, who was visiting our group already several times, has now published all material from his trip to Nepal, during which he collected material on the Kusunda language. 
This trip was to a small part supported by our project, and Tim thanked us for the support by collecting a 250-item wordlist of Kusunda that we can compare with our database on Sino-Tibetan langauges.
You can find a summary of the work (published by Aaley and Bodt), along with all links to the original data, which is free for download, <a href="http://dx.doi.org/10.17613/1zy2-k376">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-09-20-blogposts#2019-09-20-blogposts"/><published>2019-09-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-10-02-blogpost</id><title>New Blog Posts on Open Problems in Computational Diversity Linguistics </title><updated>2019-10-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In this week, a new blogpost, discussing my 8th problem of computational diversity linguistics appeared, this time focusing on the <a href="http://phylonetworks.blogspot.com/2019/09/typology-of-semantic-change-open.html">typology of semantic change</a>. </p></div></content><link href="https://calclab.org/?news=2019-10-02-blogpost#2019-10-02-blogpost"/><published>2019-10-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-10-14-news</id><title>New Releases</title><updated>2019-10-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, there are news with respect to new releases to share. First, there is another blogpost in German, introducing the <a href="http://bibliography.lingpy.org">EvoBib</a> reference browser, which offers references and citations for more than 3000 articles and books, related to historical linguistics, language contact, and linguistic typology, and was now officially released. The <a href="https://wub.hypotheses.org/963">blogpost</a>, titled «Wissensmanagment» quickly introduces the tool, and the tool itself can be browsed, or also downloaded on Zenodo or <a href="https://github.com/lingpy/evobib">GitHub</a>. </p>
<p>Furthermore, we released LingPy, version 2.6.5, which is now officially <a href="http://lingpy.org">available</a> through all typical channels. </p></div></content><link href="https://calclab.org/?news=2019-10-14-news#2019-10-14-news"/><published>2019-10-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-10-29-blogpost</id><title>New Blogpost </title><updated>2019-10-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, my 9th blogpost on Open Problems in Computational Diversity Linguistics appeared, this time discussing the problems of establishing a typology of sound change processes. The blogpost, which is rather long this time, can be found <a href="http://phylonetworks.blogspot.com/2019/10/typology-of-sound-change-open-problems.html">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-10-29-blogpost#2019-10-29-blogpost"/><published>2019-10-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-10-30-calcblogpost</id><title>New Blogpost </title><updated>2019-10-30T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday I published a new post in our blog, discussing the importance and the
advantages of the new approach to linguistic data that we constantly proposing.
I illustrate such advantages by describing how I could reuse the data from
CLICS, itself reusing data from Lexibank, to build a simple matrix and graph of
semantic distance. The post can be found
<a href="https://calc.hypotheses.org/1980">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-10-30-calcblogpost#2019-10-30-calcblogpost"/><published>2019-10-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-11-12-blogpost</id><title>Concepticon 2.2 and a new blogpost</title><updated>2019-11-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, we announced version 2.2 of the <a href="https://concepticon.clld.org">Concepticon</a>. The new version now includes as many as 275 different concept lists linked to our unified concept sets.</p>
<p>I also wrote a new German blogpost, this time about the review process, in which I ask "Wer begutachted eigentlich die Gutachter?" (<em>Who reviews the reviewer, after all?</em>), and which you can find <a href="https://wub.hypotheses.org/980">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-11-12-blogpost#2019-11-12-blogpost"/><published>2019-11-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-11-18-clics3</id><title>CLICS-3</title><updated>2019-11-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, we published the third big version of our Database of Cross-Linguistic Colexifications, CLICS³, available at <a href="https://clics.clld.org">clics.clld.org</a>. In this version, we managed to double the number of languages and we also drastically increased the number of concepts. Many people helped in different ways to acquire the data. In order to make sure we acknowledge all of them, we prepared a <a href="https://github.com/clics/clics/blob/master/CONTRIBUTORS.md">CONTRIBUTORS.MD</a> file on GitHub, in which you can see past and present editors, as well as all who have helped to contribute to the collection of the data of CLICS. Many thanks to all who helped to establish CLICS, in the past, and specifically also for version 3.</p></div></content><link href="https://calclab.org/?news=2019-11-18-clics3#2019-11-18-clics3"/><published>2019-11-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-11-20-blogpost</id><title>New Blog Post</title><updated>2019-11-20T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today I published the third blog post in our series on biological metaphors in linguistics. This time I am contrasting the processes involved in language change and in genetic evolution which cause differences or similarities between related or unrelated words or genes. You can read the blog post <a href="https://calc.hypotheses.org/2000">here</a>.</p></div></content><link href="https://calclab.org/?news=2019-11-20-blogpost#2019-11-20-blogpost"/><published>2019-11-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-11-25-news</id><title>Blogposts and other things</title><updated>2019-11-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week was a very busy week, with papers that had to be prepared and talks that had to be presented. Nevertheless, we managed to put the blogposts from 2018 online and share them with <a href="https://hcommons.org">Humanities Commons</a>. First the contributions to our blog on <a href="https://calc.hypotheses.org">Computer-Assisted Language Comparison in Practice</a> are now available <a href="http://dx.doi.org/10.17613/vxm7-aa52">here</a>, second, my contributions to the <a href="https://phylonetworks.blogspot.com">Genealogical World of Phylogenetic Networks</a> for 2018 can be retrieved from <a href="http://dx.doi.org/10.17613/3vnd-1b81">this link</a>.</p>
<p>In addition, I managed to submit a study on inter-linear-glossed text and our attempts to retro-standardize linguistic data. This study, carrried out together with Nathaniel A. Sims, titled "Towards a sustainable handling of inter-linear-glossed text in language documentation" has now also be posted on Humanities Commons in form of a preprint and can be found <a href="http://dx.doi.org/10.17613/gscz-mb13">here</a>.</p>
<p>Last not least, today appeared my 11th blogpost in the Genealogical World of Phylogenetic networks, this time dealing with the 10th (and last) problem in computational diversity linguistics. This post, discussing the "Typology of semantic promiscuity" can be found <a href="http://phylonetworks.blogspot.com/2019/11/typology-of-semantic-promiscuity-open.html">here</a>. </p></div></content><link href="https://calclab.org/?news=2019-11-25-news#2019-11-25-news"/><published>2019-11-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-12-17-news</id><title>News before the end of the year</title><updated>2019-12-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Although quite a few things happened recently, I did not find much time to update the news feed, and I was surprised that my last update was made in November. </p>
<p>Anyway, yesterday, two blogposts appeared, my final post on the series on 
<em>Open Problems in Computational Diversity Linguistics</em> (available <a href="http://phylonetworks.blogspot.com/2019/12/open-problems-in-computational.html">here</a>), and a German blog post that elaborates about the topic <em>patience</em> and how much of impatience is needed in scientific research (<a href="https://t.co/yEIHpLLTmV?amp=1">Trotz der Ungeduld: Jonglieren im Wind</a>).</p>
<p>In the week before, a paper discussing how to compare reconstruction systems appeared, titled "Beyond Edit Distances: Comparing linguistic reconstruction systems". The preprint is available online <a href="https://hcommons.org/deposits/item/hc:27897/">here</a>.</p>
<p>Two papers were accepted by now: our paper on CLICS, Version 3, with Scientific Data and many coauthors (preprint <a href="https://doi.org/10.17613/5awv-6w15">here</a>), and our paper on Sign language evolution with Justin Power and Guido Grimm with Royal Society Open Science (preprint <a href="https://hcommons.org/deposits/item/hc:24953">here</a>).</p>
<p>These are all the news for December, but it is possible that there will still be updates during this month.</p></div></content><link href="https://calclab.org/?news=2019-12-17-news#2019-12-17-news"/><published>2019-12-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2019-12-20-science</id><title>New Paper on Emotion and Colexification in Science </title><updated>2019-12-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The <a href="https://clics.clld.org">Database of Cross-Linguistic Colexifications</a> is one of the most prominent outputs of the CALC project and the Department of Linguistic and Cultural Evolution, since it combines our interest 
in standardization, aggregation of cross-linguistic lexical data, graph-based approaches for exploratory data analysis, and interactive visualization tools. </p>
<p>Thanks to a very fruitful collaboration with psychologists from the University of North Carolina, 
it could now also be shown that the data in CLICS has the potential to provide essential evidence for questions related to human cognition. The study shows that emotion semantics vary across language families, but that there is a certain common core of similarities that can be used as an explanandum of certain structures across all cultures:</p>
<blockquote>
<p>Many human languages have words for emotions such as “anger” and “fear,” yet it is not clear whether
these emotions have similar meanings across languages, or why their meanings might vary. We
estimate emotion semantics across a sample of 2474 spoken languages using “colexification”—a
phenomenon in which languages name semantically related concepts with the same word. Analyses
show significant variation in networks of emotion concept colexification, which is predicted by
the geographic proximity of language families. We also find evidence of universal structure in emotion
colexification networks, with all families differentiating emotions primarily on the basis of hedonic
valence and physiological activation. Our findings contribute to debates about universality and
diversity in how humans understand and experience emotion.</p>
</blockquote>
<p>The paper titled <a href="https://science.sciencemag.org/content/366/6472/1517">Emotion semantics show both cultural variation and universal structure</a> by Joshua Jackson, Joseph Watts, Teague Henry, myself, Peter Mucha, Robert Forkel, Simon Greenhill, Russell Gray, and Kristen Lindquist has has now appeared in Science.</p></div></content><link href="https://calclab.org/?news=2019-12-20-science#2019-12-20-science"/><published>2019-12-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-01-13-clics</id><title>New paper introducing the CLICS database and a new blogpost</title><updated>2020-01-13T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new paper by our group and colleagues from the DLCE appeared in the journal Scientific Data, in which we present the third installment of our <a href="https://clics.clld.org">CLICS</a> database.</p>
<blockquote>
<p>Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world’s languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.</p>
</blockquote>
<p>The paper, titled "The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies" which involves a lot of co-authors and particularly many people from our CALC team, can be found <a href="https://www.nature.com/articles/s41597-019-0341-x">here</a>.</p>
<p>In addition, I published a new German blogpost in which I discuss the Sapir-Whorf hypothesis in the light of cross-linguistic data, which you can find <a href="https://wub.hypotheses.org/1049">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-01-13-clics#2020-01-13-clics"/><published>2020-01-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-01-22-paper</id><title>New paper on sign languages and new blogpost on concept mapping </title><updated>2020-01-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new paper by Justin Power, Guido Grimm, and myself appeared, discussing the dispersal of sign language manual alphabets:</p>
<blockquote>
<p>The evolution of spoken languages has been studied since the mid-nineteenth century using traditional historical comparative methods and, more recently, computational phylogenetic methods. By contrast, evolutionary processes resulting in the diversity of contemporary sign languages (SLs) have received much less attention, and scholars have been largely unsuccessful in grouping SLs into monophyletic language families using traditional methods. To date, no published studies have attempted to use language data to infer relationships among SLs on a large scale. Here, we report the results of a phylogenetic analysis of 40 contemporary and 36 historical SL manual alphabets coded for morphological similarity. Our results support grouping SLs in the sample into six main European lineages, with three larger groups of Austrian, British and French origin, as well as three smaller groups centring around Russian, Spanish and Swedish. The British and Swedish lineages support current knowledge of relationships among SLs based on extra-linguistic historical sources. With respect to other lineages, our results diverge from current hypotheses by indicating (i) independent evolution of Austrian, French and Spanish from Spanish sources; (ii) an internal Danish subgroup within the Austrian lineage; and (iii) evolution of Russian from Austrian sources.</p>
</blockquote>
<p>The paper, titled "Evolutionary dynamics in the dispersal of sign languages" <a href="https://royalsocietypublishing.org/doi/full/10.1098/rsos.191100">here</a>.</p>
<p>In addition, I published a new tutorial blogpost in which I show how large datasets can be easily linked to our Concepticon data, which you can find <a href="https://calc.hypotheses.org/2250">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-01-22-paper#2020-01-22-paper"/><published>2020-01-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-01-29-beginnersguide</id><title>New beginner's guide for Concepticon contribution </title><updated>2020-01-29T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today I published a blog post that contains step-by-step instructions for adding concept lists to Concepticon. The goal of this post is to give helpful tips for the contribution process in our project. The post can be found <a href="https://calc.hypotheses.org/2225">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-01-29-beginnersguide#2020-01-29-beginnersguide"/><published>2020-01-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-02-05-blogpost</id><title>New blogpost on emotion concepts </title><updated>2020-02-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already on Monday last week, a new blog post appeared in which I discussed the Sapir-Whorf hypothesis in the light of the article on emotion concepts which <a href="https://doi.org/10.1126/science.aaw8160">appeared in December</a> last year.</p>
<p>The blog post, titled "From words to deeds" can be found <a href="http://phylonetworks.blogspot.com/2020/01/from-words-to-deeds.html">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-02-05-blogpost#2020-02-05-blogpost"/><published>2020-02-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-02-05-dataworkshop</id><title>Workshop on Reproducible Research and Data Management</title><updated>2020-02-05T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Johann-Mattis List and I were involved in a successful workshop
on <a href="https://www.shh.mpg.de/1613304/reproducible-research-and-data-management-workshop.html">Reproducible Research and Data
Management</a>
that took place last week
at the Max-Planck-Institute for the Science of Human History in Jena.</p>
<p>Open to entire academic community, we collaborated with collegues from our
department and from the Department of Archeogenetics
to introduce participants to command-line usage and Bash, Git and
GitHub, and reproducible research in general. The linguistic sessions
focused on the reference catalogs used for most of our research
(<a href="https://glottolog.org">Glottolog</a>,
<a href="https://concepticon.clld.org/">Concepticon</a>,
and <a href="https://clts.clld.org/">CLTS</a>), on Lexibank, and on orthographic 
profiles, ending with a hands-on session on Lexibank.
Christoph Rzymski provided unvaluable help, 
teaching Git and explaining the rationale for CSV(W) and CLDF,
and we were joined by Simon J. Greenhill when
presenting Lexibank to the general public.</p>
<p>Our slides will be put online in the next days, linked from the <a href="https://rrdm-shh.github.io/#/">Workshop's
page</a>. The first presentation is <a href="https://speakerdeck.com/tresoldi/linguistic-catalogs-and-dlce-tools">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-02-05-dataworkshop#2020-02-05-dataworkshop"/><published>2020-02-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-02-07-title</id><title>Sign language research featured on title page of Süddeutsche Zeitung </title><updated>2020-02-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We knew that an article would feature our research on sign language evolution in the Süddeutsche Zeitung. But when we saw today that it appeared even on the first page, we were really surprised. 
Unfortunately, the article is not yet online available, so we cannot link it here, but it seemed interesting enough to mention this.</p>
<p>Further good news are that the CNRS 2020 Summer School on "Semantic shifts from lexicon to grammar – diachronic and typological perspectives" was accepted and will be held on the island of Porquerolles in the south of France from 14th to 25th September 2020.
I myself will teach a two day workshop on computational methods. More information can be found on the official <a href="https://semanticshifts.sciencesconf.org/">website</a>.</p></div></content><link href="https://calclab.org/?news=2020-02-07-title#2020-02-07-title"/><published>2020-02-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-02-10-blog</id><title>German blog post for February </title><updated>2020-02-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just wrote my monthly German blog post for February, this time devoted to the question of language universals, the language faculty, and our work on sign language evolution. You can find the post <a href="https://wub.hypotheses.org/1080">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-02-10-blog#2020-02-10-blog"/><published>2020-02-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-02-26-news</id><title>More blog posts and a new paper </title><updated>2020-02-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two more blogposts appeared in this week, and one more paper was accepted and is available in form of its author version. 
First, Justin Power, Guido Grimm, and myself wrote a summary responding to critics on our paper on <a href="">sign language evolution</a>, which is titled "How should one study language evolution" and can be found <a href="http://phylonetworks.blogspot.com/2020/02/how-should-one-study-language-evolution.html">here</a>. 
Second, I wrote a short blog post showing how the data of our <a href="https://clics.clld.org">CLICS</a> studies can be converted to a concept list for our <a href="https://concepticon.clld.org">Concepticon</a> project, which you can find <a href="https://calc.hypotheses.org/2362">here</a>.
Third, a paper in which I discuss how one can improve data handling and analysis when studying rhyme patterns, was accepted for publication in the Cahiers de Linguistique Asie Orientale. 
The paper, titled "Improving data handling and analysis in the study of rhyme patterns" is now also available in form of the authors copy, submitted to <a href="https://doi.org/10.17613/v88m-2608">Humanities Commons</a>. </p></div></content><link href="https://calclab.org/?news=2020-02-26-news#2020-02-26-news"/><published>2020-02-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-03-09-news</id><title>New paper accepted and new version of EvoBib</title><updated>2020-03-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper by Robert Forkel and myself has been accepted for publication. The study presents the <a href="https://github.com/cldf/cldfbench">CLDFbench package</a> and illustrates how it can be used in order to convert datasets conveniently into the <a href="https://cldf.clld.org">CLDF</a> format. 
While the paper will only appear in May, we have uploaded our authors' copy in form of a preprint with the Humanities Commons repository, where you can find it under <a href="http://doi.org/10.17613/8t0e-w639">this link</a>.</p>
<p>Additionally, I managed to release a new version of <a href="https://digling.org/evobib">EvoBib</a>, Version 1.1, which now contains about 100 bibliographic entries more than the previous version and also about 300 more quotes (mostly abstracts).</p></div></content><link href="https://calclab.org/?news=2020-03-09-news#2020-03-09-news"/><published>2020-03-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-03-10-news</id><title>New paper accepted </title><updated>2020-03-10T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper by myself was published lsat month. It introduced the
<a href="https://pypi.org/project/dafsa">DAFSA</a> project, a Python library for
generating graphs over collections of sequences that highlight recurring and
redundant information. I have been using it to experiment with morphological
detection in low-resource languages. The paper is
available <a href="https://joss.theoj.org/papers/10.21105/joss.01986#">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-03-10-news#2020-03-10-news"/><published>2020-03-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-03-20-pyigt</id><title>New paper accepted and a new blog post</title><updated>2020-03-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper by Nathaniel Sims, Robert Forkel and myself has been accepted for publication. The study, titled "Towards a sustainable handling of interlinear-glossed text in language
documentation" presents a computer-assisted approach to study interlinear-glossed text within the computational frameworks set up along with the <a href="https://cldf.clld.org">Cross-Linguistic Data Formats</a> initiative. 
Our authors' version can be found <a href="https://doi.org/10.17613/gscz-mb13">here</a>.</p>
<p>Additionally, I wrote a new German blogpost, this time dealing with the evolution of personal names. This post, titled "Evolution unchained: Die Entwicklung von Personennamen und die Grenzen der Sequenzen", can be found <a href="https://wub.hypotheses.org/1124">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-03-20-pyigt#2020-03-20-pyigt"/><published>2020-03-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-03-26-news</id><title>New blog posts and new annotation tool</title><updated>2020-03-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two new blogposts have appeared in this week, completing the typical series of blog posts for March. The first post is an English version of my German post published earlier this month, dealing with the evolution of personal names. Titled "Evolution unchained: The development of person names and the limits of sequences" <a href="http://phylonetworks.blogspot.com/2020/03/evolution-unchained-development-of.html">here</a>.</p>
<p>The second post presents a new rhyme annotation tool, called <a href="https://digling.org/calc/rhyant/">RhyAnT</a>, which I managed to prepare in a first draft version. This post, published with our blog on <a href="https://calc.hypotheses.org">Computer-Assisted Language Comparison in Practice</a> can be found <a href="https://calc.hypotheses.org/2380">here</a>. </p>
<p>The rhyme annotation tool itself is still in flux, although a first draft version is already available at <a href="https://digling.org/calc/rhyant">https://digling.org/calc/rhyant/</a>. I hope to finish a stable version soon, so we can start on working towards a cross-linguistic database of rhymed poetry.</p></div></content><link href="https://calclab.org/?news=2020-03-26-news#2020-03-26-news"/><published>2020-03-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-04-01-paper</id><title>New paper submitted on annotating etymological data as word trees</title><updated>2020-04-01T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, Mattis List and me submitted a paper in which we present a framework for the annotation of etymological relationships in a human- and machine-readable fashion. The preprint can be accessed <a href="https://hcommons.org/deposits/item/hc:29285/">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-04-01-paper#2020-04-01-paper"/><published>2020-04-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-04-07-news</id><title>New paper accepted on a workflow for computer-assisted language comparison</title><updated>2020-04-07T12:00:00+00:00</updated><author><name>@macyl</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, an article by Johann-Mattis List, Timotheus A. Bodt, Nathan W. Hill, Nathanael E. Schweikhard, and myself was accepted by the Journal of Open Humanities Data. In the article, we present a workflow which lifts raw data to a stage where algorithms for computer-assisted language comparison can detect sound correspondence patterns across several languages. At every stage, the data can be interactively inspected and even be modified, which makes this workflow truly computer-assisted. We also provide a tutorial, in which we show how to run the code and how to inspect or edit the data at all stages. The authors copy, which we submitted to Humanities Commons, can be accessed <a href="https://hcommons.org/deposits/item/hc:29377/">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-04-07-news#2020-04-07-news"/><published>2020-04-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-04-14-blogposts</id><title>New blog posts</title><updated>2020-04-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I have published a new German blog post during the last weekend, titled "Was sich reimt, das frisst sich". In this post, I discuss the chances and challenges involved in the systematic annotation of rhymed poetry across languages, genres, and times. You can find the post <a href="https://wub.hypotheses.org/1149">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-04-14-blogposts#2020-04-14-blogposts"/><published>2020-04-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-04-21-news</id><title>New blog post on a CLDF dataset of the Kusunda language</title><updated>2020-04-21T12:00:00+00:00</updated><author><name>M.-S. Wu</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I have published a new blog post with the title "New Lexical Data for the
Kusunda Language" on Monday this week. In this post, I mention the challenges
of collecting second-hand Kusunda lexical data and present a new Kusunda
dataset which is available in CLDF format
<a href="https://github.com/lexibank/aaleykusunda">online</a>. You can find the post
<a href="https://calc.hypotheses.org/2446">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-04-21-news#2020-04-21-news"/><published>2020-04-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-05-15-newposts</id><title>Research in the news and new blog posts</title><updated>2020-05-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In the June issue of Psychologie Heute, which was available from Wednesday on, there is a report on the work on on emotion concepts (<a href="">Jackson et al. 2019</a>) with help of our database of cross-linguistic colexifications (<a href="">Rzymski et al. 2020</a>). The article can also be found <a href="">online</a>, but it is not freely available without subscription.</p>
<p>During last week, I found time to prepare two more blog posts for May, one German post concentrating on scientific practice and some general ideas on open research within the humanities, titled "Was Wissen schafft, wird festgestellt: Gedanken zur offenen Forschung", and online available <a href="">here</a>. A second blog post was devoted to an exploration of semantic similarity as it is represented and handled in the STARLING software package. This post, which can be found <a href="">here</a>, is accompanied by a Python software package called <a href="https://github.com/lingpy/pysen/">pysen</a>, and an interactive online application which you can find <a href="https://digling.org/sense/">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-05-15-newposts#2020-05-15-newposts"/><published>2020-05-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-05-18-newpaper</id><title>New Paper on CLDFBench Appeared </title><updated>2020-05-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A paper by Robert Forkel and myself, introducing "CLDFBench. Give your Cross-Linguistic data a lift" has just appeared officially as part of the (now only digital) LREC conference. 
Here is the abstract:</p>
<blockquote>
<p>While the amount of cross-linguistic data is constantly increasing, most datasets produced today and in the past cannot be considered
FAIR (findable, accessible, interoperable, and reproducible). To remedy this and to increase the comparability of cross-linguistic resources, it is not enough to set up standards and best practices for data to be collected in the future. We also need consistent workflows
for the “retro-standardization” of data that has been published during the past decades and centuries. With the Cross-Linguistic Data Formats initiative, first standards for cross-linguistic data have been presented and successfully tested. So far, however, CLDF creation was
hampered by the fact that it required a considerable degree of computational proficiency. With cldfbench, we introduce a framework
for the retro-standardization of legacy data and the curation of new datasets that drastically simplifies the creation of CLDF by providing
a consistent, reproducible workflow that rigorously supports version control and long term archiving of research data and code. The
framework is distributed in form of a Python package along with usage information and examples for best practice. This study introduces
the new framework and illustrates how it can be applied by showing how a resource containing structural and lexical data for Sinitic
languages can be efficiently retro-standardized and analyzed.</p>
</blockquote>
<p>The paper can be found <a href="http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.864.pdf">here</a>, and the code itself is hosted with GitHub at <a href="https://github.com/cldf/cldfbench">cldf/cldfbench</a>. </p></div></content><link href="https://calclab.org/?news=2020-05-18-newpaper#2020-05-18-newpaper"/><published>2020-05-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-05-25-newpaper</id><title>New paper presents a workflow for Computer-Assisted Language Comparison</title><updated>2020-05-25T12:00:00+00:00</updated><author><name>M.-S. Wu</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, we published a new paper "Computer-Assisted Language Comparison: State of the Art" in the Journal of Open Humanities Data.
In this paper, we demonstrate our current five-stage workflow for
computer-assisted language comparison which lifts raw data to a level where
sound correspondence patterns across multiple languages have been identified
and can be readily presented, inspected, and discussed.  The paper can be found
<a href="https://openhumanitiesdata.metajnl.com/article/10.5334/johd.12/">here</a>, the
code can be found in the GitHub repository
<a href="https://github.com/lingpy/workflow-paper">lingpy/workflow-paper</a>. To see the
real-time executation of the workflow, please visit our <a href="https://codeocean.com/capsule/8178287/tree/v2">Code Ocean
capsule</a>.</p></div></content><link href="https://calclab.org/?news=2020-05-25-newpaper#2020-05-25-newpaper"/><published>2020-05-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-05-26-blogpost</id><title>New blogpost on rhyme networks </title><updated>2020-05-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, the second blogpost out of a series of six blogposts devoted to the construction of rhyme networks planned for the next months appeared. It discusses rhyming in general and can be found <a href="http://phylonetworks.blogspot.com/2020/05/general-remarks-on-rhyming-from-rhymes.html">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-05-26-blogpost#2020-05-26-blogpost"/><published>2020-05-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-06-02-newstudies</id><title>New Papers Appeared </title><updated>2020-06-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two more studies have recently appeared. The first is study on Chinese character formation graphs, together with Nathan W. Hill:</p>
<blockquote>
<p>This paper proposes the use of network techniques in the exploration of Old Chinese phonology as reflected in the phonophoric determinatives of xiéshēng 諧聲 characters. We use the approach to examine five specific proposals in Chinese historical phonology, and whether the distinctions suggested by these proposals can be said to be recoverable on the basis of phonophoric choice. The major finding is that the type A versus type B distinction is in some cases encoded in the choice of phonophoric determinative, while other distinctions are only spuriously if at all reflected in the phonophoric subseries.</p>
</blockquote>
<p>The paper is available in Open Access and can be found <a href="https://brill.com/view/journals/bcl/12/2/article-p186_186.xml">here</a>.</p>
<p>The second study is a popular science article about the experiments on prediction in historical linguistics, which Tim Bodt and me have started some two years ago. The article is <a href="https://cloud.3dissue.com/18743/41457/106040/issue31may/">available</a> (but only for subscribers) in the journal <em>Babel: The Language Magazin</em>, but you can find our authors' copy <a href="https://doi.org/10.17613/m688-4b90">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-06-02-newstudies#2020-06-02-newstudies"/><published>2020-06-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-06-03-blogpost</id><title>New blog post on markup for lexical data</title><updated>2020-06-03T12:00:00+00:00</updated><author><name>I. Chechuro</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I have published a new blog post titled "Why Tag Markup may be Useful for
Lexical Data". In this post, I discuss the benefits of using tag-based semantic
markup instead of category-based one and propose a relatively simple way to
create a tag markup by aggregating the existing categorisations provided in the
data. You can find the post <a href="https://calc.hypotheses.org/2476">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-06-03-blogpost#2020-06-03-blogpost"/><published>2020-06-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-06-07-blogpost</id><title>New Blogpost </title><updated>2020-06-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday evening I found time to write a new German blogpost, devoted to the multiple meanings of the word <em>machen</em> in German. This post, which is not meant to be entirely serious, titled <em>Neues zum Wortfeld »machen«</em> is online available <a href="https://wub.hypotheses.org/1174">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-06-07-blogpost#2020-06-07-blogpost"/><published>2020-06-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-06-10-newstudy</id><title>New study appeared on rhyme data handling and analysis</title><updated>2020-06-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new study in which I discuss the handling of rhyme data and rhyme analysis appeared in Cahiers de Linguistique Asie Orientale. </p>
<blockquote>
<p>By reviewing a recent quantitative study of rhyme patterns in Mandarin Chinese, this study shows how data handling and data analysis in the study of rhyme patterns can be improved. Suggestions for improvement include (a) a consistent annotation of rhyme data, which is exhaustive and facilitates data reuse, and (b) emphasizes the importance of automated approaches for exploratory data analysis, which can help to analyze rhyme data in an improved way, prior to applying statistical frameworks for hypothesis testing.</p>
</blockquote>
<p>The study itself can be found <a href="https://doi.org/10.1163/19606028-bja10004">here</a>, but it can only be viewed with subscription. I have deposited a preprint with Humanities Commons, which you can find <a href="https://hcommons.org/deposits/item/hc:28875/">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-06-10-newstudy#2020-06-10-newstudy"/><published>2020-06-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-06-11-newpaper</id><title>New paper accepted on annotating word formation processes</title><updated>2020-06-11T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper by Johann-Mattis List and myself has been accepted for publication by the SKASE Journal of Theoretical Linguistics. In the article, titled "Developing an annotation framework for word formation processes in comparative linguistics", we propose a new approach to the annotation of cross-linguistic etymological relations that also takes morphological processes into account. Included is a small Python library and annotated data samples from a variety of language families. You can access the preprint <a href="https://hcommons.org/deposits/item/hc:30401/">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-06-11-newpaper#2020-06-11-newpaper"/><published>2020-06-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-06-22-blogspot</id><title>New Blogpost </title><updated>2020-06-22T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote a blogpost describing an on-going project where I have developing a
model of phonological distinctive features for computer-assisted
language comparison. It explains the rationale behind the proposal and
shows how to use a small Python library that allows to access the feature
matix without too much boilerplate code. The blogpost can be found on 
<a href="https://calc.hypotheses.org/2485">CALC's blog</a>.</p></div></content><link href="https://calclab.org/?news=2020-06-22-blogspot#2020-06-22-blogspot"/><published>2020-06-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-07-02-blogpost</id><title>New blog post on rhyming </title><updated>2020-07-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already on Monday this week, a new blogpost on rhyming, part three in my series "From rhymes to networks" appeared. Devoted to rhyme annotation, the post presents some general ideas and a rather simple, but efficient text-based format for the annotation of rhymes in texts. You can find the blogpost <a href="http://phylonetworks.blogspot.com/2020/06/annotating-rhymes-in-texts-from-rhymes.html">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-07-02-blogpost#2020-07-02-blogpost"/><published>2020-07-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-07-10-news</id><title>New Blogpost, Preprint, and CfP </title><updated>2020-07-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Time passes quickly, and there are three new things to announce now. First, I have published a new blog post in German, which deals with the differential treatment in speaking, <a href="https://wub.hypotheses.org/1194">Andersbehandlung von Menschen im Sprechen</a>. </p>
<p>Second, a preprint titled <a href="https://psyarxiv.com/qat4r/">From Text to Thought: How Analyzing Language Can Advance Psychological Science</a> was just submitted online. The paper by Joshua C. Jackson, Joseph Watts, myself, Ryan Drabble, and Kristen Lindquist discusses how new approaches to language analysis could be fruitfully applied in psychology in the future:</p>
<blockquote>
<p>Humans have been using language for thousands of years, but psychologists seldom consider what natural language can tell us about the mind. Here we propose that language offers a unique window into human cognition. After briefly summarizing the legacy of language analyses in psychological science, we show how methodological advances have made these analyses more feasible and insightful than ever before. In particular, we describe how two forms of language analysis—comparative linguistics and natural language processing—are already contributing to how we understand emotion, creativity, and religion, and overcoming methodological obstacles related to statistical power and culturally diverse samples. We summarize resources for learning both of these methods, and highlight the best way to combine language analysis techniques with behavioral paradigms. Applying language analysis to large-scale and cross-cultural datasets promises to provide major breakthroughs in psychological science.</p>
</blockquote>
<p>Last not least, we have just launched a call for papers, for a workshop on <a href="https://www.sfs.uni-tuebingen.de/~gjaeger/maeiqcl21/">Model and Evidence in Quantitative Comparative Linguistics</a>, organized by Gerhard Jäger (University Tübingen) and myself as part of the annual meeting of the DGfS in February 2021. The deadline for this Call is 31st of August 2020, and we invite submissions for 20-minute talks and have even limited resources for travel funds available.</p></div></content><link href="https://calclab.org/?news=2020-07-10-news#2020-07-10-news"/><published>2020-07-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-07-28-norare</id><title>New Preprint for NoRaRe collection</title><updated>2020-07-28T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We have been working on a feature of the <a href="https://concepticon.clld.org/">Concepticon</a> that contains data on norms, ratings, and relations for words and concepts. The new database is called NoRaRe and currently offers 71 data sets from studies in psychology and linguistics.</p>
<p>You can take a look at the data in the <a href="https://digling.org/norare/">web app</a> or in the <a href="https://github.com/concepticon/norare-data">NoRaRe GitHub reporitory</a>.</p>
<p>The preprint can be found <a href="https://psyarxiv.com/tgw3z/">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-07-28-norare#2020-07-28-norare"/><published>2020-07-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-08-03-news</id><title>New Blog Posts in July </title><updated>2020-08-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Time keeps passing, and I have not found the time to keep up with my website, so news come now in condensed form. First, I initiated a new series of blog posts in our <a href="https://calc.hypotheses.org">Computer-Assisted Language Comparison in Practice</a> blog, which is called <a href="https://calc.hypotheses.org/2501">How to do X in linguistics?</a> and will features specific topics that are barely taught but considered as one of the basic tasks of a professional linguist and scientist (such as writing a review, responding to a review, or organizing one's bibliography). </p>
<p>Yesterday, another blog post in the series <a href="http://phylonetworks.blogspot.com/2020/04/from-rhymes-to-networks-new-blog-series.html">From rhymes to networks</a> appeared, discussing this time the <a href="http://phylonetworks.blogspot.com/2020/07/automated-detection-of-rhymes-in-texts.html">Automated Detection of Rhymes</a>. </p></div></content><link href="https://calclab.org/?news=2020-08-03-news#2020-08-03-news"/><published>2020-08-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-08-10-blogpost</id><title>New Blog Post </title><updated>2020-08-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published a new German blog post, which is this time less serious than usual, discussing the tendency of scientists, including myself, to trace topics back to their scientific subjects. The blog post, titled "Wovon man sprechen kann, darüber darf man auch mal schweigen", can be found <a href="https://wub.hypotheses.org/1199">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-08-10-blogpost#2020-08-10-blogpost"/><published>2020-08-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-08-19-preprint</id><title>New Preprint on the Detection of Contact Layers </title><updated>2020-08-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new preprint by Abbie Hantgan, Hiba Babiker, and myself appeared online (the study itself is currently under review), discussing preliminary approaches to the detection of contact layers in the language isolate Bangime. </p>
<blockquote>
<p>We present a computer-assisted, multidisciplinary, first approach to addressing this problem of
detecting the layers of contact in Bangime. First, we assemble lexical evidence of contact
between Bangime speakers with their neighboring languages, using a computer-assisted
technique, followed by an evaluation of the materials by contrasting them with genetic findings.
Specifically, we propose trajectories for Bangande settlement patterns. With this study, we lay
the foundation of future collaborative work that will improve, correct, and enhance the results of
this study. The original data used for the study are made available so that additional
researchers may follow up on and test our hypotheses concerning contact layers in Bangime.</p>
</blockquote>
<p>The preprint has been submitted to <a href="https://hcommons.org">Humanities Commons</a> and can be accessed <a href="https://hcommons.org/deposits/item/hc:32331/">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-08-19-preprint#2020-08-19-preprint"/><published>2020-08-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-08-24-blogpost</id><title>New Blog Post in the Rhyme Network Series </title><updated>2020-08-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my fifth (out of six) blogpost in my series <a href="http://phylonetworks.blogspot.com/2020/04/from-rhymes-to-networks-new-blog-series.html">From Rhymes to Networks</a> appeared, this time focusing on <a href="http://phylonetworks.blogspot.com/2020/08/constructing-rhyme-networks-from-rhymes.html">Constructing Rhyme Networks</a>. While I was first a bit disappointed that I did not find enough time to annotate all of Goethe's Faust until now, I was quite happy to see that I have annotated enough German poems already to allow at least for a small demonstration on how to create rhyme networks from rhyme data on a language different from Chinese. </p></div></content><link href="https://calclab.org/?news=2020-08-24-blogpost#2020-08-24-blogpost"/><published>2020-08-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-09-01-tutorial</id><title>New Blog Post in the How-To Series </title><updated>2020-09-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already last week, my first blog post in our "How to do X in linguistics" series appered, concentrating on <a href="https://calc.hypotheses.org/2504">How to write an initial review for a journal</a>. Here is the abstract, more can be found in the actual blog post.</p>
<blockquote>
<p>Writing reviews for a journal is one of those things which most scientists never actively learn. For laypeople, this may be surprising, given how often the scientific method with its rigorous peer review procedure is being mentioned in the news nowadays. How can it be, one may ask oneself, that this procedure that is usually presented as the core principle of scientific reasoning, is never really actively taught? If the review by experts is the core of the scientific method and what decides about the acceptance of an article, how can it be that scientists do never take a course on article reviewing, and how can it be that reviewers are (as I have previously discussed in a German blogpost) themselves never reviewed or graded?</p>
</blockquote></div></content><link href="https://calclab.org/?news=2020-09-01-tutorial#2020-09-01-tutorial"/><published>2020-09-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-09-02-preprint</id><title>New Preprint on the Detection of Borrowings from Monolingual Wordlists</title><updated>2020-09-02T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint by John E. Miller, myself, Roberto Zariquiey, César A. Beltrán Castañón, Natalia Morozova, and Johann-Mattis List appeared online today (the paper has been submitted). We discuss the identification of borrowings from monolingual wordlists using three different statistical methods.</p>
<blockquote>
<p>Native speakers are often assumed to be efficient in identifying whether a word in their language has been borrowed,
even when they do not have direct knowledge of the donor language from which it was taken. To detect borrowings,
speakers make use of various strategies, often in combination, relying on clues such as semantics of the words
in question, phonology and phonotactics. Computationally, phonology and phonotactics can be modeled with support of
Markov n-gram models or – as a more recent technique– recurrent neural network models. Based on a substantially revised
dataset in which lexical borrowings have been thoroughly annotated for 41 different languages of a large typological
diversity, we use these models to conduct a series of experiments to investigate their performance in borrowing
detection using only information from monolingual wordlists. Their performance is in many cases unsatisfying, but
becomes more promising for strata where there is a significant ratio of borrowings and when most borrowings originate
from a dominant donor language. The recurrent neural network performs marginally better overall in both realistic
studies and artificial experiments,and holds out the most promise for continued improvement and innovation in
lexical borrowing detection. Phonology and phonotactics, as operationalized in our lexical language models, are only a
part of the multiple clues speakers use to detect borrowings. While improving our current methods will result in better
borrowing detection, what is needed are more integrated approaches that also take into account multilingual
and cross-linguistic information for a proper automated borrowing detection.</p>
</blockquote>
<p>The preprint has been submitted to <a href="https://hcommons.org">Humanities Commons</a> and can be accessed <a href="https://hcommons.org/deposits/item/hc:32409/">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-09-02-preprint#2020-09-02-preprint"/><published>2020-09-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-09-16-preprint</id><title>New Blog Post and Preprint </title><updated>2020-09-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Monday, I published a new German blog post, this time dealing with questions of "manual labor in digital times", the post, titled "Handarbeit im digitalen Zeitalter" can be found <a href="https://wub.hypotheses.org/1213">here</a>.</p>
<p>Yesterday, a new preprint appeared (common work with Hans Geisler and Robert Forkel), featuring "A digital, retro-standardized edition of the Tableaux phonétiques des patois Suisses romands (TPPSR)". </p>
<blockquote>
<p>This study presents a digital, retro-standardized edition of the Tableaux Phonétiques des Patois Suisses Romands (TPPSR), an early collection of lexical dialect data of the Suisse romande, which was compiled by Louis Gauchat, Jules Jeanqaquet, and Ernest Tappolet in the beginning of the 20th century and later published in 1925. While the plan of Gauchat and his collaborators to turn their data into a dialect atlas could never be realized for the lack of funding, we show how consistent techniques for digitization, accompanied by transparent approaches to retro-standardization can be used to turn the original data of the TPPSR into a modern interactive dialect atlas. The dialect atlas is not only publicly available in the form of a web-based application, but also in the form of a dataset that offers the data in standardized, human- and machine-readable form.</p>
</blockquote>
<p>The preprint was archived with <a href="https://hcommons.org/deposits/item/hc:32545/">Humanities Commons</a> and the web application has been published as a <a href="https://tppsr.clld.org">CLLD</a> project.</p></div></content><link href="https://calclab.org/?news=2020-09-16-preprint#2020-09-16-preprint"/><published>2020-09-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-09-28-post</id><title>Final Post in Rhyme Series </title><updated>2020-09-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, the final post in my small seris of six blogposts devoted to rhyme networks has appeared. While I have to admit that I was a bit more optimistic when I started the series, I am still content with what has been achieved in the last sixth month, even if most of these achievements reflect the awareness of new problems that need to be solved in the future. The post, titled "Analyzing Rhyme Networks", can be found <a href="http://phylonetworks.blogspot.com/2020/09/analyzing-rhyme-networks-frome-rhymes.html">here</a>.  </p></div></content><link href="https://calclab.org/?news=2020-09-28-post#2020-09-28-post"/><published>2020-09-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-09-30-blogpost</id><title>Blog post introducing a list of 171 body part concepts </title><updated>2020-09-30T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A list of 171 body part concepts was introduced in a <a href="https://calc.hypotheses.org/2512">blog post</a> today. The list consists of body part concepts from ADAM’S APPLE to WRIST and can be found <a href="https://doi.org/10.5281/zenodo.4058506">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-09-30-blogpost#2020-09-30-blogpost"/><published>2020-09-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-10-02-news</id><title>New Temporary Affiliation </title><updated>2020-10-02T12:00:00+00:00</updated><author><name>@LinguList</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>From October 2020 until March 2021, I will act as a part-time deputy professor
(Vertretungsprofessor) at the University Bielefeld. Essentially this means that
apart from a change in affiliation, I will give an extended lecture on
computer-assisted approaches to comparative linguistics at the University
Bielefeld (in remote form). I hope that the lecture can this time cover both
some basic introductions to Python for linguists as well as an in-depth
tutorial on the most recent advancement in computational historical
linguistics. As usually, I will share my scripts openly, but I may do that only
after the lecture is finished. I am looking forward to this possibility of testing how well our integrative approach to data handling and analysis can be taught to students at the bachelor and master level.</p></div></content><link href="https://calclab.org/?news=2020-10-02-news#2020-10-02-news"/><published>2020-10-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-10-19-ids</id><title>New Blogpost </title><updated>2020-10-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new English blogpost appeared, in which I present an updated German wordlist that provides transcribed translations of the concept list proposed by the <a href="https://ids.clld.org">Intercontinental Dictionary Series</a>. The blogpost can be found <a href="https://calc.hypotheses.org/2545">here</a>, and the data have been published in the form of a <a href="https://gist.github.com/LinguList/cfa4ab9b2b168fbc07d8247352fb6039">GitHub GIST</a>. </p></div></content><link href="https://calclab.org/?news=2020-10-19-ids#2020-10-19-ids"/><published>2020-10-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-10-23-blogpost</id><title>New German Blogpost </title><updated>2020-10-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published my monthly German blogpost, this time dealing with theories that seem to be useful and powerful but soon turn out to be less helpful, since they tend to attract just-so-stories as explanations. The blog post, titled "Scheinriesentheorien" can be found <a href="https://wub.hypotheses.org/1228">here</a>.  </p></div></content><link href="https://calclab.org/?news=2020-10-23-blogpost#2020-10-23-blogpost"/><published>2020-10-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-11-02-concepticon</id><title>Concepticon 2.4.0 Published</title><updated>2020-11-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, we published Version 2.4.0 of the <a href="https://concepticon.clld.org">Concepticon</a> project, now offering 353 concept lists that are linked to as many as 3825 different concept sets. In this version, we also welcomed two new editors, Carolin Hundt and Tiago Tresoldi, both of whom helped us a lot in improving the Concepticon since its last version.</p></div></content><link href="https://calclab.org/?news=2020-11-02-concepticon#2020-11-02-concepticon"/><published>2020-11-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-11-05-clics</id><title>New Blogpost on language colexification statistics</title><updated>2020-11-05T12:00:00+00:00</updated><author><name>T. Tresoldi</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A <a href="https://calc.hypotheses.org/2552">blog post</a> authored by me was published today on the <a href="https://calc.hypotheses.org/">CALC blog</a>. Following two different requests related to our <a href="https://clics.clld.org/">CLICS project</a>, I have explored different statistics related to which languages colexify which pair of concepts, and which languages have a higher tendency for colexification.</p></div></content><link href="https://calclab.org/?news=2020-11-05-clics#2020-11-05-clics"/><published>2020-11-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-11-09-blogpost</id><title>New German Blogpost and CALC Posts for 2019 </title><updated>2020-11-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Saturday, I published a new German blog post, devoted to <a href="https://wub.hypotheses.org/1236">Digital Bullshit Jobs</a>, in which I discuss how ignorance regarding the power of computational solutions leads to a situation in which we lose a lot of time doing manually what could easily be done automatically. </p>
<p>Already on Thursday Volume II of <a href="http://dx.doi.org/10.17613/cya5-sb31">Computer-Assisted Language Comparison in Practice</a> was published with <a href="https://hcommons.org">Humanities Commons</a>. This volume presents citable PDF versions of all blog posts that were written in 2019 as part of our <a href="https://calc.hypotheses.org">CALC blog</a>. </p></div></content><link href="https://calclab.org/?news=2020-11-09-blogpost#2020-11-09-blogpost"/><published>2020-11-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-11-23-news</id><title>News, news, news </title><updated>2020-11-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A lot has happened in November so far. </p>
<p>First, an article has appeared in the Frankfurter Allgemeine Zeitung, featuring our research on signed languages published earlier this year. The article, titled "Die Stille Revolution" can now be accessed online <a href="https://www.faz.net/-ibq-a5ey0">here</a>.</p>
<p>Second, a paper by John Miller, Tiago Tresoldi, Roberto Zariquiey, César Beltrán, Natalia Morozova, and myself has now been accepted by PLOS One. In this article we test the suitability of several machine learning techniques to infer lexical borrowings in a supervised approach by considering only monolingual information. This article has been available in the form of a preprint already, but it will soon also be available officially.</p>
<p>Third, a new study by Timotheus A. Bodt and myself was accepted for publication in Diachronica. In this study, we test how well one can predict words across languages that have not been elicited during fieldwork, relying on information about potential cognates of the missing words. Our experiment, which turned out to be quite successful, showed that our automated methods provide some good help, although they cannot do all of the work alone. More importantly, however, we realized how useful it can be to carry out active prediction attempts. This study, which went on for more than three years now, is also the first known to me, where predictions about words were preregistered in the form of an experiment. Our accepted authors' version before type setting can now be accessed from <a href="https://doi.org/10.17613/t3nm-q348">Humanities Commons</a>. </p></div></content><link href="https://calclab.org/?news=2020-11-23-news#2020-11-23-news"/><published>2020-11-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-12-07-blogpost</id><title>New blog post in 'How to do X in linguistics?' series and paper on 'General patterns and language variation'</title><updated>2020-12-07T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote a blog post on "Possibilities of digital communication in linguistics" for our 'How to' series. In the blog post, I’ll illustrate some of the possibilities that linguists and other researchers have to discuss and share their work. You can read it <a href="https://calc.hypotheses.org/2556">here</a>.</p>
<p>In addtion, my proceedings paper for the COLING2020 workshop on 'Cognitive Aspects of the Lexicon' was published. The article with the title "General patterns and language variation: Word frequencies across English, German, and Chinese" is available <a href="https://www.aclweb.org/anthology/2020.cogalex-1.3">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-12-07-blogpost#2020-12-07-blogpost"/><published>2020-12-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-12-14-newpaper</id><title>New Paper on Borrowing Detection Appeared</title><updated>2020-12-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A paper by John Miller, Tiago Tresoldi,  Roberto Zariquiey,
César A. Beltrán Castañón,
Natalia Morozova, and myself appeared last week, discussing the use of lexical language models for mono-lingual borrowing detection:</p>
<blockquote>
<p>Lexical borrowing, the transfer of words from one language to another, is one of the most frequent processes in language evolution. In order to detect borrowings, linguists make use of various strategies, combining evidence from various sources. Despite the increasing popularity of computational approaches in comparative linguistics, automated approaches to lexical borrowing detection are still in their infancy, disregarding many aspects of the evidence that is routinely considered by human experts. One example for this kind of evidence are phonological and phonotactic clues that are especially useful for the detection of recent borrowings that have not yet been adapted to the structure of their recipient languages. In this study, we test how these clues can be exploited in automated frameworks for borrowing detection. By modeling phonology and phonotactics with the support of Support Vector Machines, Markov models, and recurrent neural networks, we propose a framework for the supervised detection of borrowings in mono-lingual wordlists. Based on a substantially revised dataset in which lexical borrowings have been thoroughly annotated for 41 different languages from different families, featuring a large typological diversity, we use these models to conduct a series of experiments to investigate their performance in mono-lingual borrowing detection. While the general results appear largely unsatisfying at a first glance, further tests show that the performance of our models improves with increasing amounts of attested borrowings and in those cases where most borrowings were introduced by one donor language alone. Our results show that phonological and phonotactic clues derived from monolingual language data alone are often not sufficient to detect borrowings when using them in isolation. Based on our detailed findings, however, we express hope that they could prove to be useful in integrated approaches that take multi-lingual information into account.</p>
</blockquote>
<p>The paper can be found <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242709">here</a>. </p></div></content><link href="https://calclab.org/?news=2020-12-14-newpaper#2020-12-14-newpaper"/><published>2020-12-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-12-17-tangut</id><title>Tangut as a West-Rgyalrongic Language</title><updated>2020-12-17T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Tangut (Chinese: 西夏語 Xīxià Yǔ, "Western Xia language|) is an extinct Sino-Tibetan language attested between 1036-1502 AD, as one of the official languages of the Tangut empire. It is a fascinating language with copious literature and a complex writing system.</p>
<p>Right from the beginning of the 20th century until recently, the exact
affiliation of Tangut in the Sino-Tibetan family was unclear, some classified
it as with Lolo-Burmese, some claimed that it was related to Qiang. In our earlier study on the phylogeny of Sino-Tibetan (Sagart et
al 2019), we used Bayesian phylogeny inference, placing Tangut with Gyalrongic
languages. This proposal coincides with the intuition of experts of Gyalrongic.
Specifically, Tangut shows a good number of similarities with West Gyalrongic,
which includes Khroskyabs and Horpa-Stau languages. </p>
<p>In a paper that was published this week (Lai et al 2020), we explore linguistic
evidence that proves Tangut to be a West Gyalrongic language. We find shared
lexical, morphological and syntactical innovations between Tangut and modern
West Gyalrongic languages, demonstrating from a historical linguistic point of
view, that Tangut’s most close relative is indeed West Gyalrongic. We also
discuss the migration of the Tanguts based on ancient texts from the 10th
century attested Tangut population in today's Western Sichuan, where West
Gyalrongic languages are spoken. </p>
<p>The paper can be found <a href="https://doi.org/10.1515/flih-2020-0006">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-12-17-tangut#2020-12-17-tangut"/><published>2020-12-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2020-12-18-blogpost</id><title>New Blog Post </title><updated>2020-12-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote my final German blog post for this year, this time discussing to which degree we can use our "native speaker intuition" to identify foreign words in our native languages. The blog post, titled "Von Wörtern in fremdem Gewand" can be found <a href="https://wub.hypotheses.org/1247">here</a>.</p></div></content><link href="https://calclab.org/?news=2020-12-18-blogpost#2020-12-18-blogpost"/><published>2020-12-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-01-12-entropy</id><title>Khroskyabs is not hard, but harder than you think</title><updated>2021-01-12T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Different people might have different opinions about the difficulty in learning a particular language. This heavily depends on the learner's linguistic background, learning experience and exposure to the target language. However, if we forget about all those noises, and solely focus on morphological paradigms alone, it is assumed that all languages are not as messy as they appear to be. This is called "the conditional low entropy conjecture", first proposed by Farrell and Malouf (2013), stating that morphology is in general organized, thus showing a low overall conditional entropy. Bonami and Beniamine (2016) developped a new method, called the "implicative entropy", in order to measure the degree of morphological organization in language. They showed that French and European Portuguese have very low implicative entropies in terms of morphology. </p>
<p>It would have been interesting to see how implicative entropies are in Gyalrongic languages that are famous for their complex morphology. In a new paper that was just published (Lai 2021), I used Bonami and Beniamine's (2016) method to measure the implicative entropy of Siyuewu Khroskyabs, a West-Gyalrongic language. 
The result shows that the degree of morphological organization of Khroskyabs is relatively high, that is, the forms are more or less predictable, if we consider the high-low threshold of 1 bit: the entropies are generally lower than 1. However, if we compare the Siyuewu results to those for French and for Portuguese, we observe that the entropies exhibited in Siyuewu are significantly higher (sometimes by three or four folds). Although it might be inapproriate to compare the results directly, we can still claim that Siyuewu is indeed morphologically more complex. </p>
<p>The reason behind the complexity seems to be due to the evolution of the language.
The same phoneme in an earlier stage evolved into different modern reflexes under different conditions, creating irregularities, and analogical change is yet to be at work to clear the mess. 
Therefore, the result implies that Siyuewu Khroskyabs preserves more information about the proto-language than other dialects that have lower entropies, and is useful for the reconstruction of verbal morphology in the proto-language. As a result, I put an internal reconstruction forward, which helps to understand the evolution of verbal morphology in Khroskyabs and contributes to the field of Sino-Tibetan historical linguistics. </p></div></content><link href="https://calclab.org/?news=2021-01-12-entropy#2021-01-12-entropy"/><published>2021-01-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-01-13-tables</id><title>New Blog Post </title><updated>2021-01-13T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote a new blog post that just appeared in our blog on Computer-Assisted Language comparison, titled <a href="https://calc.hypotheses.org/2617">How to Handle Semantic Data with Tables</a>. The blog post introduces basic practices used in the <a href="https://concepticon.clld.org">Concepticon</a> project to handle semantic data.</p></div></content><link href="https://calclab.org/?news=2021-01-13-tables#2021-01-13-tables"/><published>2021-01-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-01-15-blogpost</id><title>New German Blog Post </title><updated>2021-01-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote a new German blog post for January that discusses how important it is to be careful when seemingly detecting patterns, given that our mind often finds patterns where there are none in reality.
This is exemplified with examples from etymology. You can find the post <a href="https://wub.hypotheses.org/1258">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-01-15-blogpost#2021-01-15-blogpost"/><published>2021-01-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-01-20-blogpost</id><title>New Blog Post in 'How to do X in linguistics?' Series</title><updated>2021-01-20T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I wrote a blog post about how to organize a journal club. The post is intended for someone who wants to start their own journal club or for someone who is unsure what to expect when they join our journal club. I share my experience and some tips <a href="https://calc.hypotheses.org/2613">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-01-20-blogpost#2021-01-20-blogpost"/><published>2021-01-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-01-25-preprint</id><title>New Preprint </title><updated>2021-01-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I submitted a new preprint, titled "Chances and Challenges of Quantitative Approaches in Chinese Historical Phonology". The study was submitted for the inclusion in a Festschrift for the 120th birthday of the famous linguist Li Fang-Kuei. </p>
<blockquote>
<p>The field of Chinese Historical Phonology is traditionally dealing with a large
number of complex and diverse types of data. While the data diversity can be
conveniently dealt with in qualitative approaches, computational possibilities that
have arisen during the past two decades offer new possibilities and new challenges
for the field. In the study, I will summarize the chances and challenges which we face
in the discipline and point to some suggestions for future work. While not being able
to provide a direct solution for most issues of data handling and standardization, I
hope that this study can contribute to a broader discussion about data and standards
in the field of Chinese Historical Phonology.</p>
</blockquote>
<p>The study has been submitted as a preprint to <a href="https://hcommons.org/deposits/item/hc:34217/">Humanities Commons</a>.</p></div></content><link href="https://calclab.org/?news=2021-01-25-preprint#2021-01-25-preprint"/><published>2021-01-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-02-01-farewell</id><title>Tiago Tresoldi Leaves the CALC Project to Start in Uppsala </title><updated>2021-02-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We are pleased and sad at the same time that Tiago Tresoldi, who worked in the
CALC project as a post-doc for almost three years, is leaving the project to
pursue a post-doc project in Uppsala. Given that we are currently finalizing
some studies that have not been published or submitted yet during Tiago's
presence in our project, this is not the last time, Tiago will feature in our
list of authors. For those interested in seeing what Tiago will be working
on in the future, I recommend to follow his <a href="http://www.tiagotresoldi.com/">personal
website</a>, where he will share his new ideas on
the analysis of the cultural transmission of text traditions. </p></div></content><link href="https://calclab.org/?news=2021-02-01-farewell#2021-02-01-farewell"/><published>2021-02-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-02-17-durst</id><title>New German Blogpost </title><updated>2021-02-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I wrote my monthly German blog post for February, this time concentrating on expressions for "thirst" and what triggers thirst in our brains. You can find the blog post <a href="https://wub.hypotheses.org/1269">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-02-17-durst#2021-02-17-durst"/><published>2021-02-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-02-24-wals</id><title>New Tutorial Blogpost </title><updated>2021-02-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new tutorial blogpost appeared, this time focusing on how to work with the data for the World Atlas of Language Structures in CLDF formats. You can find the post <a href="https://calc.hypotheses.org/2670">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-02-24-wals#2021-02-24-wals"/><published>2021-02-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-03-10-blogs</id><title>New Tutorial Blogpost and New Paper Accepted </title><updated>2021-03-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new tutorial blogpost appeared, focusing on how to link a particular complex dataset to the Concepticon <a href="https://calc.hypotheses.org/2684">here</a>. In addition, a new paper by Joshua Conrad Jackson, Joseph Watts, Curtis Puryear, Ryan Drabble, Kristen Lindquist, and myself. The study, titled "From text to thought: How analyzing language can advance psychological science" will appear in "Perspectives on Psychological Science". The draft is available <a href="https://www.researchgate.net/profile/Joshua-Jackson-16/publication/349734198_From_Text_to_Thought_How_Analyzing_Language_Can_Advance_Psychological_Science/links/603f992692851c077f15bc5b/From-Text-to-Thought-How-Analyzing-Language-Can-Advance-Psychological-Science.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-03-10-blogs#2021-03-10-blogs"/><published>2021-03-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-03-15-blogpost</id><title>New Blog Post on 'How to review concept lists in collaboration'</title><updated>2021-03-15T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post in our 'How to do X in linguistics?' series was published. In the post, I describe our review process for adding concept lists to the <a href="https://github.com/concepticon/concepticon-data">Concepticon GitHub repository</a>. The post shows how a collaborative review workflow ensures data validity and it is available <a href="https://calc.hypotheses.org/2680">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-03-15-blogpost#2021-03-15-blogpost"/><published>2021-03-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-03-16-paper</id><title>New Paper and Blogpost </title><updated>2021-03-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Sunday, my German blog post for March appeared, this time discussing the indeterminacy in language ,  communication. You can find the post <a href="https://wub.hypotheses.org/1279">here</a>. At the same time, a review paper by Cara L. Evans,  Simon J. Greenhill,  Joseph Watts,  Carlos A. Botero,  Russell D. Gray, Kathryn R. Kirby, and myself appeared, discussing "Uses and abuses of tree thinking in cultural evolution. In contributed the part on incomplete lineage sorting to this paper, which you can find as a preprint <a href="https://osf.io/preprints/socarxiv/a8v3e/">here</a>. </p></div></content><link href="https://calclab.org/?news=2021-03-16-paper#2021-03-16-paper"/><published>2021-03-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-03-17-paper</id><title>New Paper </title><updated>2021-03-17T12:00:00+00:00</updated><author><name>Y.-F. Lai</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper by myself just appeared, which focuses on the verbal inflection
chain of Siyuewu Khroskyabs, a Gyalrongic language (Trans-Himalayan). Siyuewu
Khroskyabs goes against two general typological tendencies: first, as an SOV
language, it shows an overwhelming preference for prefixes, which is rarely
reported typologically; second, the inflectional prefixes in the outer slots
are older than those in the inner slots, which is the reverse case of most
languages. In this paper, I first identify distinct historical layers
within the inflectional prefixes, and then focus on two of the prefixes, də-
‘even’ and “ɕə- ‘Q’ whose evolutionary pathways are relatively clear. The
essential part of the hypothesis is that the prefixes originate from enclitics
which could be attached to the end of a preverbal chain, originally loosely
attached to the verb stem. The preverbal chain later became tightly attached to
the verbal stem and eventually became a part of it as a chain of prefixes. As a
result, the original enclitics are reanalysed as prefixes. The integration of
preverbal morphemes is responsible for the prefixing preference in Modern
Siyuewu Khroskyabs. However, despite this superficial prefixing preference,
Siyuewu Khroskyabs underlyingly favours postposed morphemes. By following the
general suffixing tendency, this language finally managed to create a
typologically rare, overwhelmingly prefixing verbal template. </p>
<p>The author's version of the paper can be found <a href="https://pure.mpg.de/pubman/faces/ViewItemOverviewPage.jsp?itemId=item_3292444">here</a></p></div></content><link href="https://calclab.org/?news=2021-03-17-paper#2021-03-17-paper"/><published>2021-03-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-03-23-lingpy</id><title>LingPy 2.6.7 Released </title><updated>2021-03-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday we released LingPy 2.6.7, which you can find at https://pypi.org/project/lingpy (documentation at https://lingpy.github.io). The release does not introduce new features but guarantees compatibility with Python 3.9.</p></div></content><link href="https://calclab.org/?news=2021-03-23-lingpy#2021-03-23-lingpy"/><published>2021-03-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-04-16-blog</id><title>New Tutorial Blog Post and EDICTOR Release</title><updated>2021-04-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Tuesday, I released <a href="https://digling.org/edictor">EDICTOR 2.0</a>, and on
Wednesday, I published a blog post discussing some of the new features, titled
<a href="https://calc.hypotheses.org/2735">Using EDICTOR 2.0 to Annotate Language-Internal Cognates in a German
Wordlist</a>. </p></div></content><link href="https://calclab.org/?news=2021-04-16-blog#2021-04-16-blog"/><published>2021-04-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-04-21-clts</id><title>Cross-Linguistic Transcription Systems 2.1.0 </title><updated>2021-04-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, we released the Cross-Linguistic Transcription Systems
(<a href="https://clts.clld.org">https://clts.clld.org</a>) in a new version, which also
came along with some changes in the presentation of the data in the CLLD
application, which Robert Forkel added during the past days. </p></div></content><link href="https://calclab.org/?news=2021-04-21-clts#2021-04-21-clts"/><published>2021-04-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-04-27-news</id><title>New Blog Post and New Papers </title><updated>2021-04-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today I published a new German blog post, titled <a href="https://wub.hypotheses.org/1287">Parallele Evolution in der Benennung von Unverpacktläden</a>. At the same time, two papers appeared online. The first one is a prediction study with T. A. Bodt, published in <a href="https://doi.org/10.1075/dia.20009.bod">Diachronica</a>:</p>
<blockquote>
<p>While analysing lexical data of Western Kho-Bwa languages of the Sino-Tibetan or Trans-Himalayan family with the help of a computer-assisted approach for historical language comparison, we observed gaps in the data where one or more varieties lacked forms for certain concepts. We employed a new workflow, combining manual and automated steps, to predict the most likely phonetic realisations of the missing forms in our data, by making systematic use of the information on sound correspondences in words that were potentially cognate with the missing forms. This procedure yielded a list of hypothetical reflexes of previously identified cognate sets, which we first preregistered as an experiment on the prediction of unattested word forms and then compared with actual word forms elicited during secondary fieldwork. In this study we first describe the workflow which we used to predict hypothetical reflexes and the process of elicitation of actual word forms during fieldwork. We then present the results of our reflex prediction experiment. Based on this experiment, we identify four general benefits of reflex prediction in historical language comparison. These comprise (1) an increased transparency of linguistic research, (2) an increased efficiency of field and source work, (3) an educational aspect which offers teachers and learners a wide plethora of linguistic phenomena, including the regularity of sound change, and (4) the possibility of kindling speakers’ interest in their own linguistic heritage.</p>
</blockquote>
<p>The second study is based on our initial experiments with interlinear-glossed text, published in <a href="https://doi.org/10.1145/3389010">TALLIP</a>, together with N. Sims and R. Forkel, which is not available in open access, but our authors copy can be found <a href="http://doi.org/10.17613/nppg-x393">here</a>:</p>
<blockquote>
<p>While the amount of digitally available data on the worlds’ languages is steadily increasing, with more and more languages being documented, only a small proportion of the language resources produced are sustainable. Data reuse is often difficult due to idiosyncratic formats and a negligence of standards that could help to increase the comparability of linguistic data. The sustainability problem is nicely reflected in the current practice of handling interlinear-glossed text, one of the crucial resources produced in language documentation. Although large collections of glossed texts have been produced so far, the current practice of data handling makes data reuse difficult. In order to address this problem, we propose a first framework for the computer-assisted, sustainable handling of interlinear-glossed text resources. Building on recent standardization proposals for word lists and structural datasets, combined with state-of-the-art methods for automated sequence comparison in historical linguistics, we show how our workflow can be used to lift a collection of interlinear-glossed Qiang texts (an endangered language spoken in Sichuan, China), and how the lifted data can assist linguists in their research.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2021-04-27-news#2021-04-27-news"/><published>2021-04-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-05-04-norare</id><title>Updated Preprint for NoRaRe Article available</title><updated>2021-05-04T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We revised our manuscript for the article on "Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties." An updated version of our preprint (Version 2) is now available on PsyArXiv (<a href="https://doi.org/10.31234/osf.io/tgw3z">click here</a>)</p>
<p>Tjuka, A., Forkel, R., &amp; List, J. (2021, May 4). Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties. https://doi.org/10.31234/osf.io/tgw3z</p></div></content><link href="https://calclab.org/?news=2021-05-04-norare#2021-05-04-norare"/><published>2021-05-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-05-07-pd</id><title>Privatdozent </title><updated>2021-05-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Having successfully defended my habilitation in February this year, I have now
gained a new affiliation as a Privatdozent at the Institut für <a href="https://www.oriindufa.uni-jena.de/institut/mitarbeiter">Orientalistik,
Indogermanistik, Ur- und Frühgeschichtliche
Archäologie</a> at the
Friedrich-Schiller-Universität Jena. This means that I will give at least one seminar per semester from now on at the FSU Jena, and my current seminar on lexical change has already started.</p>
<p>Furthermore, <a href="https://digling.org/evobib">EvoBib Version 1.4.0</a> was published yesterday, containing some 200 more entries in the bibliography and numerous new quotes, all assembled during the past months. </p></div></content><link href="https://calclab.org/?news=2021-05-07-pd#2021-05-07-pd"/><published>2021-05-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-05-17-news</id><title>News </title><updated>2021-05-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>There are quite a few different kinds of news to share today. First, I published my May blogpost in German, titled 
"Offene Forschung als Praxis guter Wissenschaft" (URL: <a href="https://wub.hypotheses.org/1292">wub.hypotheses.org/1292</a>), discussing open science and why it should become part of the <em>good scientific practice</em>. 
Then, I am very happy to announce that Frederic Blum published a blog post on
"Data Gathering in Times of a Pandemic: Upcycling Constenla Umaña’s Data on the Chibchan, Lencan and Misumalpam Language Families" (URL: <a href="https://calc.hypotheses.org/2751">calc.hypotheses.org/2751</a> in our CALC blog, which illustrates how data can be converted to our CLDF formats and nicely shows how accessible these formats are already by now. Here's the abstract:</p>
<blockquote>
<p>While searching for the topic of a small research project about the linguistic history of South America, I realized that a lot of data that is crucial for assessing central arguments is not openly available, but new data is difficult to come by these days. And when it is, it is not usually presented in data format that allows for easy reuse. Guided by these thoughts, I decided to turn towards the upcycling of previously published data</p>
</blockquote>
<p>Finally, a new paper appeared, titled 
"The uses and abuses of tree thinking in cultural evolution" (DOI: <a href="https://royalsocietypublishing.org/doi/10.1098/rstb.2020.0056">10.1098/rstb.2020.0056</a> by Cara L. Evans, Simon J. Greenhill, Joseph Wats, myself, Carlos A. Botero, Russell D. Gray, and Kathryn R. Kirby. Here is the abstract:</p>
<blockquote>
<p>Modern phylogenetic methods are increasingly being used to address questions about macro-level patterns in cultural evolution. These methods can illuminate the unobservable histories of cultural traits and identify the evolutionary drivers of trait change over time, but their application is not without pitfalls. Here, we outline the current scope of research in cultural tree thinking, highlighting a toolkit of best practices to navigate and avoid the pitfalls and ‘abuses' associated with their application. We emphasize two principles that support the appropriate application of phylogenetic methodologies in cross-cultural research: researchers should (1) draw on multiple lines of evidence when deciding if and which types of phylogenetic methods and models are suitable for their cross-cultural data, and (2) carefully consider how different cultural traits might have different evolutionary histories across space and time. When used appropriately phylogenetic methods can provide powerful insights into the processes of evolutionary change that have shaped the broad patterns of human history.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2021-05-17-news#2021-05-17-news"/><published>2021-05-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-05-19-paper</id><title>Manuscript Paper on Annotating Cognates in Phylogenetic Studies of South-East Asian Languages </title><updated>2021-05-19T12:00:00+00:00</updated><author><name>M.-S. Wu</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I submitted a manuscript to a journal. 
The manuscript introduces a new annotation framework which deal with compounding and derivation in Southeast Asian (SEA) languages. 
We compare words in 19 Chinese dialect varieties in order to determine the relationships between these languages. 
Since compounding is the primary strategy to enlarge SEA languages' lexicons, the morphemes of the compounds need to be taken into account, as well as the cross-linguistic relationships between these morphemes. 
In our study, we annotate the meanings and functions of the morphemes using a new annotation format. We also present four conversion methods to transform the annotation into formats from which the family 
can then be derived. We show that using different conversion methods will drastically change the trees’ topologies. 
In conclusion, we encourage linguists to consider also the relationships between morphemes rather than full words. </p>
<p>The author's version of the paper can be found <a href="https://doi.org/10.17613/0v48-aa64">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-05-19-paper#2021-05-19-paper"/><published>2021-05-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-06-09-news</id><title>News, News, News</title><updated>2021-06-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, a new post in our tutorial blog on Computer-Assisted Language Comparison in Practice appeared. In this blog, I presented initial ideas on <a href="https://calc.hypotheses.org/2782">How to Share Data and Code when Submitting Papers to a Journal</a>. </p>
<p>At the same time, we finally managed to publish version 2.5 of the <a href="https://concepticon.clld.org">CLLD Concepticon</a> which offers now 392 different concept lists which make up for a total of about 100 000 concept labels.</p></div></content><link href="https://calclab.org/?news=2021-06-09-news#2021-06-09-news"/><published>2021-06-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-06-14-blogpost</id><title>New Blog Post and Update on NoRaRe</title><updated>2021-06-14T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a new blog post, I present a multilingual concept list consisting of 28 body part concepts across 15 languages (e.g., English, German, Wolof, Vietnamese, Czech). The list was elicited in an urban fieldwork study for my master's thesis (<a href="https://doi.org/10.17613/j95n-c998">Tjuka 2019</a>). The blog post is available <a href="https://calc.hypotheses.org/2788">here</a>.</p>
<p>We also received good news for our NoRaRe article, which has been accepted for publication in Behavior Research Methods! The preprint is available <a href="https://psyarxiv.com/tgw3z/">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-06-14-blogpost#2021-06-14-blogpost"/><published>2021-06-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-06-16-blogpost</id><title>New Blog Post </title><updated>2021-06-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new German blogpost appeared, focusing on <a href="https://wub.hypotheses.org/1312">predatory journals and
paper mills</a> in science.</p></div></content><link href="https://calclab.org/?news=2021-06-16-blogpost#2021-06-16-blogpost"/><published>2021-06-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-06-21-lingrex</id><title>LingRex Released in Version 1.0.0 </title><updated>2021-06-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I released <a href="https://pypi.org/project/lingrex">LingRex</a> in version 1.0.0. While the earlier version, which was used in three recent papers (List 2019, Wu et al. 2020, and Bodt and List 2019), was never properly tested, the new version has a test coverage of 99% with respect to unit tests. This means we can expect more stability in the application of the major functions provided by the library. These consist in the code for correspondence pattern detection (List 2019), the search for cross-semantic cognates (Wu et al. 2020), a new template-based alignment method (Wu et al. 2020), and the prediction of words (Bodt and List 2019 and 2021). </p>
<p>In addition, our blog collection for all blogs published in 2020 in our Computer-Assisted Language Comparison in Progress (CALCiP) blog has now been published with Humanities Commons in a new design (see <a href="https://doi.org/10.17613/2qq4-y417">10.17613/2qq4-y417</a>). In the future, we hope to publish stable PDF versions of blog posts individually after their appearance, instead of publishing them only one time per year. </p></div></content><link href="https://calclab.org/?news=2021-06-21-lingrex#2021-06-21-lingrex"/><published>2021-06-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-06-28-habilitation</id><title>Habilitation </title><updated>2021-06-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, my habilitation thesis officially appeared online, it is titled "Computer-assisted approaches to historical language comparison" and features 12 articles which I wrote in the past years on the topic of "computer-assisted language comparison". It can be found online under the DOI <a href="https://doi.org/10.22032/dbt.49007">10.22032/dbt.49007</a>. </p></div></content><link href="https://calclab.org/?news=2021-06-28-habilitation#2021-06-28-habilitation"/><published>2021-06-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-07-06-blogpost</id><title>New Blog Post </title><updated>2021-07-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I wrote and published a new German blog post, which points to the problem that linguistis often think that language is a perfect tool for communication without any flaws. 
I argue that there are still many situations in which we have problems to express ourselves with our languages, and that language is not perfect to describe these. The post, titled "Worüber man nicht reden kann..." can be found <a href="https://wub.hypotheses.org/1327">here</a>. </p></div></content><link href="https://calclab.org/?news=2021-07-06-blogpost#2021-07-06-blogpost"/><published>2021-07-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-07-16-paper</id><title>New Preprint </title><updated>2021-07-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new preprint appeared, which was submitted to a new open access journal with post-publication peer review.
The study, together with Robert Forkel, proposes a new method for the automated detection of borrowings in multi-lingual wordlists (DOI: <a href="https://doi.org/10.12688/openreseurope.13843.1">10.12688/openreseurope.13843.1</a>).</p>
<blockquote>
<p>Although lexical borrowing is an important aspect of language evolution, there have been few attempts to automate the identification of borrowings in lexical datasets. Moreover, none of the solutions which have been proposed so far identify borrowings across multiple languages. This study proposes a new method for the task and tests it on a newly compiled large comparative dataset of 48 South-East Asian languages. The method yields very promising results, while it is conceptually straightforward and easy to apply. This makes the approach a perfect candidate for computer-assisted exploratory studies on lexical borrowing in contact areas.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2021-07-16-paper#2021-07-16-paper"/><published>2021-07-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-08-03-blogpost</id><title>---</title><updated>2021-08-03T12:00:00+00:00</updated><author><name>layout: post</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>title: New Blogpost on Data Sharing 
type: News</p>
<hr/>
<p>Yesterday, a new tutorial blogpost appeared which discusses how data should be shared when submitting a study to a journal for peer review. The blogpost, titled <a href="https://calc.hypotheses.org/2877">Transparent Data</a> emphasizes the importance of data transparency in order to make replication and reuse of data easy.</p></div></content><link href="https://calclab.org/?news=2021-08-03-blogpost#2021-08-03-blogpost"/><published>2021-08-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-08-05-preprint</id><title>New Preprint on Language Evolution </title><updated>2021-08-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just submitted a new preprint to Humanities Commons
(<a href="https://doi.org/10.17613/ebas-hj26">DOI</a>), titled "Evolutionary Aspects of
Language Evolution". The study is a contribution to a forthcoming anthology on
"Evolutionary Thinking Across Disciplines", ed. by Agathe du Crest, Martina
Valkovic, Philippe Huneman, and Thomas A. C. Reydon, which will appear in the
Synthese Library (Springer) some time next year. This is the abstract:</p>
<blockquote>
<p>While it has been known for a long time that human languages can change in various ways, it was only in the early 19th century that scholars realized that certain aspects of language change proceed in a surprisingly regular manner, allowing us to reconstruct historical stages of languages which have never been documented in written sources. The findings led to the establishment of historical linguistics as a scientific discipline, devoted to the investigation of how languages change and why. Although evolutionary thinking plays a major role in historical linguistics, practitioners often have the tendency to emphasize the peculiarities of language evolution rather than the commonalities with other kinds of evolution. In part, this seems to be justified by some phenomena for which it is difficult to find counterparts in different disciplines. In part, however, this may also due to a communication problem that is characteristic for interdisciplinary research, since scholars lack a common terminology. As a result, it is difficult for linguists to explain their particular evolutionary views on language change to practitioners from other disciplines, while evolutionary terminology from disciplines such as biology is difficult to grasp for linguists. In the study, I will try to present some important evolutionary aspects of language change for which it is hard to find counterparts in other disciplines and then point to current challenges of evolutionary studies in historical linguistics which have to deal with these aspects.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2021-08-05-preprint#2021-08-05-preprint"/><published>2021-08-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-08-06-paper</id><title>NoRaRe article published</title><updated>2021-08-06T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I'm happy to announce that our article "Linking norms, ratings, and relations of words and concepts across multiple language varieties" was published today in Behavior Research Methods. The article is open access and available <a href="https://doi.org/10.3758/s13428-021-01650-1">here</a>. In the article, we present the Database of Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts (NoRaRe). The data can be accessed via a <a href="http://digling.org/norare/">web interface</a> or <a href="https://github.com/concepticon/norare-data">GitHub</a>.</p></div></content><link href="https://calclab.org/?news=2021-08-06-paper#2021-08-06-paper"/><published>2021-08-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-08-11-blogpost</id><title>New beginner's guide for NoRaRe</title><updated>2021-08-11T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a new blog post, I describe how to add word lists to NoRaRe. The instructions are aimed at people new to Python who want to add their own dataset to the NoRaRe database (Tjuka et al <a href="https://doi.org/10.3758/s13428-021-01650-1">2021</a>). The post can be found <a href="https://calc.hypotheses.org/2890">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-08-11-blogpost#2021-08-11-blogpost"/><published>2021-08-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-08-16-newpaper</id><title>New Paper Accepted and New Blogpost </title><updated>2021-08-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After a relatively short period of open reviewes, our study presenting a new method for automatic borrowing detection with Robert Forkel was no formally accepted by Open Research Europe (see <a href="https://open-research-europe.ec.europa.eu/articles/1-79/v1">here</a> for the current draft). We will still have to revise the study a bit more, before the final version will appear, so the study itself is now listed as "forthcoming", but the final version will appear soon.</p>
<p>Furthermore, I published another German blog post, this time discussing those cases in scientific research, where I argue that it would be good if scientists did not push their research results too fast, but rather really consider twice, whether the results are sound and whether it is important to share them, providing examples where linguistic studies and studies in machine learning have not done, but rather insisted on publishing not only half-baked but also particularly problematic results. This German blog post can be found <a href="https://wub.hypotheses.org/1341">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-08-16-newpaper#2021-08-16-newpaper"/><published>2021-08-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-08-25-paper</id><title>Paper on Borrowing Detection Published </title><updated>2021-08-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our study presenting a new method for automatic borrowing detection with Robert Forkel was now published by Open Research Europe (see <a href="https://open-research-europe.ec.europa.eu/articles/1-79/v2">here</a>). The revision contains a very detailed investigation of thresholds used in automatic cognate detection approaches, which reveals that thresholds needed to infer borrowings differ quite substantially from thresholds needed to infer cognates language-family-internally. This justifies why we use two thresholds in our new approach. </p></div></content><link href="https://calclab.org/?news=2021-08-25-paper#2021-08-25-paper"/><published>2021-08-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-09-03-lexibank</id><title>Lexibank </title><updated>2021-09-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>After more than one year in which we were busily finalizing datasets and writing new applications for the analysis, we have finally managed to submit the paper presenting the Lexibank wordlist collection, and a preprint presenting the database is already available online from <a href="https://doi.org/10.21203/rs.3.rs-870835/v1">ResearchSquare</a>.  </p>
<blockquote>
<p>The past decades have seen substantial growth in digital data on the world's languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, the majority of published datasets lack standardization which makes their comparison difficult. Here, we present the first step to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that increase the FAIRness of linguistic data. We test the Lexibank workflow on a collection of 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.</p>
</blockquote>
<p>The data is also available now from GitHub (<a href="https://github.com/lexibank/lexibank-analysed">lexibank/lexibank-analysed</a>). </p></div></content><link href="https://calclab.org/?news=2021-09-03-lexibank#2021-09-03-lexibank"/><published>2021-09-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-09-05-evobib</id><title>EvoBib </title><updated>2021-09-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Saturday, a new version of EvoBib was released, now containing 4000 bibliographic references and more than 6000 different quotes from the literature. The web-interface (with additional information on the original data) can be found at <a href="https://digling.org/evobib/">https://digling.org/evobib/</a>.  </p></div></content><link href="https://calclab.org/?news=2021-09-05-evobib#2021-09-05-evobib"/><published>2021-09-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-09-08-keynote</id><title>Keynote Talk and New Blog Post </title><updated>2021-09-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post showing how data can be added to our Lexibank repository appeared. In this post, I show how data can be converted to Lexibank CLDF formats, using a recently published dataset on Vietic languages by Sidwell and Alwes as an example. The post can be found <a href="https://calc.hypotheses.org/2954">here</a>.</p>
<p>I also gave a keynote at this year's KONVENS conference in Düsseldorf in virtual form. Since I recorded the keynote before, an earlier version is also available online, which you can find <a href="https://share.eva.mpg.de/index.php/s/24rcJ4ZbJ7CJMrc">here</a>.   </p></div></content><link href="https://calclab.org/?news=2021-09-08-keynote#2021-09-08-keynote"/><published>2021-09-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-09-10-blogpost</id><title>New Blog Post </title><updated>2021-09-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new German blog post discussing how much we know about the parts of which words are composed has appeared online. The post can be found <a href="https://wub.hypotheses.org/1348">here</a>.</p></div></content><link href="https://calclab.org/?news=2021-09-10-blogpost#2021-09-10-blogpost"/><published>2021-09-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-09-11-preprint</id><title>New Preprint </title><updated>2021-09-11T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new preprint with Cormac Anderson, Tiago Tresoldi, Simon J. Greenhill, Robert Forkel, and Russell D. Gray, titled "Measuring variation in phoneme inventories" appeared online at Research Square (<a href="https://doi.org/10.21203/rs.3.rs-891645/v1">10.21203/rs.3.rs-891645/v1</a>). The study systematically compares phoneme inventories and how they are coded in different datasets. </p>
<blockquote>
<p>For over a century, the phoneme has played a central role in linguistic research. In recent years, collections of phoneme inventories, originally designed for cross-linguistic purposes, have increasingly been used in comparative studies involving neighbouring disciplines. Despite the extended application of this type of data, there has been no research into its comparability or tests of its reliability. In this study, we carry out a systematic comparison of four popular phoneme inventory collections. We render them comparable by linking them to standardised formats for the handling of cross-linguistic datasets and develop new measures to test both size and similarity. We find considerable differences in inventories supposedly representing the same language variety, both in terms of size and transcriptional choices. While some of these differences appear to be predic, reflecting design decisions in the different collections, much of the observed variation is unsystematic. These results should sound a note of caution for comparative studies based on phoneme inventories, which we suggest need to take the question of comparability more seriously. We make a number of proposals for improving the comparability of phoneme inventories.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2021-09-11-preprint#2021-09-11-preprint"/><published>2021-09-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-10-06-blogposts</id><title>New Blog Posts </title><updated>2021-10-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, two new blog posts were published, one English blog post, discussing <a href="https://calc.hypotheses.org/2970">How to write a term paper in linguistics</a>, and one German blog post, discussing the <a href="https://wub.hypotheses.org/1356">calculability of data</a>.    </p></div></content><link href="https://calclab.org/?news=2021-10-06-blogposts#2021-10-06-blogposts"/><published>2021-10-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-10-08-paper</id><title>New Paper </title><updated>2021-10-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already on October 4, a new paper appeared (published online before print), in which I had the chance to contribute. This review by Joshua C. Jackson, Joseph Watts, myself, Curtis Puryear, Ryan Drabble, and Kristan A. Lindquvist, discusses <a href="https://doi.org/10.1177/17456916211004899">From Text to Thought: How Analyzing Language Can Advance Psychological Science</a>. </p>
<blockquote>
<p>Humans have been using language for millennia but have only just begun to scratch the surface of what natural language can reveal about the mind. Here we propose that language offers a unique window into psychology. After briefly summarizing the legacy of language analyses in psychological science, we show how methodological advances have made these analyses more feasible and insightful than ever before. In particular, we describe how two forms of language analysis—natural-language processing and comparative linguistics—are contributing to how we understand topics as diverse as emotion, creativity, and religion and overcoming obstacles related to statistical power and culturally diverse samples. We summarize resources for learning both of these methods and highlight the best way to combine language analysis with more traditional psychological paradigms. Applying language analysis to large-scale and cross-cultural datasets promises to provide major breakthroughs in psychological science.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2021-10-08-paper#2021-10-08-paper"/><published>2021-10-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-11-01-blogpost</id><title>New Blog Post</title><updated>2021-11-01T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a new blog post, I introduce a list of 192 concepts across the semantic domains of color, emotion, and human body. The post can be found <a href="https://calc.hypotheses.org/3023">here</a>. The concept list is available on Zenodo: <a href="https://doi.org/10.5281/zenodo.5549847">https://doi.org/10.5281/zenodo.5549847</a>.</p></div></content><link href="https://calclab.org/?news=2021-11-01-blogpost#2021-11-01-blogpost"/><published>2021-11-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-11-02-paperpost</id><title>New Paper and New Blog post </title><updated>2021-11-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, we were informed that our paper presenting "A digital, retro-standardized edition of the Tableaux Phonétiques des Patois Suisses Romands (TPPSR)", by Hans Geisler, Robert Forkel, and myself, has finally appeared in print. The study presents an online edition of the TPPSR, a dialect atlas of the Suisse romande, collected in the early 20th century, which has already been published online at <a href="https://tppsr.clld.org">https://tppsr.clld.org</a>. The study presenting this database itself will mainly appear in print, but for now, offprints are also available <a href="https://www.dropbox.com/s/f37i6bciun5ns6d/H.Geisler_R.Forkel_J.M.List.pdf?dl=1">here</a>. </p>
<p>Additionally, a new German blog post discussing ghost writers, predatory journals, and agencies searching for ghost writers, has appeared. The post, titled "Von schreibenden Geistern und vertretenen Stellen" can be found at <a href="https://wub.hypotheses.org/1370">https://wub.hypotheses.org/1370</a>.</p></div></content><link href="https://calclab.org/?news=2021-11-02-paperpost#2021-11-02-paperpost"/><published>2021-11-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-11-24-blogpost</id><title>New blog post related to WoW conference presentation</title><updated>2021-11-24T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a new blog post, I describe how to compare NoRaRe data sets in R. The post is based on a study investigating arousal and valence ratings in English, Dutch, and Spanish, which will be present at the [WoW Conference] (https://wordsintheworld.ca/wow-conference/) on Saturday.  </p>
<p>The post can be found <a href="https://calc.hypotheses.org/3109">here</a>. The presentation slides are available here: <a href="https://pad.gwdg.de/p/KLgI9TLrP#/">https://pad.gwdg.de/p/KLgI9TLrP#/</a>.</p></div></content><link href="https://calclab.org/?news=2021-11-24-blogpost#2021-11-24-blogpost"/><published>2021-11-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-12-06-blogpost</id><title>New Blog post </title><updated>2021-12-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published a new blog post in German, discussing open science principles and how to evaluate studies when reviews have been submitted along with them. You can find the post at <a href="https://wub.hypotheses.org/1379">https://wub.hypotheses.org/1379</a>. </p></div></content><link href="https://calclab.org/?news=2021-12-06-blogpost#2021-12-06-blogpost"/><published>2021-12-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-12-16-newpaper</id><title>New paper on semantic relations in word formation, borrowing, and semantic change</title><updated>2021-12-16T12:00:00+00:00</updated><author><name>N. E. Schweikhard</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, I submitted a paper for review on using computer-assisted approaches for studying semantic aspects of language change. In it, I investigate the etymologies of 480 German nouns of basic vocabulary. Findings include the various factors that contribute to the choice of the semantic relation utilized in coining new meanings (like part of speech, semantic field, and morphological aspects), and potential ways of improving semantic reconstruction. You can find the preprint at <a href="https://doi.org/10.17613/03dk-tk62">DOI:10.17613/03dk-tk62</a>.</p></div></content><link href="https://calclab.org/?news=2021-12-16-newpaper#2021-12-16-newpaper"/><published>2021-12-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2021-12-17-tiger</id><title>New Accepted Paper </title><updated>2021-12-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I shared the author's version of a newly accepted paper called "Correcting a bias in TIGER rates resulting from high amounts of invariant and singleton cognate sets".</p>
<blockquote>
<p>In a recent issue of the Journal of Language Evolution, Syrjänen et al. (2021) investigate the suitability of computing Cummins and McInerney’s (2011) TIGER rates for estimating the tree-likeness of linguistic datasets compiled for phylogenetic reconstruction. The authors test the TIGER rates on a diverse sample of simulated data, which by and large confirms the usefulness of TIGER rates as an analytic tool for investigating linguistic data, but they test them only on one real-world dataset of Uralic languages which turns out to behave quite differently from the simulated data. When testing the TIGER rates on additional datasets, I detected a bias in the computation which leads to an unnatural increase in those cases where a dataset contains many characters with invariant or singleton states. To overcome this problem, I suggest a modified variant of TIGER rates, which is provided in the form of a freely available Python package. Testing the modified TIGER scores on the simulated data of Syrjänen et al. shows that the corrected TIGER rates still readily distinguish between different degrees of tree-likeness. Testing them on a dataset in which the number of singletons and invariants was artificially increased further shows that the corrected TIGER rates are not influenced by the bias. A final tests on seven linguistic datasets shows the usefulness of the corrected TIGER rates on a larger variety of linguistic datasets and illustrates the importance to take specific aspects of linguistic data into account when using biological methods in the domain of language evolution.</p>
</blockquote>
<p>The paper is accompanied by a small Python package that computes the new TIGER rates (https://pypi.org/project/pylotiger) and can be accessed from Humanities Commons (<a href="https://doi.org/10.17613/0n1n-3352">DOI: 10.17613/0n1n-3352</a>).</p></div></content><link href="https://calclab.org/?news=2021-12-17-tiger#2021-12-17-tiger"/><published>2021-12-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-04-calcip</id><title>CALC Blog Posts for 2021</title><updated>2022-01-04T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We published Volume 4 of <em>Computer-Assisted Language Comparison in Practice</em>. The volume contains PDF versions of all contributions published on the [CALC blog] (https://calc.hypotheses.org) in 2021 and is available on [Humanities Commons] (https://hcommons.org) at <a href="https://doi.org/10.17613/a0ew-0n98">https://doi.org/10.17613/a0ew-0n98</a>.</p></div></content><link href="https://calclab.org/?news=2022-01-04-calcip#2022-01-04-calcip"/><published>2022-01-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-05-news</id><title>New Paper Appeared </title><updated>2022-01-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a study on the managing of data in historical linguistics with the goal of reconstructing language phylogenies from lexical data appeared in print (DOI: <a href="https://doi.org/10.7551/mitpress/12200.003.0033">10.7551/mitpress/12200.003.0033</a>). This study was collaborative work with Tiago Tresoldi, former post-doc in our CALC project, Christoph Rzymski, Robert Forkel, Simon J. Greenhill, and Russell D. Gray. </p></div></content><link href="https://calclab.org/?news=2022-01-05-news#2022-01-05-news"/><published>2022-01-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-07-calc3</id><title>Beyond CALC </title><updated>2022-01-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Thanks to the generous funding by the MPG, the CALC project will be continued from April 2022 until March 2024. Under the title "Beyond CALC: Computer-Assisted Approaches to Human Prehistory, Linguistic Typology, and Human Cognition. (CALC³)", we will continue and expand our work on computer-assisted language comparison. Since our doctoral students have not yet finished their PhD, there won't be an abrupt change in our group but rather a transition with some people leaving us and other people joining us. More detailed news on the new project will be shared later this year.</p></div></content><link href="https://calclab.org/?news=2022-01-07-calc3#2022-01-07-calc3"/><published>2022-01-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-10-pysem</id><title>New Blog Post on PySem </title><updated>2022-01-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post appeared in our series of blog posts on Computer-Assisted Language Comparison, this time introducing <a href="https://calc.hypotheses.org/3193">How to Map Concepts with the PySem Library</a>.</p></div></content><link href="https://calclab.org/?news=2022-01-10-pysem#2022-01-10-pysem"/><published>2022-01-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-17-bp</id><title>New Blog Post on the Concept "Shadow" </title><updated>2022-01-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post appeared in my German blog, this time devoted to the concept "shadow" and the extended meanings it can take in German and other languages. This post, titled "Von der Ambivalenz des Schattens" can be found online at <a href="https://wub.hypotheses.org/1406">https://wub.hypotheses.org/1406</a>.</p></div></content><link href="https://calclab.org/?news=2022-01-17-bp#2022-01-17-bp"/><published>2022-01-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-21-tiger</id><title>New Paper on TIGER Rates </title><updated>2022-01-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>TIGER rates are an interesting way to assess the tree-likeness of a dataset, originally proposed by <a href="https://doi.org/10.1093/sysbio/syr064">Cummins and McInerney in 2011</a> and now also discussed for their suitability to be applied to linguistic data by <a href="https://doi.org/10.1093/jole/lzab004">Syrjänen et al. 2021</a>. When reading both articles, I felt that there was something odd with the TIGER rates, and I found the reason in the handling of singletons and invariants. As a result, I wrote a small library in Python that computes both corrected and original TIGER rates and I also wrote a comment to the original article by Syrjänen et al. to illustrate the usefulness of the extended (or correctd) rates. The article has now been published under the title "Correcting a bias in TIGER rates resulting from high amounts of invariant and singleton cognate sets" (<a href="https://doi.org/10.1093/jole/lzab007">DOI: 10.1093/jole/lzab007</a>).</p></div></content><link href="https://calclab.org/?news=2022-01-21-tiger#2022-01-21-tiger"/><published>2022-01-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-01-23-preprint</id><title>New Preprint on Contact Layer Detection </title><updated>2022-01-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Friday last week, a new study submitted to Open Research Europe appeared as preprint. The study titled "First steps towards the detection of contact layers in Bangime: a multi-disciplinary, computer-assisted appraoch" by Abbie Hantgan, Hiba Babiker, and myself, is now published as a preprint, waiting for open peer review on the Open Research Europe platform (<a href="https://doi.org/10.12688/openreseurope.14339.1">DOI: 10.12688/openreseurope.14339.1</a>).</p></div></content><link href="https://calclab.org/?news=2022-01-23-preprint#2022-01-23-preprint"/><published>2022-01-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-02-16-blogpost</id><title>New German Blog Post on Learning </title><updated>2022-02-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published my monthly German blogpost, this time discussing how we learn things and forget the difficulties when doing so. The post, titled "Über das Vergessen der Einstiegshürden" can be found <a href="https://wub.hypotheses.org/1418">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-02-16-blogpost#2022-02-16-blogpost"/><published>2022-02-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-02-28-blogpost</id><title>New Blog Post on Extended Concept List</title><updated>2022-02-28T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a recent <a href="https://calc.hypotheses.org/3023">blog post</a>, I introduced a list of color, emotion, and human body part concepts. An updated version of the list is now available that includes 28 additional emotion concepts. The blog post presenting the extended list can be found <a href="https://calc.hypotheses.org/3913">here</a>. The concept list is available on Zenodo: <a href="https://doi.org/10.5281/zenodo.6226423">https://doi.org/10.5281/zenodo.6226423</a>.</p></div></content><link href="https://calclab.org/?news=2022-02-28-blogpost#2022-02-28-blogpost"/><published>2022-02-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-03-07-wordle</id><title>New German Blog Post on Playing Wordle </title><updated>2022-03-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published my monthly German blogpost, this time discussing how the popular Wordle game requires different strategies when playing it in different languages. You can find the post <a href="https://wub.hypotheses.org/1443">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-03-07-wordle#2022-03-07-wordle"/><published>2022-03-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-03-08-animation</id><title>New English Blog Post on the Wagner-Fischer Algorithm</title><updated>2022-03-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, another blogpost of mine appeared, this time in our tutorial blog for computer-assisted language comparison, presenting an animated version of the Wagner-Fischer algorithm. The blogpost can be found <a href="https://calc.hypotheses.org/3265">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-03-08-animation#2022-03-08-animation"/><published>2022-03-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-03-18-produsemy</id><title>ERC Consolidator Grant</title><updated>2022-03-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, the ERC <a href="https://t.co/6elLFwyaFl">offially announced</a> the winners of the Consolidator Grant applications from 2021. I am very proud that my project ProduSemy -- Productive Signs. A Computer-Assisted Analysis of Evolutionary,
Typological, and Cognitive Dimensions of Word Families -- was among the projects that were selected for funding (see the announcement of the project in our institute <a href="https://www.eva.mpg.de/press/news/article/studying-the-evolution-the-distribution-and-the-psychology-of-word-families/">here</a>). The abstract of the project is given below: </p>
<blockquote>
<p>All human languages have simple and complex words. Simple words refer to meanings regardless of their form,
while complex words are formed from other words, and their formation can be semantically motivated. Since
words can share lexical material, we can group them into families. Word families can vary greatly in size, ranging from small ones – comprising only a few members –, to large ones – spanning several hundred words –,
but it is still unclear why some words are more productive than others in forming new words. Lexical composi-
tionality has received some attention in historical linguistics, linguistic typology, and cognitive linguistics, but
so far studies have mostly concentrated on the morphological complexity of individual words and languages,
while the fact that words form families which interact during language change and language use has been typically ignored. As a result, many questions regarding word family formation remain unresolved, and we do not
know (1) how word families evolve along language phylogenies, (2) which semantic processes underlying word
family formation are universal, and (3) to what extent human cognition influences the productivity of lexical
roots to form families. The project will tackle these three target questions by unifying evolutionary, typological,
and cognitive insights into lexical compositionality. Building on a computer-assisted framework that reconciles
classical and computational approaches in historical linguistics and linguistic typology, the project will design
new models to standardize cross-linguistic data on word families, apply them to integrate data from historical
linguistics, linguistic typology, and cognitive linguistics, and develop new methods for the computer-assisted inference of word families, their underlying motivation patterns, and their evolutionary histories in large datasets.
In this way, the project will deepen the integration of cross-linguistic studies in cognitive and psychological
sciences.</p>
</blockquote>
<p>With a project start planned for October, and the CALC³ project starting already in April, the CALC lab, which lost some of its members recently, due to the end of the ERC Starting Grant that funded the first five years of the project, will welcome new members in the future and pursue the research on computer-assisted language comparison, this time with a specific focus on the compositionality of the human lexicon.</p></div></content><link href="https://calclab.org/?news=2022-03-18-produsemy#2022-03-18-produsemy"/><published>2022-03-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-03-30-position</id><title>Student Assistant Position in the CALC³ Project</title><updated>2022-03-30T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We invite applications for a position as a student assistant in our CALC³ project. The student assistant will be preparing data for computer-assisted studies on historical and lexical language comparison.</p>
<p>Details and the application form can be found on the MPI website: https://www.eva.mpg.de/career/positions-available/job/535/Abteilung%20Sprach-%20und%20Kulturevolution/en/ </p></div></content><link href="https://calclab.org/?news=2022-03-30-position#2022-03-30-position"/><published>2022-03-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-04-04-blogpost</id><title>New Blog Post on Body and Object Concept List</title><updated>2022-04-04T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a new blog post, I introduce a list of body and object concepts. The concept list is the basis for an ongoing study and consists of 784 concepts divided into two groups: 134 body and 650 object concepts. The blog post can be found <a href="https://calc.hypotheses.org/3840">here</a>. The concept list is available on Zenodo: <a href="https://doi.org/10.5281/zenodo.6365495">https://doi.org/10.5281/zenodo.6365495</a>.</p></div></content><link href="https://calclab.org/?news=2022-04-04-blogpost#2022-04-04-blogpost"/><published>2022-04-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-04-07-lingrex</id><title>LingRex </title><updated>2022-04-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new version of LingRex (https://pypi.org/project/lingrex, together
with Robert Forkel) was published, version 1.2, which contains not only
bugfixes to our code on borrowing detection, but also new code that can be used
for automated word prediction or phonological reconstruction in a supervised
fashion. </p></div></content><link href="https://calclab.org/?news=2022-04-07-lingrex#2022-04-07-lingrex"/><published>2022-04-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-04-12-reconstruction</id><title>New Paper on Phonological Reconstruction </title><updated>2022-04-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper was accepted, together with Nathan W. Hill and Robert Forkel, titled "A new framework for fast automated phonological reconstruction using trimmed alignments and sound correespondence patterns". It will appear in the proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change, co-located with the ACL 2022 meeting in Dublin. In this study, we present a new framework for supervised phonological reconstruction which is quite simple and also fast and thus perfect as a baseline to be compared with more complex methods. The preprint of the study can be found <a href="https://arxiv.org/abs/2204.04619">online</a> now.</p>
<blockquote>
<p>Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts, there are not many easy-to-use and fast approaches for the task of phonological reconstruction. Here we present a new framework that combines state-of-the-art techniques for automated sequence comparison with novel techniques for phonetic alignment analysis and sound correspondence pattern detection to allow for the supervised reconstruction of word forms in ancestral languages. We test the method on a new dataset covering six groups from three different language families. The results show that our method yields promising results while at the same time being not only fast but also easy to apply and expand. </p>
</blockquote></div></content><link href="https://calclab.org/?news=2022-04-12-reconstruction#2022-04-12-reconstruction"/><published>2022-04-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-04-22-news</id><title>New Paper and New Preprint </title><updated>2022-04-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper has just been officially published, after having been officially endorsed by two open reviews. Common work with Abbie Hantgan and Hiba Babiker, this study sheds light on potential contact relations of the language isolate Bangime, spoken in Mali (full paper can be found <a href="https://open-research-europe.ec.europa.eu/articles/2-10/v2">here</a>). </p>
<blockquote>
<p>Bangime is a language isolate spoken among the Dogon, Mande, Atlantic, and Songhai language families in Central-Eastern Mali. Despite Dogon disapproval, the speakers of Bangime, the Bangande, claim an ethnic identity with the Dogon. The Bangande are geographically isolated and current genetic research denoted their genetic disparity. However, here we show evidence of shared vocabulary among the Bangime and neighboring language groups. We investigate the layers of contact using a computer-assisted, multidisciplinary approach in a series of steps. We use lexical automated comparisons taking into account the qualitative and quantitative measures and the correction of the findings. Within archeological and historical contexts from Central-Eastern Mali, our results show that the Bangime language was spoken before the Dogon Expansion in the Escarpment 1400c. AD. This work represents a great mark in computational linguistics for the study of language isolates and the paradox of their history.</p>
</blockquote>
<p>Additionally, a new preprint is now available, a review of computational approaches to historical language comparison, which was submitted for the inclusion in the 2nd edition of the Routledge Handbook of Historical Linguistics. The preprint can be found <a href="https://doi.org/10.17613/8nya-dn09">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-04-22-news#2022-04-22-news"/><published>2022-04-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-04-28-NoRaRe</id><title>NoRaRe Article published in Print</title><updated>2022-04-28T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The article presenting the Database of Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts (NoRaRe) has finally appeared in print. In the article, we introduce an openly curated cross-linguistic database for studies in psychology and linguistics. NoRaRe (v0.2) currently contains 65 unique word and concept properties drawn from 98 different datasets in 40 languages. The article was first published online in the journal <em>Behavior Research Methods</em> in 2021. The citation for the printed version is:</p>
<p>Tjuka, Annika, Robert Forkel, and Johann-Mattis List. 2022. Linking norms, ratings, and relations of words and concepts across multiple language varieties. <em>Behavior Research Methods</em> 54, 864–884. <a href="https://doi.org/10.3758/s13428-021-01650-1">https://doi.org/10.3758/s13428-021-01650-1</a></p></div></content><link href="https://calclab.org/?news=2022-04-28-NoRaRe#2022-04-28-NoRaRe"/><published>2022-04-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-05-09-blogpost</id><title>Blog Post Style Guide for Future Contributions</title><updated>2022-05-09T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In a new blog post, I introduce a style guide for contributions on our <a href="https://calc.hypotheses.org/">Computer-Assisted Language Comparison in Practice</a> blog. We hope that the post will help our colleagues, not only those who work in our research group and department, but also external collaborators and scholars who would like to share their ideas. The post is available here: <a href="https://calc.hypotheses.org/4084">https://calc.hypotheses.org/4084</a>.</p></div></content><link href="https://calclab.org/?news=2022-05-09-blogpost#2022-05-09-blogpost"/><published>2022-05-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-05-10-news</id><title>New Blogpost and Paper Accepted</title><updated>2022-05-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new German blog post has just been published online, this time discussing publishing and discussing in scientific research (you can find the post <a href="https://wub.hypotheses.org/1521">here</a>). </p>
<p>Furthermore, our paper introducing the <a href="https://github.com/lexibank/lexibank-analysed">Lexibank</a> repository has now been accepted with Scientific Data, and we are currently revising it for the final publication.</p></div></content><link href="https://calclab.org/?news=2022-05-10-news#2022-05-10-news"/><published>2022-05-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-05-30-newpaper</id><title>New Paper and Update to PySEM</title><updated>2022-05-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>With the publication of Concepticon 2.6.0 (https://concepticon.clld.org) during the last week (together with Annika Tjuka and Robert Forkel as main collaborators on this project) the PySEM package (https://pypi.org/project/pysem) has now also been updated to version 0.5, which contains the data from the most recent Concepticon version.</p>
<p>In addition, our paper on supervised phonological reconstruction has now appeared. This study, common work with Nathan Hill and Robert Forkel, offers a new straightforward framework for phonological reconstruction and word prediction, which can serve as a fast baseline for future studies devoted to the task. This study can be found <a href="https://aclanthology.org/2022.lchange-1.9/">here</a>.</p>
<blockquote>
<p>Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts, there are not many easy-to-use and fast approaches for the task of phonological reconstruction. Here we present a new framework that combines state-of-the-art techniques for automated sequence comparison with novel techniques for phonetic alignment analysis and sound correspondence pattern detection to allow for the supervised reconstruction of word forms in ancestral languages. We test the method on a new dataset covering six groups from three different language families. The results show that our method yields promising results while at the same time being not only fast but also easy to apply and expand.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2022-05-30-newpaper#2022-05-30-newpaper"/><published>2022-05-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-06-03-blum</id><title>New Member of our CALC³ Group</title><updated>2022-06-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two days ago, Frederid Blum joined us as a doctoral student in our CALC³ project, funded by the Max Planck Society. Frederic will apply computer-assisted methods to study the history of the Pano-Tacanan language family in South America. We are very happy that Frederic joined our group and look forward to a fruitful collaboration in the future. </p></div></content><link href="https://calclab.org/?news=2022-06-03-blum#2022-06-03-blum"/><published>2022-06-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-06-13-newblog</id><title>New German Blog Post and New Preprint</title><updated>2022-06-13T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my monthly German blog post appeared, which discusses this time "gray zones" in scientific practice. You can find the post <a href="https://wub.hypotheses.org/1539">here</a>.</p>
<p>Additionally, a new preprint by Hans Geisler and myself was just deposited online. It discusses metaphors about language history, both in the past, in the now, and in the future. This preprint, currently under review, can be found <a href="https://doi.org/https://doi.org/10.17613/e5zx-1852">here</a>.</p>
<blockquote>
<p>For a long time, metaphors have played an important role in depicting language history. In this study, we contrast early metaphors on language history, such as the family tree or the wave model, with recent metaphors that were popularized after the quantitative turn, such as forests of trees or phylogenetic networks. Speculating about metaphors which could play a more important role in the future, we conclude that a vivid discussion about the usefulness and the concrete implications of metaphors plays an important role for the development of models for language history in historical linguistics.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2022-06-13-newblog#2022-06-13-newblog"/><published>2022-06-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-06-17-lexibank</id><title>New Paper on Lexibank Appeared </title><updated>2022-06-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, our paper presenting the Lexibank repository (with Robert Forkel, Simon J. Greenhill, Christoph Rzymski, Johannes English, and Russell. D. Gray) finally appeared online, after almost 8 years of work on the topic. The paper can be found <a href="https://www.nature.com/articles/s41597-022-01432-0">here</a>.</p>
<blockquote>
<p>The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.</p>
</blockquote>
<p>There is also a press release by our institute in <a href="https://www.eva.mpg.de/press/news/article/shedding-light-on-linguistic-diversity-and-its-evolution/">English</a> and <a href="https://www.eva.mpg.de/de/presse/aktuelles/artikel/shedding-light-on-linguistic-diversity-and-its-evolution/">German</a>.</p></div></content><link href="https://calclab.org/?news=2022-06-17-lexibank#2022-06-17-lexibank"/><published>2022-06-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-06-28-sigtyp</id><title>New Paper on Shared Task and New Accepted Paper </title><updated>2022-06-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, our paper describing the <a href="https://github.com/sigtyp/ST2022">SIGTYP 2022 Shared Task on Reflex Prediction</a> appeared online and can be found <a href="https://sigtyp.github.io/workshops/2022/sigtyp/papers/SIGTYP2022_proceedings.pdf#page=64">here</a>. Furthermore, our preprint titled "Annotating cognates in phylogenetic studies of South-East Asian languages" with Mei-Shin Wu was now accepted with Language Dynamics and Change. We already shared our final authors' copy with Humanities Commons, and you can find it online <a href="https://doi.org/10.17613/3n9j-y345">here</a>.</p></div></content><link href="https://calclab.org/?news=2022-06-28-sigtyp#2022-06-28-sigtyp"/><published>2022-06-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-07-15-sigtyp</id><title>New Members in the CALC Team </title><updated>2022-07-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>There are new members in our CALC team by now. John Miller, doctoral student in Lima, with whom we were collaborating in the past already, now joined us as an external associate. In addition, Mathilda van Zantwijk and Carlos Barrientos Ugarte have now joined us as student assistants. We welcome all new members to the CALC group and hope that we will fruitfully collaborate.</p>
<p>In addition, two new papers appeared in this week. One study (in Spanish), titled "The languages of the Gran Chaco from the perspective of lexical semantics" by Nicolás Brid, in collaboration with Cristina Messineo and myself, appeared in LIAMES and can be accessed <a href="https://doi.org/10.20396/liames.v22i00.8669038">here</a>. Another study, the paper presenting our shared task on cognate reflex prediction, common work with Ekaterina Vylomova, Nathan W. Hill, Robert Forkel, and Ryan Cotterell, has now also officially appeared and can be accessed from the <a href="https://aclanthology.org/2022.sigtyp-1.7.pdf">ACL Website</a>.  </p></div></content><link href="https://calclab.org/?news=2022-07-15-sigtyp#2022-07-15-sigtyp"/><published>2022-07-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-07-18-blogpost</id><title>New Blog Post on Colexification Networks </title><updated>2022-07-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post in our tutorial blog on Computer-Assisted Language Comparison appeared.
This post illustrates how colexification networks can be reconstructed with the help 
of the <a href="https://pypi.org/project/cltoolkit">CL Toolkit</a> package. The post can be found <a href="https://calc.hypotheses.org/4311">here</a>. In a follow-up post in August, I will show how the networks can be visualized interactively.</p></div></content><link href="https://calclab.org/?news=2022-07-18-blogpost#2022-07-18-blogpost"/><published>2022-07-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-07-20-blogpost</id><title>New German Blog Post </title><updated>2022-07-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post in my German blog appeared, this time discussing the disappointment one can experience when getting insights into the real processes that happend "behind the stage". The post, titled "Von Einblicken in Sterneküchen" can be found <a href="https://wub.hypotheses.org/1569">here</a>.</p></div></content><link href="https://calclab.org/?news=2022-07-20-blogpost#2022-07-20-blogpost"/><published>2022-07-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-07-22-preprint</id><title>New Paper Presenting Database of Gran Chaco Languages </title><updated>2022-07-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new preprint appeared with Open Research Europe (common work with Nicolás Brid and Cristina Messineo), presenting our data base of the languages from the Gran Chaco area. The preprint can be found <a href="https://open-research-europe.ec.europa.eu/articles/2-90">here</a> and will be peer reviewed openly.</p>
<blockquote>
<p>Home to more than twenty indigenous languages belonging to six linguistic families, the Gran Chaco has raised the interest of many linguists from different backgrounds. While some have focused on finding deeper genetic relations between different language groups, others have looked into similarities from the perspective of areal linguistics. In order to contribute to further research of areal and genetic features among these languages, we have compiled a comparative wordlist consisting of translational equivalents for 326 concepts — representing basic and ethnobiological vocabulary — for 26 language varieties. Since the data were standardized in various ways, they can be analyzed both quantitatively and qualitatively. In order to illustrate this in detail, we have carried out an initial computer-assisted analysis of parts of the data by searching for shared lexicosemantic patterns resulting from structural rather than direct borrowings.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2022-07-22-preprint#2022-07-22-preprint"/><published>2022-07-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-08-06-newpaper</id><title>New Paper on the History of Uto-Aztecan Languages </title><updated>2022-08-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I am very glad to announce that a study on Uto-Aztecan languages, led by Simon J. Greenhill and Hannah Haynie (with Robert Ross, Angela Chira, Lyle Campbell, Carlos Boter, Russell D. Gray, and myself) has now been accepted for publication in Language. The study will appear officially in 2023, but a preprint has now already been shared online, which can be accessed <a href="https://osf.io/preprints/socarxiv/k598j/">here</a>.</p>
<blockquote>
<p>The Uto-Aztecan language family is one of the largest language families in the Americas. However, there has been considerable debate about its origin and how it spread. Here we use Bayesian phylogenetic methods to analyze lexical data from 34 Uto-Aztecan varieties and 2 Kiowa-Tanoan languages. We infer the age of Proto-Uto-Aztecan to be around 4,100 years ago (3,258 - 5,025 years), and identify the most likely homeland to be near what is now southern California. We reconstruct the most probable subsistence strategy in the ancestral Uto-Aztecan society and infer no casual or intensive cultivation, an absence of cereal crops, and a primary subsistence mode of gathering (rather than agriculture). Our results therefore support the timing, geography, and cultural practices of a northern origin, and are inconsistent with alternative scenarios.</p>
</blockquote>
<p>My own work in this study consisted in the design of specific methods that help to evaluate to which degree the manually annotated cognate sets would differ from automatically computed ones. Given that such an evaluation has not been done so far in such detail, I hope that we can apply this method in other cases in the future.</p></div></content><link href="https://calclab.org/?news=2022-08-06-newpaper#2022-08-06-newpaper"/><published>2022-08-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-08-24-blogposts</id><title>Two Blog Posts in August </title><updated>2022-08-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two blog posts appeared in this week, a German blog post discussing etymologies, which you can find <a href="https://wub.hypothess.org/1591">here</a> and a post in our tutorial blog on CALC, which concludes a mini-series of three blog posts devoted to the creation and analysis of colexification networks. This post shows how colexification networks can be visualized and can be found <a href="https://calc.hypotheses.org/4351">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-08-24-blogposts#2022-08-24-blogposts"/><published>2022-08-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-09-28-blogposts</id><title>Two Blog Posts in September </title><updated>2022-09-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two blog posts appeared in this week, a German blog post discussing different goals of scientific research, which you can find <a href="https://wub.hypotheses.org/1622">here</a> and a post in our tutorial blog on CALC, in which I present the <a href="https://pypi.org/project/pyedictor">PyEDICTOR</a> tool, which you can find <a href="https://calc.hypotheses.org/4432">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-09-28-blogposts#2022-09-28-blogposts"/><published>2022-09-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-10-07-welcome</id><title>Welcoming Viktor Martinovic in our group </title><updated>2022-10-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>With the beginning of this week, Viktor Martinovic joined our team. He is a PhD student in his final year from Vienna and will collaborate with our group on methods for the handling of lexical borrowing, concentrating specifically of ancient borrowings in a rule-based paradigm. Viktor is generously funded by the Department of Linguistic and Cultural Evolution and associated with the CALC³ group.</p></div></content><link href="https://calclab.org/?news=2022-10-07-welcome#2022-10-07-welcome"/><published>2022-10-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-10-16-kindersprache</id><title>New Blog Post and New Accepted Paper </title><updated>2022-10-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my monthly German blog post appeared, this time discussing certain similarities between the age by which children acquire the ability to pronounce certain sounds and the patterns by which sounds change in a language over time. This blog post can be found <a href="https://wub.hypotheses.org/1640">here</a>. </p>
<p>Additionally, a review paper by Hans Geisler and myself, which we submitted earlier this year to the journal <em>Moderna</em>, has now been accepted for publication and will appear some time in the next year. A preprint of the study, titled "Of word families and language trees: New and old metaphors in studies on language history" and can be found <a href="https://doi.org/10.17613/e5zx-1852">here</a>. </p></div></content><link href="https://calclab.org/?news=2022-10-16-kindersprache#2022-10-16-kindersprache"/><published>2022-10-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-10-31-lexibank</id><title>New Blog Post on Querying Data from Lexibank </title><updated>2022-10-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, the October blog post for our CALC blog appeared, this time concentrating on querying datasets with cognates from the Lexibank repository. The blog post can be found <a href="https://calc.hypotheses.org/4872">here</a>.</p></div></content><link href="https://calclab.org/?news=2022-10-31-lexibank#2022-10-31-lexibank"/><published>2022-10-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-11-14-releases</id><title>Major Release of Concepticon 3.0 and NoRaRe 1.0</title><updated>2022-11-14T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We released Concepticon Version 3.0 (<a href="https://github.com/concepticon/concepticon-data/releases/tag/v3.0.0">List et al. 2022</a>) and NoRaRe Version 1.0 (<a href="https://github.com/concepticon/norare-data/releases/tag/v1.0.1">Tjuka et al. 2022b</a>). At this point, Concepticon includes 413 concept lists with 41 mapping languages and 3914 concept sets. NoRaRe contains 113 datasets with 75 word properties across 39 languages. With the major releases new data were added to both resources and they were published as <a href="https://cldf.clld.org">CLDF datasets</a> here: <a href="https://github.com/concepticon/concepticon-cldf/releases">concepticon-cldf</a> and <a href="https://github.com/concepticon/norare-cldf/releases">norare-cldf</a>. Furthermore, we updated the <a href="https://github.com/clld/clld">clld app</a> for Concepticon at <a href="https://concepticon.clld.org">https://concepticon.clld.org</a> and created a new one for NoRaRe at <a href="https://norare.clld.org">https://norare.clld.org</a>. </p></div></content><link href="https://calclab.org/?news=2022-11-14-releases#2022-11-14-releases"/><published>2022-11-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-11-15-sprache</id><title>New Blog Post on Language and Writing </title><updated>2022-11-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, the my November blog post in German appeared, this time discussing the relation between language and writing <a href="https://wub.hypotheses.org/1686">here</a>.</p></div></content><link href="https://calclab.org/?news=2022-11-15-sprache#2022-11-15-sprache"/><published>2022-11-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-11-19-cristian</id><title>New Member of the CALC³ Group</title><updated>2022-11-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, Cristian Juarez finally joined our CALC³ group. He'll investigate the relationship between the Guaycuruan and Mataguayan languages in the South American Gran Chaco area, trying to find out if the attested similarities can be explained by contact or inheritance. We are very happy that Cristian joined our group and hope on a fruitful collaboration in the future.</p></div></content><link href="https://calclab.org/?news=2022-11-19-cristian#2022-11-19-cristian"/><published>2022-11-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-11-28-commands</id><title>New Blog Post on Custom Commands in CLDF</title><updated>2022-11-28T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post has been published that describes the creation of Custom Commands for CLDF datasets which can be used from the command line. The tutorial uses as an example the creation of Nexus-files out of an existing Lexibank-dataset. The blog post can be found <a href="https://calc.hypotheses.org/4403">here</a>.</p></div></content><link href="https://calclab.org/?news=2022-11-28-commands#2022-11-28-commands"/><published>2022-11-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-11-30-updates</id><title>New Versions of PySEM and EvoBib </title><updated>2022-11-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I released new versions of <a href="https://pypi.org/projects/pysem">PySEM</a> and <a href="https://digling.org/evobib/">EvoBib</a>. PySEM now contains latest data from <a href="https://concepticon.clld.org">Concepticon 3.0</a> and EvoBib has been extended with more than 1000 additional quotes and dozens of new references.</p></div></content><link href="https://calclab.org/?news=2022-11-30-updates#2022-11-30-updates"/><published>2022-11-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-12-09-paper</id><title>New Paper on Body Part Suffixes in Panoan Languages </title><updated>2022-12-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new paper, led by Roberto Zariquiey, appeared in the journal Interface Focus, titled "Untangling the evolution of body-part terminology in Pano: conservative versus innovative traits in body-part lexicalization".</p>
<blockquote>
<p>Although language-family specific traits which do not find direct counterparts outside a given language family are usually ignored in quantitative phylogenetic studies, scholars have made ample use of them in qualitative investigations, revealing their potential for identifying language relationships. An example of such a family specific trait are body-part expressions in Pano languages, which are often lexicalized forms, composed of bound roots (also called body-part prefixes in the literature) and non-productive derivative morphemes (called here body-part formatives). We use various statistical methods to demonstrate that whereas body-part roots are generally conservative, body-part formatives exhibit diverse chronologies and are often the result of recent and parallel innovations. In line with this, the phylogenetic structure of body-part roots projects the major branches of the family, while formatives are highly non-tree-like. Beyond its contribution to the phylogenetic analysis of Pano languages, this study provides significative insights into the role of grammatical innovations for language classification, the origin of morphological complexity in the Amazon and the phylogenetic signal of specific grammatical traits in language families.</p>
</blockquote>
<p>The paper is open access and can be found <a href="https://doi.org/10.1098/rsfs.2022.0053">here</a>.</p></div></content><link href="https://calclab.org/?news=2022-12-09-paper#2022-12-09-paper"/><published>2022-12-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-12-14-call</id><title>New Blogposts and Call for Workshop Abstracts </title><updated>2022-12-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, the final blog post for the year in our CALC blog appeared. Abbie Hantgan introduces her ERC project "The Small Bang", you can find the post <a href="https://calc.hypotheses.org/5053">here</a>.</p>
<p>Already on Monday, I published my final blog in German for the year, this time discussing todo-lists. You can find the post <a href="https://wub.hypotheses.org/1713">here</a>.</p>
<p>Last not least, our workshop proposal for the 26th International Conference of Historical Linguistics in Heidelberg in 2023 was accepted, and we are now inviting abstracts of one page related to the workshop's topic "". You can find a detailed call for abstracts with more information <a href="https://share.eva.mpg.de/index.php/s/cLfqyL5bMsKCe93">here</a>. Deadline is the 1st of January in 2023. For questions, you can also contact the organizers (including myself) directly.</p></div></content><link href="https://calclab.org/?news=2022-12-14-call#2022-12-14-call"/><published>2022-12-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2022-12-19-datanote</id><title>New article submitted to Open Research Europe</title><updated>2022-12-19T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We introduced the major release of Concepticon 3.0 (<a href="https://github.com/concepticon/concepticon-data/releases/tag/v3.0.0">List et al. 2022</a>) and NoRaRe 1.0 (<a href="https://github.com/concepticon/norare-data/releases/tag/v1.0.1">Tjuka et al. 2022b</a>) in an article which is now awaiting peer review at the journal <em>Open Research Europe</em>. The article is available here: <a href="https://doi.org/10.12688/openreseurope.15380.1">https://doi.org/10.12688/openreseurope.15380.1</a>.</p></div></content><link href="https://calclab.org/?news=2022-12-19-datanote#2022-12-19-datanote"/><published>2022-12-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-01-passau</id><title>CALC Lab Becomes CALC/MCL Lab at the University of Passau</title><updated>2023-01-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>With the beginning of 2023, I am now a full professor of the University of Passau, leading the newly funded Chair of Multilingual Computational Linguistics. With this new position, the CALC Lab will also move from Leipzig to Passau, emerging into a new laboratory in which we will extend our work on computer-assisted language comparison (CALC) to the broader field of multilingual applications in computational linguistics (MCL). The transition won't be abrupt, however, as I will keep my affiliation with the Max Planck Institute for Evolutionary Anthropology as well as my position as a leader of the CALC³ group until 2024. The new position means that our group will keep growing in the future, since new positions funded by the University of Passau will become available and hopefully filled soon. It also means that the so far rather targeted research group devoted to the field of computer-assisted language comparison will extend its scope further, concentrating more broadly on multilingual approaches in computational linguistics in the future. I look forward to new fruitful collaborations with the new colleagues from the University of Passau and I am very happy to pursue our existing collaborations with several colleagues from all around the world.</p></div></content><link href="https://calclab.org/?news=2023-01-01-passau#2023-01-01-passau"/><published>2023-01-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-09-cognates</id><title>New Paper on Partial Cognate Annotation </title><updated>2023-01-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, a new paper on partial cognate annotation by Mei-Shin Wu and myself was published. In the study, which you can find <a href="https://doi.org/10.1163/22105832-bja10023">here</a>, we discuss the consequences of varying the ways in which partial cognates are annotated and later converted to statements of overall (word-level) cognacy for the purpose of phylogenetic reconstruction.</p>
<blockquote>
<p>Compounding and derivation are frequent in many language families. As a consequence, words in different languages are often only partially cognate, sharing some but not all morphemes. While partial cognates do not constitute a problem for the phonological reconstruction of individual morphemes, they are problematic for phylogenetic reconstruction based on comparative word lists. We review current practices of preparing cognate-coded word lists and develop new approaches that make the process of cognate annotation more transparent. Comparing four methods by which partial cognate judgments can be converted to cognate judgments for whole words on a newly annotated data set of 19 Chinese dialect varieties, we find that the choice of conversion method has an impact on the inferred tree topologies that cannot be ignored. We conclude that scholars should take great care with cognate judgments in languages in which compounding and derivation are frequent and recommend always assigning cognates transparently.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-01-09-cognates#2023-01-09-cognates"/><published>2023-01-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-11-quechua</id><title>New Paper on the internal classification of Quechua</title><updated>2023-01-11T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, we published the pre-print presenting an (undated) phylogeny for the internal classification of Quechua. The article, available <a href="https://doi.org/10.31235/osf.io/twu6a">here</a>, was accepted for publication in <a href="https://journals.iai.spk-berlin.de/index.php/indiana/"><em>Indiana</em></a>. We relate the computational evidence for the different branches to the different hypotheses surrounding the expansions of the Quechua language family. Further, we show hat tree models are not incompatible with this data, and how low posterior values in a phylogeny can actually help us identifying complex historic scenarios.</p>
<blockquote>
<p>We present a computational phylogeny for the internal classification of the Quechua language family. Based on a concept list of 150 lexical items, we manually analyzed data from 39 contemporaneous Quechua varieties for cognacy and computed a family tree using Bayesian phylogenetic methods. The results provide further evidence for the classification of individual varieties and compares the results to the existing hypotheses for the evolution of the Quechua language family.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-01-11-quechua#2023-01-11-quechua"/><published>2023-01-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-12-calcip</id><title>CALC Blog Posts in 2022</title><updated>2023-01-12T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We published Volume 5 of <em>Computer-Assisted Language Comparison in Practice</em>. The volume contains PDF versions of all contributions published on the <a href="https://calc.hypotheses.org">CALC blog</a> in 2022 and is available on <a href="https://hcommons.org">Humanities Commons</a> at <a href="https://doi.org/10.17613/0df3-gm47">https://doi.org/10.17613/0df3-gm47</a>.</p></div></content><link href="https://calclab.org/?news=2023-01-12-calcip#2023-01-12-calcip"/><published>2023-01-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-16-blogpost</id><title>New Blog Post on Cross-Linguistic Colexifications</title><updated>2023-01-16T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In the first blog post of 2023, I discuss the origins of cross-linguistic colexifications. The blog post is the first step towards a deeper exploration of this topic and explains the four processes that underlie cross-linguistic colexifications. The post is available here: <a href="https://calc.hypotheses.org/5001">https://calc.hypotheses.org/5001</a>.</p></div></content><link href="https://calclab.org/?news=2023-01-16-blogpost#2023-01-16-blogpost"/><published>2023-01-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-18-einkehr</id><title>New Blog Posts </title><updated>2023-01-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new German blog post appeared, in which I look back at the scientific journey that ultimately brought me to Passau. You can find this post <a href="https://wub.hypotheses.org/1794">here</a>.</p></div></content><link href="https://calclab.org/?news=2023-01-18-einkehr#2023-01-18-einkehr"/><published>2023-01-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-19-article</id><title>New Paper Appeared </title><updated>2023-01-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper (with Nicolás Brid and Cristina Messineo) on "A comparative wordlist for the languages of The Gran Chaco, South America" was now formally accepted by Open Research Europe and can thus be considered as fully "published". You can find the study <a href="https://doi.org/10.12688/openreseurope.14922.2">here</a>.</p>
<blockquote>
<p>Home to more than twenty indigenous languages belonging to six linguistic families, the Gran Chaco has raised the interest of many linguists from different backgrounds. While some have focused on finding deeper genetic relations between different language groups, others have looked into similarities from the perspective of areal linguistics. In order to contribute to further research of areal and genetic features among these languages, we have compiled a comparative wordlist consisting of translational equivalents for 326 concepts — representing basic and ethnobiological vocabulary — for 26 language varieties. Since the data were standardized in various ways, they can be analyzed both quantitatively and qualitatively. In order to illustrate this in detail, we have carried out an initial computer-assisted analysis of parts of the data by searching for shared lexicosemantic patterns resulting from structural rather than direct borrowings.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-01-19-article#2023-01-19-article"/><published>2023-01-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-23-erc</id><title>ERC Portrait and Lecture Materials Online </title><updated>2023-01-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, an interview in the German National Contact Point's series of ERC Portraits appeared, in which I answer some questions on my ERC projects and how it was to apply for ERC grants in the past. The interview can be found <a href="https://www.eubuero.de/de/nks-erc-portraets-list-3138.html">here</a>. </p>
<p>Already more than a week agod, I published the handouts accompanying my lecture in Amsterdam devoted to "Computational Historical Linguistics", which you can find <a href="https://doi.org/10.17613/a29d-xh32">here</a>.</p></div></content><link href="https://calclab.org/?news=2023-01-23-erc#2023-01-23-erc"/><published>2023-01-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-01-30-news</id><title>News and Preprints </title><updated>2023-01-30T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In the last week, an article presenting the major goals of the ProduSemy project appeared in the <a href="https://www.pnp.de/print/lokales/stadt-und-landkreis-passau/passau-stadt/wie-entstehen-wortfamilien-10420747">Passauer Neue Presse</a>. </p>
<p>We also published a new preprint (common work with Nathan W. Hill, Xun Gong, and Seth Knights) on "Computer-Assisted Approaches to Rule-Based Phonological Reconstruction", which you can find <a href="https://doi.org/10.17613/2cbe-2j11">here</a>.</p>
<blockquote>
<p>The formalization of sound changes as finite state transducers is implicit already in the Neogrammarians. For at least six decades scholars have recognized the potential of transducers for improving the speed and rigor of research in historical linguists, but almost no historical linguists actually use them. This article identifies the obstacles facing the concrete use of transducers and introduces a software package built to reconstruct Proto-Burmish using transducers.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-01-30-news#2023-01-30-news"/><published>2023-01-30T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-02-03-papers</id><title>A Preprint and a Forthcoming Study </title><updated>2023-02-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In this week, the preprint of a forthcoming study with John Miller appeared, accepted for the EACL conference. The study has been archived with <a href="https://doi.org/10.48550/arXiv.2302.00189">arXiv</a>. The study titled "Detecting Lexical Borrowings from Dominant Languages in Multilingual Wordlists" tests some straightforward methods for borrowing detection.</p>
<blockquote>
<p>Language contact is a pervasive phenomenon reflected in the borrowing of words from donor to recipient languages. Most computational approaches to borrowing detection treat all languages under study as equally important, even though dominant languages have a stronger impact on heritage languages than vice versa. We test new methods for lexical borrowing detection in contact situations where dominant languages play an important role, applying two classical sequence comparison methods and one machine learning method to a sample of seven Latin American languages which have all borrowed extensively from Spanish. All methods perform well, with the supervised machine learning system outperforming the classical systems. A review of detection errors shows that borrowing detection could be substantially improved by taking into account donor words with divergent meanings from recipient words.</p>
</blockquote>
<p>A preprint of a study yet to be reviewed, titled "Inference of Partial Colexifications from Multilingual Wordlists", also appeared on <a href="https://arxiv.org/abs/2302.00739">arXiv</a>. It proposes automated methods for the inference of different partial colexification networks.</p>
<blockquote>
<p>The past years have seen a drastic rise in studies devoted to the investigation of colexification patterns in individual languages families in particular and the languages of the world in specific. Specifically computational studies have profited from the fact that colexification as a scientific construct is easy to operationalize, enabling scholars to infer colexification patterns for large collections of cross-linguistic data. Studies devoted to partial colexifications -- colexification patterns that do not involve entire words, but rather various parts of words--, however, have been rarely conducted so far. This is not surprising, since partial colexifications are less easy to deal with in computational approaches and may easily suffer from all kinds of noise resulting from false positive matches. In order to address this problem, this study proposes new approaches to the handling of partial colexifications by (1) proposing new models with which partial colexification patterns can be represented, (2) developing new efficient methods and workflows which help to infer various types of partial colexification patterns from multilingual wordlists, and (3) illustrating how inferred patterns of partial colexifications can be computationally analyzed and interactively visualized.  </p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-02-03-papers#2023-02-03-papers"/><published>2023-02-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-02-06-blogpost</id><title>New Blog Post on Metaphor, Metonymy, Analogy</title><updated>2023-02-06T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In this blog post, I discuss ideas about metaphor and metonymy from linguistics that highlight the cognitive underpinnings of both notions, as well as a proposal from psychology about how analogical thinking can explain the processing of metaphors. The post is available here: <a href="https://calc.hypotheses.org/5234">https://calc.hypotheses.org/5234</a>.</p></div></content><link href="https://calclab.org/?news=2023-02-06-blogpost#2023-02-06-blogpost"/><published>2023-02-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-02-14-blogpost</id><title>New Blog Post and Defended Dissertation </title><updated>2023-02-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new German blog post appeared, discussing etymologies in the context of the concept of "paper". The post, titled "Vom Verteilen von Papier" can be found <a href="https://wub.hypotheses.org/1844">here</a>.</p>
<p>Also yesterday, Mei-Shin Wu, former PhD student in our CALC project,
successfully defended her PhD thesis, titled "Computer-Assisted Approach to the
Comparison of Mainland Southeast Asian Languages". We are all very glad that
Mei-Shin finished her thesis successfully and wish her all the best for the
future.</p></div></content><link href="https://calclab.org/?news=2023-02-14-blogpost#2023-02-14-blogpost"/><published>2023-02-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-02-22-language</id><title>New Paper on Uto-Aztecan Origins in Language </title><updated>2023-02-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a paper on the origins of Uto-Aztecan appeared in Language ahead of print. </p>
<blockquote>
<p>The Uto-Aztecan language family is one of the largest language families in the Americas. However, there has been considerable debate about its origin and how it spread. Here we use Bayesian phylogenetic methods to analyze lexical data from thirty-four Uto-Aztecan varieties and two Kiowa-Tanoan languages. We infer the age of Proto-Uto-Aztecan to be around 4,100 years (3,258–5,025 years) and identify the most likely homeland to be near what is now Southern California. We reconstruct the most probable subsistence strategy in the ancestral Uto-Aztecan society and infer no casual or intensive cultivation, an absence of cereal crops, and a primary subsistence mode of gathering (rather than agriculture). Our results therefore support the timing, geography, and cultural practices of a northern origin and are inconsistent with alternative scenarios.</p>
</blockquote>
<p>The contribution of CALC to this study was a thorough formal test of the quality of cognate judgments that showed that cognate judgments provided by experts basically increase the overall regularity of the amount of words that would exhibit systematic correspondences in the data.</p>
<p>The study can be found <a href="https://doi.org/10.1353/lan.0.0276">here</a>, a preprint is also available in <a href="https://osf.io/preprints/socarxiv/k598j/">open access</a>.</p></div></content><link href="https://calclab.org/?news=2023-02-22-language#2023-02-22-language"/><published>2023-02-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-03-06-blogpost</id><title>New Contribution to the How To Do X In Linguistics Series</title><updated>2023-03-06T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new contribution to our How To series was published today. I offer an overview of how to organize literature and notes in <a href="https://www.zotero.org/">Zotero</a>. Specifically, I illustrate some of my own workflows for organizing the literature for my dissertation and discuss general features of Zotero. The blog post is available here: <a href="https://calc.hypotheses.org/5692">https://calc.hypotheses.org/5692</a>.</p></div></content><link href="https://calclab.org/?news=2023-03-06-blogpost#2023-03-06-blogpost"/><published>2023-03-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-03-16-wurst</id><title>New Blogpost on the Meanings of Sausage in German </title><updated>2023-03-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new German blogpost appeared, discussing the word family German <em>Wurst</em> "sausages" and its counterparts in some other languages. The blogpost can be found <a href="https://wub.hypotheses.org/1865">here</a>.</p></div></content><link href="https://calclab.org/?news=2023-03-16-wurst#2023-03-16-wurst"/><published>2023-03-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-03-updates</id><title>New Concepticon Release and New Study Appeared</title><updated>2023-04-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>There are a couple of news in different categories to be shared. First, a new version of the <a href="https://concepticon.clld.org">CLLD Concepticon</a> was published last week. The new version adds several new concept lists and is crucial for the upcoming new version of Lexibank.</p>
<p>Then, another paper appeared in the journal Moderna, together with Hans Geisler, titled "Of word families and language trees. New and old metaphors in studies on language history" (DOI: <a href="https://doi.org/10.19272/202201902005">10.19272/202201902005</a>). </p>
<blockquote>
<p>For a long time, metaphors have played an important role in depicting language
history. In this study, we contrast early metaphors on language history, such as the family
tree or the wave model, with recent metaphors that were popularized after the quantitative
turn, such as forests of trees or phylogenetic networks. Speculating about metaphors which
could become important in the future, we conclude that a vivid discussion about the useful-
ness and the concrete implications of metaphors plays a key role for the development of
models for language history in historical linguistics.</p>
</blockquote>
<p>This study is unfortunately under closed access, as the open access fees seemed too high to us for a study providing merely a review, but a <a href="https://doi.org/10.17613/e5zx-1852">preprint</a> is freely available, and an author copy can be shared upon request.</p></div></content><link href="https://calclab.org/?news=2023-04-03-updates#2023-04-03-updates"/><published>2023-04-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-04-trimming</id><title>New Paper on Trimming Phonetic Alignments Accepted</title><updated>2023-04-04T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, we heard that a new study on the trimming of phonetic alignments to improve the inference of sound correspondence patterns was accepted to appear as part of the SIGTYP workshop organized as part of the EACL. 
The preprint of this study is now also available on 
on <a href="https://doi.org/10.48550/arXiv.2303.17932">arXiv</a>.</p>
<blockquote>
<p>Sound correspondence patterns form the basis of cognate detection and phonological reconstruction in historical language comparison. Methods for the automatic inference of correspondence patterns from phonetically aligned cognate sets have been proposed, but their application to multilingual wordlists requires extremely well annotated datasets. Since annotation is tedious and time consuming, it would be desirable to find ways to improve aligned cognate data automatically. Taking inspiration from trimming techniques in evolutionary biology, which improve alignments by excluding problematic sites, we propose a workflow that trims phonetic alignments in comparative linguistics prior to the inference of correspondence patterns. Testing these techniques on a large standardized collection of ten datasets with expert annotations from different language families, we find that the best trimming technique substantially improves the overall consistency of the alignments. The results show a clear increase in the proportion of frequent correspondence patterns and words exhibiting regular cognate relations.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-04-04-trimming#2023-04-04-trimming"/><published>2023-04-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-05-talks</id><title>Two Upcoming Talks at 56th SLE Conference</title><updated>2023-04-05T12:00:00+00:00</updated><author><name>@JuarezRC</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Talks by myself and Frederic Blum were accepted at the 56th Annual Meeting of the <em>Societas Linguistica Europaea</em> to be held in Athens. Frederic's talk will be on <em>Re-examining the proposed genetic relationship(s) of Panoan and Tacanan</em> and I myself will present <em>Locative relations and valence extension: multifunctional locative markers in Mocoví (Guaycuruan, Argentina)</em>. The full abstracts will be posted soon <a href="https://societaslinguistica.eu/sle2023/programme/">here</a>.</p></div></content><link href="https://calclab.org/?news=2023-04-05-talks#2023-04-05-talks"/><published>2023-04-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-11-harmony</id><title>One more Paper at the SIGTYP Workshop Accepted</title><updated>2023-04-11T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Another paper, submitted to the SIGTYP workshop SIGTYP workshop organized as part of the EACL, was accepted last week. This study, common work by Julius Steuer, Badr. M. Abdullah, myself, and Dietrich Klakow,
investigates information-theoretic aspects of vowel harmony reflected in multilingual wordlists. A first version of this study is now already available <a href="https://sigtyp.github.io/workshops/2023/sigtyp/papers/18_information_theoretic_characte.pdf">online</a>.  </p>
<blockquote>
<p>We present a cross-linguistic study that aims
to quantify vowel harmony using data-driven
computational modeling. Concretely, we define
an information-theoretic measure of harmonicity based on the predictability of vowels in a
natural language lexicon, which we estimate
using phoneme-level language models (PLMs).
Prior quantitative studies have relied heavily on
inflected word-forms in the analysis of vowel
harmony. We instead train our models using
cross-linguistically comparable lemma forms
with little or no inflection, which enables us
to cover more under-studied languages. Training data for our PLMs consists of word lists
with a maximum of 1000 entries per language.
Despite the fact that the data we employ are
substantially smaller than previously used corpora, our experiments demonstrate the neural
PLMs capture vowel harmony patterns in a set
of languages that exhibit this phenomenon. Our
work also demonstrates that word lists are a
valuable resource for typological research, and
offers new possibilities for future studies on
low-resource, under-studied languages.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-04-11-harmony#2023-04-11-harmony"/><published>2023-04-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-11-new</id><title>One more Paper at the SIGTYP Workshop Accepted </title><updated>2023-04-11T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Another paper, submitted to the SIGTYP workshop SIGTYP workshop organized as part of the EACL, was accepted last week. This study, common work by Julius Steuer, Badr. M. Abdullah, myself, and Dietrich Klakow, investigates information-theoretic aspects of vowel harmony reflected in multilingual wordlists. A first version of this study is now already available <a href="https://sigtyp.github.io/workshops/2023/sigtyp/papers/18_information_theoretic_characte.pdf">online</a>.</p>
<blockquote>
<p>We present a cross-linguistic study that aims to quantify vowel harmony using data-driven computational modeling. Concretely, we define an information-theoretic measure of harmonicity based on the predictability of vowels in a natural language lexicon, which we estimate using phoneme-level language models (PLMs). Prior quantitative studies have relied heavily on inflected word-forms in the analysis of vowel harmony. We instead train our models using cross-linguistically comparable lemma forms with little or no inflection, which enables us to cover more under-studied languages. Training data for our PLMs consists of word lists with a maximum of 1000 entries per language. Despite the fact that the data we employ are substantially smaller than previously used corpora, our experiments demonstrate the neural PLMs capture vowel harmony patterns in a set of languages that exhibit this phenomenon. Our work also demonstrates that word lists are a valuable resource for typological research, and offers new possibilities for future studies on low-resource, under-studied languages.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-04-11-new#2023-04-11-new"/><published>2023-04-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-12-positions</id><title>New Positions in our ProduSemy Project from October 2023</title><updated>2023-04-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In our ERC project "Productive Signs: A computer-assisted investigation of evolutionary, typological, and cognitive aspects of word families", we offer three doctooral positions (3 years with possible extension by one more year), deadline to apply is May 20. 
One position on cognitive aspects of word families, more information can be found <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_I.pdf">here</a>.
One position on typological aspects of word families:
<a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_II.pdf">here</a>.
One position on evolutionary aspects of word families, more information can be found
<a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_III.pdf">here</a>. </p>
<p>Additionally, our Chair of Multilingual Computational Linguistics is offering a position for an Akademischer Rat (research assistant) for 3 years with the possibility of extension by 3 more years. We look for a candidate who can teach topics in Multilingual Computational Linguistics with a specific focus on machine learning and data management.
Deadline for application is May 16, more information can be found <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_04_AR_Prof_List.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2023-04-12-positions#2023-04-12-positions"/><published>2023-04-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-14-blogpost</id><title>Blogpost on Language in Specific</title><updated>2023-04-14T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new Gergman blog post appeared, titled "Von der Sprache im Speziellen", which you can find <a href="https://wub.hypotheses.org/1919">here</a>. In the post, I discuss how people often view language, and how this conflicts with the linguistic perspective.</p></div></content><link href="https://calclab.org/?news=2023-04-14-blogpost#2023-04-14-blogpost"/><published>2023-04-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-19-posts</id><title>Two Three-Year Post-Doc Positions in the ERC-Project ProduSemy</title><updated>2023-04-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two three-year post-doc positions in our ERC project <a href="https://doi.org/10.3030/715618">Productive Signs</a> available for three years, starting in October 2023, deadline for application is May 20, 2023. </p>
<p>One post-doc with a focus on the historical development of word families in the languages of the world:</p>
<p>https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_Post_Doc_Prof_List_Projekt_Productive_Signs_I.pdf</p>
<p>The other post-doc focuses on typological aspects of word families:</p>
<p>https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_Post_Doc_Prof_List_Projekt_Productive_Signs_II.pdf</p>
<p>English versions of the calls will also be published soon.</p></div></content><link href="https://calclab.org/?news=2023-04-19-posts#2023-04-19-posts"/><published>2023-04-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-21-positions</id><title>We are Hiring</title><updated>2023-04-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In our <a href="https://doi.org/10.3030/101044282">ProduSemy</a> project and with the Chair of Multilingual Computational
Linguistics at the University of Passau, there are several open positions for
which doctoral students and post-docs can apply. We are looking both for people
who are experienced in machine learning and computational linguistics as well
as for people experienced in comparative linguistics (linguistic typology and
historical linguistics) and cognitive linguistics and psycholinguistics. </p>
<p>Below is a summary for the positions we offer and where you can find more
information on how to apply. Since English calls are not yet available at the
moment, I kindly ask all those who do not speak German to contact me directly
via mcl-admin@uni-passau.de in order to get more information on the positions
and how to apply. </p>
<table>
<thead>
<tr>
<th>Position</th>
<th>Duration</th>
<th>Start Date</th>
<th>Speciality</th>
<th>Deadline</th>
<th>Link</th>
</tr>
</thead>
<tbody>
<tr>
<td>Doctoral Student</td>
<td>3+1 years</td>
<td>October 2023</td>
<td>psycholinguistics</td>
<td>May 20</td>
<td><a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_I.pdf">DE</a> <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_I_ENG_Ri.pdf">EN</a></td>
</tr>
<tr>
<td>Doctoral Student</td>
<td>3+1 years</td>
<td>October 2023</td>
<td>typology</td>
<td>May 20</td>
<td><a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_II.pdf">DE</a> <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_II_ENG_Ri.pdf">EN</a></td>
</tr>
<tr>
<td>Doctoral Student</td>
<td>3+1 years</td>
<td>October 2023</td>
<td>historical linguistics</td>
<td>May 20</td>
<td><a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_III.pdf">DE</a> <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_WM_Prof_List_Projet_Productive_Signs_III_ENG_Ri.pdf">EN</a></td>
</tr>
<tr>
<td>Post-Doc</td>
<td>3+3 years</td>
<td>October 2023 or earlier</td>
<td>computational linguistics / machine learning</td>
<td>May 16</td>
<td><a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_04_AR_Prof_List.pdf">DE</a> <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_04_AR_Prof_List_ENG.pdf">EN</a></td>
</tr>
<tr>
<td>Post-Doc</td>
<td>3 years</td>
<td>October 2023</td>
<td>historical linguistics</td>
<td>May 20</td>
<td><a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_Post_Doc_Prof_List_Projekt_Productive_Signs_I.pdf">DE</a> <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_Post_Doc_Prof_List_Projekt_Productive_Signs_I_ENG_Ri.pdf">EN</a></td>
</tr>
<tr>
<td>Post-Doc</td>
<td>3 years</td>
<td>October 2023</td>
<td>typology</td>
<td>May 20</td>
<td><a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_Post_Doc_Prof_List_Projekt_Productive_Signs_II.pdf">DE</a> <a href="https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote/2023_10_Post_Doc_Prof_List_Projekt_Productive_Signs_I_ENG_Ri.pdf">EN</a></td>
</tr>
</tbody>
</table></div></content><link href="https://calclab.org/?news=2023-04-21-positions#2023-04-21-positions"/><published>2023-04-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-26-blogpost</id><title>New Blog Post on the Release of Concepticon 3.1</title><updated>2023-04-26T12:00:00+00:00</updated><author><name>M. van Zantwijk</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In this blog post, I provide an overview of the improvements we integrated into the newest version of our Concepticon resource: Concepticon 3.1. After describing the new lists we added to Concepticon 3.1, I illustrate how we refined the concept relations and mappings and show how we deal with potential inconsistencies by use of an example of one list that proved to be inconsistent. The blog post is available here: <a href="https://calc.hypotheses.org/5915">https://calc.hypotheses.org/5915</a>.</p></div></content><link href="https://calclab.org/?news=2023-04-26-blogpost#2023-04-26-blogpost"/><published>2023-04-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-04-27-colloquium</id><title>Invited Talk at Research Colloquium</title><updated>2023-04-27T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Tuesday, I gave an invited talk at the Current Topics in General Linguistics Colloquium organized by Kilu von Prince at Heinrich Heine University Düsseldorf. I presented a study on body-object colexifications that illustrates workflows based on Lexibank (<a href="https://doi.org/10.1038/s41597-022-01432-0">List et al. 2022</a>). The slides are available <a href="https://annikatjuka-talks.github.io/slides/tjuka2023-body-object-colexification-HHU-Colloquium.pdf">here</a>.</p></div></content><link href="https://calclab.org/?news=2023-04-27-colloquium#2023-04-27-colloquium"/><published>2023-04-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-05-08-newpapers</id><title>Three New Papers in the Context of EACL Appeared </title><updated>2023-05-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>In the context of the EACL conference, three new papers have now been published. One study with John Miller, which made it into the main conference, tests new methods for the detection of borrowings from dominant donor languages and can be found under the link <a href="https://aclanthology.org/2023.eacl-main.190/">https://aclanthology.org/2023.eacl-main.190/</a>. Two more papers appeared as part of the SIGTYP workshop organized by the special interest group for linguistic typology in NLP: One paper led by Julius Steuer tests new ways to investigate vowel harmony on wordlist data and can be found under the link <a href="https://aclanthology.org/2023.sigtyp-1.10">https://aclanthology.org/2023.sigtyp-1.10</a>. Another study by Frederic Blum and myself proposes a novel technique to handle phonetic alignments, which we call <em>trimming</em>. This study can be found under the link <a href="https://aclanthology.org/2023.sigtyp-1.6">https://aclanthology.org/2023.sigtyp-1.6</a>.</p></div></content><link href="https://calclab.org/?news=2023-05-08-newpapers#2023-05-08-newpapers"/><published>2023-05-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-05-15-blogpost</id><title>New German Blogpost </title><updated>2023-05-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blogpost in German appeared, in which I discuss certain aspects of language, which we often think are "natural", but may turn out to much harder to detect as those if one approaches language from a mind that does not yet know how to speak. The blogpost, titled "Von der Sprache im Allgemeinen" is available under the link <a href="https://wub.hypotheses.org/1935">https://wub.hypotheses.org/1935</a>.</p></div></content><link href="https://calclab.org/?news=2023-05-15-blogpost#2023-05-15-blogpost"/><published>2023-05-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-05-24-pw</id><title>TV Interview and New Paper Accepted</title><updated>2023-05-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, I was guest in the German TV show Planet Wissen, discussing the origin and the future of human language. The video can be found <a href="https://www.planet-wissen.de/video-sprachwunder-mensch--macht-und-weiterentwicklung-der-kommunikation-100.html">here</a>.</p>
<p>Additionally, I learned that my paper on partial colexifications was accepted with Frontiers in Psychology and will soon appear online.</p></div></content><link href="https://calclab.org/?news=2023-05-24-pw#2023-05-24-pw"/><published>2023-05-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-05-25-datanote</id><title>Final Version of Data Note Published</title><updated>2023-05-25T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our data note on curating and extending lexical data has been published at Open Research Europe. The article is available here: <a href="https://doi.org/10.12688/openreseurope.15380.3">https://doi.org/10.12688/openreseurope.15380.3</a></p>
<p>We present the major release of <a href="https://concepticon.clld.org">Concepticon</a> 3.0 and <a href="https://norare.clld.org">NoRaRe</a> 1.0. The article describes the underlying data and methods for maintaining the two resources.</p></div></content><link href="https://calclab.org/?news=2023-05-25-datanote#2023-05-25-datanote"/><published>2023-05-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-06-16-colex</id><title>New Paper on Partial Colexifications </title><updated>2023-06-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new paper was published, presenting a new method on the inference of partial colexifications from multilingual wordlists (DOI: <a href="https://doi.org/10.3389/fpsyg.2023.1156540">10.3389/fpsyg.2023.1156540</a>). This is the first study to propose explicit methods to infer partial (as opposed to "full") colexifications from multilingual wordlists.</p>
<blockquote>
<p>The past years have seen a drastic rise in studies devoted to the investigation of colexification patterns in individual languages families in particular and the languages of the world in specific. Specifically computational studies have profited from the fact that colexification as a scientific construct is easy to operationalize, enabling scholars to infer colexification patterns for large collections of cross-linguistic data. Studies devoted to partial colexifications—colexification patterns that do not involve entire words, but rather various parts of words—, however, have been rarely conducted so far. This is not surprising, since partial colexifications are less easy to deal with in computational approaches and may easily suffer from all kinds of noise resulting from false positive matches. In order to address this problem, this study proposes new approaches to the handling of partial colexifications by (1) proposing new models with which partial colexification patterns can be represented, (2) developing new efficient methods and workflows which help to infer various types of partial colexification patterns from multilingual wordlists, and (3) illustrating how inferred patterns of partial colexifications can be computationally analyzed and interactively visualized.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-06-16-colex#2023-06-16-colex"/><published>2023-06-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-06-19-blogpost</id><title>New Blog Post </title><updated>2023-06-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new German blog post, titled "Wer hat Angst vorm Chatprogramm" appeared (see <a href="https://wub.hypotheses.org/1978">wub.hypotheses.org/1978</a>), in which I discuss a bit the potential implications but also potentially false fears from artificial intelligence and chat programs.</p></div></content><link href="https://calclab.org/?news=2023-06-19-blogpost#2023-06-19-blogpost"/><published>2023-06-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-06-21-oliveira</id><title>New Blog Post on a Dataset With Phonological Reconstructions in CLDF</title><updated>2023-06-21T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We present the digitization of a CLDF dataset that involves the reconstruction of Proto-Panoan in a new blogpost (Link: <a href="https://calc.hypotheses.org/6142">https://calc.hypotheses.org/6142</a>). We discuss the challenges that arise with the parsing of text-based data, and also highlight some potential future use cases for machine-readble data that involves phonological reconstructions. You can access the release of the dataset that we discuss either on <a href="https://github.com/pano-tacanan-history/oliveiraprotopanoan/releases/tag/v1.0.0">GitHub</a> or <a href="https://doi.org/10.5281/zenodo.8058801">Zenodo</a>.</p></div></content><link href="https://calclab.org/?news=2023-06-21-oliveira#2023-06-21-oliveira"/><published>2023-06-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-06-22-preprint</id><title>New Preprint </title><updated>2023-06-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new preprint by Yunfan Lai and myself appeared in Open Research Europe. Titled " Lexical data for the historical comparison of Rgyalrongic languages", we present a database on Rgyalrongic languages that is in part coded for partial cognates (article is available in open access, DOI: <a href="https://doi.org/10.12688/openreseurope.16017.1">10.12688/openreseurope.16017.1</a>).</p>
<blockquote>
<p>As one of the most morphologically conservative branches of the Sino-Tibetan language family, most of the Rgyalrongic languages are still understudied and poorly understood, not to mention their vulnerable or endangered status. It is therefore important for available data of these languages to be made accessible. The present lexical data sets provide comparative word lists of 20 modern and medieval Rgyalrongic languages, consisting of word lists from fieldwork carried out by the first author and other colleagues as well as published word lists by other authors. In particular, data of the two Khroskyabs varieties are collected by the first author from 2011 to 2016. Cognate identification is based on the authors' expertise in Rgyalrong historical linguistics through the neogrammarian comparative method. We curated the data by conducting phonemic segmantation and partial cognate annotation. The data sets can be used by historical linguists interested in the etymology and the phylogeny of the languages in question, and they can use them to answer questions regarding individual word histories or the subgrouping of languages in this important branch of Sino-Tibetan.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-06-22-preprint#2023-06-22-preprint"/><published>2023-06-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-07-13-paper</id><title>New Paper Appeared </title><updated>2023-07-13T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I just found out that a review study that I wrote for a book project published with Springer has now appeared online. The study, titled "Evolutionary Aspects of Language Change" discusses some parallels and differences between language change and biological evolution. Unfortunately, it is not available in open access (DOI: <a href="https://doi.org/10.1007/978-3-031-33358-3_6">10.1007/978-3-031-33358-3_6</a>), but a preprint is available via Humanities Commons (DOI: <a href="https://doi.org/10.17613/ebas-hj26">10.17613/ebas-hj26</a>).</p></div></content><link href="https://calclab.org/?news=2023-07-13-paper#2023-07-13-paper"/><published>2023-07-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-07-19-zitieren</id><title>New Blogpost </title><updated>2023-07-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, I published a new German blog post, this time discussing how to quote in the humanities, arguing that we need a new debate on citation practice, given the influence of social media and preprint archives on our work (<a href="https://wub.hypotheses.org/2015">Vom grauen Zitieren</a>).</p></div></content><link href="https://calclab.org/?news=2023-07-19-zitieren#2023-07-19-zitieren"/><published>2023-07-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-08-02-blog</id><title>New Blogpost </title><updated>2023-08-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already last week, we published a new blog post with Zhenyang Liu, Guillaume Jacques, and myself, presenting a new comparative wordlist of Newari, one of the few Sino-Tibetan languages with a long written tradition (<a href="https://calc.hypotheses.org/6269">Creating a Standardized Comparative Wordlist of Newari Varieties</a>).</p></div></content><link href="https://calclab.org/?news=2023-08-02-blog#2023-08-02-blog"/><published>2023-08-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-08-17-blog</id><title>New Blogpost </title><updated>2023-08-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new blog post by Abbie Hantgan and myself appeared, introducing a standardized CLDF wordlist, created from the Dogon Comparative Wordlist by <a href="https://dogonlanguages.org">Heath et al. 2016</a> (<a href="https://calc.hypotheses.org/6329">Creating a CLDF Wordlist from Heath et al.'s Dogon Comparative Wordlist</a>).</p></div></content><link href="https://calclab.org/?news=2023-08-17-blog#2023-08-17-blog"/><published>2023-08-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-08-22-blog</id><title>New Blogpost </title><updated>2023-08-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new blog post in German appeared, discussing scientific practice in the context of referencing and structuring documents ("Strukturprobleme", <a href="https://wub.hypotheses.org/2078">URL</a>).</p></div></content><link href="https://calclab.org/?news=2023-08-22-blog#2023-08-22-blog"/><published>2023-08-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-09-05-award</id><title>Best Presentation Award</title><updated>2023-09-05T12:00:00+00:00</updated><author><name>C. Juarez</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I am happy to share that my presentation "Locative relations and valence extension: multifunctional markers in Mocoví" received the second prize for best conference paper by starting postdoctoral researchers at the last <em>56th Annual Meeting of the Societas Linguistica Europaea</em>.
This achivement would not have been possible without the kind support of our CALC group and the Linguistic and Cultural Evolution Department at MPI-EVA. </p></div></content><link href="https://calclab.org/?news=2023-09-05-award#2023-09-05-award"/><published>2023-09-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-09-11-iclc</id><title>Focus Stream on Productive Signs at ICL 2024</title><updated>2023-09-11T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A first call for papers has been announced for the Focus Stream on "Productive Signs: Evolutionary, Typological, and Cognitive Dimensions of Word Families", organized as part of the 24th International Congress of Linguists, taking place in Poznán from September 8 to 14 2024.</p>
<p>The call can be found here: https://linguistlist.org/issues/34-2666/</p>
<p>An abstract of the call can be found here: https://ciplnet.com/wp-content/uploads/2023/07/FS-10-Productive-signs.pdf</p></div></content><link href="https://calclab.org/?news=2023-09-11-iclc#2023-09-11-iclc"/><published>2023-09-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-09-15-blog</id><title>New Blogpost </title><updated>2023-09-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post in German appeared, discussing scientific practice in the context of what may be classified as "questionable research practices" (<em>Etiquetten</em>, <a href="https://wub.hypotheses.org/2091">URL</a>).</p></div></content><link href="https://calclab.org/?news=2023-09-15-blog#2023-09-15-blog"/><published>2023-09-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-09-27-blog</id><title>New Blogpost on Orthography Profiles</title><updated>2023-09-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post in out tutorial blog appeared.
The blog, titled  presents an implementation of the orthography profile algorithm in JavaScript (<em>Sequence Manipulation with Orthography Profiles in JavaScript</em>, <a href="https://calc.hypotheses.org/6361">URL</a>).</p></div></content><link href="https://calclab.org/?news=2023-09-27-blog#2023-09-27-blog"/><published>2023-09-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-10-09-welcome</id><title>Welcoming New Team Members</title><updated>2023-10-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week on Monday, five new members joined our team, Dr. Kellen Parker van Dam, as a chair assistant, Dr. Anna Di Natale and Dr. Matthias Pache as post-docs in our ProduSemy project, and Katja Bocklage and Arne Rubehn as doctoral students in the same project. We hope that all members will like the research atmosphere at our chair in specific and at the University of Passau in general and look forward to fruitful collaboration.</p></div></content><link href="https://calclab.org/?news=2023-10-09-welcome#2023-10-09-welcome"/><published>2023-10-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-10-16-papers</id><title>New Accepted Papers </title><updated>2023-10-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two more papers have been accepted, first, a paper by Cormac Anderson at al., in which we measure variation in phoneme inventories, was accepted by the Journal of Language Evolution. Second, a paper by Yunfan Lai and myself presenting a database for Rgyalrongic languages was accepted by Open Research Europe. We hope that preprints and final versions of both papers will appear soon.</p></div></content><link href="https://calclab.org/?news=2023-10-16-papers#2023-10-16-papers"/><published>2023-10-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-10-19-papers</id><title>New Paper Published </title><updated>2023-10-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper on Rgyalrongic languages with Yunfan Lai has now been published in a second version with Open Research Europe (DOI: <a href="https://doi.org/10.12688/openreseurope.16017.2">10.12688/openreseurope.16017.2</a>).</p>
<blockquote>
<p>As one of the most morphologically conservative branches of the Sino-Tibetan language family, most of the Rgyalrongic languages are still understudied and poorly understood, not to mention their vulnerable or endangered status. It is therefore important for available data of these languages to be made accessible. The lexical data sets the authors have assembled provide comparative word lists of 20 modern and medieval Rgyalrongic languages, consisting of word lists from fieldwork carried out by the first author and other colleagues as well as published word lists by other authors. In particular, data of the two Khroskyabs varieties were collected by the first author from 2011 to 2016. Cognate identification is based on the authors' expertise in Rgyalrong historical linguistics through application of the comparative method. We curated the data by conducting phonemic segmentation and partial cognate annotation. The data sets can be used by historical linguists interested in the etymology and the phylogeny of the languages in question, and they can use them to answer questions regarding individual word histories or the subgrouping of languages in this important branch of Sino-Tibetan.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-10-19-papers#2023-10-19-papers"/><published>2023-10-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-10-27-paper</id><title>New Paper Accepted </title><updated>2023-10-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new study was accepted (common work with Nathan W. Hill, Robert Forkel, and Frederic Blum), titled "Representing and computing uncertainty in phonological reconstruction". </p>
<blockquote>
<pre><code> Despite the inherently fuzzy nature of reconstructions in historical linguistics, most scholars do not represent their uncertainty when proposing proto-forms. With the increasing success of recently proposed approaches to automating certain aspects of the traditional comparative method, the formal representation of proto-forms has also improved. This formalization makes it possible to address both the representation and the computation of uncertainty. Building on recent advances in supervised phonological reconstruction, during which an algorithm learns how to reconstruct words in a given proto-language relying on previously annotated data, and inspired by improved methods for automated word prediction from cognate sets, we present a new framework that allows for the representation of uncertainty in linguistic reconstruction and also includes a workflow for the computation of fuzzy reconstructions from linguistic data.
</code></pre>
</blockquote>
<p>A preprint of this paper that will appear in December 2023 is now available from arXiv (DOI: <a href="https://doi.org/10.48550/arXiv.2310.12727">10.48550/arXiv.2310.12727</a>).</p></div></content><link href="https://calclab.org/?news=2023-10-27-paper#2023-10-27-paper"/><published>2023-10-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-10-31-blog</id><title>New Blog Posts </title><updated>2023-10-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two new blog posts were published in the past days. First, one blog post in German, discussing the role that preprints play nowadays ("Vom Vordrucken" <a href="https://wub.hypotheses.org/2111">https://wub.hypotheses.org/2111</a>). Second, a blog post with Olena Shcherbakova devoted to the investigation of taste colexifications ("Retrieving and Analyzing Taste Colexifications from Lexibank" <a href="https://calc.hypotheses.org/6398">https://calc.hypotheses.org/6398</a>).</p></div></content><link href="https://calclab.org/?news=2023-10-31-blog#2023-10-31-blog"/><published>2023-10-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-11-10-blog</id><title>Blog</title><updated>2023-11-10T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post was published on Wednesday, discussing the role that type setting plays in scientific work ("Auch das Auge liest mit", <a href="https://wub.hypotheses.org/21134">https://wub.hypotheses.org/2134</a>). </p></div></content><link href="https://calclab.org/?news=2023-11-10-blog#2023-11-10-blog"/><published>2023-11-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-11-16-blog</id><title>New Blog Post </title><updated>2023-11-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post was published on Wednesday, presenting how transcription systems are modeled in the Cross-Linguistic Transcription Systems reference catalog ("Parsing IPA Transcriptions with CLTS", <a href="https://calc.hypotheses.org/6546">https://calc.hypotheses.org/6546</a>). </p></div></content><link href="https://calclab.org/?news=2023-11-16-blog#2023-11-16-blog"/><published>2023-11-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-11-20-problems</id><title>New Article Preprint </title><updated>2023-11-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new article just appeared in preprint with Open Research Europe, discussing "Open Problems in Computational Historical Linguistics" (DOI: <a href="https://doi.org/10.12688/openreseurope.16804.1">0.12688/openreseurope.16804.1</a>). </p>
<blockquote>
<p>Problems constitute the starting point of all scientific research. The essay reflects on the different kinds of problems that scientists address in their research and discusses a list of 10 problems for the field of computational historical linguistics, that was proposed throughout 2019 in a series of blog posts. In contrast to problems identified in different contexts, these problems were considered to be solvable, but no solution could be proposed back then. By discussing the problems in the light of developments that have been made in the field during the past five years, a modified list is proposed that takes new insights into account but also finds that the majority of the problems has not yet been solved.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-11-20-problems#2023-11-20-problems"/><published>2023-11-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-11-25-article</id><title>New Article on Phoneme Inventories Published </title><updated>2023-11-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new article just appeared in the Journal of Language evolution (common work with Cormac Anderson, Tiago Tresoldi, Robert Forkel, Simon Greenhill, and Russell Gray, DOI: <a href="https://doi.org/10.1093/jole/lzad011">10.1093/jole/lzad011</a>). </p>
<blockquote>
<p>For over a century, the phoneme has played a central role in linguistic research. In recent years, collections of phoneme inventories, originally designed for cross-linguistic purposes, have increasingly been used in comparative studies involving neighbouring disciplines. Despite the extended application of this type of data, there has been no research into its comparability or tests of its reliability. In this study, we carry out a systematic comparison of nine popular phoneme inventory collections. We render them comparable by linking them to standardised formats for the handling of cross-linguistic datasets, develop new measures to test both size and similarity, and release the organised data in supplementary material. We find considerable differences in inventories supposedly representing the same language variety, both in terms of size and transcriptional choices. While some of these differences appear to be predictable, reflecting design decisions in the different collections, much of the observed variation is unsystematic. These results should sound a note of caution for comparative studies based on phoneme inventories, which we suggest need to take the question of comparability more seriously. We make a number of proposals for improving the comparability of phoneme inventories.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-11-25-article#2023-11-25-article"/><published>2023-11-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-12-05-article</id><title>New Article on Uncertainty in Linguistic Reconstruction </title><updated>2023-12-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new article just appeared in the proceedings of the workshop on language change, organized as part of the EMNLP conference in Singapur this year (with Nathan W. Hill, Robert Forkel, and Frederic Blum, URL: <a href="https://aclanthology.org/2023.lchange-1.3/">https://aclanthology.org/2023.lchange-1.3/</a>). </p>
<blockquote>
<p>Despite the inherently fuzzy nature of reconstructions in historical linguistics, most scholars do not represent their uncertainty when proposing proto-forms. With the increasing success of recently proposed approaches to automating certain aspects of the traditional comparative method, the formal representation of proto-forms has also improved. This formalization makes it possible to address both the representation and the computation of uncertainty. Building on recent advances in supervised phonological reconstruction, during which an algorithm learns how to reconstruct words in a given proto-language relying on previously annotated data, and inspired by improved methods for automated word prediction from cognate sets, we present a new framework that allows for the representation of uncertainty in linguistic reconstruction and also includes a workflow for the computation of fuzzy reconstructions from linguistic data.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2023-12-05-article#2023-12-05-article"/><published>2023-12-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2023-12-21-blog</id><title>Final Blog Post for the Year </title><updated>2023-12-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two days ago, my final blog post for the year was published. This time it was very short, discussing tones in Mandarin Chinese and how difficult it is to learn them ("Von pferdeschimpfenden Müttern", URL:https://wub.hypotheses.org/2159).</p></div></content><link href="https://calclab.org/?news=2023-12-21-blog#2023-12-21-blog"/><published>2023-12-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-01-02-preprint</id><title>New Preprint on Body Part Colexification Study</title><updated>2024-01-02T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Just before the turn of the new year, we submitted our study on body part colexifications. The study presents the first large-scale analysis of body part vocabularies across 1,028 languages. A preprint is available on PsyArXiv: <a href="https://osf.io/preprints/psyarxiv/tu74k">https://osf.io/preprints/psyarxiv/tu74k</a></p></div></content><link href="https://calclab.org/?news=2024-01-02-preprint#2024-01-02-preprint"/><published>2024-01-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-01-17-calcip</id><title>Computer-Assisted Language Comparison in Practice </title><updated>2024-01-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The blog "Computer-Assisted Language Comparison in Practice" has been posting various tutorials and small articles on linguistic data since 2018. From 2024 on, the blog will also be available as a journal. The journal "Computer-Assisted Language Comparison in Practice" (available at <a href="https://ojs3.uni-passau.de/index.php/calcip/index">https://ojs3.uni-passau.de/index.php/calcip/index</a>) will offer digital object identifiers and PDF versions of all blog contributions. More information can be found in an editorial post (together with Annika Tjuka) in the blog (URL: <a href="https://calc.hypotheses.org/6651">https://calc.hypotheses.org/6651</a>) and the new journal (DOI: <a href="https://doi.org/10.15475/calcip.2024.1.1">10.15475/calcip.2024.1.1</a>).</p></div></content><link href="https://calclab.org/?news=2024-01-17-calcip#2024-01-17-calcip"/><published>2024-01-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-01-18-paper</id><title>New dataset paper published</title><updated>2024-01-18T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today the paper "A comparative wordlist for investigating distant relations among languages in Lowland South America" appeared in <em>Scientific Data</em>. In this paper, we describe a new <a href="A comparative wordlist for investigating distant relations among languages in Lowland South America">dataset</a> on Panoan, Tacanan, and four other languages that have been claimed to be related to the former. We summarize the state-of-the-art of wordlist annotation and show how such CLDF datasets can easily be linked to others, such as Grambank. The article is available under the following DOI: <a href="https://doi.org/10.1038/s41597-024-02928-7">10.1038/s41597-024-02928-7</a></p></div></content><link href="https://calclab.org/?news=2024-01-18-paper#2024-01-18-paper"/><published>2024-01-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-01-22-blog</id><title>Anchor Points of Trust </title><updated>2024-01-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my German blog for January appeared online, discussing "Ankerpunkte des Vertrauens in den Fluten digitaler Information" (<a href="https://wub.hypotheses.org/2217">URL</a>).</p></div></content><link href="https://calclab.org/?news=2024-01-22-blog#2024-01-22-blog"/><published>2024-01-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-01-26-preprint</id><title>New Preprint on Productive Signs </title><updated>2024-01-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint is now available with Humanities Commons, presenting some new ideas regarding the handling of word families in computer-assisted language comparison. The study, titled "Productive Signs: Towards a Computer-Assisted Analysis of Evolutionary, Typological, and Cognitive Dimensions of Word Families" can be accessed at <a href="https://doi.org/10.17613/zfwr-sn25">10.17613/zfwr-sn25</a>.</p></div></content><link href="https://calclab.org/?news=2024-01-26-preprint#2024-01-26-preprint"/><published>2024-01-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-02-06-papers</id><title>New accepted papers </title><updated>2024-02-06T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two new papers that have recently been accepted for publication (as part of the SIGTYP 2024 workshop), are now available as preprints. </p>
<p>The first paper is a study by Jessica Nieder and myself in which we present a new proposal to model mutual intelligibility computationally.</p>
<blockquote>
<p>Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model's comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings. </p>
</blockquote>
<p>Preprint is available on <a href="https://arxiv.org/abs/2402.02915">arXiv</a>.</p>
<p>The second paper is a paper written with Luise Häuser, Gerhard Jäger, Taraka Rama, and Alexandros Stamatakis, discussing how well sound correspondences work in phylogenetic reconstruction:</p>
<blockquote>
<p>In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences. </p>
</blockquote>
<p>Preprint is available on <a href="https://arxiv.org/abs/2402.02807">arXiv</a>. </p></div></content><link href="https://calclab.org/?news=2024-02-06-papers#2024-02-06-papers"/><published>2024-02-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-02-15-article</id><title>New accepted paper </title><updated>2024-02-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My essay on <a href="https://open-research-europe.ec.europa.eu/articles/3-201/v1">Open problems in computational historical linguistics</a> has now been accepted with the journal Open Research Europe. I'll now have to respond to the four reviews and try to work in their comments for the final version.</p></div></content><link href="https://calclab.org/?news=2024-02-15-article#2024-02-15-article"/><published>2024-02-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-02-19-blog</id><title>New Blog Post on Visualizing Networks</title><updated>2024-02-19T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my blog post on how to visualize colexification networks in <a href="https://cytoscape.org/">Cytoscape</a> appeared. It is a tutorial for beginners who want to become familiar with the tool and learn how to get started. The post is available here: <a href="https://calc.hypotheses.org/6697">https://calc.hypotheses.org/6697</a></p></div></content><link href="https://calclab.org/?news=2024-02-19-blog#2024-02-19-blog"/><published>2024-02-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-02-20-grouping</id><title>New Preprint</title><updated>2024-02-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint with Frederic Blum, Nathan Hill, and Cristian Juárez appeared online with Open Research Europe, awaiting open peer review. The study is titled "Grouping sounds into evolving units for the purpose of historical language comparison" (DOI: <a href="https://doi.org/10.12688/openreseurope.16839.1">10.12688/openreseurope.16839.1</a>.</p>
<blockquote>
<p>Computer-assisted approaches to historical language comparison have made great progress during the past two decades. Scholars can now routinely use computational tools to annotate cognate sets, align words, and search for regularly recurring sound correspondences. However, computational approaches still suffer from a very rigid sequence model of the form part of the linguistic sign, in which words and morphemes are segmented into fixed sound units which cannot be modified. In order to bring the representation of sound sequences in computational historical linguistics closer to the research practice of scholars who apply the traditional comparative method, we introduce improved sound sequence representations in which individual sound segments can be grouped into evolving sound units in order to capture language-specific sound laws more efficiently. We illustrate the usefulness of this enhanced representation of sound sequences in concrete examples and complement it by providing a small software library that allows scholars to convert their data from forms segmented into sound units to forms segmented into evolving sound units and vice versa. 
Additionally, we were informed that our paper (with Robert Forkel and Guillaume Ségerer), titled "Linguistic Survey of India and Polyglotta Africana: Two Retrostandardized Digital Editions of Large Historical Collections of Multilingual Wordlists" was accepted for the COLING-LREC conference in Torino in May. </p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-02-20-grouping#2024-02-20-grouping"/><published>2024-02-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-02-21-draft</id><title>New Preprint</title><updated>2024-02-21T12:00:00+00:00</updated><author><name>M. Pulini</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>During the weekend, we submitted a new preprint (with Johann-Mattis List), which is now available with Humanities Commons, titled "Finding language-internal cognates in Old Chinese" (DOI: <a href="https://doi.org/10.17613/ftm2-3b58">10.17613/ftm2-3b58</a>). </p>
<blockquote>
<p>The investigation of language-internal cognates and word families in Chinese plays a central role in enhancing our understanding of Old Chinese phonology and morphology, as well as constituting a key element for fostering our knowledge of the history of Sino- Tibetan languages. Here we provide an overview of common challenges encountered when searching for language-internal cognates in Old Chinese. We identify three major problems in this endeavor. An epistemological problem arises from varying definitions of word families among scholars, a heuristic problem results from the scarcity of shared workflows for word family identification, and a representation problem follows from the absence of standards for data handling and analysis. While ultimate solutions remain elusive, three suggestions are proposed to enhance future research on Chinese word families. These include advocating for a stricter separation between words and their written representations in Chinese characters, investing time and collaborative efforts in establishing consistent annotation schemes for Chinese word families, and promoting the integration and standardization of data from neighboring languages.</p>
</blockquote>
<p>Additionally, we were notified today, that our paper submitted to the LREC-COLING conference was accepted, titled "First Steps Towards the Integration of Resources on Historical Glossing Traditions in the History of Chinese: A Collection of Standardized Fǎnqiè Spellings from the Guǎngyùn" (also with Johann-Mattis List). </p></div></content><link href="https://calclab.org/?news=2024-02-21-draft#2024-02-21-draft"/><published>2024-02-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-02-27-dhd</id><title>New Conference Paper </title><updated>2024-02-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our conference paper (by Robert Forkel and myself), submitted to the DHd 2024 conference, co-organized by the Chair fo Multilingual Computational Linguistics, has appeared online now. In this paper, titled "Cross-Linguistic Data Formats (CLDF): D'où Venons Nous? Que Sommes Nous? Où Allons Nous?" (DOI: <a href="https://doi.org/10.5281/zenodo.10698325">10.5281/zenodo.10698325</a>) we present how the Cross-Linguistic Data Formats were established and how we think they could further develop in the next years. </p>
<blockquote>
<p>Seit nun mehr zehn Jahren entwickeln wir in Kollaboration mit einer Vielzahl von Forschenden im Bereich der vergleichenden Sprachwissenschaft die sogenannten Cross-Linguistic Data Formats (CLDF), eine Sammlung von Standards, die -- basierend auf tabellarischen Datenformaten -- dazu dient, den großen Wissenschatz, den die linguistische Forschung in den letzten 200 Jahren erschlossen hat, so aufzubereiten, dass er systematisch aggregiert, mit anderen Datensätzen integriert, und transparent analysiert werden kann. Trotz anfänglicher Schwierigkeiten hat sich unser Bemühen als sehr erfolgreich erwiesen, auch wenn manches, von dem wir zuerst dachten, es sei leicht zu realisieren, sich als äußerst komplziert herausgestellt hat. Heute schon liegen in CLDF die größten lexikalischen und typologisch-grammatischen Sammlungen an Sprachdaten vor, und ein Ende ist bisher nicht in Sicht. In unserer Studie stellen wir vor, wie CLDF zu dem wurde, was es heute ist, und wo wir die Standardformate in der Zukunft sehen.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-02-27-dhd#2024-02-27-dhd"/><published>2024-02-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-03-misol</id><title>New Preprint and Software Tool </title><updated>2024-03-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I deposited a new preprint that presents a new method that models sound change in ordered layers of simultaneous sound laws (DOI: <a href="https://doi.org/10.17613/4n5z-9y52">0.17613/4n5z-9y52</a>).</p>
<blockquote>
<p>In historical linguistics, sound change is typically modeled with the help of linearly arranged replacement rules that scan over an input sequence in a fixed order, converting the initial sequence in an iterative manner until all sound laws are exhausted. Arguing that this model of cascades of sound laws falls short in many regards, this study proposes a new model of sound change in which sound laws are grouped into linearly arranged layers in which sound change proceeds simultaneously. Illustrating how this model can be implemented with the help of an open, web-based tool, several examples are shown to prove the usefulness of the new model.</p>
</blockquote>
<p>The tool presented in this study has also been published in a first stable version (Version 0.2) and can be accessed at <a href="https://misol.edictor.org">https://misol.edictor.org</a>.</p></div></content><link href="https://calclab.org/?news=2024-03-03-misol#2024-03-03-misol"/><published>2024-03-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-04-preprint</id><title>New Preprint</title><updated>2024-03-04T12:00:00+00:00</updated><author><name>K. Bocklage</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint (currently under review), titled "Directional Tendencies in Semantic Change" is now available from Humanities Commons (DOI: <a href="https://doi.org/10.17613/0y0r-f341">10.17613/0y0r-f341</a>). We investigate to which degree semantic motivation patterns in word formation reflect patterns in semantic change (common work with Anna Di Natale, Annika Tjuka and Johann-Mattis List).</p>
<blockquote>
<p>Due to its complexity, scholars have often hesitated to establish relative chronologies in semantic change. While occasionally researchers postulated universal directions of semantic change, concrete proposals for inferring or estimating these from lexical data are rare. According to one hypothesis, however, cross-linguistic directional tendencies in semantic motivation underlying word formation could provide direct hints regarding directional tendencies of semantic change. We revisit this idea using new data from independent sources and new methods for data analysis and exploration. Our results show that there is only a small overlap with respect to concrete processes of semantic change and concrete processes of semantic motivation. For this small overlap, however, we find positive correlations when comparing weight ratios of semantic change and semantic motivation data, and we also receive precision values exceeding 0.5 when trying to predict directional tendencies of semantic change from semantic motivation patterns. This indicates that, while semantic change and semantic motivation are generally distinct processes, there are certain cross-linguistic tendencies in semantic motivation that can provide hints regarding directional tendencies of semantic change.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-03-04-preprint#2024-03-04-preprint"/><published>2024-03-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-05-paper</id><title>Article Published in Cognitive Science</title><updated>2024-03-05T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Together with my colleague Yoolim Kim, we published an article entitled "Cognitive Science From the Perspective of Linguistic Diversity" in Cognitive Science. The article is part of the letter series "Progress &amp; Puzzles of Cognitive Science". We address the question of the comparability of word meanings in different languages and the neglect of an integrated approach to writing systems. The article is available here: <a href="https://doi.org/10.1111/cogs.13418">https://doi.org/10.1111/cogs.13418</a></p></div></content><link href="https://calclab.org/?news=2024-03-05-paper#2024-03-05-paper"/><published>2024-03-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-08-evobib</id><title>New Release of EvoBib </title><updated>2024-03-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I released a new version of <a href="https://digling.org/evobib">EvoBib</a> (Version 1.7.0). Since its last release in November 2022, the database has been extended further by adding several hundres of quotes and also expanding the literature.</p></div></content><link href="https://calclab.org/?news=2024-03-08-evobib#2024-03-08-evobib"/><published>2024-03-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-19-papers</id><title>New Papers </title><updated>2024-03-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two new papers have now been officially published as part of the SIGTYP workshop in Malta.</p>
<p>First, there is a paper by Jessica Nieder and myself, discussing mutual intelligibility and proposing a way to model it computationally (URL <a href="https://aclanthology.org/2024.sigtyp-1.4/">https://aclanthology.org/2024.sigtyp-1.4/</a>).</p>
<blockquote>
<p>Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model’s comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings.</p>
</blockquote>
<p>Then there is a paper by Luise Häuser, Gerhard Jäger, Taraka Rama, myself, and Alexandros Stamatakis, discussing the usefulness of phylogenetic reconstruction with sound correspondences (URL: <a href="https://aclanthology.org/2024.sigtyp-1.11/">https://aclanthology.org/2024.sigtyp-1.11/</a>).</p>
<blockquote>
<p>In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-03-19-papers#2024-03-19-papers"/><published>2024-03-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-20-release</id><title>Concepticon Release Version 3.2</title><updated>2024-03-20T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We released a new version of <a href="https://concepticon.clld.org/">Concepticon</a>. Version 3.2 contains 17 new concept lists and improvements to concept mappings. We also included an updated format for the representation of concept lists containing networks so that they are handled more uniformly. The CLDF dataset of Concepticon v3.2 is available here: <a href="https://github.com/concepticon/concepticon-cldf/tree/v3.2.0">https://github.com/concepticon/concepticon-cldf/tree/v3.2.0</a>. </p></div></content><link href="https://calclab.org/?news=2024-03-20-release#2024-03-20-release"/><published>2024-03-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-22-fanqie</id><title>New Paper on Fǎnqiè Spellings</title><updated>2024-03-22T12:00:00+00:00</updated><author><name>M. Pulini</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new paper, titled "First Steps Towards the Integration of Resources on Historical Glossing Traditions in the History of Chinese: A Collection of Standardized Fǎnqiè Spellings from the Guǎngyùn" was published as a preprint with Humanities Commons (DOI: <a href="https://doi.org/10.17613/q3yt-pd95">10.17613/q3yt-pd95</a>). It will appear in the proceedings of the LREC-COLING conference in Torino in May. </p>
<blockquote>
<p>Due to the peculiar nature of the Chinese writing system, it is difficult to assess the pronunciation of historical varieties of Chinese. In order to reconstruct ancient pronunciations, historical glossing practices play a crucial role. However, although studied thoroughly by numerous scholars, most research has been carried out in a qualitative manner, and no attempt at providing integrated resources of historical glossing practices has been made so far. Here, we present a first step towards the integration of resources on historical glossing traditions in the history of Chinese. Our starting point are so-called fǎnqiè spellings in the Guǎngyùn, one of the early rhyme books in the history of Chinese, providing pronunciations for more than 20000 Chinese characters. By standardizing digital versions of the resource using tools from computational historical linguistics, we show that we can predict historical spellings with high precision and at the same time shed light on the precision of ancient glossing practices. Although a considerably small first step, our resource could be the starting point for an integrated, standardized collection that could ultimately shed new light on the history of Chinese.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-03-22-fanqie#2024-03-22-fanqie"/><published>2024-03-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-27-blogpost</id><title>New Blog Post </title><updated>2024-03-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My blog post for March was now published, this time dealing with literature for children and rhyme patterns, <a href="https://wub.hypotheses.org/2307">Von Falschen Rhymen</a>.</p></div></content><link href="https://calclab.org/?news=2024-03-27-blogpost#2024-03-27-blogpost"/><published>2024-03-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-03-28-preprint</id><title>New Preprint for Study on Partial Body-Object Colexifications</title><updated>2024-03-28T12:00:00+00:00</updated><author><name>A. Tjuka</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We submitted a paper entitled "Partial Colexifications Reveal Directional Tendencies in Object Naming". The study represents the first cross-linguistic investigation of partial colexifications between body and object concepts. We address the question of how meaning is extended between two concrete domains. The preprint is available here: <a href="https://doi.org/10.31234/osf.io/hc3j5">https://doi.org/10.31234/osf.io/hc3j5</a></p></div></content><link href="https://calclab.org/?news=2024-03-28-preprint#2024-03-28-preprint"/><published>2024-03-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-04-09-papers</id><title>New Accepted Papers and New Team Members</title><updated>2024-04-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This week, we welcome three new members to our team. First, Christian Bentz joined the chair as an independent research group leader with his ERC Starting Grant project <a href="https://www.erc-evine.de">EVINE</a>. Then, Alžběta Kučerová joined our ProduSemy project as a PhD student, and finally, David Snee is now enrolled with us as an independent PhD student (he will join us officially as member of the ProduSemy project in October).</p>
<p>We also received notification that two more papers have been accepted. Our study on cognates in Chinese with Michele Pulini was accepted with the Bulletin of Chinese Linguistics, and my review study on Productive Signs was accepted for the edited volume accompanying the focus streams of the International Conference of Linguists in September 2024. </p></div></content><link href="https://calclab.org/?news=2024-04-09-papers#2024-04-09-papers"/><published>2024-04-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-04-24-blogpost</id><title>New Blog Post online</title><updated>2024-04-24T12:00:00+00:00</updated><author><name>K. Bocklage</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, our new blog post about the representation of Zalizniak et al.'s (2024) Catalogue of Semantic Shifts in CLDF appeared. We explain step by step how we converted the data into the standardized format. You can find it <a href="https://calc.hypotheses.org/7060">here</a> (DOI: <a href="https://doi.org/10.15475/calcip.2024.1.4">10.15475/calcip.2024.1.4</a>). </p></div></content><link href="https://calclab.org/?news=2024-04-24-blogpost#2024-04-24-blogpost"/><published>2024-04-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-04-29-blogpost</id><title>New Blog Post </title><updated>2024-04-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My blog post for April was now published, this time dealing with deadlines, <a href="https://wub.hypotheses.org/2345">Von toten Linien</a>.</p></div></content><link href="https://calclab.org/?news=2024-04-29-blogpost#2024-04-29-blogpost"/><published>2024-04-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-08-bodyparts</id><title>New Study on Body Part Semantics Appeared </title><updated>2024-05-08T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our study on body part semantics by Annika Tjuka, Robert Forkel, and myself has now appeared online with Scientific Reports (URL <a href="https://www.nature.com/articles/s41598-024-61140-0">here</a>).</p>
<blockquote>
<p>Every human has a body. Yet, languages differ in how they divide the body into parts to name them. While universal naming strategies exist, there is also variation in the vocabularies of body parts across languages. In this study, we investigate the similarities and differences in naming two separate body parts with one word, i.e., colexifications. We use a computational approach to create networks of body part vocabularies across languages. The analyses focus on body part networks in large language families, on perceptual features that lead to colexifications of body parts, and on a comparison of network structures in different semantic domains. Our results show that adjacent body parts are colexified frequently. However, preferences for perceptual features such as shape and function lead to variations in body part vocabularies. In addition, body part colexification networks are less varied across language families than networks in the semantic domains of emotion and colour. The study presents the first large-scale comparison of body part vocabularies in 1,028 language varieties and provides important insights into the variability of a universal human domain.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-05-08-bodyparts#2024-05-08-bodyparts"/><published>2024-05-08T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-10-preprint</id><title>New Preprint</title><updated>2024-05-10T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper "Generating Feature Vectors from Phonetic Transcriptions in Cross-Linguistic Data Formats" (with Jessica Nieder, Robert Forkel, and Johann-Mattis List) has recently been accepted for the <a href="https://sites.uci.edu/scil2024/">SCiL 2024 conference</a>. The preprint is now available on arXiv: <a href="https://doi.org/10.48550/arXiv.2405.04271">https://doi.org/10.48550/arXiv.2405.04271</a></p>
<blockquote>
<p>When comparing speech sounds across languages, scholars often make use of feature representations of individual sounds in order to determine fine-grained sound similarities. Although binary feature systems for large numbers of speech sounds have been proposed, large-scale computational applications often face the challenges that the proposed feature systems -- even if they list features for several thousand sounds -- only cover a smaller part of the numerous speech sounds reflected in actual cross-linguistic data. In order to address the problem of missing data for attested speech sounds, we propose a new approach that can create binary feature vectors dynamically for all sounds that can be represented in the the standardized version of the International Phonetic Alphabet proposed by the Cross-Linguistic Transcription Systems (CLTS) reference catalog. Since CLTS is actively used in large data collections, covering more than 2,000 distinct language varieties, our procedure for the generation of binary feature vectors provides immediate access to a very large collection of multilingual wordlists. Testing our feature system in different ways on different datasets proves that the system is not only useful to provide a straightforward means to compare the similarity of speech sounds, but also illustrates its potential to be used in future cross-linguistic machine learning applications.</p>
</blockquote>
<p>The presented Python package `soundvectors' can be installed via <a href="https://pypi.org/project/soundvectors/">pip</a>, the source code is available on <a href="https://github.com/cldf-clts/soundvectors">GitHub</a>.</p></div></content><link href="https://calclab.org/?news=2024-05-10-preprint#2024-05-10-preprint"/><published>2024-05-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-21-papers</id><title>New Papers Published </title><updated>2024-05-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two new studies have now appeared officially as part of the joined LREC / COLING conference in Torino.</p>
<p>A paper by Robert Forkel, myself, Christoph Rzymski and Guillaume Ségerer presents <a href="https://aclanthology.org/2024.lrec-main.925/">Linguistic Survey of India and Polyglotta Africana: Two Retrostandardized Digital Editions of Large Historical Collections of Multilingual Wordlists</a>.</p>
<blockquote>
<p>The Linguistic Survey of India (LSI) and the Polyglotta Africana (PA) are two of the largest historical collections of multilingual wordlists. While the originally printed editions have long since been digitized and shared in various forms, no editions in which the original data is presented in standardized form, comparable with contemporary wordlist collections, have been produced so far. Here we present digital retro-standardized editions of both sources. For maximal interoperability with datasets such as Lexibank the two datasets have been converted to CLDF, the standard proposed by the Cross-Linguistic Data Formats initiative. In this way, an unambiguous identification of the three main constituents of wordlist data – language, concept and segments used for transcription – is ensured through links to the respective reference catalogs, Glottolog, Concepticon and CLTS. At this level of interoperability, legacy material such as LSI and PA may provide a reasonable complementary source for language documentation, filling in gaps where original documentation is not possible anymore.</p>
</blockquote>
<p>A paper by Michele Pulini and myself presents <a href="https://aclanthology.org/2024.lrec-main.646/">First Steps Towards the Integration of Resources on Historical Glossing Traditions in the History of Chinese: A Collection of Standardized Fǎnqiè Spellings from the Guǎngyùn</a>.</p>
<blockquote>
<p>Due to the peculiar nature of the Chinese writing system, it is difficult to assess the pronunciation of historical varieties of Chinese. In order to reconstruct ancient pronunciations, historical glossing practices play a crucial role. However, although studied thoroughly by numerous scholars, most research has been carried out in a qualitative manner, and no attempt at providing integrated resources of historical glossing practices has been made so far. Here, we present a first step towards the integration of resources on historical glossing traditions in the history of Chinese. Our starting point are so-called fǎnqiè spellings in the Guǎngyùn, one of the early rhyme books in the history of Chinese, providing pronunciations for more than 20000 Chinese characters. By standardizing digital versions of the resource using tools from computational historical linguistics, we show that we can predict historical spellings with high precision and at the same time shed light on the precision of ancient glossing practices. Although a considerably small first step, our resource could be the starting point for an integrated, standardized collection that could ultimately shed new light on the history of Chinese.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-05-21-papers#2024-05-21-papers"/><published>2024-05-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-22-paper</id><title>New Study Published </title><updated>2024-05-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>As part of the SIGUL 2024 workshop organized with the LREC / COLING conference in Torino this year, a new study by Frederic Blum, Johannes Englisch, Alba Hermida-Rodríguez, Rik van Gijn, and myself has now been published, presenting a new approach for the <a href="https://sigul-2024.ilc.cnr.it/wp-content/uploads/2024/05/Blum-et-al.pdf">Resource Acquisition for Understudied Languages</a>. </p></div></content><link href="https://calclab.org/?news=2024-05-22-paper#2024-05-22-paper"/><published>2024-05-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-27-blogpost</id><title>New blog post online</title><updated>2024-05-27T12:00:00+00:00</updated><author><name>K. van Dam</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post title <em>Implementing Fuzzy Spelling Search in Dictionaries of Under-Described Languages Lacking Standard Orthographies</em> appeared. In this post, different approaches to implementing fuzzy string matching for an online dictionary are discussed, focusing on the issue of non-standard writing systems for under-resourced languages. A simple finite state transducer is presented as a good approach, with sample code and a minimal working example. It can be found <a href="https://calc.hypotheses.org/7160">here</a> (DOI: <a href="https://doi.org/10.15475/calcip.2024.1.5">10.15475/calcip.2024.1.5</a>). </p></div></content><link href="https://calclab.org/?news=2024-05-27-blogpost#2024-05-27-blogpost"/><published>2024-05-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-29-post</id><title>New Blog Post </title><updated>2024-05-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On Sunday, a new blog post in German appeared, "Von feinen Zügen unterm Radar", discussing the phenomenon of aphantasia. You can find it <a href="https://wub.hypotheses.org/2363">here</a>.</p></div></content><link href="https://calclab.org/?news=2024-05-29-post#2024-05-29-post"/><published>2024-05-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-05-31-paper</id><title>New Paper </title><updated>2024-05-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Having passed review successfully, my paper on <a href="https://doi.org/10.12688/openreseurope.16804.2">Open Problems in Computational Historical Linguistics</a> has now appeared in its final version (<a href="https://doi.org/10.12688/openreseurope.16804.2">DOI</a>). </p>
<blockquote>
<p>Problems constitute the starting point of all scientific research. The essay reflects on the different kinds of problems that scientists address in their research and discusses a list of 10 problems for the field of computational historical linguistics, that was proposed throughout 2019 in a series of blog posts (see http://phylonetworks.blogspot.com/). In contrast to problems identified in different contexts, these problems were considered to be solvable, but no solution could be proposed back then. By discussing the problems in the light of developments that have been made in the field during the past five years, a modified list is proposed that takes new insights into account but also finds that the majority of the problems has not yet been solved.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-05-31-paper#2024-05-31-paper"/><published>2024-05-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-06-24-blogpost</id><title>New Blog Post </title><updated>2024-06-24T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new German blog post appeared, this time discussing the redundancy in language, allowing us to encode the same message in multiple different ways. The post is titled "Fünf vor zwölf mit halbleerem Glas" (URL: <a href="https://wub.hypotheses.org/2384">https://wub.hypotheses.org/2384</a>).</p></div></content><link href="https://calclab.org/?news=2024-06-24-blogpost#2024-06-24-blogpost"/><published>2024-06-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-06-25-paper</id><title>Grouping Sounds Paper Accepted </title><updated>2024-06-25T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our Grouping Sounds paper with Frederic Blum, Nathan Hill and Cristian Juárez has now been officially accepted with Open Research Europe (DOI: <a href="https://doi.org/10.12688/openreseurope.16839.1">10.12688/openreseurope.16839.1</a>). Having passed review means we will write one revision of the study in which we account for minor remarks by the reviewers in the next week. </p></div></content><link href="https://calclab.org/?news=2024-06-25-paper#2024-06-25-paper"/><published>2024-06-25T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-07-01-soundvectors</id><title>Study on Sound Vectors Published</title><updated>2024-07-01T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper "Generating Feature Vectors from Phonetic Transcriptions in Cross-Linguistic Data Formats" (together with Jessica Nieder, Robert Forkel, and Johann-Mattis List) has been published last week as part of the <em>Proceedings of the 2024 Meeting of the Society for Computation in Linguistics (SCiL)</em> (DOI: <a href="https://doi.org/10.7275/scil.2144">10.7275/scil.2144</a>).</p></div></content><link href="https://calclab.org/?news=2024-07-01-soundvectors#2024-07-01-soundvectors"/><published>2024-07-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-07-18-blogpost</id><title>Grouping Sounds Paper Accepted </title><updated>2024-07-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new blog post was published in our CALC blog and journal. The post is titled "Converting an Artificial Proto-Language into Data for Testing Computational Approaches in Historical Linguistics" and shows how data for an artificially created language can be automatically retrieved and converted to formats that allow to compare the data with other datasets. </p>
<blockquote>
<p>This small study shows how data for an artificially created language that was supposed to reflect features of “proto-languages”, predating modern languages by several thousand years, can be used in testing computational approaches in historical linguistics. In order to do so,  computational workflow is described that retrieves the data automatically, creating a comparative wordlist compatible in format with software tools for historical linguistics, and then uses a baseline method for automatic cognate detection to compare an artificial language against a sample of Indo-European languages.  The results show that artificial languages might help to fill a gap in testing that has so far been ignored in the literature.</p>
</blockquote>
<p>The post can be found online <a href="https://calc.hypotheses.org/7363">here</a> or in article form via its <a href="https://doi.org/10.15475/calcip.2024.2.1">DOI</a>. </p></div></content><link href="https://calclab.org/?news=2024-07-18-blogpost#2024-07-18-blogpost"/><published>2024-07-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-08-06-blogpost</id><title>New Blog Post Published</title><updated>2024-08-06T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new blogpost for <em>Computer-Assisted Language Comparison in Practice</em> was published. It is titled "Generating Phonological Feature Vectors with SoundVectors and CLTS" and introduces the recently released Python library SoundVectors, briefly comparing it to PHOIBLE and PanPhon.</p>
<blockquote>
<p>The recently published Python library soundvectors offers a simple and robust method to derive phonological feature vectors for any valid IPA sound via its canonical description. It is designed to interact neatly with the Cross-Linguistic Transcription Systems reference catalog (CLTS), which dynamically parses valid strings in phonetic transcription to describe speech sounds. This study illustrates how both systems can be used together to generate phonological feature vectors for all kinds of sounds without relying on a previously defined lookup table. Additionally, it compares the generated feature vectors with those obtained from two other prominent databases, PanPhon and PHOIBLE, showing how those systems can be accessed from the CLTS data via its Python API pyclts.</p>
</blockquote>
<p>The post can be found online <a href="https://calc.hypotheses.org/7224">here</a> or in article form via its <a href="https://doi.org/10.15475/calcip.2024.2.2">DOI</a>. </p></div></content><link href="https://calclab.org/?news=2024-08-06-blogpost#2024-08-06-blogpost"/><published>2024-08-06T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-08-12-edictor</id><title>Paper in ACL Workshop Proceedings </title><updated>2024-08-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, with the beginning of the ACL conference in Bangkok, our paper presenting <a href="https://edictor.org">EDICTOR 3</a> appeared as part of the proceedings of the LChange workshop this year. The paper presents major features that made it into EDICTOR 3 and points to the major improvements that made it into the new version of the EDICTOR application.</p>
<blockquote>
<p>Computer-assisted approaches to historical and typological language comparison have made great progress over the past two decades.  Specifically for the classical tasks of historical language comparison, many computational methods have been published that mimic certain steps of the traditional workflow of the comparative method. In contrast to the diver- sity of new computational methods, there is only a limited number of interactive tools and interfaces that help scholars to curate and refine their data both before and after the ap- plication of computational methods. One of the few publicly available interfaces is EDICTOR (https://edictor.org), an interactive tool for computer-assisted language comparison.  EDICTOR has been around for some time, and allows scholars to annotate and align cognate sets in various ways. With EDICTOR 3, the original tool has been enhanced, offering not only new features for data annotation, but also providing the possibility to use purely automatic methods for initial cognate detection, phonetic alignment, and correspondence pattern inference in an integrated workflow.</p>
</blockquote>
<p>The paper can be accessed <a href="https://aclanthology.org/2024.lchange-1.1/">here</a>. EDICTOR 3 is now also available as a Python package on <a href="https://pypi.org/project/edictor">PyPi</a>. </p></div></content><link href="https://calclab.org/?news=2024-08-12-edictor#2024-08-12-edictor"/><published>2024-08-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-08-19-blogpost</id><title>Paper in ACL Workshop Proceedings </title><updated>2024-08-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, with the beginning of the ACL conference in Bangkok, our paper presenting <a href="https://edictor.org">EDICTOR 3</a> appeared as part of the proceedings of the LChange workshop this year. The paper presents major features that made it into EDICTOR 3 and points to the major improvements that made it into the new version of the EDICTOR application.</p>
<blockquote>
<p>Computer-assisted approaches to historical and typological language comparison have made great progress over the past two decades.  Specifically for the classical tasks of historical language comparison, many computational methods have been published that mimic certain steps of the traditional workflow of the comparative method. In contrast to the diver- sity of new computational methods, there is only a limited number of interactive tools and interfaces that help scholars to curate and refine their data both before and after the ap- plication of computational methods. One of the few publicly available interfaces is EDICTOR (https://edictor.org), an interactive tool for computer-assisted language comparison.  EDICTOR has been around for some time, and allows scholars to annotate and align cognate sets in various ways. With EDICTOR 3, the original tool has been enhanced, offering not only new features for data annotation, but also providing the possibility to use purely automatic methods for initial cognate detection, phonetic alignment, and correspondence pattern inference in an integrated workflow.</p>
</blockquote>
<p>The paper can be accessed <a href="https://aclanthology.org/2024.lchange-1.1/">here</a>. EDICTOR 3 is now also available as a Python package on <a href="https://pypi.org/project/edictor">PyPi</a>. </p></div></content><link href="https://calclab.org/?news=2024-08-19-blogpost#2024-08-19-blogpost"/><published>2024-08-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-08-19-posts</id><title>New Blog Post</title><updated>2024-08-19T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Last week, my German blog post for August appeared, this time discussing citation practice in newspapers (see <a href="https://wub.hypotheses.org/2474">Kopflose Fußnoten</a>).</p></div></content><link href="https://calclab.org/?news=2024-08-19-posts#2024-08-19-posts"/><published>2024-08-19T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-08-21-paper</id><title>New Paper Published </title><updated>2024-08-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper that introduces how sounds can be grouped into evolving units (with Frederic Blum, Nathan W. Hill, and Cristian Juárez) has now appeared online in its final version with Open Research Europe (DOI: <a href="https://doi.org/10.12688/openreseurope.16839.2">10.12688/openreseurope.16839.2</a>).</p>
<blockquote>
<p>Computer-assisted approaches to historical language comparison have made great progress during the past two decades. Scholars can now routinely use computational tools to annotate cognate sets, align words, and search for regularly recurring sound correspondences. However, computational approaches still suffer from a very rigid sequence model of the form part of the linguistic sign, in which words and morphemes are segmented into fixed sound units which cannot be modified. In order to bring the representation of sound sequences in computational historical linguistics closer to the research practice of scholars who apply the traditional comparative method, we introduce improved sound sequence representations in which individual sound segments can be grouped into evolving sound units in order to capture language-specific sound laws more efficiently. We illustrate the usefulness of this enhanced representation of sound sequences in concrete examples and complement it by providing a small software library that allows scholars to convert their data from forms segmented into sound units to forms segmented into evolving sound units and vice versa.</p>
</blockquote>
<p>In addition to the reviews, sound grouping has now also been fully integrated as a feature in <a href="https://edictor.org">EDICTOR 3</a>.</p></div></content><link href="https://calclab.org/?news=2024-08-21-paper#2024-08-21-paper"/><published>2024-08-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-09-03-blogpost</id><title>New Blogpost Appeared </title><updated>2024-09-03T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, a new blog post in our <a href="https://calc.hypotheses.org">CALCiP</a> appeared, titled "Adding Standardized Transcriptions to Panoan and Tacanan Languages in the Intercontinental Dictionary Series" (see either for its <a href="https://calc.hypotheses.org/7503">URL</a> or the <a href="https://doi.org/10.15475/calcip.2024.2.3">DOI</a>).</p>
<blockquote>
<p>In this study, we illustrate how standardized phonetic transcriptions can be added to the data for Panoan and Tacanan languages provided by the Intercontinental Dictionary Series. The result is presented as a new dataset that keeps reference to the original data and adds phonetic transcriptions for each word form in Panoan languages, Tacanan languages, as well as Spanish and Portuguese.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-09-03-blogpost#2024-09-03-blogpost"/><published>2024-09-03T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-09-05-interview</id><title>Interview About the ORE Languages and Literature Gateway  </title><updated>2024-09-05T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already in July, an interview in which I answered question on the <a href="https://open-research-europe.ec.europa.eu/gateways/languages-and-literature/about">Languages and Literature Gateway</a> with <a href="https://open-research-europe.ec.europa.eu/">Open Research Europe</a> (ORE), appeared. The interview was published on the ORE blog and can be found <a href="https://open-research-europe.ec.europa.eu/blog/introducing-languages-and-literature-community-gateway">here</a>. </p></div></content><link href="https://calclab.org/?news=2024-09-05-interview#2024-09-05-interview"/><published>2024-09-05T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-09-09-blogpost</id><title>New Blog Post</title><updated>2024-09-09T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, my monthly German blog post appeared, this time dealing with the question of bringing things in order and searching for them: <a href="https://wub.hypotheses.org/2496">Vom Ordnen und Suchen</a>. In this post, I also introduce a new way to search for articles by typing key words from their titles into a search prompt, which I integrated in my professional website (see <a href="https://lingulist.de/articles.html">here</a>). </p></div></content><link href="https://calclab.org/?news=2024-09-09-blogpost#2024-09-09-blogpost"/><published>2024-09-09T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-09-11-podcast</id><title>Podcast about language evolution</title><updated>2024-09-11T12:00:00+00:00</updated><author><name>C. Bentz</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I was interviewed for a podcast (MDR Wissen: Große Fragen in zehn Minuten) about language evolution (in German).
(https://www.ardaudiothek.de/episode/grosse-fragen-in-zehn-minuten-von-mdr-wissen/warum-ist-sprache-entstanden/mdr/13691975/)</p></div></content><link href="https://calclab.org/?news=2024-09-11-podcast#2024-09-11-podcast"/><published>2024-09-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-09-20-paper</id><title>New Paper Published</title><updated>2024-09-20T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper (with Simonetta Montemagni and John Nerbonne) presenting an information-theoric method for detecting characteristic phonetic correspondences in dialectal data, including a case study on Tuscan, has now appeared online in its final version with Language Dynamics and Change (DOI: <a href="https://doi.org/10.1163/22105832-bja10034">10.1163/22105832-bja10034</a>).</p>
<blockquote>
<p>We present a novel approach to identifying individual pairs of phonetic correspondences in a dataset of dialect pronunciations. This continues work identifying shibboleths (i.e., characteristic features of a given dialect), a category that has interested dialectology and that dialectometrical research has examined mostly in the form of categorical data or entire phonetic transcriptions. This article reaches into segmental sequences (phonetic transcriptions) to identify individual phonetic correspondences. We follow earlier work in examining how distinctive and how representative a given phonetic correspondence is for a selected group of varieties. We proceed from string alignments, and innovate in characterizing the important notions via information theory. Despite minor problems, the method improves on the generality of competing approaches and can be shown to be useful in detecting characteristic phonetic correspondences in Tuscan varieties. We argue that this facilitates deeper investigation into the relation between aggregating approaches to dialectology and approaches proceeding from features.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-09-20-paper#2024-09-20-paper"/><published>2024-09-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-09-24-paper</id><title>New Paper Published</title><updated>2024-09-24T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper showing that 'Consonant lengthening marks the beginning of words across a diverse sample of languages' has now appeared online with nature Human Behaviour (DOI: <a href="https://doi.org/10.1038/s41562-024-01988-4">10.1038/s41562-024-01988-4</a>). In a sample of 51 diverse languagers, we have found that word-initial consonants are systematically longer than their counterparts in other positions. While the study only analyzes observational data, we think this might be one of several cues for segmenting the acoustic stream into words - present in possibly most of the world's languages.</p>
<blockquote>
<p>Speech consists of a continuous stream of acoustic signals, yet humans can segment words and other constituents from each other with astonishing precision. The acoustic properties that support this process are not well understood and remain understudied for the vast majority of the world’s languages, in particular regarding their potential variation. Here we report cross-linguistic evidence for the lengthening of word-initial consonants across a typologically diverse sample of 51 languages. Using Bayesian multilevel regression, we find that on average, word-initial consonants are about 13 ms longer than word-medial consonants. The cross-linguistic distribution of the effect indicates that despite individual differences in the phonology of the sampled languages, the lengthening of word-initial consonants is a widespread strategy to mark the onset of words in the continuous acoustic signal of human speech. These findings may be crucial for a better understanding of the incremental processing of speech and speech segmentation.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-09-24-paper#2024-09-24-paper"/><published>2024-09-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-10-04-team</id><title>New Team Members </title><updated>2024-10-04T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our chair welcomes two new team members. David Snee, who has already started with his dissertation with us, has officially joined the ProduSemy project in September. From October on, Dr. Luca Ciucci will join the ProduSemy project, working on word families in South American languages. We welcome both team members in our team and hope for a fruitful collaboration.</p></div></content><link href="https://calclab.org/?news=2024-10-04-team#2024-10-04-team"/><published>2024-10-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-10-07-blogpost</id><title>New blog post online</title><updated>2024-10-07T12:00:00+00:00</updated><author><name>K. van Dam</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post title <em>Preparing Acoustic Pitch Data for Computational Analysis and Presentation</em> appeared. This study presents the issues with using raw pitch data as Hertz values some historical efforts to resolve these issues, and two more appropriate solutions than some of the more widely used systems, with a way to easily calculate these alternative systems in a short Python script.. It can be found <a href="https://calc.hypotheses.org/7160">here</a> (DOI: <a href="https://doi.org/10.15475/calcip.2024.2.4">10.15475/calcip.2024.2.4</a>). </p></div></content><link href="https://calclab.org/?news=2024-10-07-blogpost#2024-10-07-blogpost"/><published>2024-10-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-10-11-cogsci</id><title>New Paper accepted for publication </title><updated>2024-10-11T12:00:00+00:00</updated><author><name>J. Nieder</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper by Ruben van de Vijver, Adam Ussishkin and myself, submitted to Cognitive Science has been accepted for publication. In this paper, titled "Emerging roots: Investigating early access to meaning in Maltese auditory word recognition" (DOI: <a href="https://osf.io/preprints/psyarxiv/hwumf">https://doi.org/10.31234/osf.io/hwumf</a>) we investigate access to meaning in early word recognition in the Semitic language Maltese through a computational model. </p>
<blockquote>
<p>In Semitic languages, the consonantal root is central to morphology, linking form and meaning. While psycholinguistic studies highlight its importance in language processing, the role of meaning in early lexical access and its representation remain unclear. This study investigates when meaning becomes accessible during the processing of Maltese verb forms, using a computational model based on the Discriminative Lexicon framework. Our model effectively comprehends and produces Maltese verbs, while also predicting response times in a masked auditory priming experiment. Results show that meaning is accessible early in lexical access and becomes more prominent after the target word is fully processed. This suggests that semantic information plays a critical role from the initial stages of lexical access, refining our understanding of real-time language comprehension. Our findings contribute to theories of lexical access and offer valuable insights for designing priming studies in psycholinguistics. Additionally, this study demonstrates the potential of computational models in investigating the relationship between form and meaning in language processing.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-10-11-cogsci#2024-10-11-cogsci"/><published>2024-10-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-10-21-blog</id><title>New German Blog Post</title><updated>2024-10-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, my German blog post for October appeared, titled <em>Von türstehenden Gutachtern</em>, discussing scientific practice that keeps critical studies via negative reviews away from certain journals. You can find it <a href="https://wub.hypotheses.org/2525">here</a>.  </p></div></content><link href="https://calclab.org/?news=2024-10-21-blog#2024-10-21-blog"/><published>2024-10-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-10-31-papers</id><title>New Papers </title><updated>2024-10-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Three new papers have appeared across the last week.</p>
<p>First, a study by Dubi Nanda Dhakal, myself, and Seán G. Roberts appeared in the Journal of Language Evolution:</p>
<blockquote>
<p>This study performs primary data collection, transcription, and cognate coding for eight South West Tibetic languages (Lowa, Gyalsumdo, Nubri, Tsum, Yohlmo, Kagate, Jirel, and Sherpa). This includes partial cognate coding, which analyses linguistic relations at the morpheme level. Prior resources and inferences are leveraged to conduct a Bayesian phylogenetic analysis. This helps estimate the extent to which the historical relationships between the languages represent a tree-like structure. We argue that small-scale projects like this are critical to wider attempts to reconstruct the cultural evolutionary history of Sino-Tibetan and other families.</p>
</blockquote>
<p>The study can be found <a href="https://doi.org/10.1093/jole/lzae008">here</a>.</p>
<p>Then, a study by Jessica Nieder, Ruben van de Vijver and Adam Ussishkin appeared in Cognitive Science:</p>
<blockquote>
<p>In Semitic languages, the consonantal root is central to morphology, linking form and meaning. While psycholinguistic studies highlight its importance in language processing, the role of meaning in early lexical access and its representation remain unclear. This study investigates when meaning becomes accessible during the processing of Maltese verb forms, using a computational model based on the Discriminative Lexicon framework. Our model effectively comprehends and produces Maltese verbs, while also predicting response times in a masked auditory priming experiment. Results show that meaning is accessible early in lexical access and becomes more prominent after the target word is fully processed. This suggests that semantic information plays a critical role from the initial stages of lexical access, refining our understanding of real-time language comprehension. Our findings contribute to theories of lexical access and offer valuable insights for designing priming studies in psycholinguistics. Additionally, this study demonstrates the potential of computational models in investigating the relationship between form and meaning in language processing.</p>
</blockquote>
<p>The study can be found <a href="https://doi.org/10.1111/cogs.70004">here</a>.</p>
<p>Then, a study by Kellen Parker van Dam and Thüküvelü Sakhamo appeared in teh proceedings of the First International Conference on Social Sciences.</p>
<blockquote>
<p>This  paper  presents  a  brief  outline  of  some  important  cultural  and  historical  aspects  of  Porba 
Village, a Chokrimi community in Phek District, Nagaland. Information is given on the traditional 
history of the community, including the settlement, the importance of major cultural practices, and 
the structure of the society. Though societal norms have changed considerably these practices and 
heritage still hold importance to the residents of Porba village as a significant part of their culture
and  identity.  In  addition,  the  clan  system,  practices  around  the  naming  of  babies,  the  system  of 
inheritance, the economy system, education, and language are also disc</p>
</blockquote>
<p>The study can be found <a href="https://zenodo.org/records/14017613">here</a>.</p></div></content><link href="https://calclab.org/?news=2024-10-31-papers#2024-10-31-papers"/><published>2024-10-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-11-04-blog</id><title>New Blog Post on Typing Special Characters </title><updated>2024-11-04T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post on <em>Typing Special Characters as a Key Skill for Linguists</em> appeared:</p>
<blockquote>
<p>Most linguists have to type special characters that are not available on an ordinary keyboard on a regular basis. Reflecting about the general problems involved in typing special characters, I review different solutions and argue that linguists should not only be able to type special characters on their computers, but that they should also have some basic knowledge about their technical aspects and know how to expand and customize them. In order to improve the training of young scholars, it is important to discuss special character typing more openly in linguistics, especially in the classroom and with doctoral students, sharing individual solutions openly.</p>
</blockquote>
<p>The blog can be read online <a href="https://calc.hypotheses.org/7806">here</a>, a PDF version can be downloaded from the corresponding journal website via its DOI (<a href="https://doi.org/10.15475/calcip.2024.2.5">10.15475/calcip.2024.2.5</a>).</p></div></content><link href="https://calclab.org/?news=2024-11-04-blog#2024-11-04-blog"/><published>2024-11-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-11-07-paper</id><title>New Paper </title><updated>2024-11-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The paper on partial colexifications of body and object terms by Annika Tjuka and myself appeared yesterday:</p>
<blockquote>
<p>Expressions in which the word for a body part is also used for objects can be found in many languages. Some languages use body part terms to refer to object parts, while others have only a few idiosyncratic examples in their vocabulary. Studying the word forms referring to body and object concepts, i.e., colexifications, across languages, offers insights into cognitive principles facilitating such usage. Previous studies focused on full colexifications in which the same word form expresses two distinct concepts. Here, we utilize a new approach that allows us to analyze partial colexifications in which a concept is built out of the word forms for two separate concepts, like river mouth. Based on a large lexical database, we identified body and object concepts and analyzed 39 colexifications across 329 languages. The results show that word forms for body concepts are used slightly more frequently as a source for object names. However, the detailed examination of directional tendencies and colexifications of word forms between body and object concepts reveals linguistic variation. The study sheds light on meaning extensions between two concrete domains and showcases the synergies that arise through the combination of existing data and methods.</p>
</blockquote>
<p>Unfortunately, it is not available as open access publication so far, but the preprint can be accessed <a href="https://doi.org/10.31234/osf.io/hc3j5">here</a> (DOI: <a href="https://doi.org/10.1515/gcla-2024-0005">10.1515/gcla-2024-0005</a>).</p></div></content><link href="https://calclab.org/?news=2024-11-07-paper#2024-11-07-paper"/><published>2024-11-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-11-18-blogpost</id><title>New Blog Post</title><updated>2024-11-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, my monthly German blog post appeared, this time dealing the phenomenon of <em>ghosting</em> in the context of science: <a href="https://wub.hypotheses.org/2553">Von Geistern verlassen</a>. </p></div></content><link href="https://calclab.org/?news=2024-11-18-blogpost#2024-11-18-blogpost"/><published>2024-11-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-12-11-blogpost</id><title>New Blog Post</title><updated>2024-12-11T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already on Sunday, my monthly German blog post appeared, this time dealing the boring tasks in scientific work: <a href="https://wub.hypotheses.org/2588">Vom Schaffen in der Wissenschaft</a>. </p></div></content><link href="https://calclab.org/?news=2024-12-11-blogpost#2024-12-11-blogpost"/><published>2024-12-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-12-18-blogpost</id><title>New Blog Post on Using CLDFBench and PyLexibank on Windows </title><updated>2024-12-18T12:00:00+00:00</updated><author><name>D. Snee</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post on <em>Using CLDFBench and PyLexibank on Windows</em> appeared:</p>
<blockquote>
<p>Due to idiosyncrasies in the Windows operating system, certain workarounds may be necessary to successfully execute the CLDF conversion workflow using CLDFBench and Pylexibank. This blog post illustrates how this workflow can be efficiently implemented on a Windows 10 operating system using the hattorijaponic dataset as an example.</p>
</blockquote>
<p>The blog can be read online <a href="https://calc.hypotheses.org/7825">here</a>, a PDF version can be downloaded from the corresponding journal website (DOI: <a href="https://doi.org/10.15475/calcip.2024.2.6">10.15475/calcip.2024.2.6</a>).</p></div></content><link href="https://calclab.org/?news=2024-12-18-blogpost#2024-12-18-blogpost"/><published>2024-12-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2024-12-29-paper</id><title>New Paper on 'Cognate Reflex Prediction as hypothesis test for a genealogical relation between the Panoan and Takanan language families'</title><updated>2024-12-29T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper <em>Cognate Reflex Prediction as hypothesis test for a genealogical relation between the Panoan and Takanan language families</em> has now appeared in Scientific Reports (DOI: <a href="https://doi.org/10.1038/s41598-024-82515-3">10.1038/s41598-024-82515-3</a>). In the paper, we present a study where we converted reconstructions from one proto-language to stipulated reflexes in another, potentially related language, to test the hypothesized relationship between both. The amount of correct matches from our predictions leads us to consider them as further evidence for a genealogical relation both language families involved.</p>
<blockquote>
<p>We present a novel approach for testing genealogical relations between language families. Our method, which has previously only been applied to closely related languages, makes predictions for cognate reflexes based on the regularity of proposed sound correspondences between language families that are hypothesized to be related. We test the hypothesis about a genealogical relation between Panoan and Takanan, two linguistic families of the Amazon. The workflow contributes to new ideas of hypothesis testing in historical linguistics and can likely be transferred to other language families. We predict 206 cognate reflexes from Shipibo-Konibo, a Panoan language, from independently proposed Proto-Takanan reconstructions and test our predictions in elicitation sessions with speakers of the language. We found 21 correct predictions from the core-set, as well as another 20 correct predictions from the extended set of predictions. In addition to confirming the previously established sound correspondence patterns, we find further evidence for additional patterns that suggest the reconstruction of three new phonemes for Proto-Pano-Takanan.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2024-12-29-paper#2024-12-29-paper"/><published>2024-12-29T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-01-07-article</id><title>Interview on Language Universals</title><updated>2025-01-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I was interviewed by the <a href="https://de.wikipedia.org/wiki/Deutsche_Presse-Agentur">German Press Agency</a> (DPA) regarding recent studies on language universals. The article by Doreen Garud, titled "Was die Welt sprachlich verbindet", by  was printed and published online in many venues, including the <a href="https://www.pnp.de/nachrichten/wissenschaft/was-die-welt-sprachlich-verbindet-17713832">Passauer Neue Presse</a>. </p></div></content><link href="https://calclab.org/?news=2025-01-07-article#2025-01-07-article"/><published>2025-01-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-01-14-paper</id><title>New Paper on Object Naming</title><updated>2025-01-14T12:00:00+00:00</updated><author><name>A. Kučerová</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We are happy to announce that our paper, Kučerová, Alžběta and List, Johann-Mattis (2025): Everybody Likes to Sleep: A Computer-Assisted Comparison of Object Naming Data from 30 Languages, has been accepted and will appear in the Proceedings of the Global WordNet Conference 2025.</p>
<p>It is available under the following DOI: <a href="https://doi.org/10.48550/arXiv.2501.08312">10.48550/arXiv.2501.08312</a> and <a href="https://arxiv.org/abs/2501.08312v1">link</a>, where where it can also be downloaded.</p>
<blockquote>
<p>Object naming – the act of identifying an object with a word or a phrase – is a fundamental skill in interpersonal communication, relevant to many disciplines, such as psycholinguistics, cognitive linguistics, or language and vision research. Object naming datasets, which consist of concept lists with picture pairings, are used to gain insights into how humans access and select names for objects in their surroundings and to study the cognitive processes involved in converting visual stimuli into semantic concepts. Unfortunately, object naming datasets often lack transparency and have a highly idiosyncratic structure. Our study tries to make current object naming data transparent and comparable by using a multilingual, computer-assisted approach that links individual items of object naming lists to unified concepts. Our current sample links 17 object naming datasets that cover 30 languages from 10 different language families. We illustrate how the comparative dataset can be explored by searching for concepts that recur across the majority of datasets and comparing the conceptual spaces of covered object naming datasets with classical basic vocabulary lists from historical linguistics and linguistic typology. Our findings can serve as a basis for enhancing cross-linguistic object naming research and as a guideline for future studies dealing with object naming tasks. </p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-01-14-paper#2025-01-14-paper"/><published>2025-01-14T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-01-23-blogpost</id><title>New Blog Post in German</title><updated>2025-01-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post in German appeared, titled "Geduld und Ungeduld in der Wissenschaft". The post is available <a href="https://wub.hypotheses.org/2681">here</a> and discusses the role of patience and impatience in research.</p></div></content><link href="https://calclab.org/?news=2025-01-23-blogpost#2025-01-23-blogpost"/><published>2025-01-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-01-24-talk</id><title>A talk at the University of Regensburg</title><updated>2025-01-24T12:00:00+00:00</updated><author><name>A. Kučerová</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>On 23.01.2025, Alžběta Kučerová gave a talk on an experimental study into the perception of Czech speech at the Colloquium on Slavic and Albanian Linguistics at the University of Regensburg.</p></div></content><link href="https://calclab.org/?news=2025-01-24-talk#2025-01-24-talk"/><published>2025-01-24T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-01-28-blogpost</id><title>New Blog Post</title><updated>2025-01-28T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, we published a new blogpost on <em>How to Run EDICTOR 3 Locally</em>. It introduced the workflow of setting up your local PC in order to run the new EDICTOR 3 release on your own computer instead of relying on online servers. It also introduces the necessary preprocessing, package installments, and configuration settings.</p>
<blockquote>
<p>EDICTOR3 offers many ways of comparing language data with computer-assisted methods. This study offers a short overview of how to run EDICTOR3 locally, without the need for uploading the data to a server or being connected to the internet, while maintaining all the functionalities. In a first step, we will show how one can download a Lexibank dataset and create different types of files that one can use with EDICTOR. We will then proceed to present the possibility of running an EDICTOR server locally and to edit the dataset that one has downloaded.</p>
</blockquote>
<p>The post can be found online <a href="https://calc.hypotheses.org/8143">here</a> or in article form via its <a href="https://doi.org/10.15475/calcip.2025.1.1">DOI</a>.</p></div></content><link href="https://calclab.org/?news=2025-01-28-blogpost#2025-01-28-blogpost"/><published>2025-01-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-02-17-preprint</id><title>New Preprint</title><updated>2025-02-17T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new study, titled "Partial Colexifications Improve Concept Embeddings" (together with J.-M. List, currently under review), is now available as a preprint on <a href="https://doi.org/10.48550/arXiv.2502.09743">arXiv</a>.</p>
<blockquote>
<p>While the embedding of words has revolutionized the field of Natural Language Processing, the embedding of concepts has received much less attention so far. A dense and meaningful representation of concepts, however, could prove useful for several tasks in computational linguistics, especially those involving cross-linguistic data or sparse data from low resource languages. First methods that have been proposed so far embed concepts from automatically constructed colexification networks. While these approaches depart from automatically inferred polysemies, attested across a larger number of languages, they are restricted to the word level, ignoring lexical relations that would only hold for parts of the words in a given language. Building on recently introduced methods for the inference of partial colexifications, we show how they can be used to improve concept embeddings in meaningful ways. The learned embeddings are evaluated against lexical similarity ratings, recorded instances of semantic shift, and word association data. We show that in all evaluation tasks, the inclusion of partial colexifications lead to improved concept representations and better results. Our results further show that the learned embeddings are able to capture and represent different semantic relationships between concepts.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-02-17-preprint#2025-02-17-preprint"/><published>2025-02-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-02-18-preprint</id><title>New Preprint</title><updated>2025-02-18T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A preprint of a new study titled "From Isolates to Families: Using Neural Networks for Automated Language Affiliation" (together with Steffen Herbold and Johann-Mattis List, currently under review), has now appeared as a preprint on <a href="https://doi.org/10.48550/arXiv.2502.11688">arXiv</a>.</p>
<blockquote>
<p>In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using a complex workflow that relies on manually comparing individual languages. Large-scale standardized collections of multilingual wordlists and grammatical language structures might help to improve this and open new avenues for developing automated language affiliation workflows. Here, we present neural network models that use lexical and grammatical data from a worldwide sample of more than 1,000 languages with known affiliations to classify individual languages into families. In line with the traditional assumption of most linguists, our results show that models trained on lexical data alone outperform models solely based on grammatical data, whereas combining both types of data yields even better performance. In additional experiments, we show how our models can identify long-ranging relations between entire subgroups, how they can be employed to investigate potential relatives of linguistic isolates, and how they can help us to obtain first hints on the affiliation of so far unaffiliated languages. We conclude that models for automated language affiliation trained on lexical and grammatical data provide comparative linguists with a valuable tool for evaluating hypotheses about deep and unknown language relations.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-02-18-preprint#2025-02-18-preprint"/><published>2025-02-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-02-26-blog</id><title>New Blog Post</title><updated>2025-02-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Together with Luise Häuser, we published a new blog post today in our CALC tutorial blog, presenting a new benchmark database for computational historical linguistics (URL: <a href="https://calc.hypotheses.org/8227">https://calc.hypotheses.org/8227</a>, PDF: <a href="https://ojs3.uni-passau.de/index.php/calcip/article/view/356">10.15475/calcip.2025.1.2 </a>).</p>
<blockquote>
<p>Computational approaches in historical linguistics have made great progress during the past two decades. As of now, it is much more common to propose subgroupings based on phylogenetic analyses than on traditional considerations using shared innovations. We have also seen a drastic increase in openly available datasets that share cognate judgments for various language families. Thanks to new standardization efforts providing facilitated access to several dozen comparative wordlists, it seems about time to work on on improved benchmarks of manually annotated cognates in computational historical linguistics. In this study, a first effort of this kind is undertaken, by presenting Lexibench, a preliminary gold standard for computational historical linguistics. Lexibench builds on the Lexibank repository to extract 63 multilingual wordlists, all manually annotated for cognacy, that can be used to assess the quality of cognate detection and phylogenetic reconstruction methods in computational historical linguistics.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-02-26-blog#2025-02-26-blog"/><published>2025-02-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-03-04-preprint</id><title>New Preprint</title><updated>2025-03-04T12:00:00+00:00</updated><author><name>D. Snee</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>We are happy to announce that two new studies are now available as preprints on arXiv.</p>
<p>David Snee, Luca Ciucci, Arne Rubehn, Kellen Parker van Dam, and Johann-Mattis List (2025): <a href="https://doi.org/10.48550/arXiv.2503.00464">"Unstable Grounds for Beautiful Trees? Testing the Robustness of Concept Translations in the Compilation of Multilingual Wordlists"</a>.</p>
<blockquote>
<p>Multilingual wordlists play a crucial role in comparative linguistics. While many studies have been carried out to test the power of computational methods for language subgrouping or divergence time estimation, few studies have put the data upon which these studies are based to a rigorous test. Here, we conduct a first experiment that tests the robustness of concept translation as an integral part of the compilation of multilingual wordlists. Investigating the variation in concept translations in independently compiled wordlists from 10 dataset pairs covering 9 different language families, we find that on average, only 83% of all translations yield the same word form, while identical forms in terms of phonetic transcriptions can only be found in 23% of all cases. Our findings can prove important when trying to assess the uncertainty of phylogenetic studies and the conclusions derived from them.</p>
</blockquote>
<p>Arne Rubehn, Christoph Rzymski, Luca Ciucci, Kellen Parker van Dam, Alžběta Kučerová, Katja Bocklage, David Snee, Abishek Stephen, and Johann-Mattis List (2025): <a href="https://doi.org/10.48550/arXiv.2503.01625">"Annotating and Inferring Compositional Structures in Numeral Systems Across Languages"</a>.</p>
<blockquote>
<p>Numeral systems across the world's languages vary in fascinating ways, both regarding their synchronic structure and the diachronic processes that determined how they evolved in their current shape. For a proper comparison of numeral systems across different languages, however, it is important to code them in a standardized form that allows for the comparison of basic properties. Here, we present a simple but effective coding scheme for numeral annotation, along with a workflow that helps to code numeral systems in a computer-assisted manner, providing sample data for numerals from 1 to 40 in 25 typologically diverse languages. We perform a thorough analysis of the sample, focusing on the systematic comparison between the underlying and the surface morphological structure. We further experiment with automated models for morpheme segmentation, where we find allomorphy as the major reason for segmentation errors. Finally, we show that subword tokenization algorithms are not viable for discovering morphemes in low-resource scenarios.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-03-04-preprint#2025-03-04-preprint"/><published>2025-03-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-03-17-preprint</id><title>New Preprint</title><updated>2025-03-17T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint by Annika Tjuka, Robert Forkel, Christoph Rzymski and myself is available now, presenting an improved version of the Database of Cross-Linguistic Colexifications (DOI: <a href="https://doi.org/10.48550/arXiv.2503.11377">10.48550/arXiv.2503.11377</a>).</p>
<blockquote>
<p>Lexical resources are crucial for cross-linguistic analysis and can provide new insights into computational models for natural language learning. Here, we present an advanced database for comparative studies of words with multiple meanings, a phenomenon known as colexification. The new version includes improvements in the handling, selection and presentation of the data. We compare the new database with previous versions and find that our improvements provide a more balanced sample covering more language families worldwide, with an enhanced data quality, given that all word forms are provided in phonetic transcription. We conclude that the new Database of Cross-Linguistic Colexifications has the potential to inspire exciting new studies that link cross-linguistic data to open questions in linguistic typology, historical linguistics, psycholinguistics, and computational linguistics. </p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-03-17-preprint#2025-03-17-preprint"/><published>2025-03-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-03-18-preprint</id><title>New Accepted Paper </title><updated>2025-03-18T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A study by Michele Pulini and myself was accepted with the <a href="https://www.ancientnlp.com/alp2025/">Second Workshop on Ancient Language Processing</a> and is now available as a preprint on Humanities Commons (DOI: <a href="https://doi.org/10.17613/4wvf7-qva13">10.17613/4wvf7-qva13</a>). </p>
<blockquote>
<p>Ancient Chinese documents written on bamboo slips more than 2000 years ago offer a rich resource for research in linguistics, paleography, and historiography. However, since most documents are only available in the form of scans, additional steps of analysis are needed to turn them into interactive digital editions, amenable both for manual and computational exploration. Here, we present a first attempt to establish a workflow for the annotation of ancient bamboo slips. Based on a recently rediscovered dialogue on warfare, we illustrate how a digital edition amenable for manual and computational exploration can be created by integrating standards originally designed for cross-linguistic data collections. </p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-03-18-preprint#2025-03-18-preprint"/><published>2025-03-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-03-21-blogpost</id><title>New Blog Post</title><updated>2025-03-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two days a go, a new blog post appeared, this time discussing compounds in English and German that result from a process called "contamination". The post can be found <a href="https://wub.hypotheses.org/2737">here</a>.  </p></div></content><link href="https://calclab.org/?news=2025-03-21-blogpost#2025-03-21-blogpost"/><published>2025-03-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-04-28-blog</id><title>New Blog Post</title><updated>2025-04-28T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post appeared in our CALCiP tutorial blog / journal. This time presenting with Luise Häuser and Robert Forkel the <a href="https://pypi.org/project/pylexibench">PyLexibench package</a>(URL: <a href="https://calc.hypotheses.org/8267">https://calc.hypotheses.org/8267</a>, DOI: <a href="https://doi.org/10.15475/calcip.2025.1.4">10.15475/calcip.2025.1.4</a>).</p>
<blockquote>
<p>With PyLexibench we introduce a small Python package that can be used to populate the Lexibench benchmark for computational historical linguistics with benchmark data. Here, we introduce the package and show how it helps to access and expand Lexibench. We also introduce new data for character matrices in various forms and formats and lay out how we intend to use the package to manage Lexibench releases in the future.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-04-28-blog#2025-04-28-blog"/><published>2025-04-28T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-05-02-blog</id><title>News, News, News</title><updated>2025-05-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our team gladly and cordially welcomes two new team members. Jekaterina Mažara joins us in the position of an assistant to the chair (Akademische Rätin), pursuing independent research on psycholinguistics and teaching courses in the same area. Abishek Stephen visits us in the summer term as an independent doctoral student to collaborate on morpheme segmentation.</p>
<p>Yesterday, I released EvoBib 1.10 (<a href="https://evobib.digling.org">https://evobib.digling.org</a>, Data available via Zenodo at <a href="https://zenodo.org/">https://zenodo.org/</a>). The data could be increased by about 100 new bibliographic entries and several hundred quotes.</p>
<p>At the same time, our study on a digital edition of the Cao Mo Zhi Zhen with Michele Pulini appeared in the proceedings of the 2nd Workshop on Ancient Language Processing (<a href="https://aclanthology.org/2025.alp-1.4/">https://aclanthology.org/2025.alp-1.4/</a>).</p>
<blockquote>
<p>Ancient Chinese documents written on bam-boo slips more than 2000 years ago offer a rich resource for research in linguistics, paleogra-phy, and historiography. However, since most documents are only available in the form of scans, additional steps of analysis are needed to turn them into interactive digital editions, amenable both for manual and computational exploration. Here, we present a first attempt to establish a workflow for the annotation of an-cient bamboo slips. Based on a recently redis-covered dialogue on warfare, we illustrate how a digital edition amenable for manual and com-putational exploration can be created by inte-grating standards originally designed for cross-linguistic data collections.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-05-02-blog#2025-05-02-blog"/><published>2025-05-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-05-04-interview</id><title>Interview on Language Universals </title><updated>2025-05-04T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, an interview with SWR Kultur appeared, in which I talk with Julia Nestlen about those aspects that languages have in common and where they differ (interview is available <a href="https://www.swr.de/swrkultur/wissen/was-alle-sprachen-verbindet-die-weltweite-mama-das-wissen-2025-05-04-100.html">here</a>).</p>
<iframe style="border-radius:12px" src="https://open.spotify.com/embed/episode/7hO67HQRfvum8WO1eVvb6g?utm_source=generator" width="100%" height="352" frameBorder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture" loading="lazy"/></div></content><link href="https://calclab.org/?news=2025-05-04-interview#2025-05-04-interview"/><published>2025-05-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-05-12-interview</id><title>Interview on Computational Linguistics </title><updated>2025-05-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a written interview with the <a href="https://www.kleinefaecher.de/">Arbeitsstelle Kleine Fächer</a> appeared, in which I answer general questions on Computational Linguistics as a scientific discipline. You can find the interview online (<a href="https://www.kleinefaecher.de/beitraege/blogbeitrag/computerlinguistik">https://www.kleinefaecher.de/beitraege/blogbeitrag/computerlinguistik</a>). </p></div></content><link href="https://calclab.org/?news=2025-05-12-interview#2025-05-12-interview"/><published>2025-05-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-05-16-acl</id><title>Two Papers on ACL </title><updated>2025-05-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two papers were accepted for the ACL main conference, the paper by Frederic Blum, Steffen Herbold, and myself on language affiliation (From Isolates to Families: Using Neural Networks for Automated Language Affiliation), and the paper by Arne Rubehn and myself on concept embeddings (Partial Colexifications Improve Concept Embeddings). We are all of course very happy that we made it in this form to the main conference of the ACL and look forward to presenting our work in Vienna.</p></div></content><link href="https://calclab.org/?news=2025-05-16-acl#2025-05-16-acl"/><published>2025-05-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-05-21-blog</id><title>New German Blog Post </title><updated>2025-05-21T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A German blog post appeared on Monday, discussing how a specific kind of numerical annotation revolutionized the development of juggling patterns. The blog post in German can be found <a href="https://wub.hypotheses.org/2822">here</a>.</p></div></content><link href="https://calclab.org/?news=2025-05-21-blog#2025-05-21-blog"/><published>2025-05-21T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-06-12-paper</id><title>New Paper on the Indigenous Languages of the Americas</title><updated>2025-06-12T12:00:00+00:00</updated><author><name>L. Ciucci</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today a paper by Marcin Kilarski and me, titled "Investigating the Indigenous languages of the Americas: History and prospects", appeared online in its final version, and it can be found <a href="https://brill.com/display/book/9789004715608/BP000035.xml">here</a>. The preprint version is available <a href="https://hal.science/hal-04568391v1">here</a>.</p>
<blockquote>
<p>In this paper, we address the state of the art in the study of the Indigenous languages of the Americas and reflect on the perspectives for future research. Since the first 16th-century grammatical descriptions, new data have contributed to the development of language study and the birth of modern linguistics and continue to inform linguistic theory. Ongoing documentation helps language preservation, while historical data improve our understanding of present-day languages and contribute to their revitalisation. Linguistic descriptions have also affected the perception of Indigenous languages and cultures over time, reflecting beliefs and prejudices about less familiar languages. We illustrate the genetic and typological diversity of American languages and reflect on their contribution to linguistic theory, which is illustrated with a few selected features. Finally, we offer some remarks about ongoing documentation.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-06-12-paper#2025-06-12-paper"/><published>2025-06-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-06-20-blog</id><title>New Blog Post </title><updated>2025-06-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Yesterday, I published my monthly German blog post, this time discussing the limits of scientific insights: <a href="https://wub.hypotheses.org/2853">Von der Vorläufigkeit der Erkenntnisse</a>.</p></div></content><link href="https://calclab.org/?news=2025-06-20-blog#2025-06-20-blog"/><published>2025-06-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-06-23-lexibank</id><title>Lexibank Paper Published </title><updated>2025-06-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Three years after we published Lexibank the first time, we have now published its second version, enriched by more datasets and higher standards regarding data quality. The study by Blum et al. (2025), presenting the database, can be found on Open Research Europe (DOI: <a href="https://doi.org/10.12688/openreseurope.20216.2">10.12688/openreseurope.20216.2</a>).</p>
<blockquote>
<p>Large-scale lexical and grammatical datasets nowadays play an important role in comparative linguistics. However, the lack of standardization remains a challenge exacerbating extension and reuse of published data. We present an updated version of Lexibank, a large-scale lexical dataset, expanding on previous efforts to standardize and unify cross-linguistic data. This new version includes over 3,100 languages and more than one-and-a-half million word forms, substantially broadening the scope and utility of the previous resource. Our dataset has been systematically curated using a dedicated computer-assisted workflow designed specifically for the lifting of published wordlist data to the standards recommended by the Cross-Linguistic Data Formats initiative. The expanded dataset features standardized references to language varieties, standardized semantic glosses that reference the concepts expressed by individual word forms, and standardized phonetic transcriptions for all word forms that our repository contains. Based on those standardizations we pre-compute semantic and phonological features, which can be used to carry out extensive automated analyses. We illustrate this potential by providing dedicated database queries to (1) infer words that are similar in pronunciation and meaning, (2) identify concepts that are colexified across languages in our sample, and (3) assess the semantic diversity of etymologically related words. These queries are not only fast to execute but also global in their scope, due to the largescale coverage provided by Lexibank 2. The queries are also easy to extend, thus having the potential to contribute to various studies in historical linguistics, linguistic typology, and related disciplines. The updated dataset is a substantial step forward in the effort to create comprehensive, standardized, and accessible linguistic resources.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-06-23-lexibank#2025-06-23-lexibank"/><published>2025-06-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-07-07-acl</id><title>ACL Oral Presentation</title><updated>2025-07-07T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper on language affiliation (From Isolates to Families: Using Neural Networks for Automated Language Affiliation) was invited to give a oral presentation at the ACL Conference in Vienna. Since less than 10% of the accepted papers are considered for this, we are very proud of this. The paper presents a supervised neural network approach to the affiliation of individual languages to families using Lexibank and Grambank datasets.</p>
<blockquote>
<p>Frederic Blum, Steffen Herbold, and Johann-Mattis List. 2025. From Isolates to Families: Using Neural Networks for Automated Language Affiliation, <a href="https://arxiv.org/abs/2502.11688">https://arxiv.org/abs/2502.11688</a>.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-07-07-acl#2025-07-07-acl"/><published>2025-07-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-07-15-blog</id><title>New Blog Post on Attitudes in Science</title><updated>2025-07-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Already last week, my monthly German blog post appeared, this time discussing attitudes towards scientific work and the merits of scientific research in humanities and natural sciences (<a href="https://wub.hypotheses.org/2895">https://wub.hypotheses.org/2895</a>).</p></div></content><link href="https://calclab.org/?news=2025-07-15-blog#2025-07-15-blog"/><published>2025-07-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-07-23-acl</id><title>Two Long Papers at ACL in Vienna</title><updated>2025-07-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two papers that we submitted as long papers to the ACL conference in Vienna have been accepted and have now appeared officially in print.</p>
<p>The first paper by Frederic Blum, Steffen Herbold, and myself, presents our study on automated language affiliation ("From Isolates to Families: Using Neural Networks for Automated Language Affiliation", <a href="https://aclanthology.org/2025.acl-long.876/">URL</a>).</p>
<blockquote>
<p>In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using a complex workflow that relies on manually comparing individual languages. Large-scale standardized collections of multilingual wordlists and grammatical language structures might help to improve this and open new avenues for developing automated language affiliation workflows. Here, we present neural network models that use lexical and grammatical data from a worldwide sample of more than 1,200 languages with known affiliations to classify individual languages into families. In line with the traditional assumption of most linguists, our results show that models trained on lexical data alone outperform models solely based on grammatical data, whereas combining both types of data yields even better performance. In additional experiments, we show how our models can identify long-ranging relations between entire subgroups, how they can be employed to investigate potential relatives of linguistic isolates, and how they can help us to obtain first hints on the affiliation of so far unaffiliated languages. We conclude that models for automated language affiliation trained on lexical and grammatical data provide comparative linguists with a valuable tool for evaluating hypotheses about deep and unknown language relations.</p>
</blockquote>
<p>The second study by Arne Rubehn and myself presents our work on concept embeddings ("Partial Colexifications Improve Concept Embeddings", <a href="https://aclanthology.org/2025.acl-long.1004/">URL</a>).</p>
<blockquote>
<p>While the embedding of words has revolutionized the field of Natural Language Processing, the embedding of concepts has received much less attention so far. A dense and meaningful representation of concepts, however, could prove useful for several tasks in computational linguistics, especially those involving cross-linguistic data or sparse data from low resource languages. First methods that have been proposed so far embed concepts from automatically constructed colexification networks. While these approaches depart from automatically inferred polysemies, attested across a larger number of languages, they are restricted to the word level, ignoring lexical relations that would only hold for parts of the words in a given language. Building on recently introduced methods for the inference of partial colexifications, we show how they can be used to improve concept embeddings in meaningful ways. The learned embeddings are evaluated against lexical similarity ratings, recorded instances of semantic shift, and word association data. We show that in all evaluation tasks, the inclusion of partial colexifications lead to improved concept representations and better results. Our results further show that the learned embeddings are able to capture and represent different semantic relationships between concepts.</p>
</blockquote>
<p>In addition, Kellen Parker van Dam today published a study in our Blog / Journal on Computer-Assisted Language Comparison in Practice ("Digitizing Legacy Lexical Data of Muishaung for Computer-Assisted Language Comparison", <a href="https://doi.org/10.15475/calcip.2025.2.1">DOI</a>).</p>
<blockquote>
<p>This study describes the process of digitizing legacy materials into a computer-readable format for the purposes of computational typology and computer-assisted historical reconstruction. It presents a comparative wordlist that is made available in the formats recommended by the Cross-Linguistic Data Formats initiative.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-07-23-acl#2025-07-23-acl"/><published>2025-07-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-07-31-sigtyp</id><title>Two Papers at the SIGTYP Workshop at ACL in Vienna </title><updated>2025-07-31T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Two papers that we submitted to the SIGTYP workshop at the ACL conference in Vienna have been accepted and have now appeared officially in print.</p>
<p>The first paper by David Snee et al. ("Unstable Grounds for Beautiful Trees? Testing the Robustness of Concept Translations in the Compilation of Multilingual Wordlists", <a href="https://aclanthology.org/2025.sigtyp-1.3/">URL</a>) presents robustness tests on concept translation in multilingual wordlists.</p>
<blockquote>
<p>Multilingual wordlists play a crucial role in comparative linguistics. While many studies have been carried out to test the power of computational methods for language subgrouping or divergence time estimation, few studies have put the data upon which these studies are based to a rigorous test. Here, we conduct a first experiment that tests the robustness of concept translation as an integral part of the compilation of multilingual wordlists. Investigating the variation in concept translations in independently compiled wordlists from 10 dataset pairs covering 9 different language families, we find that on average, only 83% of all translations yield the same word form, while identical forms in terms of phonetic transcriptions can only be found in 23% of all cases. Our findings can prove important when trying to assess the uncertainty of phylogenetic studies and the conclusions derived from them.</p>
</blockquote>
<p>The second study by Arne Rubehn et al. presents our work on numeral annotation ("Annotating and Inferring Compositional Structures in Numeral Systems Across Languages", <a href="https://aclanthology.org/2025.sigtyp-1.4/">URL</a>).</p>
<blockquote>
<p>Numeral systems across the world’s languages vary in fascinating ways, both regarding their synchronic structure and the diachronic processes that determined how they evolved in their current shape. For a proper comparison of numeral systems across different languages, however, it is important to code them in a standardized form that allows for the comparison of basic properties. Here, we present a simple but effective coding scheme for numeral annotation, along with a workflow that helps to code numeral systems in a computer-assisted manner, providing sample data for numerals from 1 to 40 in 25 typologically diverse languages. We perform a thorough analysis of the sample, focusing on the systematic comparison between the underlying and the surface morphological structure. We further experiment with automated models for morpheme segmentation, where we find allomorphy as the major reason for segmentation errors. Finally, we show that subword tokenization algorithms are not viable for discovering morphemes in low-resource scenarios.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-07-31-sigtyp#2025-07-31-sigtyp"/><published>2025-07-31T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-08-07-blog</id><title>New Blog Post on Ambiguities in German </title><updated>2025-08-07T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post discussing ambiguities in German compounds appeared today, titled "Von subjektiven und objektiven Fällen" (URL: <a href="https://wub.hypotheses.org/2928">https://wub.hypotheses.org/2928</a>).</p></div></content><link href="https://calclab.org/?news=2025-08-07-blog#2025-08-07-blog"/><published>2025-08-07T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-08-26-blog</id><title>New Blog Post on Templates for NoRaRe </title><updated>2025-08-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post presenting templates for the <a href="https://norare.clld.org">Database of Norms, Ratings, and Relations</a> appeared already yesterday (URL: <a href="https://calc.hypotheses.org/8723">https://calc.hypotheses.org/8723</a>).</p>
<blockquote>
<p>This study introduces a collection of templates that can be used to contribute data to the Database of Norms, Ratings, and Relations (NoRaRe) of words and concepts. The templates are intended to facilitate the process of dataset conversion and serve as a starting point for those who are interested to contribute data to the catalog. A first template structure with two sample datasets is introduced and discussed in more detail, pointing to those aspects of data curation that may lead to confusion among users who contribute the first time to the NoRaRe database.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-08-26-blog#2025-08-26-blog"/><published>2025-08-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-09-15-blog</id><title>New Blog Post on Historical Linguistics </title><updated>2025-09-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new German blog post that appeared today discusses some interesting parallels regarding the introduction of digital and computational approaches in mathematics and historical linguistics ("Schon gesehen", URL: <a href="https://wub.hypotheses.org/3018">https://wub.hypotheses.org/3018</a>).</p></div></content><link href="https://calclab.org/?news=2025-09-15-blog#2025-09-15-blog"/><published>2025-09-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-09-17-blogpost</id><title>New Blog Post on Semantic Embeddings in NoRaRe </title><updated>2025-09-17T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post presenting workflows for integrating and retrieving semantic embeddings with the <a href="https://norare.clld.org">Database of Norms, Ratings, and Relations</a> has appeared today (URL: <a href="https://calc.hypotheses.org/8723">https://calc.hypotheses.org/8723</a>).</p>
<blockquote>
<p>This study illustrates how semantic embeddings can be added to and retrieved from NoRaRe. By that, it provides a template for handling vector data and makes popular methodology in semantic modeling available for cross-linguistic comparison.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-09-17-blogpost#2025-09-17-blogpost"/><published>2025-09-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-09-22-book</id><title>A new book on non-verbal predication in the world's languages</title><updated>2025-09-22T12:00:00+00:00</updated><author><name>L. Ciucci</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new book on non-verbal predication, co-edited by Luca Ciucci, has just been published in the series <a href="https://www.degruyterbrill.com/serial/chl-b/html">Comparative Handbooks of Linguistics</a>. Its 33 chapters, written by international experts, present a new typological framework for the study of non-verbal predication and provide detailed descriptions from selected languages and families across Eurasia, the Americas, Africa, and Oceania. Particular attention is given to languages from traditionally little-described families.</p>
<p>This work is the result of the collaboration of 40 scholars over more than five years and consists of two volumes with a total of about 1,300 pages. It is intended to serve as a reference work on non-verbal predication for years to come.</p>
<blockquote>
<p>Bertinetto, Pier Marco, Luca Ciucci &amp; Denis Creissels (eds.). 2025. Non-verbal predication in the world’s languages: A typological survey. <a href="https://www.degruyterbrill.com/document/doi/10.1515/9783110730982/html">Volume 1: Eurasia, North America, South America.</a> Berlin &amp; Boston: De Gruyter Mouton.</p>
<p>Bertinetto, Pier Marco, Luca Ciucci &amp; Denis Creissels (eds.). 2025. Non-verbal predication in the world’s languages: A typological survey. <a href="https://www.degruyterbrill.com/document/doi/10.1515/9783112209677/html">Volume 2: Africa, Austronesia, Papunesia, Australia.</a> Berlin &amp; Boston: De Gruyter Mouton.</p>
</blockquote>
<p>The volume is available so far in electronic version, the printed copy will appear on November 3rd.</p></div></content><link href="https://calclab.org/?news=2025-09-22-book#2025-09-22-book"/><published>2025-09-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-10-01-news</id><title>News at the Chair</title><updated>2025-10-01T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Christian Bentz, who had joined the chair last year with his ERC research group, has left the chair for a professorship in Saarbrücken. While this is sad news for the chair, it is fantastic news for Christian. We wish Christian all the best for the new challenges that await him now, and we are very thankful that we had the chance to have him here with us for at least some time, as his work was always very inspiring for many of us. Given that distances can be easily bridged with modern communication, we will surely stay in contact and look forward to collaborating with Christian in the future.</p></div></content><link href="https://calclab.org/?news=2025-10-01-news#2025-10-01-news"/><published>2025-10-01T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-10-02-preprint</id><title>New Preprint Submitted</title><updated>2025-10-02T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, we published a new preprint, by Katja Bocklage (with many other people from our chair helping with annotations) and Thanasis Georgakopoulos joining us in the analysis. In this study, we put partial affix colexifications to the test and find that they may be a useful way to address certain problems in lexical typology. The preprint can be found online now (DOI), the abstract is given below.</p>
<blockquote>
<p>Cross-linguistic colexification patterns have proven useful for quantitative studies in lexical typology. While most studies focus on full colexification, where senses are co-expressed by the same word form, recent studies have proposed to compute partial colexifications, where senses are not colexificied by entire words, but only by parts of them. Among these, affix colexifications, where one word recurs in the end or the beginning of another word, show interesting properties, potentially reflecting word formation processes giving hints cross-linguistic motivation patterns. In order to test their potential, we conduct a detailed case study. Based on a large sample of cross-linguistic partial colexification patterns, computed from the Database of Cross-Linguistic Colexifications, we first check to which degree partial colexifications reflect true cases of lexical motivation and then carry out a detailed comparison of concept relations underlying frequent partial and full colexification patterns. Our results show that partial affix colexifications that recur across five and more language families tend to reflect true lexical motivation patterns in almost 90% of all cases. Furthermore, we find that majority of affix colexifications and full colexifications reflect contiguity relations. However, the proportion of contiguity relations in partial colexifications exceeds the proportion of contiguity relations in full colexifications (50% vs. 40%), showing that there are differences in the semantics reflected by both colexification types.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-10-02-preprint#2025-10-02-preprint"/><published>2025-10-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-10-04-paper</id><title>New Paper from IWCS in Düsseldorf Published </title><updated>2025-10-04T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our paper presenting the CLICS⁴ database, that we released this year, has now been published as part of the IWCS conference in Düsseldorf. The study, by Annika Tjuka, Robert Forkel, Christoph Rzymski, and myself presents new innovations that we introduced along with the fourth installment of the Database of Cross-Linguistic Colexificaitons (paper available in open access, see this <a href="https://preview.aclanthology.org/iwcs-25-ingestion/2025.iwcs-1.1/">URL</a>).</p>
<blockquote>
<p>Lexical resources are crucial for cross-linguistic analysis and can provide new insights into computational models for natural language learning. Here, we present an advanced database for comparative studies of words with multiple meanings, a phenomenon known as colexification. The new version includes improvements in the handling, selection and presentation of the data. We compare the new database with previous versions and find that our improvements provide a more balanced sample covering more language families worldwide, with enhanced data quality, given that all word forms are provided in phonetic transcription. We conclude that the new Database of Cross-Linguistic Colexifications has the potential to inspire exciting new studies that link cross-linguistic data to open questions in linguistic typology, historical linguistics, psycholinguistics, and computational linguistics.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-10-04-paper#2025-10-04-paper"/><published>2025-10-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-10-20-facts</id><title>Gefühlte Tatsachen </title><updated>2025-10-20T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>This months' German blogposts deals with the scientific construct and the topic of "perceived inflation" ("Gefühlte Tatsachen", <a href="https://wub.hypotheses.org/3049">URL</a>).</p></div></content><link href="https://calclab.org/?news=2025-10-20-facts#2025-10-20-facts"/><published>2025-10-20T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-10-22-qualt</id><title>New Preprint on Qù Tone Alternations</title><updated>2025-10-22T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint with Barbara Meisterernst appeared today at Open Research Europe, presenting 
"A database of qù-tone alternations in Ancient Chinese". The preprint is accessible from the journal, awaiting peer review now (DOI: <a href="https://doi.org/10.12688/openreseurope.21142.1">10.12688/openreseurope.21142.1</a>). The database can also be inspected already (URL: <a href="https://qualternations.digling.org">https://qualternations.digling.org</a>).</p>
<blockquote>
<p>Alternations in the entering tone (qù-tone) in Ancient Chinese have for a long time fascinated scholars, since they seem to give hint to relics of morphology in the history of Chinese, contrasting strongly with the isolating structure of all modern varieties of Chinese. Here we present a transparently assembled collection of entering tone alternations in the history of Chinese, derived from Lù Démíngs historical annotation of the classics, the Jīngdiǎn Shìwén, which gives early hints on such alternations by means of historical fǎnqiè spellings. The database is available in two flavors. On the one hand, it can be accessed through a web interface that allows interested users to browse through the data in linked form. On the other hand, the data is available in the formats recommended by the Cross-Linguistic Data Formats initiative. This format does not only offer quick programmatic access to experienced users, but is also specifically apt for the purpose of archiving the data. In the study, we illustrate how the data was assembled and curated, and illustrate how the data can be put to concrete use.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-10-22-qualt#2025-10-22-qualt"/><published>2025-10-22T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-10-27-formspec</id><title>New Blog Post on Lexibank FormSpec </title><updated>2025-10-27T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post appeared today, presenting the Lexibank FormSpec, a function in the PyLexibank library that can be used to handle forms in multilingual wordlists. The blog, titled "Manipulating Lexical Forms with the PyLexibank FormSpec", can be accessed online (URL: <a href="https://calc.hypotheses.org">https://calc.hypotheses.org</a>) or via our Journal landing page (DOI: <a href="https://doi.org/10.15475/calcip.2025.2.3">https://doi.org/10.15475/calcip.2025.2.3</a>)</p>
<blockquote>
<p>Multilingual lexical data is typically stored in a wide variety of forms, based on many idiosyncratic decisions that vary from dataset to dataset. Here, a simple but efficient solution for the manipulation of lexical data in multilingual wordlists will be introduced. This solution, the PyLexibank FormSpec, was originall developed for the conversion of various kinds of lexical data to Cross-Linguistic Data Formats, but it can also be used as a standalone. This study offers a basic tutorial that illustrates how the FormSpec can be put to concrete use.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-10-27-formspec#2025-10-27-formspec"/><published>2025-10-27T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-11-10-paper</id><title>New Paper from GWC 2025 in Pavia Published </title><updated>2025-11-10T12:00:00+00:00</updated><author><name>A. Kučerová</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A paper that we submitted to and presented at the 13th Global WordNet Conference in Pavia, has now appeared officially in print as part of the proceedings.</p>
<p>The study by Kučerová&amp;List, "Everybody Likes to Sleep: A Computer-Assisted Comparison of Object Naming Data from 30 Languages" <a href="https://aclanthology.org/2025.gwc-1.27/">URL</a>, presents a comparison of 17 object naming datasets from 30 distinct languages, and offers a novel, computer-assisted approach using Concepticon to assure transparency and comparability of naming datasets across languages and authors. It provides a foundation for more standardized and transparent future research on how people name objects.</p>
<blockquote>
<p>Object naming -- the act of identifying an object with a word or a phrase -- is a fundamental skill in interpersonal communication, relevant to many disciplines, such as psycholinguistics, cognitive linguistics, or language and vision research. Object naming datasets, which consist of concept lists with picture pairings, are used to gain insights into how humans access and select names for objects in their surroundings and to study the cognitive processes involved in converting visual stimuli into semantic concepts. Unfortunately, object naming datasets often lack transparency and have a highly idiosyncratic structure. Our study takes the first steps towards making current object naming data transparent, and comparable by using a multilingual, computer-assisted approach that links individual items of object naming lists to unified concepts in order to make object naming datasets cross-linguistically comparable.
Our current sample links 17 object naming datasets that cover 30 languages from 10 different language families. We illustrate how the comparative dataset can be explored by searching for concepts that recur across the majority of datasets and comparing the conceptual spaces of covered object naming datasets with classical basic vocabulary lists from historical linguistics. Our findings can serve as a basis for enhancing cross-linguistic object naming research and as a guideline for future studies dealing with object naming tasks.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-11-10-paper#2025-11-10-paper"/><published>2025-11-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-12-02-preprint</id><title>New Preprint</title><updated>2025-12-02T12:00:00+00:00</updated><author><name>A. Rubehn</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, we published a new preprint entitled "Integration of Linguistic Legacy Data Collections through Digital Scholarly Editions: A Case Study on Vanuatu Languages" (common work with Tihomir Rangelov, Luca Ciucci, John Burgess, Riccardo Rost and Johann-Mattis List). It is available on Humanities Commons <a href="https://doi.org/10.17613/bf8x3-2d778">[URL]</a>, the presented digital scholarly edition can be accessed under <a href="https://tvl.digling.org">https://tvl.digling.org</a>.</p>
<blockquote>
<p>The past two decades have witnessed a substantial increase in computational methods for investigating language diversity and history. The amount of digital data in comparative linguistics, however, is still lagging behind, with existing digitization efforts still mostly relying on the cumbersome labor of typing off data manually. Available Optical Character Recognition tools for automating this task have received relatively little attention in digitizing legacy data in linguistics, even though they are routinely used in other disciplines. At the same time, the editorial work must go beyond plain digitalization to add various layers of analysis and standardization, while recording the full provenance of each data point. We present an efficient and transparent workflow for digitizing legacy data in comparative linguistics and integrating it with larger data collections. As a result, we present a digital scholarly edition of lexical data for Vanuatu languages published by Darrell T. Tryon in 1976.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-12-02-preprint#2025-12-02-preprint"/><published>2025-12-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-12-11-news</id><title>Article About Book on Non-verbal Predication</title><updated>2025-12-11T12:00:00+00:00</updated><author><name>L. Ciucci</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The recent multivolume work <em>Non-verbal Predication in the World’s Languages</em>, co-edited by Luca Ciucci (<a href="https://www.degruyterbrill.com/serial/chl-9-b/html">https://www.degruyterbrill.com/serial/chl-9-b/html</a>), has just been featured in the <strong>Digital Research Magazine</strong> of the University of Passau.</p>
<p>In the article, Luca Ciucci explains to non-specialists what non-verbal predication is, tells the story behind the project, and presents some findings from the book. Read the full article here: <a href="https://www.digital.uni-passau.de/en/beitraege/2025/sprachwissenschaft">https://www.digital.uni-passau.de/en/beitraege/2025/sprachwissenschaft</a></p></div></content><link href="https://calclab.org/?news=2025-12-11-news#2025-12-11-news"/><published>2025-12-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-12-15-blog</id><title>New Team Member and Science with AI</title><updated>2025-12-15T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Our project has a new team member. Dr. Carlo Meloni will join us as a post-doc in the ProduSemy project and help to investigate word family evolution in the Semitic language family. Welcome Carlo, we look forward to collaborating with you in the next two years!</p>
<p>My final blog post in German this year is devoted to the question if responsible research is possible when relying on current AI tools. My answer would probably be <em>no</em> at the time being, but I discuss this in the broader contexts of science as a personal and a general endeavor (URL: <a href="https://wub.hypotheses.org/3240">https://wub.hypotheses.org/3240</a>).</p></div></content><link href="https://calclab.org/?news=2025-12-15-blog#2025-12-15-blog"/><published>2025-12-15T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-12-17-blogpost</id><title>New Blog Post on Conversion Tables for Semitic Languages  </title><updated>2025-12-17T12:00:00+00:00</updated><author><name>C. Meloni</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post introduces a preliminary conversion table that links traditional Semitic transcription and transliteration systems to standardized IPA representations. It situates the table in the history of Semitic transcription practices and demonstrates its practical use with the LinSe software package, offering an open and extensible starting point for computational and comparative work on Semitic languages within the CLTS framework.</p>
<blockquote>
<p>In this study we present a preliminary conversion table that can be used for transcriptions and transliterations across different Semitic languages. We introduce the basic idea behind the table, show how it can be used, and explain how we hope to expand it in the future.</p>
</blockquote>
<p>The post can be found online <a href="https://calc.hypotheses.org/9109">here</a> or in article form via its <a href="https://doi.org/10.15475/calcip.2025.2.6">DOI</a>.</p></div></content><link href="https://calclab.org/?news=2025-12-17-blogpost#2025-12-17-blogpost"/><published>2025-12-17T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2025-12-18-preprint</id><title>New Preprint</title><updated>2025-12-18T12:00:00+00:00</updated><author><name>D. Snee</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, we published a new preprint entitled "Variation in Language Phylogenies May Result From Variation in Concept Translation" (with David Snee, Luca Ciucci, and Johann-Mattis List). It is available on Humanities Commons <a href="https://doi.org/10.17613/dpaf1-egm52">[URL]</a>. The paper highlights that concept translation variation during the compilation of comparative wordlists may yield variation in the resulting
language phylogenies, both for distance-based and character-based approaches.</p>
<blockquote>
<p>Phylogenetic reconstruction in historical linguistics now typically relies on cognates sets assembled from multilingual wordlists. While more and more scholars now trust in the robustness of the algorithms underlying the reconstruction of cognate-based language phylogenies, few studies have actually tested to which degree initial choices during data preparation can influence their outcome. Here, we provide first tests that focus on the role that concept translation – the initial stage of wordlist compilation, in which a list of concepts is translated into the target languages prior to identifying cognate sets – plays in phylogenetic reconstruction.  Based on a newly compiled comparative dataset consisting of seven wordlists from five language families in which the same language varieties were coded by different authors, we investigate to which degree differences in concept translation lead to differences in phylogenetic reconstruction. Our results show that despite considerable differences in concept translation, lexical distances still show a considerably high correlation across all datasets. However, when comparing individual phylogenies reconstructed with the help of Bayesian inference, we find considerable differences ranging between 0.10 and 0.44 in Normalized Quartet Distances computed from posterior tree samples. An additional cluster analysis that we introduce shows that larger differences in phylogenies do not necessarily correspond to high disagreement in the larger subgroups. A detailed inspection of individual concept translation differences in the Indo-European and Tupian wordlists in our sample further confirms that concept translation differences may specifically impact subgrouping decisions in lower clades of the tree, while major groupings are often unaffected.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2025-12-18-preprint#2025-12-18-preprint"/><published>2025-12-18T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-01-10-news</id><title>The Passauer Neue Presse on Non-verbal Predication in the World's Languages </title><updated>2026-01-10T12:00:00+00:00</updated><author><name>L. Ciucci</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The <strong>Passauer Neue Presse</strong> has published an article on the recent multivolume work <em>Non-verbal Predication in the World’s Languages</em>, co-edited by Luca Ciucci (<a href="https://www.degruyterbrill.com/serial/chl-9-b/html">https://www.degruyterbrill.com/serial/chl-9-b/html</a>). The article is available here: <a href="https://www.pnp.de/lokales/stadt-passau/das-sein-in-den-sprachen-der-welt-20243161">https://www.pnp.de/lokales/stadt-passau/das-sein-in-den-sprachen-der-welt-20243161</a>.</p></div></content><link href="https://calclab.org/?news=2026-01-10-news#2026-01-10-news"/><published>2026-01-10T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-01-11-articles</id><title>Two articles accepted and published as pre-print</title><updated>2026-01-11T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>To start of the new year, two of the replication studies I have worked on have now been accepted for publication and published as pre-prints. The first paper, titled 'Over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects', has been accepted in <em>Linguistic Typology</em>. The second article, co-authored with Miri Mertner (UZH Zürich), is accepted for publication in a special issue on replication in quantitative typology that will appear in <em>Linguistic Typology at the Crossroads</em>, is titled 'Temperate does (probably) not shape sonority'. Both articles use LexiBank data to replicate sensationalistic claims from quanitative typology, and show that under careful control for phylogeny and contact effects, the original results do not replicate.</p>
<blockquote>
<p>Frederic Blum. 2026. Over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects. <a href="https://doi.org/10.48550/ARXIV.2512.07543">https://doi.org/10.48550/ARXIV.2512.07543</a>
Frederic Blum and Miri Mertner. 2026. Temperate does (probably) not shape sonority. <a href="https://doi.org/10.17613/a500z-d1957">https://doi.org/10.17613/a500z-d1957</a></p>
</blockquote></div></content><link href="https://calclab.org/?news=2026-01-11-articles#2026-01-11-articles"/><published>2026-01-11T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-01-12-blog</id><title>First German Blog Post in 2026</title><updated>2026-01-12T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>My first German blog contribution in this year just appeared, titled "Vom Fabulieren und Halluzinieren", discussing the concept of "hallucination" in the context of language models critically (see <a href="https://wub.hypotheses.org/3313">here</a>).</p></div></content><link href="https://calclab.org/?news=2026-01-12-blog#2026-01-12-blog"/><published>2026-01-12T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-01-13-preprint</id><title>New Preprint</title><updated>2026-01-13T12:00:00+00:00</updated><author><name>L. Ciucci</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new preprint titled "Possessive constructions in Chamacoco (Ɨshɨr ahwoso)" is now available on Humanities Commons <a href="https://works.hcommons.org/records/2rdnx-wbt77">[URL]</a>. The paper is based on first-hand data and describes predicative possession in Chamacoco based on the framework proposed in Bertinetto et al. 2026 <a href="https://www.degruyterbrill.com/serial/chl-9-b/html">[URL]</a>.</p></div></content><link href="https://calclab.org/?news=2026-01-13-preprint#2026-01-13-preprint"/><published>2026-01-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-01-26-blog</id><title>Transparent Application of Text Generation Tools in Scientific Research</title><updated>2026-01-26T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The first study in our blog and journal on Computer-Assisted Language Comparison in Practice in this year discusses the "Transparent Application of Text Generation Tools in Scientific Research" (URL: <a href="https://calc.hypotheses.org/9138">https://calc.hypotheses.org/9138</a>, DOI: <a href="https://doi.org/10.15475/calcip.2026.1.1">10.15475/calcip.2026.1.1</a>).</p>
<blockquote>
<p>In  this opinion piece, I share my view on the application of language  models and text generation services in scientific research. In my  opinion, scientific research that lives up to the promises of open  science must provide full documentation of all prompts and exchanges  that were used to create a given study. A mere mention that AI tools  have been used in study design, writing, or coding is not enough.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2026-01-26-blog#2026-01-26-blog"/><published>2026-01-26T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-02-04-preprint</id><title>New Preprint</title><updated>2026-02-04T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, Mattis List and I published a new preprint with the title "Using correspondence patterns to identify irregular words in cognate sets through leave-one-out validation". Here, we present a quantitative measure for regularity of correspondence patterns in comparative wordlists. The paper will be presented at the L'Change Workshop @ EACL 2026, and appear in the proceedings of the event. For now, the paper is available at <a href="https://doi.org/10.48550/arXiv.2602.02221">arXiv</a></p>
<blockquote>
<p>Regular sound correspondences constitute the principal evidence in historical language comparison. Despite the heuristic focus on regularity, it is often more an intuitive judgement than a quantified evaluation, and irregularity is more common than expected from the Neogrammarian model. Given the recent progress of computational methods in historical linguistics and the increased availability of standardized lexical data, we are now able to improve our workflows and provide such a quantitative evaluation. Here, we present the balanced average recurrence of correspondence patterns as a new measure of regularity. We also present a new computational method that uses this measure to identify cognate sets that lack regularity with respect to their correspondence patterns. We validate the method through two experiments, using simulated and real data. In the experiments, we employ leave-one-out validation to measure the regularity of cognate sets in which one word form has been replaced by an irregular one, checking how well our method identifies the forms causing the irregularity. Our method achieves an overall accuracy of 85\% with the datasets based on real data. We also show the benefits of working with subsamples of large datasets and how increasing irregularity in the data influences our results. Reflecting on the broader potential of our new regularity measure and the irregular cognate identification method based on it, we conclude that they could play an important role in improving the quality of existing and future datasets in computer-assisted language comparison.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2026-02-04-preprint#2026-02-04-preprint"/><published>2026-02-04T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-02-13-paper</id><title>A new publication in Wiley's Language and Linguistics Compass!</title><updated>2026-02-13T12:00:00+00:00</updated><author><name>A. Kučerová</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper written by Alžběta Kučerová and Johann-Mattis List has just appeared in print in Language and Linguistics Compass. It is entitled "From Psycholinguistics to Computer Vision. A Comprehensive Review of Object Naming Data and Studies" and can be acessed at the following link: <a href="http://dx.doi.org/10.1111/lnc3.70034">http://dx.doi.org/10.1111/lnc3.70034</a>.</p>
<p>The study delivers an in-depth review of object naming datasets and pivotal object naming studies conducted over the past four decades. Our cross-lingustic analysis makes use of datasets normed for English and 38 other languages across diverse language families. We illustrate that the topic has not only remained relevant, but continues to be critical today. Its applications span a wide range of fields, from evaluating childrens' vocabulary, to understanding more about cognitive aging, aphasia, or advancing computer vision and language and vision research.</p>
<blockquote>
<p>In recent years, much research has focused on what happens in the human brain when a perceptual stimulus, such as a picture, is converted into linguistic content, a word. This process is commonly referred to as object naming and is considered a crucial aspect of language processing, production, and cognition. It refers to the identification of an object with a word or phrase, as well as the psychometric method of investigating this human behavior to obtain insights into different aspects of human cognition and language, such as the organization of the mental lexicon, language acquisition, disorders, or cognitive aging. The ability to name objects is considered a fundamental skill in interpersonal communication and has long been studied in numerous disciplines, such as cognitive science, psycholinguistics, psychology, and, more recently, in computer vision and research on language and vision. In the latter two, object naming has become an extremely powerful tool, especially in the development and fine-tuning of multi-modal models, facilitating tasks such as visual question answering, image captioning tasks, object detection, or visual scene understanding. Our comprehensive, cross-linguistic review explores the key findings, commonly cited, and prominent datasets and their applications that establish object naming both in the past and now, as well as discusses its chances and challenges in future work.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2026-02-13-paper#2026-02-13-paper"/><published>2026-02-13T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-02-16-blog</id><title>Vom Winden vor Winden </title><updated>2026-02-16T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>The contribution to my German blog for February deals with etymologies for taboo words and the influence of sound symbolism ("Vom Winden vor Winden", <a href="https://wub.hypotheses.org/3348">https://wub.hypotheses.org/3348</a>).</p></div></content><link href="https://calclab.org/?news=2026-02-16-blog#2026-02-16-blog"/><published>2026-02-16T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-02-23-blog</id><title>New Blog Post on Computing Colexifications with Missing Data Information from CLICS⁴ </title><updated>2026-02-23T12:00:00+00:00</updated><author><name>D. Snee</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Today, a new blog post by David Snee and Johann-Mattis List appeared, titled <em>Computing Detailed Colexifications with Missing Data Information from the CLICS⁴ Collection.</em></p>
<blockquote>
<p>CLICS⁴ offers a refined structural representation of cross-linguistic colexification patterns but retains an implicit representation of missing data. This obscures whether the lack of a colexification in a language for paired concepts is due to its true absence in the language, or due to missing data on the concept or word form level. We introduce a straightforward workflow that can be applied to individual datasets from CLICS⁴ to identify cases of colexification via a three-way attestation scheme. Our approach captures the presence or absence of a colexification in CLICS⁴, but it also explicitly encodes the presence or absence of data at the level of the original questionnaire, or the individual language, elicited with the help of the questionnaire.</p>
</blockquote>
<p>The blog can be read online <a href="https://calc.hypotheses.org/9164">here</a>, a PDF version can also be downloaded from the corresponding journal website (DOI: <a href="https://doi.org/10.15475/calcip.2026.1.2">10.15475/calcip.2026.1.2</a>).</p></div></content><link href="https://calclab.org/?news=2026-02-23-blog#2026-02-23-blog"/><published>2026-02-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-03-02-article</id><title>New article published</title><updated>2026-03-02T12:00:00+00:00</updated><author><name>F. Blum</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new paper with the title 'The over-representation of phonological features in basic vocabulary doesn't replicate when controlling for spatial and phylogenetic effects' has just appeared in <em>Linguistic Typology</em> <a href="https://doi.org/10.48550/ARXIV.2512.07543">URL</a></p>
<blockquote>
<p>The statistical over-representation of certain phonological features in the basic vocabulary of languages is often interpreted as reflecting potentially universal sound symbolic patterns. However, most of these cases have not been tested explicitly for reproducibility and might be prone to biases in the study samples or models. Many studies on the topic do not adequately control for genealogical and areal dependencies between sampled languages, casting doubts on the robustness of the results. In this study, I test the robustness of a recent study on sound symbolism in basic vocabulary concepts which analyzed 245 languages. This paper adds a new sample of 2,864 languages from Lexibank. I modify the original model by adding statistical controls for spatial and phylogenetic dependencies between languages. The new results show that most of the previously observed patterns are not robust, and in fact many patterns disappear completely when adding the genealogical and areal controls. A small number of patterns, however, emerges as highly stable even with the new sample. Through the new analysis, it is possible to assess the distribution of sound symbolism on a larger scale than previously. The study further highlights the need for testing all universal claims on language for robustness on various levels.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2026-03-02-article#2026-03-02-article"/><published>2026-03-02T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-03-23-blog</id><title>New Blog Post on the Consumption of Information</title><updated>2026-03-23T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog contribution on the consumption of information (in German) appeared on Friday, titled "Vom Konsumieren von Information" (URL: <a href="https://wub.hypotheses.org/3386">https://wub.hypotheses.org/3386</a>), discussing the aspect of aesthetics when it comes to information and science.</p></div></content><link href="https://calclab.org/?news=2026-03-23-blog#2026-03-23-blog"/><published>2026-03-23T12:00:00+00:00</published></entry><entry><id>https://calclab.org/#2026-03-29-blog</id><title>New Blog Post on Formal Aspects of Etymological Analysis</title><updated>2026-03-29T12:00:00+00:00</updated><author><name>J.-M. List</name></author><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A new blog post in our blog / journal "Computer-Assisted Language Comparison in Practice" appeared last week, presenting "Foundations of Formal Etymological Analysis" (URL: <a href="https://calc.hypotheses.org/9208">https://calc.hypotheses.org/9208</a>, DOI: <a href="https://doi.org/10.15475/calcip.2026.1.3">https://doi.org/10.15475/calcip.2026.1.3</a>).</p>
<blockquote>
<p>This study gives a brief overview on formal aspects of etymological analysis, by providing a modified workflow for the classical comparative method in historical language comparison. This workflow is contrasted with the current state-of-the-art in computational historical linguistics, pointing out where computational methods and interactive tools for annotation are lacking, and where they are available already.</p>
</blockquote></div></content><link href="https://calclab.org/?news=2026-03-29-blog#2026-03-29-blog"/><published>2026-03-29T12:00:00+00:00</published></entry></feed>