generated from alshedivat/al-folio
-
Notifications
You must be signed in to change notification settings - Fork 3
/
feed.xml
1 lines (1 loc) · 25.5 KB
/
feed.xml
1
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://changelinglab.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://changelinglab.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2024-11-08T21:53:03+00:00</updated><id>https://changelinglab.github.io/feed.xml</id><title type="html">blank</title><subtitle>The CMU Language Change and Empirical Linguistics Lab </subtitle><entry><title type="html">Information and Comparative Reconstruction</title><link href="https://changelinglab.github.io/blog/2024/hl_and_information/" rel="alternate" type="text/html" title="Information and Comparative Reconstruction"/><published>2024-09-05T16:40:16+00:00</published><updated>2024-09-05T16:40:16+00:00</updated><id>https://changelinglab.github.io/blog/2024/hl_and_information</id><content type="html" xml:base="https://changelinglab.github.io/blog/2024/hl_and_information/"><![CDATA[<p>When the Neogrammarians said that “sound changes admit no exceptions,” they were saying that sound changes apply deterministically. Thus, the output can always be derived given the input (but the converse is not necessarily true). To put this in terms of information, a sound change never adds information. It may destroy information (mergers, deletions); it may move information around (conditioned splits, transphonologization); but it may never delete information. In other words, there can be no unconditioned splits.</p> <p>Of course, there are language changes that do add information and, using the regularity principle as a definition, these are not sound changes. The Neogrammarians assumed a qualitative difference between these changes and those encoded in “sound laws.” For example, the numeral <em>four</em> in English should have started with a wh- sound but came to be pronounced with an f- because of “contamination from <em>five</em> (speakers frequently mispronounced in a way similar to the next number in sequence and this caught on). Analogical processes, like contamination, add information, so the whole family of analogical changes (paradigmatic extension, paradigm leveling, contamination, and so forth) are treated as something entirely different, as is borrowing—including “dialect borrowing” which may yield forms that look superficially like true cognates of other, semantically similar, words in sister languages.</p> <p>At some level, it is not clear to me that there is a qualitative difference between dialect borrowing and the propagation of a sound change through a speech community. However, because the Neogrammarian formulation of the Regularity Principle provides a way of—by definition—separating changes that proceed mechanistically from those than proceed sporadically, it serves an important methodological function: it allows for a precise formulation of a framework for doing comparative reconstruction.</p> <p>When viewed from the standpoint of information, comparative reconstruction is a kind of compression and the comparative method is a compression algorithm. It takes collections of cognate sets and reduces each of them to a single form from which all of the reflexes in the cognate set can be derived through the application of finite sequences of rewrite rules (one for each language). By definition, then, each reconstructed protoform must contain all of and only the information in the corresponding cognate set. This means that the complete set of reconstructions must contain all and only the information present in the set of cognate sets.</p> <p>It is probably possible to formalize this algorithm in a backwards direction only (from cognate sets to protoforms). However, in my experience, linguists solve this compression problem by iterating over the protoforms and the rules until</p> <ol> <li>The protoforms can be predicted from the cognate sets</li> <li>The cognate sets can be predicted from the protoforms, give the sound laws</li> </ol> <p>Humans are generally not bad at noticing cases of analogy and borrowing, given their ability to integrate rich evidence from diverse sources. This is a challenge for computational implementations of this “algorithm”. We have tried various approaches <a class="citation" href="#chang2022wikihan">(Chang et al., 2022; Kim et al., 2023; Lu et al., 2024; Lu et al., 2024)</a> and have found the most successful to be (1) neural approaches that (2) take into account both the lossless mapping from reflex to protoform and the lossy mapping from protoform to reflex <a class="citation" href="#lu-etal-2024-improved">(Lu et al., 2024; Lu et al., 2024)</a>. These approaches have the virtue of being robust to “noise” (analogy and borrowing) and the disadvantage that they cannot provide cascades of sound laws like humans can.</p> <p>I have left another thing out: it is possible for a reconstruction to be valid with regards to these information-theoretic criteria and still be a bad reconstruction. That is to say, a reconstruction may be structurally sound but substantively incoherent. Consider, for example, that substitution ciphers do not change the information present in a string. However, there are a very large number of possible substitution ciphers, even for a finite alphabet. Some constraint is needed to enforce the similarity between protoforms and reflexes. This could be seen as a constraint upon sound laws and cascades of sound laws. For example, sound laws must be equivalent to finite state transducers and cascades are legal only when there is no equally compliant set of cascades with a smaller number of sound laws.</p>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="links"/><summary type="html"><![CDATA[Informal information-theoretic framing of the comparative method in historical linguistics]]></summary></entry><entry><title type="html">On our ACL Best Paper Award Paper</title><link href="https://changelinglab.github.io/blog/2024/best_paper/" rel="alternate" type="text/html" title="On our ACL Best Paper Award Paper"/><published>2024-08-15T16:40:16+00:00</published><updated>2024-08-15T16:40:16+00:00</updated><id>https://changelinglab.github.io/blog/2024/best_paper</id><content type="html" xml:base="https://changelinglab.github.io/blog/2024/best_paper/"><![CDATA[<p>Yesterday, my student Liang (Leon) Lu won the Best Paper Award (Non-Publicized) at the 2024 Association for Computational Linguistics conference in Bangkok for his paper “Semisupervised Neural Proto-Language Reconstruction.” We are generally impressed by students who win this awards, but in Leon’s case their is double reason to be impressed. Not only is he an undergraduate student (just starting his third year of university), he has only submitted two papers to conferences. Both were accepted and the second (his first submission to a *ACL conference) was the subject of this best paper award. But what is the substance of this paper?</p> <p>If you are technical, you can go and read the paper for yourself. It is short and is, I think, fairly clearly written. If you are not technical but you have some training in historical linguistics, you will recognize some of the principal themes, even if you don’t follow the complete methodology or analysis. However, if you are neither technical nor trained in historical linguistics, but you still want to know what this paper was about, I will do me best to let you know.</p> <p>First, let us consider a table of words from the related languages Kachai, Huishu, and Ukhrul (members of the family of languages called Tangkhulic):</p> <table> <thead> <tr> <th>Gloss</th> <th>‘grandchild’</th> <th>‘bone’</th> <th>‘breast’</th> <th>‘laugh’</th> </tr> </thead> <tbody> <tr> <td>Kachai</td> <td>ðɐ</td> <td>rɐ</td> <td>nɐ</td> <td>ni</td> </tr> <tr> <td>Huishu</td> <td>ruk</td> <td>ruk</td> <td>nuk</td> <td>nuk</td> </tr> <tr> <td>Ukhrul</td> <td>ru</td> <td>ru</td> <td>nu</td> <td>nu</td> </tr> </tbody> </table> <p>Note that the ð sound is like the th in <em>this</em> and the ɐ sound is sort of like the u in cup.</p> <p>If you look at these words, you can see that there is, for example, a systematic relationship between the words for ‘bone‘ and those for ‘breast’. This extends to ‘grandchild’ as well, even though there is a somewhat unexpected consonant in Kachai (ð). What did the ancestors of these words look like in the shared ancestor of these languages? Is this lost in the mists of history?</p> <p>A group of 19th century linguists (of philologists, to use the terminology then current) called the Neogrammarians said you can know how these ancestral forms of words (protoforms) were pronounced by applying a methodology we now call the Comparative Method. Central to this method is the <strong>regularity principle</strong> which states that changes in pronunciation apply as general rules to which <strong>all</strong> words in a language are subject. This means that if you know the protoform for one of the columns in the table (these columns are what are known as “cognate sets”) you can derive all of the rows in that column deterministically—mechanically and unambiguously.</p> <p>Under these Neogrammarian assumptions, we could posit *ruk or *ru as reconstructions for ‘bone’. If we posit *ru, we must assert that there was a sound change in Kachai that changed u to ɐ and a sound change in Huishu that inserted k at the ends of words (perhaps after u). If we reconstruction *ruk, we have to assume that there was a sound change in both Kachai and Ukhrul that deletes k at the end of words (perhaps after u) and another sound change in Kachai that changes u to ɐ. The first option (*ru) ends up working better if we look at all of the cognate sets instead of just these four.</p> <p>By this logic, we get:</p> <table> <thead> <tr> <th>Gloss</th> <th>‘grandchild’</th> <th>‘bone’</th> <th>‘breast’</th> <th>‘laugh’</th> </tr> </thead> <tbody> <tr> <td>Kachai</td> <td>ðɐ</td> <td>rɐ</td> <td>nɐ</td> <td>ni</td> </tr> <tr> <td>Huishu</td> <td>ruk</td> <td>ruk</td> <td>nuk</td> <td>nuk</td> </tr> <tr> <td>Ukhrul</td> <td>ru</td> <td>ru</td> <td>nu</td> <td>nu</td> </tr> </tbody> <tbody> <tr> <td>Proto-Tangkhulic</td> <td>*?u</td> <td>*ru</td> <td>*nu</td> <td>*n?</td> </tr> </tbody> </table> <p>The reconstruction of ‘grandchild’ must end in u, but it must start with something other than r. Why? Because if it did start with r, there would be no way of explaining why, in Kachai, ‘grandchild’ starts with ð but ‘bone’ starts with r through the application of regular sound change. This would be an <strong>unconditioned split</strong>, something that the regularity principle rules out.</p> <p>Looking at the full collection of cognate sets, we would probably make the following reconstructions (or their formal equivalents):</p> <table> <thead> <tr> <th>Gloss</th> <th>‘grandchild’</th> <th>‘bone’</th> <th>‘breast’</th> <th>‘laugh’</th> </tr> </thead> <tbody> <tr> <td>Kachai</td> <td>ðɐ</td> <td>rɐ</td> <td>nɐ</td> <td>ni</td> </tr> <tr> <td>Huishu</td> <td>ruk</td> <td>ruk</td> <td>nuk</td> <td>nuk</td> </tr> <tr> <td>Ukhrul</td> <td>ru</td> <td>ru</td> <td>nu</td> <td>nu</td> </tr> </tbody> <tbody> <tr> <td>Proto-Tangkhulic</td> <td>*du</td> <td>*ru</td> <td>*nu</td> <td>*nɨ</td> </tr> </tbody> </table> <p>where ɨ is a vowel that is half-way between i and u. These protoforms make sense based on the reflexes: they are similar in pronunciation to the reflexes and the reflexes can all be derived from them mechanically.</p> <p>There have been past efforts to build neural models (neural networks) that can perform this kind of reconstruction (including some from our lab), but they have two unfortunate properties:</p> <ol> <li>They only take the reflex-to-protoform mapping into account (not the protoform-to-reflex mapping).</li> <li>Training them (teaching them to generate reconstructions given sets of cognate reflexes) requires many cognate sets, each of which must be paired with a reconstruction.</li> </ol> <p>In actual reconstruction projects, as I know from first hand experience, the constraint in 2 is only satisfied when the most challenging and interesting work in reconstructing a protolanguage has already been done. What Leon discovered as part of this paper is that 2 is a consequence of 1—we are forced to do fully supervised training (a protoform for every cognate set) because we are not enforcing the regularity principle (not rejecting protoforms if—for example—identical protoforms are mapped to different reflex forms in a single language).</p> <p>To make things clearer, let’s return to our example. The older kind of fully-supervised reconstruction models is likely to produce the reconstructions below:</p> <table> <thead> <tr> <th>Gloss</th> <th>‘grandchild’</th> <th>‘bone’</th> <th>‘breast’</th> <th>‘laugh’</th> </tr> </thead> <tbody> <tr> <td>Kachai</td> <td>ðɐ</td> <td>rɐ</td> <td>nɐ</td> <td>ni</td> </tr> <tr> <td>Huishu</td> <td>ruk</td> <td>ruk</td> <td>nuk</td> <td>nuk</td> </tr> <tr> <td>Ukhrul</td> <td>ru</td> <td>ru</td> <td>nu</td> <td>nu</td> </tr> </tbody> <tbody> <tr> <td>Proto-Tangkhulic</td> <td>*du</td> <td>*ru</td> <td>*nu</td> <td><strong>*nu</strong></td> </tr> </tbody> </table> <p>This is because having a u in Ukhrul and a u in Huishu is generally a reliable cue for reconstructing *u in the protoform. However, this reconstruction cannot be right, following the regularity principle, because a single pronunciation in the protolanguage would be reflected as two different pronunciations in Kachai.</p> <p>Leon’s technical innovation was to design a neural network that can learn both to reconstruct protoforms based on reflexes (daughter-to-protoform) and to disfavor protoforms from which reflexes cannot be derived (proto-to-daughter). The D2P and P2D networks are joined together with a “bridge network” that allows both networks to be trained as a single network. The resulting type of neural network is similar in some ways to what is called a “variational autoencoder” but it differs in that the outputs of the D2P network are not real-valued vectors but strings of symbols (in the International Phonetic Alphabet).</p> <p>In training, the model can thus take advantage of cognate sets without a protoform, something that the earlier, fully supervised, models could not do.</p> <p>The implications of this development are striking. Given the same amount of labeled data (cognate sets with protoforms), our DPD models outperform the best fully supervised models as well as the best semisupervised models we could train. And careful statistical testing shows that the differences are as significant as they look at first glance.</p> <p>We think that this approach is very promising and looks forward to a day, not too far in the future, when neural models like this will be a tool in the belt of every historical linguist and will help shed light on the dark recesses of the human past.</p>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="links"/><summary type="html"><![CDATA[a short philosophical discursion]]></summary></entry><entry><title type="html">Is ACL an AI (or NLP or CL) Conference?</title><link href="https://changelinglab.github.io/blog/2024/cl_in_acl/" rel="alternate" type="text/html" title="Is ACL an AI (or NLP or CL) Conference?"/><published>2024-08-14T16:40:16+00:00</published><updated>2024-08-14T16:40:16+00:00</updated><id>https://changelinglab.github.io/blog/2024/cl_in_acl</id><content type="html" xml:base="https://changelinglab.github.io/blog/2024/cl_in_acl/"><![CDATA[<p><img src="/assets/img/acl_is_not_ai_conference.jpg" alt="Screen with the text "ACL Is not an AI Conference"" title="Title Slide from Emily Bender's Presentation"/></p> <p>In the title of a controversial Presidential Address at ACL2024, Emily Bender proclaimed that “ACL is not an AI conference.“ I was not fortunate enough to be in Bangkok to hear the address, but I have read the slides and, since it concerns me personally, I want to write a short response.</p> <p>tl;dr Reinforcing disciplinary boundaries between CL, NLP, and ML makes all three disciplines poorer and less coherent. They <strong>all</strong> belong at ACL. As for “AI,” it is not a discipline but a marketing term and contrasting it with the big three is a category error.</p> <p>I am a Computational Linguist who is deeply embedded in the ACL community. I am not alone. People who try to paint ACL as an NLP conference only (e.g., Yoav Goldberg) are ignored the substantial number of CL scientists who participate in ACL in all capacities and the role that CL has always played in the *ACL organizations and conferences. Bender commits the same type of error when she tries to exclude or marginalize “AI,” by which I assume she means machine learning. This is problematic in a couple of respects:</p> <ol> <li><strong>AI is not a discipline, technology, community, or anything substantial at all—it is a marketing term.</strong> One might as coherently say that ACL is not a phlogiston conference as to say that it is not an “AI” conference. AI has referred to symbolic systems like the HPSG grammars for which Bender is well known, expert systems, behavior trees guiding video game characters, classical statistical machine learning, perceptrons, deep neural nets, language models, and so on. AI is just “computers looking smart so you’ll give us money” and not anything a conference could be about.</li> <li>If she means “machine learning” then, in a sense, she’s right—almost no one believes that ACL is a machine learning conference in the sense that ICML, ICLR, and NeurIPS are. However, <strong>to exclude work from the conference that is focused on ML techniques (which are thoroughly embedded in modern NLP and CL) would make the community vastly poorer.</strong> And there is no reason that ACL cannot be a CL conference, and NLP conference, and an ML conference. These fields intersect and reinforce each other and well all benefit from interacting (and even confronting) one another.</li> </ol> <p>I understand one of Bender’s points: many younger reviewers, even in linguistics-heavy tracks like Linguistic Theories and Cognitive Modeling and Phonology, Morphology, and Word Segmentation, will criticize papers for lacking novelty (really technical novelty) when they concentrate on answering a scientific question about language (or a language) rather than presenting a technical innovation. I’ve encountered this when writing metareviews as an AC and AE. I’ve also encountered it first hand. For example, I once received a very low review on a linguistics paper submitted to a *ACL because the ML techniques used were well-known (decision trees, SVMs, and generic LSTMs and CNNs). The reviewer, apparently not familiar with linguistic research, missed our point, which is that these tools resolved a theoretically significant question about the languages under investigation. (Ironically, another reviewer attacked us for ignoring linguistic theory because our findings did not accord with their theoretical preconceptions). In this case, as in many cases I’ve seen, an AC with a broader view of the field(s) stepped in and wrote a balanced and informed metareview. This is not to say that the system is perfect; only that overly strong statements about linguistics being pushed out of ACL by AI-intoxicated reviewers are not helpful.</p> <p>In fact, there is no reason for linguistics, as a field, to be at odds with ML. The animosity, I think, comes from the incompatibility of a certain <strong>kind</strong> of linguistics with a particular <strong>kind</strong> of machine learning researcher. Starting in the 1950s, North American linguistics was dominated by the generative school, which was built upon discrete symbolic representations governed by categorical rules and constraints and arrived at via a specific introspective methodology. This was compatible with (indeed, very much a part) of the first wave of “artificial intelligence” research. Head Driven Phrase Structure Grammar (on which Bender cut her teeth) grows out of this school (though it has a more computationally appealing formalism and a more empirically-oriented culture). But generative linguistics is not (and has never been) the only approached to the language sciences. Many linguists before, during, and after the heyday of generative grammar, have viewed language as grounded in stochastic patterns of usage and for these empirically-minded linguists, machine learning (whether classical or neural) is a natural, intuitive approach to language.</p> <p>The point of disagreement, really, is in what class of questions are interesting. Few linguists (even computational linguists) are really going be interested in knowledge distillation and few ML practitioners are really going to be interested in noncompositionality in Newari noun phrases. I would like to advance the radical proposition that it is okay for a professional organization and a conference to embrace people who are interested in different things. I would go further and say that we benefit from being in an organization with people who have different interests and backgrounds than we have. In the end, we all have language to tie us together, and we all have computing to tie us together.</p> <p>I sincerely believe that the name of the ACL should change. We should keep “ACL” but change the full name to “<strong>the Association for Computing and Language</strong>.” This matched better what the ACL is to me and sets the tenor for a community that embraces a spectrum of research interests and practices.</p>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="links"/><summary type="html"><![CDATA[Ruminations on Emily Bender's Presidential Address at ACL2024]]></summary></entry><entry><title type="html">Why does diachronic linguistics matter?</title><link href="https://changelinglab.github.io/blog/2024/why-diachronic-linguistics/" rel="alternate" type="text/html" title="Why does diachronic linguistics matter?"/><published>2024-08-07T16:40:16+00:00</published><updated>2024-08-07T16:40:16+00:00</updated><id>https://changelinglab.github.io/blog/2024/why-diachronic-linguistics</id><content type="html" xml:base="https://changelinglab.github.io/blog/2024/why-diachronic-linguistics/"><![CDATA[<p>Diachronic linguistics is important for the same reason the evolutionary biology is important: you cannot really understand the state of a language at any given time, or why the range of attested languages is what it is, without understanding the mechanisms by which languages come to be.</p> <p>In computational linguistics, diachrony has often played second fiddle. The focus, during the Age of Rules, was on implementing linguistic formalisms (that captured aspects of language like syntax, semantics, and discourse structure). This reflected the prestige approaches in theoretical linguistics. It also reflected the underlying expectation that, with just a few more rules (or a few more fixes to existing rules), these symbolic systems would be good for something.</p> <p>Historical linguistics has long lacked the prestige of theoretical syntax and semantics and, at first blush, it is not good for much in the practical realm. Certainly the knowledge of human history and culture that comes from understanding the linguistic past is valuable. However, it is not readily apparent how diachronic linguistics feeds into machine translation, question answering, automatic speech recognition, or any of the other human language technologies that are economically important today.</p> <p>Certainly, on the margins, historical linguistics can help with a few NLP tasks. For example, historical linguistic relationships (are the resulting similarities in phonology) can be leveraged to facilitate named entity recognition, as we have shown in a few papers like <a class="citation" href="#bharadwaj2016phonologically">(Bharadwaj et al., 2016)</a> and <a class="citation" href="#chaudhary2018adapting">(Chaudhary et al., 2018)</a>. Likewise, phylogentitic information has been used for selecting transfer languages in cross-lingual training. However, it seems unlikely that the next big breakthrough in NLP will be the product of diachronic linguistics (or even linguistics).</p> <p>What it lacks in terms of engineering potential, though, diachronic linguistics more than makes up for in its scientific implications. Indeed, the persistent question of typology (why languages are so alike and why they differ where they differ) has been treated as a question of human linguistic competence. However, decades of research make it appear more likely that it is the mechanisms of human change the bias languages towards certain structural properties, whether in phonology, morphology, syntax, semantics, etc. The question of why languages vary in the way that they do, then, is ultimately a question of what changes languages are likely to undergo in their histories.</p> <p>It is not obvious that diachronic linguistics will have a transformative affect upon the world economy or the technological sphere. However, it is almost certain that our understanding of human language as a phenomenon hinges on our understanding of language change and linguistic history. This may, in turn, have practical consequences, but <strong>even if it does not</strong> investigating linguistic diachrony is worthwhile.</p>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="links"/><summary type="html"><![CDATA[a short philosophical discursion]]></summary></entry></feed>