Blog

Apr 15, 2022

Can computers change the way we speak?

Some months ago, I bought my first smart speaker: an Amazon echo. Because I live in the U.S., Alexa's default language was English. Even though I'm a native speaker of Spanish, I can also speak English, so it wasn't really a problem for me. However, when pronouncing Spanish proper names (e.g., people or places), I would still use my Spanish pronunciation. It turns out Alexa couldn't understand me unless I anglicized those names (ugh! Read more

Feb 28, 2022

Four key steps for developing guidelines for language data annotation

Whether you want to build a corpus to document an interesting linguistic phenomenon or simply want to have language data to train your AI models, the process of data annotation is crucial. Language data annotation (or text data annotation) is the process of adding labels with relevant information or meta-data. Even though it might seem like a simple process, choosing the right labels for your data is not always easy, as ambiguous data is pretty common. Read more

Oct 27, 2021

How to get Twitter data

Nowadays, a lot of linguistics research uses language data from Twitter. At every major linguistics conference, there's at least one or two people (if not more!) discussing the use of language on this popular social network. Even though doing research on language use on Twitter is quite trendy now, getting the data for research is not that cool. The reason behind this? There are several ways to get Twitter data and each way depends on a variety of factors, which makes the process a bit confusing. Read more

Aug 25, 2021

An analysis of Peter Pan using the R package koRpus

If you are into language research, you probably use R to analyze your data. One of the good things about R is that you can install packages to extend R's basic functions. In this blog post, I'd like to introduce you to a very useful package for language research: koRpus. Despite its name, the R package koRpus was designed to work with individual texts, and not with a collection of texts (I know, this is confusing! Read more

Jul 21, 2021

Transcribing and analyzing speech samples with CLAN

In a previous blog post, I wrote about automatic transcription software that you can use to speed up your transcription process. However, chances are that you also want to analyze what you transcribe if you're working on a research project. Measures such as words per minute or lexical diversity scores are quite common in linguistics research. In this blog post, I introduce you to CLAN, a software that will help you with both transcription and analysis. Read more