…in Spanish. But Tweets are likely to come from U.S.
With the #CopaAmerica well underway and the #Euro2016 kicking off as I type, I had no choice but to dedicate this week’s episode of Professional Shenanigans to major international soccer tournaments and their representation on social media (i.e. Twitter).
After focusing on tweets about MLS teams and predicting their frequency by looking at characteristics of the state they originated in, I wanted this project to explore one of the most interesting aspects of major international sporting events: language. From a media perspective, managing the preferences of a multi-lingual audience is a challenge.
Knowing that soccer is a truly global sport, and realizing that social media is a low-cost but effective in-house solution for reaching this global audience, many teams have started internationalizing their Twitter content. For example, 14 out of 16 Bundesliga teams had at least two Twitter accounts — with one providing information in a language other than German (mostly: English). Some clubs even added further channels. Most notably, the fantastic @FCBayernUS popularizing the club in the US, and Bayer Leverkusen’s @bayer04_es (with more than 75k fans following updates on Chicharito).
Teams often have a clear rationale for creating specific language destinations (English as the accepted “lowest common denominator”, plus a language spoken by the fans of a major star playing for the team) — but how does a major tournament navigate this space? Just look at the #Euro2016: Almost every participating country has its own official language (in fact: I counted 19 languages for 24 participating teams).
The situation at the #Copa100 is far less dramatic. 13 out of 16 participating nations speak Spanish, so the priority should be clear. Then you add some English (for the USMNT as the host), as well as Portuguese (Brazil) — and you’re done. And that is exactly what the organizers did by creating three separate Twitter accounts. Interestingly, though, the main account (@CA2016 with more than 208k followers) is in English. Both “language destinations” trail considerably in followers (ES = 23k, PT = 1.5k), even though they were all established in April 2015 and tweet similar amounts of content. This is somewhat surprising, especially when looking at my language analysis (see Figure 1).
So the question becomes: Are Spanish-speaking Twitter users for some reason choosing the English account, or are the English-speaking Copa followers a silent majority? To find out, I looked at a subset of the @CA2016 followers (n=25,000; see Data Collection) and their language settings. Interestingly, the results mirror the distribution within tweets. The most frequently used language was Spanish (45%), followed by English (38%), French (2%), and Portuguese (2%). Arabic (1%) remained in the top 5. Overall, then, even though language destinations for Spanish and Portuguese exist, fans chose the English account.
Why: Identity? Quality? Originality?
There are several potential explanations for fans’ preference for following a major sporting event in English rather than the language that they self-identify as native (at least based on their Twitter account). First, fans following international sporting events on social media (and especially in soccer) might be more likely to be fluent in English – at least enough to feel comfortable consuming English language content. Many fans are probably following their national teams’ stars in foreign leagues (MLS, Premier League, Bundesliga), and therefore consume English-language content to stay up-to-date. At the same time, these highly engaged fans might also perceive the content on the “bigger” @CA2016 account to be more original and of higher quality (as the tournament is played in the U.S.).
At the same time, many fans with Spanish as their main Twitter language might actually live in the U.S. To test that hypothesis, I analyzed a subset of #Copa100 tweets that included geo-location (unfortunately, only 2,5% of tweets did). Still, the greatest portion of these tweets (~27%) originated in the U.S. (see Figure 2), showing some support for the argument that the Spanish-speaking population in the U.S. might follow the Copa in English.
As always: There is much more to analyze and I will use the #Copa100 and #Euro2016 as my playground for the weeks to come. Ideas, suggestions, and feedback are welcome!
How: This is how I got the data
Data collection: I accessed Twitter’s Streaming API using the StreamR package in R*. Search was limited to tweets mentioning either one of the two most popular hashtags (#Copa100, used by the official Copa America account; #CopaAmerica, pushed by Twitter) or the term “Copa America”. I ran three rounds of initial data collection, limiting each search to tweets published within a 30-minute time frame. Data was collected during the early afternoon hours, trying to avoid live updates during games (that’ll be a different study). The three searches returned a combined 10388 tweets.
Of note: Even though data collection took place in the early afternoon while no games were played, the results are likely to reflect the scheduling of the tournament to a certain extent. Fans of the teams that had played the night before (Brasil vs. Haiti & Ecuador vs. Peru) or later that day (Uruguay vs. Venezuela & Mexico vs. Jamaica) should be more likely to talk about the tournament on Twitter during data collection.
For the second part of the analysis, I used the twitteR package to access information about the 25k most recent followers of the @CA2016 account and evaluated the most frequently set language within the returned accounts. Most followers (92,8%) were not verified and therefore likely non-media / athlete accounts with an average of 838 followers.
* If you’re interested in learning more about the package and how to use it to scrape tweets, I strongly recommend the work(shops) of Pablo Barbera.