Bots won’t buy you any fan engagement – or will they?

Fan Engagement in the MLS

We’re about three months into the MLS season – time for another edition of the highly in-official Hype-O-Meter. With two important changes:

  1. How “real” are your followers: When Atlanta United entered the scene, there was much debate on whether how much they had bought themselves into their “from-hero-to-zero” status on Twitter. An expansion team suddenly having the most followers in the league – smells like fake. And it seems to be exactly that. Inspired by the press coverage of Trump’s Twitter exploits, I used “Twitter Audit” to analyze a sample of Twitter followers for each team. Although this is not a perfect representation for the exact makeup of each team’s followers (e.g., I did not pay for a pro account and therefore had to rely on data from earlier analyses, only sub-samples were analyzed), it can serve as a decent proxy.

“Each audit takes a sample of up to 5000 (or more, if you subscribe to Pro) Twitter followers for a user and calculates a score for each follower. This score is based on number of tweets, date of the last tweet, and ratio of followers to friends. We use these scores to determine whether any given user is real or fake. Of course, this scoring method is not perfect but it is a good way to tell if someone with lots of followers is likely to have increased their follower count by inorganic, fraudulent, or dishonest means.”

What I found was quite interesting. As some commentators had suspected earlier: Atlanta seems to have added quite a bit their follower count. In fact: Only 48% of their 517,134 followers are deemed real. Interestingly, though, that does not seem to hurt them. Usually, simple bots or fake accounts won’t buy you any fan engagement (as measured in likes and retweets), but Atlanta still comes in a solid 3rd. However, they are the exception. There are several other teams in the MLS that seem to have added to their follower count — and as a result, find themselves mostly towards the bottom of the engagement chart.

Here is the complete list of “fakeness” in the MLS:

Team Followers % of “Real” Followers
Vancouver 308005 26
Toronto 290872 42
Atlanta 517134 48
Houston 343049 51
Montral 284534 61
Orlando 387925 63
San Jose 203446 63
Washington 117203 64
Seattle 407927 66
Portland 285669 73
Los Angeles 391280 74
NY Red Bulls 184798 76
Dallas 134959 77
Colorado 84481 77
Kansas City 297709 78
Chicago 139334 80
Philadelphia 109161 81
NYCFC 332828 83
New England 89855 84
Columbus 143143 85
Salt Lake 127778 87
Minnesota 60619 88

No surprise: all professional / celebrity accounts will draw some noise and attract the occasional bot follower. However, the result for Vancouver was quite shocking. Of their more than 300k followers, only 26% (or about 80k) are active enough to be counted as “real”. I almost hope that this is some form of a slip-up, but the engagement numbers would match the trend. For the 2nd time in a row, the Whitecaps are among the bottom three of the league, generating only .86 favorites and .6 retweets per 10000 followers. They were also voted as least appealing Twitter Account by Howler Magazine. Houston and Montreal, the only two teams with less fan engagement, also rank in the first quarter of the “fake followers” analysis. On the other side of the spectrum: Minnesota United deserves a big shout out. The expansion franchise seemingly chose the slow(er) route of organic growth on social media and now tops the Engagement Ranking for the 2nd time in a row — with a whopping margin.

2. Bye, bye – Facebook. I decided to leave Facebook out of the analysis. Both platforms are very different and lumping them together in one analysis is likely to confound the results. Instead, I decided to focus on Twitter — which became even richer from a data perspective given the addition of “Twitter Audit” and some planned further analyses.

Some interesting findings / thoughts.

  • Overall engagement and tweet frequency are up from the pre-season analysis. Which makes perfect sense given that game days are expected to a) see more tweets, and b) get fans much more involved.
  • Chicago Fire and L.A. Galaxy upped their game. Not sure if it is the “Schweinsteiger-Effekt” for Chicago or getting quite a bit of TV time for Los Angeles, but both teams jump significantly in the ranking.  They might have also simply upped their social media efforts for the season: L.A. was just voted as having the best memes in the MLS.
  • Can’t buy me love – or can I? Atlanta is somewhat of a conundrum. I just complained about their (presumably) artificially bolstered follower count and how that should diminish their fan engagement scores, and yet they rank among the top three for the 2nd time in a row. How can that be? We can’t be sure, but there are some possible explanations:
    • 1) Even without the suspected bots, United would still sport almost 250k followers – the 4th most in the MLS (when all teams are adjusted to their true follower count based on Twitter Audit data). So: There is quite some buzz surrounding the team — and maybe making the follower count look nice early on kick-started overall engagement. When we only look at this “core” group of followers and calculate engagement based on them, Atlanta comes in first. By far.
    • 2) I don’t want to suggest anything here – I really like how Atlanta has kick-started their campaign on the pitch as well as online – but one could also suspect that they invested in smart bots that could automatically like and share content instead of “dead” fake accounts. In the end, though, any brand engaging in such behavior would shoot itself in the foot. No matter how well a bot is programmed, unless you also train in to buy your merchandise and sit in the stands, the ROI simply won’t be there. Instead, you’ll have to explain why you seem to have a gazillion fans — that never buy anything.
  • Love thy fans! There is quite a variance in the amount of interactivity among teams and their fans – at least when taking replies (and retweets) on Twitter as a proxy. Seattle leads the reply charts: more than 28% of all original tweets were in reply to another Twitter account — compared to 4% for Orlando and Montreal. Looking at retweets, Salt Lake is king. Almost 39% of all tweets are re-tweets. On the other end of the spectrum: Philadelphia posts the most original content with “only” 9% of RTs.


[UPDATE] Following some further analyses and great feedback, I have adjusted the engagement formulas to include adjustments for playing games on national television, as well as market size.

nba most engaging teams on social media twitter and facebook

1. Getting the Data

1.1. Twitter

I use “R” to access Twitter’s REST API, which provides programmatic access to tweets, user profiles, follower data, etc.* Instead of pulling all the information manually, I use a script that downloads up to +/- 3200 of the most recent tweets made by the selected accounts (in this case: all NBA teams) including the info I am interested in (retweets, favorites, replies, etc.). Once the data is downloaded, I save it in individual .json files for further analysis.

* The REST API does not provide access to real-time data (we would need the STREAMING API ), but since I’m interested in accounts instead of ongoing conversations, the REST API works better.

1.2. Facebook

Again, I use R to access the API. The fantastic “Rfacebook” library allows downloading information about public posts from public pages. Instead of downloading a set number of posts, I restrict my data collection to a certain timeframe. In this case: the NBA season so far. More specifically, I include all posts made by the official Facebook pages of all NBA teams between the beginning of September 2016 (Naismith Memorial Basketball Hall of Fame Enshrinement) and the end of January 2017 (time of data collection).

2. Cleaning & Normalizing the Data

2.1. Followers / Fans

When looking at engagement on social media**, there are several confounding variables that need to be taken into account before starting any analysis. One of the most obvious (and also the easiest to fix) is the number of followers each team has. Logically, a team with more than 5.5 million followers (such as the Lakers) will naturally elicit more favorites and retweets than a team with ~ 560k followers (can people please start following the Utah Jazz), simply by having each tweet shown to a bigger audience. The same logic — of course — applies to Facebook, where the official Lakers page have ~ 22 million fans and Utah’s has about 1.2 million.

** The term “engagement” is used rather vaguely in both academia and the industry. For this analysis, I refer only to the behavioral component of engagement. Or rather: a crude proxy of it  — favorites and retweets for Twitter and Likes, Comments, and Shares for Facebook. To be perfectly clear here: This measure is rather a measure of the breadth than the depth of engagement because it does not tell us anything about “why” a Twitter user liked or favorited a tweet.

To create a level playing field, I need to control for the number of followers/fans each team has. As a first step, I divide the number of favorites, retweets (for Twitter), as well as likes, comments, and shares (Facebook) by the number of followers/fans each official team account/page has (I downloaded that information as part of the data collection process). However, since the resulting number (favorites/retweets/likes/comments/shares per single follower) is abysmal and meaningless in any practical sense, I multiply it by 10,000. Given the range of followers most NBA teams have, the resulting “per 10,000 followers”-variable provides a good starting point to compare fan reactions to tweets and Facebook posts across teams.

2.2. Replies

Another potentially confounding variable: replies. When a tweet starts with a @username (aka is a reply), the only users who will see it in their timeline (other than the sender and the recipient) are those who follow both the sender and the recipient. This reduces the potential audience (and therefore the potential for engagement) quite a bit. In other words: Teams with more replies in their data would be disadvantaged in any subsequent calculation of engagement (as measured in favorites and retweets – see below). Therefore, I separate the replies from the “original” content and only analyze the latter.

However, I don’t want to throw away that information. How teams reply to tweets – and therefore directly interact with followers – is a great separate indicator of fan engagement. Even though it is hard to quantify (and therefore not included in the engagement calculation), interacting directly with fans can be seen as a proxy for the effort/manpower each team puts into their social media strategies. Here are the most interactive NBA teams on Twitter:

  1. Portland Trail Blazers — 29.4%
  2. Memphis Grizzlies — 23.1%
  3. Sacramento Kings — 22.6%
  4. Denver Nuggets —19.4%
  5. Miami Heat — 16.5%
  6. Atlanta Hawks — 13.9%
  7. New Orleans Pelicans — 10.7%
  8. Orlando Magic — 10.2%
  9. Philadelphia 76ers — 10.2%
  10. Utah Jazz — 9.2%

2.3. All-Star Voting

To vote for their favorite player, users were encouraged to tweet, retweet or reply with a player’s first and last name or Twitter handle, along with the hashtag #NBAVOTE. Teams — as a way to promote their players — would then post tweets containing “#NBAVOTE” and encourage fans to retweet. As a result, teams with more popular players would likely receive more retweets. To reduce the potential effect of the All-Star Voting, I eliminated all tweets containing the “#NBAVOTE” hashtag from the dataset.***

*** I retained that information for the Facebook posts, though.

2.4. On-Field Success [update]

A team’s success is often the most powerful predictor of fan engagement. From a logical perspective, it’s much easier (and pleasant) to create content for a winning team and get people to like it than it is to pick up the pieces after losses (you don’t really “like” a loss, right?). In fact, I ran a regression model predicting fan engagement from a range of variables, and the current season record of a team emerged as the most powerful predictor. Why does this matter? Well, I’m mostly interested in “who does the best job on social media?” (as in: which team has the best social media folks) – and not in “whose fans are most excited for some other reason?”. As a result, I need to control for the effect on on-field success. To do so, I calculated how much each win contributes to the different engagement indicators. For example, each win is (on average) worth 148 likes on Facebook.

2.5. Television [UPDATE]

Social media and TV go well together. Twitter is often considered the premier 2nd screen medium in the realm of sport: teams promote their Twitter handles on their courts, Twitter promotes specific hashtags for events. As a logical consequence, teams with a greater presence on national television (ESPN, ABC, TNT) should by default generate more engagement. If all teams were to get equal TV time, we could just neglect this factor — but they don’t. At the time of data collection, the average team had been on national TV (excl. NBA TV & League Pass) about 7 times. However, while teams like the Warriors (22 games) and Clippers (17) have had plenty of time to “promote” themselves on air, the Nuggets and Nets (each 1), and Magic (0) don’t get that chance very often (thankfully, the NBA provides that type of data). Long story short: I ran a series of models to compute how much each TV appearance on ESPN, ABC, and TNT contributes to the overall engagement — and being on the telly matters quite a lot. For example: Every time your team plays on ESPN, you get an additional ~900 likes for your Facebook content.

2.6. Market Size [UPDATE]

This is a tough one. One of the biggest (theoretical) advantages of social media for all sorts of businesses is that it creates a somewhat level playing field. A small, family-owned business in Buford, Wyoming, can (theoretically) reach the same worldwide audience as a major corporation in New York City. The reality, however, looks a bit different. While social media has certainly opened up new avenues for smaller-market teams to flourish and reach fans beyond their traditional market, the sometimes dramatic differences in the home markets of NBA teams still matter. For example, teams in New York or Los Angeles (the two biggest media markets in the NBA) will have avery different “baseline media exposure” than the Memphis Grizzlies or New Orleans Pelicans in the two smallest TV markets in the league. Overall, the effect is not dramatic, but can certainly make a difference for some teams. For example: The Knicks will automatically get ~100 more comments on Facebook than the Portland Trailblazers just because of the market.

3. Calculating Engagement

What is a like worth? Or a retweet? Assigning values to user behavior is complicated. How do we know why an individual likes a piece of content? Well, we don’t. One might like a tweet because it is interesting. One might like a tweet to archive it. Or in hopes of being recognized by the creator of the content. Or because someone we care about cares about the content and we want to show that we care, too. In other words: We often can’t know if a user really cares about our content – or if (s)he is using our content as a relationship-building token or a virtual currency for social attention.

In any case, though, the general consensus is that fan engagement on social media matters. Some of my own research, for example, has shown that increased interactivity in form of comments on Facebook relates to traffic been referred to an organizations’ website. And even a “like” represents an individual’s engagement with the creator of the content. Even though a user might have liked it for some other reason, (s)he must have a) been exposed to it, and b) not too appalled by it to have it associated with their online identity.

Building on that argument, we can then start thinking about different degrees of engagement. A comment, for example, represents greater psychological (one has to think about what to comment) and physical (one has to actually type it out) effort than simply clicking the like button. As a consequence: One who comments must care more about the content when willing to exert this additional effort. Therefore, a comment should be “worth” more than a like when calculating fan engagement on social media. Finally, a share not only often represents an endorsement of the underlying content, but also expands the reach of the original post beyond the initial audience (connections of the one who shares might not follow the content creator) and should, therefore, be of even greater value to the content creator.

Based on this logic, I can assign weights to the individual proxies of fan engagement and calculate a single score across platforms. Is that score going to be a perfect representation of “how well” a team is doing on social media? No. Certainly not. The actual numbers are arbitrary and the resulting final score has no deeper practical meaning (you can’t buy anything for let’s say 90 Engagement), but they allow a normalized comparison across teams. People like — and often need — a simplified (key) performance indicator to evaluate their performance and allow a (crude) comparison with their competitors. This is what this score does. At least I hope it does. I call it:

Win-adjusted Normalized Engagement Score (or: WANE Score).

And this is what it looks like [UPDATE]:

In the first step, I adjust each individual engagement indicator by the major control variables identified above to adjust the scores for teams’ appearances on three major TV channels, their market size, and winning. For the TV and success adjustments, I take the league average as a standard and adjust every team towards that mean. For example, a team like the Warriors will lose likes, comments, etc. for each game they are over the league average for TV games and wins, and a team like the Nets will have points added.

However, not all teams can be assumed to benefit from these factors equally. A team putting relatively few resources into the creation of engaging social media content on a daily basis won’t get as big of a boost from an additional win than a team that is constantly developing new formats. To adjust for that (unknown) factor, I created an adjustment based on the baseline social media engagement ranking for each team and each channel:

Social Media Engagement Adjustment Formula NBA

Once I have adjusted all the individual indicators (Twitter = favorites, retweets; Facebook = likes, comments, shares ) based on this formula, I can use it to further calculate the overall engagement:

NBA social media engagement formula calulations

What this formula does is normalizing the adjusted average number of favorites and retweets (for Twitter) and likes, comments, and shares (for Facebook) by the number of followers each team has, then assigning weights to them following the logic explained above, and finally adding them up. In the final step, I combine the values for Twitter and Facebook and then normalize the score to engagement per 10,000 followers.


4. So what does all that mean?

Good question. Although we can’t take the WANE Score as an absolute value and measure of success, the calculations provide at least a starting point for comparing fan engagement on social media across teams. The results how dramatic differences in fan engagement within the NBA — and might give us an idea where to look for successful social media strategies.

Here are some high-level observations:

  • Posting frequency varies quite a bit across teams —- on both Twitter and Facebook. While the Orlando Magic only sent out 1530 eligible tweets since September 2016, several teams tweeted more than 3200 times (which was the maximum I could collect). On Facebook (where I could get all data independent of the number of posts), the average team published 563 posts (~ 3-4 posts per day). Still, there was quite some variance in the data. Memphis published the most content with 809 posts, the Lakers the least with 358 posts over the course of the season so far.
  • Some teams are very likeable – others not so much. The Warriors get about 26 likes per 10,000 Fans on Facebook and 3 favorites per 10,000 Followers on Twitter, which makes them the “most likeable” team in the league. By far. They lead the Cavaliers by about 9 points on the combined scale. Milwaukee comes in third, just ahead of Philadelphia, Houston, and San Antonio. On the other and of the scale: The Mavericks and Pistons on Facebook (with less than 3 likes per 10,000 fans), and the Pelicans and Magic on Twitter (with less than half a favorite per 10,000 followers).
  • “Most Viral” content. It’s the Warriors, again. Golden State generates about 2 retweets per 10,000 fans on Facebook, followed by Philadelphia (1.53) and Atlanta (1.35). On Twitter, Toronto stands out (3.66 retweets per 10,000 followers) – with the Cavs (2.80) and Sixers (2.68) to follow. Combined, the Warriors produce the “most viral” content, followed by the Sixers, Cavs, and Raptors. On the other and of the scale: The Nets, Heat, and Nuggets for Facebook — and the Magic (again), Heat, and Pistons (again) on Twitter. Combined, the Heat rank last. Just behind the Nuggets and Magic. All of the numbers above are “pure” (not adjusted for wins/TV time).
  • Content matters! Despite including a variety of variables in my calculations, a good portion of the variance has not been explained. My estimation right now is that at least between 20-30% of engagement depends on the actual content teams produce.
  • Average? On average, an NBA team generates 1.41 favorites and 1.31 retweets on Twitter. On Facebook, the league average is about 7.82 likes per 10,000 followers per post. Comments are much harder to get: on average, only one in about 200,000 followers will comment. Finally, per 10,000 followers, about .67 shares are generated.

Summer Break

Shenanigans are currently on summer break, because:

  • a) I am on “vacation” while moving to Pennsylvania
  • b) I am currently collecting Twitter data on the #Euro2016 for some larger projects (500k+ tweets collected) and ideas for more Shenanigans
  • c) I’m probably enjoying watching soccer a bit too much

But don’t worry: Shenanigans will be back in late August.


The Language of Engagement

Figure 1. Average number of favorites and retweets across Twitter accounts
Figure 1. Average number of favorites and retweets across FCB Twitter accounts

Following my analysis of the languages spoken by #Copa100 fans on Twitter, somebody asked me: Does it even make sense to have language destinations if most people flock to the major account anyways? In other words: My resources are limited – so why put effort into crafting language-specific content when the majority of fans does not seem to care?

Good question.

The answer is: yes, language destinations make sense. A lot of sense.

And here is why: Although we don’t reach as many people with the additional accounts (the average “foreign language” account has about 63% fewer followers), the ones that we reach are usually more committed. And greater commitment means more engagement with our content — and ultimately a stronger bond with our brand. At least that’s the theory.

Are “international” fans really more engaged?

Take Bayern Munich, for example. Their main Twitter account (@FCBayern) has about 2,85 million followers. However, given the popularity and social significance of Bayern Munich in German society (games and player signings often serve as token for conversation), many followers are likely to be less committed (read: average sports fans that just want to stay up-to-date) and therefore consume information rather passively. For many followers, Bayern Munich might only be their 2nd or 3rd favorite club that they revert to when the club plays internationally. Following (the entertaining) @FCBayernUS, on the other hand, requires more commitment to soccer in general and Bayern in particular, as the sport and club are not “mainstream-topics” in the US. As a result, a more active audience should be expected. Similarly, fans of Chicharito Hernandez following the Spanish-language account of Bayer Leverkusen (@bayer04_es) should be more inclined to interact with content that is specifically tailored towards their interests.

Figure 2. FC Barcelona provides 9 language-specific Twitter accounts
Figure 2. FC Barcelona provides 9 language-specific Twitter accounts

But is the really the case? Testing my hypothesis, I compared a total of 14 language destinations — including those of two leagues (Bundesliga, MLS) and three clubs (Bayern Munich, Bayer Leverkusen, FC Barcelona). This is by no means a representative sample, but rather a purposive one. I chose Bayern mainly because of the “unusual” way they run @FCBayernUS. To engage fans in the US, the account features more entertaining content (informal language, GIFs, emojis, retweeting of user generated content) than most “traditional” team accounts. In theory, this should result in greater engagement. Similarly, the Spanish Leverkusen account (started in 2015 after signing Chicharito) provides content tailored to his fans. Furthermore, I chose the official Bundesliga accounts (German and English), to assess how the expanded international TV deals (especially in the US) affect engagement. Similarly, I was interested in potential differences between the English and Spanish accounts of the @MLS. Finally, I added three @FCBarcelona accounts — just because the club is probably the most extreme example of creating language destinations (see Figure 2). Also: The club’s main account is in English rather than Spanish (all other clubs and leagues in the sample use their “native” language for the main account). And: In contrast to most other entities, all Barcelona accounts tweet the exact same content (with very few exceptions). In other words: They do not tailor content towards specific audience segments, which might reduce the benefit of language destinations. Here is what I found:

Language destinations show more engagement

This slideshow requires JavaScript.

  • Teams get more engagement than leagues. Fans identify with their favorite club – not necessarily the league the club plays in.
  • Language destinations out-perform the “original” account. For all entities in the sample, the language-specific accounts received more favorites and retweets per 10,000 followers. The most impressive numbers come from @FCBayernUS (7 x more favorites; 10 x more retweets than @FCBayern) and Leverkusen’s international destinations.
  • It is easier to like than to share: All accounts received more favorites than retweets. This yields support for the argument that a retweet/share should be valued higher than a favorite/like when evaluating social media metrics. Favoriting a tweet involves lesser commitment and effort than retweeting and thereby endorsing a tweet and might be done for a different reason (e.g., archiving function, social token).
  • Content matters: Language-specific channels yield the biggest benefits when their content is specifically tailored towards the targeted audience segment. In other words, simply translating the “original” content is not enough. Language destinations designed around a specific purpose (e.g., a player, cultural engagement) tend to generate the most engagement.

Method: Some detail on the analysis

Data Collection: I accessed the Twitter API using the userTimeline function of the twitteR package in “R” to call up the timelines of the selected accounts. Using this method, Twitter limits the search to a relatively short period of time (usually between 1 – 3 weeks. However, I was able to go back until November 2015 for @Bayer04_es). Other methods (such as Pablo Barbera’s getTimeline function) allow downloading up to 3200 tweets, but showed inconsistencies for key variables during data collection. Therefore, I chose data-quality over sample size and defer the larger-scale analysis until later. Overall, I collected 6556 tweets across 14 accounts. The number of tweets per account ranged from a low of 88 (@MLS) to a high of 1639 (@FCBarcelona).

Analysis: Twitter provides two metrics that are commonly used as a proxy for user engagement by both industry and academia: favorites and retweets. Despite questions about the validity of these measures (e.g., does a favorite on Twitter really mean somebody engaged with your tweet – or is it a social currency acknowledging your relationship?) and uncertainties about their value (how much is a favorite worth – and how much more value should be attached to a retweet that actually increases your audience?), they a) still seem to be accepted as the industry standard, and b) are the ones I can easily measure automatically. To allow for direct comparison of all analyzed accounts, I normalized both engagement measures as averages per 10,000 followers. By doing so, @Bayer_EN (18k followers) and @FCBarcelona (17,8m followers) have a level playing field to compete on.

You can find some descriptive statistics here.

The #Copa100 tweets…

Figure 1. Top 5 languages spoken by individuals tweeting about the Copa America
Figure 1. Top 5 languages spoken by individuals tweeting about the Copa America

…in Spanish. But Tweets are likely to come from U.S.

With the #CopaAmerica well underway and the #Euro2016 kicking off as I type,  I had no choice but to dedicate this week’s episode of Professional Shenanigans to major international soccer tournaments and their representation on social media (i.e. Twitter).

After focusing on tweets about MLS teams and predicting their frequency by looking at characteristics of the state they originated in, I wanted this project to explore one of the most interesting aspects of major international sporting events: language. From a media perspective, managing the preferences of a multi-lingual audience is a challenge.

Knowing that soccer is a truly global sport, and realizing that social media is a low-cost but effective in-house solution for reaching this global audience, many teams have started internationalizing their Twitter content. For example, 14 out of 16 Bundesliga teams had at least two Twitter accounts — with one providing information in a language other than German (mostly: English). Some clubs even added further channels. Most notably, the fantastic @FCBayernUS popularizing the club in the US, and Bayer Leverkusen’s @bayer04_es (with more than 75k fans following updates on Chicharito).

Teams often have a clear rationale for creating specific language destinations (English as the accepted “lowest common denominator”, plus a language spoken by the fans of a major star playing for the team) — but how does a major tournament navigate this space? Just look at the #Euro2016: Almost every participating country has its own official language (in fact: I counted 19 languages for 24 participating teams).

The situation at the #Copa100 is far less dramatic. 13 out of 16 participating nations speak Spanish, so the priority should be clear. Then you add some English (for the USMNT as the host), as well as Portuguese (Brazil) — and you’re done. And that is exactly what the organizers did by creating three separate Twitter accounts. Interestingly, though, the main account (@CA2016 with more than 208k followers) is in English. Both “language destinations” trail considerably in followers (ES = 23k, PT = 1.5k), even though they were all established in April 2015 and tweet similar amounts of content. This is somewhat surprising, especially when looking at my language analysis (see Figure 1).

So the question becomes: Are Spanish-speaking Twitter users for some reason choosing the English account, or are the English-speaking Copa followers a silent majority? To find out, I looked at a subset of the @CA2016 followers (n=25,000; see Data Collection) and their language settings. Interestingly, the results mirror the distribution within tweets. The most frequently used language was Spanish (45%), followed by English (38%), French (2%), and Portuguese (2%). Arabic (1%) remained in the top 5. Overall, then, even though language destinations for Spanish and Portuguese exist, fans chose the English account.

Why: Identity? Quality? Originality?

Figure 2. Country of origin for subset of tweets (n=280)
Figure 2. Country of origin for subset of tweets including geo-location (n=280)

There are several potential explanations for fans’ preference for following a major sporting event in English rather than the language that they self-identify as native (at least based on their Twitter account). First, fans following international sporting events on social media (and especially in soccer) might be more likely to be fluent in English – at least enough to feel comfortable consuming English language content. Many fans are probably following their national teams’ stars in foreign leagues (MLS, Premier League, Bundesliga), and therefore consume English-language content to stay up-to-date. At the same time, these highly engaged fans might also perceive the content on the “bigger”  @CA2016 account to be more original and of higher quality (as the tournament is played in the U.S.).

At the same time, many fans with Spanish as their main Twitter language might actually live in the U.S. To test that hypothesis, I analyzed a subset of #Copa100 tweets that included geo-location (unfortunately, only 2,5% of tweets did). Still, the greatest portion of these tweets (~27%) originated in the U.S. (see Figure 2), showing some support for the argument that the Spanish-speaking population in the U.S. might follow the Copa in English.

As always: There is much more to analyze and I will use the #Copa100 and #Euro2016 as my playground for the weeks to come. Ideas, suggestions, and feedback are welcome!

How: This is how I got the data

Data collection: I accessed Twitter’s Streaming API using the StreamR package in R*. Search was limited to tweets mentioning either one of the two most popular hashtags (#Copa100, used by the official Copa America account; #CopaAmerica, pushed by Twitter) or the term “Copa America”. I ran three rounds of initial data collection, limiting each search to tweets published within a 30-minute time frame. Data was collected during the early afternoon hours, trying to avoid live updates during games (that’ll be a different study). The three searches returned a combined 10388 tweets. 

Of note: Even though data collection took place in the early afternoon while no games were played, the results are likely to reflect the scheduling of the tournament to a certain extent. Fans of the teams that had played the night before (Brasil vs. Haiti & Ecuador vs. Peru) or later that day (Uruguay vs. Venezuela & Mexico vs. Jamaica) should be more likely to talk about the tournament on Twitter during data collection.

For the second part of the analysis, I used the twitteR package to access information about the 25k most recent followers of the @CA2016 account and evaluated the most frequently set language within the returned accounts. Most followers (92,8%) were not verified and therefore likely non-media / athlete accounts with an average of 838 followers.

* If you’re interested in learning more about the package and how to use it to scrape tweets, I strongly recommend the work(shops) of Pablo Barbera