Bots won’t buy you any fan engagement – or will they?

Fan Engagement in the MLS

We’re about three months into the MLS season – time for another edition of the highly in-official Hype-O-Meter. With two important changes:

  1. How “real” are your followers: When Atlanta United entered the scene, there was much debate on whether how much they had bought themselves into their “from-hero-to-zero” status on Twitter. An expansion team suddenly having the most followers in the league – smells like fake. And it seems to be exactly that. Inspired by the press coverage of Trump’s Twitter exploits, I used “Twitter Audit” to analyze a sample of Twitter followers for each team. Although this is not a perfect representation for the exact makeup of each team’s followers (e.g., I did not pay for a pro account and therefore had to rely on data from earlier analyses, only sub-samples were analyzed), it can serve as a decent proxy.

“Each audit takes a sample of up to 5000 (or more, if you subscribe to Pro) Twitter followers for a user and calculates a score for each follower. This score is based on number of tweets, date of the last tweet, and ratio of followers to friends. We use these scores to determine whether any given user is real or fake. Of course, this scoring method is not perfect but it is a good way to tell if someone with lots of followers is likely to have increased their follower count by inorganic, fraudulent, or dishonest means.”

What I found was quite interesting. As some commentators had suspected earlier: Atlanta seems to have added quite a bit their follower count. In fact: Only 48% of their 517,134 followers are deemed real. Interestingly, though, that does not seem to hurt them. Usually, simple bots or fake accounts won’t buy you any fan engagement (as measured in likes and retweets), but Atlanta still comes in a solid 3rd. However, they are the exception. There are several other teams in the MLS that seem to have added to their follower count — and as a result, find themselves mostly towards the bottom of the engagement chart.

Here is the complete list of “fakeness” in the MLS:

Team Followers % of “Real” Followers
Vancouver 308005 26
Toronto 290872 42
Atlanta 517134 48
Houston 343049 51
Montral 284534 61
Orlando 387925 63
San Jose 203446 63
Washington 117203 64
Seattle 407927 66
Portland 285669 73
Los Angeles 391280 74
NY Red Bulls 184798 76
Dallas 134959 77
Colorado 84481 77
Kansas City 297709 78
Chicago 139334 80
Philadelphia 109161 81
NYCFC 332828 83
New England 89855 84
Columbus 143143 85
Salt Lake 127778 87
Minnesota 60619 88

No surprise: all professional / celebrity accounts will draw some noise and attract the occasional bot follower. However, the result for Vancouver was quite shocking. Of their more than 300k followers, only 26% (or about 80k) are active enough to be counted as “real”. I almost hope that this is some form of a slip-up, but the engagement numbers would match the trend. For the 2nd time in a row, the Whitecaps are among the bottom three of the league, generating only .86 favorites and .6 retweets per 10000 followers. They were also voted as least appealing Twitter Account by Howler Magazine. Houston and Montreal, the only two teams with less fan engagement, also rank in the first quarter of the “fake followers” analysis. On the other side of the spectrum: Minnesota United deserves a big shout out. The expansion franchise seemingly chose the slow(er) route of organic growth on social media and now tops the Engagement Ranking for the 2nd time in a row — with a whopping margin.

2. Bye, bye – Facebook. I decided to leave Facebook out of the analysis. Both platforms are very different and lumping them together in one analysis is likely to confound the results. Instead, I decided to focus on Twitter — which became even richer from a data perspective given the addition of “Twitter Audit” and some planned further analyses.

Some interesting findings / thoughts.

  • Overall engagement and tweet frequency are up from the pre-season analysis. Which makes perfect sense given that game days are expected to a) see more tweets, and b) get fans much more involved.
  • Chicago Fire and L.A. Galaxy upped their game. Not sure if it is the “Schweinsteiger-Effekt” for Chicago or getting quite a bit of TV time for Los Angeles, but both teams jump significantly in the ranking.  They might have also simply upped their social media efforts for the season: L.A. was just voted as having the best memes in the MLS.
  • Can’t buy me love – or can I? Atlanta is somewhat of a conundrum. I just complained about their (presumably) artificially bolstered follower count and how that should diminish their fan engagement scores, and yet they rank among the top three for the 2nd time in a row. How can that be? We can’t be sure, but there are some possible explanations:
    • 1) Even without the suspected bots, United would still sport almost 250k followers – the 4th most in the MLS (when all teams are adjusted to their true follower count based on Twitter Audit data). So: There is quite some buzz surrounding the team — and maybe making the follower count look nice early on kick-started overall engagement. When we only look at this “core” group of followers and calculate engagement based on them, Atlanta comes in first. By far.
    • 2) I don’t want to suggest anything here – I really like how Atlanta has kick-started their campaign on the pitch as well as online – but one could also suspect that they invested in smart bots that could automatically like and share content instead of “dead” fake accounts. In the end, though, any brand engaging in such behavior would shoot itself in the foot. No matter how well a bot is programmed, unless you also train in to buy your merchandise and sit in the stands, the ROI simply won’t be there. Instead, you’ll have to explain why you seem to have a gazillion fans — that never buy anything.
  • Love thy fans! There is quite a variance in the amount of interactivity among teams and their fans – at least when taking replies (and retweets) on Twitter as a proxy. Seattle leads the reply charts: more than 28% of all original tweets were in reply to another Twitter account — compared to 4% for Orlando and Montreal. Looking at retweets, Salt Lake is king. Almost 39% of all tweets are re-tweets. On the other end of the spectrum: Philadelphia posts the most original content with “only” 9% of RTs.

The MLS Pre-Season Hype-O-Meter

The MLS season just kicked off – and with two new teams, there was lots of excitement to go around. Atlanta and Minnesota did a great job on social media getting their fans into the game early on. Here are some numbers.

MLS Preseason Hype Social Media

The Language of Engagement

Figure 1. Average number of favorites and retweets across Twitter accounts
Figure 1. Average number of favorites and retweets across FCB Twitter accounts

Following my analysis of the languages spoken by #Copa100 fans on Twitter, somebody asked me: Does it even make sense to have language destinations if most people flock to the major account anyways? In other words: My resources are limited – so why put effort into crafting language-specific content when the majority of fans does not seem to care?

Good question.

The answer is: yes, language destinations make sense. A lot of sense.

And here is why: Although we don’t reach as many people with the additional accounts (the average “foreign language” account has about 63% fewer followers), the ones that we reach are usually more committed. And greater commitment means more engagement with our content — and ultimately a stronger bond with our brand. At least that’s the theory.

Are “international” fans really more engaged?

Take Bayern Munich, for example. Their main Twitter account (@FCBayern) has about 2,85 million followers. However, given the popularity and social significance of Bayern Munich in German society (games and player signings often serve as token for conversation), many followers are likely to be less committed (read: average sports fans that just want to stay up-to-date) and therefore consume information rather passively. For many followers, Bayern Munich might only be their 2nd or 3rd favorite club that they revert to when the club plays internationally. Following (the entertaining) @FCBayernUS, on the other hand, requires more commitment to soccer in general and Bayern in particular, as the sport and club are not “mainstream-topics” in the US. As a result, a more active audience should be expected. Similarly, fans of Chicharito Hernandez following the Spanish-language account of Bayer Leverkusen (@bayer04_es) should be more inclined to interact with content that is specifically tailored towards their interests.

Figure 2. FC Barcelona provides 9 language-specific Twitter accounts
Figure 2. FC Barcelona provides 9 language-specific Twitter accounts

But is the really the case? Testing my hypothesis, I compared a total of 14 language destinations — including those of two leagues (Bundesliga, MLS) and three clubs (Bayern Munich, Bayer Leverkusen, FC Barcelona). This is by no means a representative sample, but rather a purposive one. I chose Bayern mainly because of the “unusual” way they run @FCBayernUS. To engage fans in the US, the account features more entertaining content (informal language, GIFs, emojis, retweeting of user generated content) than most “traditional” team accounts. In theory, this should result in greater engagement. Similarly, the Spanish Leverkusen account (started in 2015 after signing Chicharito) provides content tailored to his fans. Furthermore, I chose the official Bundesliga accounts (German and English), to assess how the expanded international TV deals (especially in the US) affect engagement. Similarly, I was interested in potential differences between the English and Spanish accounts of the @MLS. Finally, I added three @FCBarcelona accounts — just because the club is probably the most extreme example of creating language destinations (see Figure 2). Also: The club’s main account is in English rather than Spanish (all other clubs and leagues in the sample use their “native” language for the main account). And: In contrast to most other entities, all Barcelona accounts tweet the exact same content (with very few exceptions). In other words: They do not tailor content towards specific audience segments, which might reduce the benefit of language destinations. Here is what I found:

Language destinations show more engagement

This slideshow requires JavaScript.

  • Teams get more engagement than leagues. Fans identify with their favorite club – not necessarily the league the club plays in.
  • Language destinations out-perform the “original” account. For all entities in the sample, the language-specific accounts received more favorites and retweets per 10,000 followers. The most impressive numbers come from @FCBayernUS (7 x more favorites; 10 x more retweets than @FCBayern) and Leverkusen’s international destinations.
  • It is easier to like than to share: All accounts received more favorites than retweets. This yields support for the argument that a retweet/share should be valued higher than a favorite/like when evaluating social media metrics. Favoriting a tweet involves lesser commitment and effort than retweeting and thereby endorsing a tweet and might be done for a different reason (e.g., archiving function, social token).
  • Content matters: Language-specific channels yield the biggest benefits when their content is specifically tailored towards the targeted audience segment. In other words, simply translating the “original” content is not enough. Language destinations designed around a specific purpose (e.g., a player, cultural engagement) tend to generate the most engagement.

Method: Some detail on the analysis

Data Collection: I accessed the Twitter API using the userTimeline function of the twitteR package in “R” to call up the timelines of the selected accounts. Using this method, Twitter limits the search to a relatively short period of time (usually between 1 – 3 weeks. However, I was able to go back until November 2015 for @Bayer04_es). Other methods (such as Pablo Barbera’s getTimeline function) allow downloading up to 3200 tweets, but showed inconsistencies for key variables during data collection. Therefore, I chose data-quality over sample size and defer the larger-scale analysis until later. Overall, I collected 6556 tweets across 14 accounts. The number of tweets per account ranged from a low of 88 (@MLS) to a high of 1639 (@FCBarcelona).

Analysis: Twitter provides two metrics that are commonly used as a proxy for user engagement by both industry and academia: favorites and retweets. Despite questions about the validity of these measures (e.g., does a favorite on Twitter really mean somebody engaged with your tweet – or is it a social currency acknowledging your relationship?) and uncertainties about their value (how much is a favorite worth – and how much more value should be attached to a retweet that actually increases your audience?), they a) still seem to be accepted as the industry standard, and b) are the ones I can easily measure automatically. To allow for direct comparison of all analyzed accounts, I normalized both engagement measures as averages per 10,000 followers. By doing so, @Bayer_EN (18k followers) and @FCBarcelona (17,8m followers) have a level playing field to compete on.

You can find some descriptive statistics here.

This is where the MLS gets the most Twitter love – and why!

MLS Tweets per State
Figure 1. Number of tweets about MLS teams per state (click image for interactive map).

The first question we have to answer on our way to MLS wisdom is: How do you measure the popularity of the MLS in a given state? There are a plethora of approaches, but given my research focus on Twitter, my recent experiments with data collection via R, and my desire to relate social media and real world data, I settled on Twitter mentions of MLS teams. No, this is not the perfect measure. Not even the best one, to be honest. Twitter users are not a representative sample of the overall population, and not even of the average MLS fan (Youtube, Facebook, and Instagram have higher adoption rates among MLS fans). Still, it is a measure that I am interested in. Twitter is highly relevant in sports. In fact, Twitter has the highest growth rate among social media platforms for MLS fans (not counting Snapchat where I didn’t have data). As a result, Twitter is extremely relevant for marketing purposes and users are actively pursued and engaged by the industry.

Figure 2. Where MLS teams get their the greatest Twitter engagement (adjusted for population)?
Figure 2. Where MLS teams get their Twitter engagement (adjusted for population)?

Data: So tweets it is. But how do we get them? To collect a sample suitable for analysis, I accessed the Twitter API via R and pulled all tweets using the @username of any of the 20 MLS teams for 300 seconds per team. I went with a team-centered approach here, because I assume greater engagement with teams compared to leagues. You’d rather say “I’m a fan of PhilaUnion” instead of saying “I’m a fan of the MLS”, right? We attach to people and teams more than we do to leagues – especially when we’re trying to interact. Similarly, using @usernames as a selection criterion instead of simple team mentions not only helped to reduce unwanted data (somebody mentioning the city or team name in an unrelated context), but also to ensured reaching a highly involved audience (you need to care about the team to know and use the @username). This resulted in ~250,000 tweets overall, varying slightly across teams. However, given my interest in comparing MLS teams’ popularity on Twitter across states, I only retained those tweets that contained geolocation. This turns out to be around 10% of all tweets and reduced the final sample to 25,307 tweets. As expected, the number of tweets per state varied heavily from a low of 24 in Wyoming to 3255 in California (see Figure 1), and between teams (see Figure 2).

Analysis: I ran an Ordinary Least Squares (OLS) regression model predicting tweets per state from a set of variables (see discussion below) in SPSS. I won’t go into all of the details here, but the model worked well — predicting about 99% of the variance in tweet volume per state (F = 469.61, df = 11, p < .001).

Results. Or: What predicts the popularity of the MLS?

To create a (somewhat) level playing field for our analysis, we need to account for some differences among states that would otherwise skew the results. First and foremost: population size. The more people live in a given state, the more people can (at least theoretically) tweet about the MLS. Take Wyoming, for example. Each of its 586,107 residents would have to be much more active on Twitter to reach the same number of tweets produced by the 39,144,818 people living in California. To address this issue, I entered population size (obtained from the U.S. Census) as a variable in the analysis.

Population size alone, however, is not enough. Just imagine a scenario in which all residents of California — for some hypothetical reason — had no access to the Internet? Then Wyoming would suddenly look pretty active on social media, right? So I looked up the percentage of residents in each state that has high-speed Internet access (again, the U.S. Census Bureau thankfully provides this information). As expected, Internet access matters. States with lower Internet penetration (e.g., Mississippi and Alabama with ~ 65%) have fewer tweets mentioning MLS teams than states with higher Internet penetration (e.g., New Hampshire and Massachusetts with more than 85%). It was not the strongest predictor in the model, but it surely matters. Especially when we consider that tweeting is an inherently mobile activity (83% of users are active via mobile devices) and that using mobile Internet adoption statistics in a potential follow-up might be even more relevant.

Liverpool fans celebrate at Anfield
Figure 4. European soccer culture: Liverpool F.C. fans celebrate at Anfield

Having these two potentially confounding variables accounted for, it is time to move on. When thinking about the people behind the tweets, you also think about the history and current state of soccer itself. Soccer is often regarded as a somewhat “foreign” sport to many Americans that attracts more of an international audience. And when I say “international”, I’m thinking of two regions in particular: Europe and Latin America. Both are generally regarded as soccer-crazy (dominant soccer culture, all World Cup winners are either from Europe or South America), and relevant to the U.S. market based on immigration patterns. So might a state with more immigrants from one or both of these regions tweet more about the MLS? Yes and no. I went back to the Census data and looked for the number of people residing in each state that was not U.S. citizens at birth but came from either Europe or Latin America (South America, Central America, Mexico, and the Caribbean). Then, I calculated the proportion of the overall state population that is made up by Europeans (ranging from ~0% to ~4%) and Latinos (ranging from ~0% to ~15%). Albeit both geographic regions’ undeniable soccer craze, the amount of Europeans living in a given state does not matter for the amount of MLS tweets. Latin America’s love for the game, though, definitely translates to the U.S. The more individuals from South America, Central America, Mexico, and the Caribbean reside in a given state, the more tweets about the MLS originate in that state. But why is this the case? Maybe Europeans are hanging on to their “home clubs” (and have more opportunities to follow them on TV), while the Latin American community has started embracing the MLS (due to a greater influx of players from the region?). But that’s just a first guess.

Another indicator that might explain the MLS’ popularity on Twitter in a given state is the percentage of students playing soccer in high school. My rationale here goes something like this: If soccer becomes an integral part of young peoples’ lives (and I take playing soccer in high school as a very crude proxy for this process), then it might lead to greater interest in the game later in life (and supposedly some interest in the MLS that would be measurable in terms of teams’ Twitter mentions). To put a number on this, I went to the National Federation of State High School Associations and downloaded their complete “participation statistics”. Then I calculated the proportion of students who play soccer from the overall number of active high school athletes in each state. For example, in Virginia and Massachusetts about 16% of high school students play soccer, while in South Dakota that number plummets down to about 4%. Sounds interesting, right? Right. But it has no relation to the number of tweets mentioning MLS teams whatsoever.

Figure 4. Overall, only 2 percent of high school athletes are awarded some form of athletics scholarship to compete in college.
Figure 4. Overall, only 2 percent of high school athletes are awarded some form of athletics scholarship to compete in college. Source: NCAA.

Let’s move on. Anecdotal evidence suggests that soccer is becoming a “white collar” sport in the U.S. (as strange as this may sound). In fact, 39% of MLS fans have a reported household income of more than $100,000, compared to 25.2% of NFL fans. Similarly, kids that keep playing soccer (throughout high school and college) tend to come from higher socioeconomic backgrounds. At least that’s what youth coaches keep telling me. According to their accounts, children from lower socioeconomic backgrounds drop out of soccer at a higher rate when other, more competitive options (especially football and basketball in high school) arise. The rationale behind this is two-fold. First, soccer is still considered the less lucrative career choice by many Americans. However, this is (by most accounts) false. Yes, there are more football scholarships than soccer scholarships at the university level – and they usually “pay” better. So this is true. But: The chances of getting one of these scholarships and moving from high school to NCAA competition is equal for both sports (about 6%). The same is true for the chance of moving from NCAA to the professional leagues (2%). So there is no difference (see Figure 4). And this only includes the U.S. When you look at the global picture (and whenever you talk about soccer, you should), there are far more professional soccer players than football players. The second argument for the potentially skewed demographics of soccer players is closely related, but taps more into the cultural dimension of football: The higher the socioeconomic status, the greater the concerns about health/safety of children and the willingness to give up the culture of football. No matter if you personally agree with this, adding a variable tapping into this dynamic seems to make sense. Actually, I used two: The average household income per state and the proportion of “professionals and managers” among the workforce per state (both based on Census data and neatly compiled by the Kaiser Family Foundation). Both measures are obviously related (managers tend to earn more than service workers), but they are still different enough to be in the same model. Average household income ranged from ~ $35,000 in Mississippi to ~ $75,000 in Maryland; with 33% professionals and managers in (again) Mississippi to 57% in D.C. Long story short: Results of the regression model indicate further support for the “soccer = more wealth”-hypothesis. Even though household income itself failed to reach statistical significance, the proportion of professionals and managers in a given state was one of the most powerful predictors of MLS team mentions on Twitter. This is an interesting dynamic that certainly warrants further investigation — especially considering the cultural and market-related implications. We have to consider, though, that Twitter users, in general, tend to be more educated and wealthy. So it remains to be seen if this dynamic is unique to soccer or a general pattern.

When you talk about sports, you might also talk about the physical constitution of fans. Why? Well, you could argue that people who love soccer would also want to play soccer and therefore need to be in (somewhat) decent physical condition. As a result, a state with a higher number of “fit” residents could tweet more about the MLS. Unfortunately, you can also argue the exact opposite. If you believe all soccer fans look like THIS, you might also believe that they only have the time to tweet about the MLS in the first place because they’re not capable of playing themselves. To find out which argument (if any) finds support, I went to the Centers for Disease Control and Prevention (CDC) website and pulled state-by-state data on obesity (percentage of residents with a BMI of 30 or higher), and physical exercise (percentage residents who participated in any physical activities or exercise during the past month). The results are somewhat mixed. Unfortunately for me as a soccer fan, we seem to look more like THIS. The less fit a given state, the more MLS tweets are sent. Even though obesity rate by itself did not reach statistical significance (p = .07 with p < .05 being the accepted standard for significance), physical activity did (big time): The more “active” people there are in a state, the less they tweet about the MLS. To be honest, I am not sure what this means. However, it tells me to continue my research into the relationship between sport, social media, and health (two studies underway).

Local tweets about MLS team Philadelphia Union
Figure 5. Geo-information of “local” tweets about MLS team Philadelphia Union

Finally, I turned to geographical proximity. More specifically, I asked: Does having an MLS team in “your” state make you more likely to tweet about the team? It makes sense to believe that a greater percentage of core fans (assumed to be the most active) would live in somewhat close proximity to their team. This might be especially true if the soccer team is the only major sports franchise around. For example, a sports fan in the state of California has to decide between 18 Big Six (NLF, MLB, NBA, NHL, MLS, CFL) teams, whereas a sports fan in Oklahoma only has one major franchise to focus on. Looking into this dynamic, I compiled a list of Big Four teams per state as well as the number of MLS teams per state and entered both as variables into the model. Unfortunately, both did not matter for the amount of MLS tweets. This could mean that a) MLS fans don’t care about proximity or competition, or that b) my measure is off. I’d probably go with b). I have to admit that the “franchise per state” variable is not the best measure of either proximity or competition. Take Kansas City, for example. Just because the Chiefs, Royals, and Sporting KC are on the Missouri-side of the city does not mean that no one in Kansas cares about them. In fact, many people in Kansas (such as residents of Topeka) live in closer proximity to these teams than lots of people living in Missouri. Same is true for competition. A more detailed analysis is on order. Some initial investigations (see Figure 5), indicate at least a certain concentration around the teams’ locations.

So: What do we learn from this?

We know that California tweets a lot about the MLS. And we know that this is largely due to the states’ large population and high Internet adoption. These two factors are more or less given and don’t provide many actionable insights when looking at this from a marketing standpoint (teams can hardly shove more residents into a given state or provide high-speed Internet access in rural areas). However, we also know that the influence of Latin American fans is huge. Reaching out to this demographic and fostering opportunities to connect (maybe even intensifying the recruitment of players from Latin America or – even better – Latin American immigrants) could be highly rewarding. In addition, there seems to be a strong connection between the “white collar” population of a state and soccer tweets. Exploring this relationship further might also yield potential angles for marketing campaigns directly aimed at this demographic. On the flip-side, this also means that “educational” efforts, highlighting the “benefits” of soccer to currently less affectionate demographics might be a way of increasing fandom. However, as a final disclaimer, all of these results and interpretations are speculative/exploratory in nature and should only serve as a source of inspiration and foundation for further inquiry. Due to the nature of the underlying data, causation can not be inferred.

What else? Your feedback here! I am always looking for feedback and ideas to investigate further. So if you have suggestions, please let me know.