The first question we have to answer on our way to MLS wisdom is: How do you measure the popularity of the MLS in a given state? There are a plethora of approaches, but given my research focus on Twitter, my recent experiments with data collection via R, and my desire to relate social media and real world data, I settled on Twitter mentions of MLS teams. No, this is not the perfect measure. Not even the best one, to be honest. Twitter users are not a representative sample of the overall population, and not even of the average MLS fan (Youtube, Facebook, and Instagram have higher adoption rates among MLS fans). Still, it is a measure that I am interested in. Twitter is highly relevant in sports. In fact, Twitter has the highest growth rate among social media platforms for MLS fans (not counting Snapchat where I didn’t have data). As a result, Twitter is extremely relevant for marketing purposes and users are actively pursued and engaged by the industry.
Data: So tweets it is. But how do we get them? To collect a sample suitable for analysis, I accessed the Twitter API via R and pulled all tweets using the @username of any of the 20 MLS teams for 300 seconds per team. I went with a team-centered approach here, because I assume greater engagement with teams compared to leagues. You’d rather say “I’m a fan of PhilaUnion” instead of saying “I’m a fan of the MLS”, right? We attach to people and teams more than we do to leagues – especially when we’re trying to interact. Similarly, using @usernames as a selection criterion instead of simple team mentions not only helped to reduce unwanted data (somebody mentioning the city or team name in an unrelated context), but also to ensured reaching a highly involved audience (you need to care about the team to know and use the @username). This resulted in ~250,000 tweets overall, varying slightly across teams. However, given my interest in comparing MLS teams’ popularity on Twitter across states, I only retained those tweets that contained geolocation. This turns out to be around 10% of all tweets and reduced the final sample to 25,307 tweets. As expected, the number of tweets per state varied heavily from a low of 24 in Wyoming to 3255 in California (see Figure 1), and between teams (see Figure 2).
Analysis: I ran an Ordinary Least Squares (OLS) regression model predicting tweets per state from a set of variables (see discussion below) in SPSS. I won’t go into all of the details here, but the model worked well — predicting about 99% of the variance in tweet volume per state (F = 469.61, df = 11, p < .001).
Results. Or: What predicts the popularity of the MLS?
To create a (somewhat) level playing field for our analysis, we need to account for some differences among states that would otherwise skew the results. First and foremost: population size. The more people live in a given state, the more people can (at least theoretically) tweet about the MLS. Take Wyoming, for example. Each of its 586,107 residents would have to be much more active on Twitter to reach the same number of tweets produced by the 39,144,818 people living in California. To address this issue, I entered population size (obtained from the U.S. Census) as a variable in the analysis.
Population size alone, however, is not enough. Just imagine a scenario in which all residents of California — for some hypothetical reason — had no access to the Internet? Then Wyoming would suddenly look pretty active on social media, right? So I looked up the percentage of residents in each state that has high-speed Internet access (again, the U.S. Census Bureau thankfully provides this information). As expected, Internet access matters. States with lower Internet penetration (e.g., Mississippi and Alabama with ~ 65%) have fewer tweets mentioning MLS teams than states with higher Internet penetration (e.g., New Hampshire and Massachusetts with more than 85%). It was not the strongest predictor in the model, but it surely matters. Especially when we consider that tweeting is an inherently mobile activity (83% of users are active via mobile devices) and that using mobile Internet adoption statistics in a potential follow-up might be even more relevant.
Having these two potentially confounding variables accounted for, it is time to move on. When thinking about the people behind the tweets, you also think about the history and current state of soccer itself. Soccer is often regarded as a somewhat “foreign” sport to many Americans that attracts more of an international audience. And when I say “international”, I’m thinking of two regions in particular: Europe and Latin America. Both are generally regarded as soccer-crazy (dominant soccer culture, all World Cup winners are either from Europe or South America), and relevant to the U.S. market based on immigration patterns. So might a state with more immigrants from one or both of these regions tweet more about the MLS? Yes and no. I went back to the Census data and looked for the number of people residing in each state that was not U.S. citizens at birth but came from either Europe or Latin America (South America, Central America, Mexico, and the Caribbean). Then, I calculated the proportion of the overall state population that is made up by Europeans (ranging from ~0% to ~4%) and Latinos (ranging from ~0% to ~15%). Albeit both geographic regions’ undeniable soccer craze, the amount of Europeans living in a given state does not matter for the amount of MLS tweets. Latin America’s love for the game, though, definitely translates to the U.S. The more individuals from South America, Central America, Mexico, and the Caribbean reside in a given state, the more tweets about the MLS originate in that state. But why is this the case? Maybe Europeans are hanging on to their “home clubs” (and have more opportunities to follow them on TV), while the Latin American community has started embracing the MLS (due to a greater influx of players from the region?). But that’s just a first guess.
Another indicator that might explain the MLS’ popularity on Twitter in a given state is the percentage of students playing soccer in high school. My rationale here goes something like this: If soccer becomes an integral part of young peoples’ lives (and I take playing soccer in high school as a very crude proxy for this process), then it might lead to greater interest in the game later in life (and supposedly some interest in the MLS that would be measurable in terms of teams’ Twitter mentions). To put a number on this, I went to the National Federation of State High School Associations and downloaded their complete “participation statistics”. Then I calculated the proportion of students who play soccer from the overall number of active high school athletes in each state. For example, in Virginia and Massachusetts about 16% of high school students play soccer, while in South Dakota that number plummets down to about 4%. Sounds interesting, right? Right. But it has no relation to the number of tweets mentioning MLS teams whatsoever.
Let’s move on. Anecdotal evidence suggests that soccer is becoming a “white collar” sport in the U.S. (as strange as this may sound). In fact, 39% of MLS fans have a reported household income of more than $100,000, compared to 25.2% of NFL fans. Similarly, kids that keep playing soccer (throughout high school and college) tend to come from higher socioeconomic backgrounds. At least that’s what youth coaches keep telling me. According to their accounts, children from lower socioeconomic backgrounds drop out of soccer at a higher rate when other, more competitive options (especially football and basketball in high school) arise. The rationale behind this is two-fold. First, soccer is still considered the less lucrative career choice by many Americans. However, this is (by most accounts) false. Yes, there are more football scholarships than soccer scholarships at the university level – and they usually “pay” better. So this is true. But: The chances of getting one of these scholarships and moving from high school to NCAA competition is equal for both sports (about 6%). The same is true for the chance of moving from NCAA to the professional leagues (2%). So there is no difference (see Figure 4). And this only includes the U.S. When you look at the global picture (and whenever you talk about soccer, you should), there are far more professional soccer players than football players. The second argument for the potentially skewed demographics of soccer players is closely related, but taps more into the cultural dimension of football: The higher the socioeconomic status, the greater the concerns about health/safety of children and the willingness to give up the culture of football. No matter if you personally agree with this, adding a variable tapping into this dynamic seems to make sense. Actually, I used two: The average household income per state and the proportion of “professionals and managers” among the workforce per state (both based on Census data and neatly compiled by the Kaiser Family Foundation). Both measures are obviously related (managers tend to earn more than service workers), but they are still different enough to be in the same model. Average household income ranged from ~ $35,000 in Mississippi to ~ $75,000 in Maryland; with 33% professionals and managers in (again) Mississippi to 57% in D.C. Long story short: Results of the regression model indicate further support for the “soccer = more wealth”-hypothesis. Even though household income itself failed to reach statistical significance, the proportion of professionals and managers in a given state was one of the most powerful predictors of MLS team mentions on Twitter. This is an interesting dynamic that certainly warrants further investigation — especially considering the cultural and market-related implications. We have to consider, though, that Twitter users, in general, tend to be more educated and wealthy. So it remains to be seen if this dynamic is unique to soccer or a general pattern.
When you talk about sports, you might also talk about the physical constitution of fans. Why? Well, you could argue that people who love soccer would also want to play soccer and therefore need to be in (somewhat) decent physical condition. As a result, a state with a higher number of “fit” residents could tweet more about the MLS. Unfortunately, you can also argue the exact opposite. If you believe all soccer fans look like THIS, you might also believe that they only have the time to tweet about the MLS in the first place because they’re not capable of playing themselves. To find out which argument (if any) finds support, I went to the Centers for Disease Control and Prevention (CDC) website and pulled state-by-state data on obesity (percentage of residents with a BMI of 30 or higher), and physical exercise (percentage residents who participated in any physical activities or exercise during the past month). The results are somewhat mixed. Unfortunately for me as a soccer fan, we seem to look more like THIS. The less fit a given state, the more MLS tweets are sent. Even though obesity rate by itself did not reach statistical significance (p = .07 with p < .05 being the accepted standard for significance), physical activity did (big time): The more “active” people there are in a state, the less they tweet about the MLS. To be honest, I am not sure what this means. However, it tells me to continue my research into the relationship between sport, social media, and health (two studies underway).
Finally, I turned to geographical proximity. More specifically, I asked: Does having an MLS team in “your” state make you more likely to tweet about the team? It makes sense to believe that a greater percentage of core fans (assumed to be the most active) would live in somewhat close proximity to their team. This might be especially true if the soccer team is the only major sports franchise around. For example, a sports fan in the state of California has to decide between 18 Big Six (NLF, MLB, NBA, NHL, MLS, CFL) teams, whereas a sports fan in Oklahoma only has one major franchise to focus on. Looking into this dynamic, I compiled a list of Big Four teams per state as well as the number of MLS teams per state and entered both as variables into the model. Unfortunately, both did not matter for the amount of MLS tweets. This could mean that a) MLS fans don’t care about proximity or competition, or that b) my measure is off. I’d probably go with b). I have to admit that the “franchise per state” variable is not the best measure of either proximity or competition. Take Kansas City, for example. Just because the Chiefs, Royals, and Sporting KC are on the Missouri-side of the city does not mean that no one in Kansas cares about them. In fact, many people in Kansas (such as residents of Topeka) live in closer proximity to these teams than lots of people living in Missouri. Same is true for competition. A more detailed analysis is on order. Some initial investigations (see Figure 5), indicate at least a certain concentration around the teams’ locations.
So: What do we learn from this?
We know that California tweets a lot about the MLS. And we know that this is largely due to the states’ large population and high Internet adoption. These two factors are more or less given and don’t provide many actionable insights when looking at this from a marketing standpoint (teams can hardly shove more residents into a given state or provide high-speed Internet access in rural areas). However, we also know that the influence of Latin American fans is huge. Reaching out to this demographic and fostering opportunities to connect (maybe even intensifying the recruitment of players from Latin America or – even better – Latin American immigrants) could be highly rewarding. In addition, there seems to be a strong connection between the “white collar” population of a state and soccer tweets. Exploring this relationship further might also yield potential angles for marketing campaigns directly aimed at this demographic. On the flip-side, this also means that “educational” efforts, highlighting the “benefits” of soccer to currently less affectionate demographics might be a way of increasing fandom. However, as a final disclaimer, all of these results and interpretations are speculative/exploratory in nature and should only serve as a source of inspiration and foundation for further inquiry. Due to the nature of the underlying data, causation can not be inferred.
What else? Your feedback here! I am always looking for feedback and ideas to investigate further. So if you have suggestions, please let me know.