[UPDATE] Following some further analyses and great feedback, I have adjusted the engagement formulas to include adjustments for playing games on national television, as well as market size.

nba most engaging teams on social media twitter and facebook

1. Getting the Data

1.1. Twitter

I use “R” to access Twitter’s REST API, which provides programmatic access to tweets, user profiles, follower data, etc.* Instead of pulling all the information manually, I use a script that downloads up to +/- 3200 of the most recent tweets made by the selected accounts (in this case: all NBA teams) including the info I am interested in (retweets, favorites, replies, etc.). Once the data is downloaded, I save it in individual .json files for further analysis.

* The REST API does not provide access to real-time data (we would need the STREAMING API ), but since I’m interested in accounts instead of ongoing conversations, the REST API works better.

1.2. Facebook

Again, I use R to access the API. The fantastic “Rfacebook” library allows downloading information about public posts from public pages. Instead of downloading a set number of posts, I restrict my data collection to a certain timeframe. In this case: the NBA season so far. More specifically, I include all posts made by the official Facebook pages of all NBA teams between the beginning of September 2016 (Naismith Memorial Basketball Hall of Fame Enshrinement) and the end of January 2017 (time of data collection).

2. Cleaning & Normalizing the Data

2.1. Followers / Fans

When looking at engagement on social media**, there are several confounding variables that need to be taken into account before starting any analysis. One of the most obvious (and also the easiest to fix) is the number of followers each team has. Logically, a team with more than 5.5 million followers (such as the Lakers) will naturally elicit more favorites and retweets than a team with ~ 560k followers (can people please start following the Utah Jazz), simply by having each tweet shown to a bigger audience. The same logic — of course — applies to Facebook, where the official Lakers page have ~ 22 million fans and Utah’s has about 1.2 million.

** The term “engagement” is used rather vaguely in both academia and the industry. For this analysis, I refer only to the behavioral component of engagement. Or rather: a crude proxy of it  — favorites and retweets for Twitter and Likes, Comments, and Shares for Facebook. To be perfectly clear here: This measure is rather a measure of the breadth than the depth of engagement because it does not tell us anything about “why” a Twitter user liked or favorited a tweet.

To create a level playing field, I need to control for the number of followers/fans each team has. As a first step, I divide the number of favorites, retweets (for Twitter), as well as likes, comments, and shares (Facebook) by the number of followers/fans each official team account/page has (I downloaded that information as part of the data collection process). However, since the resulting number (favorites/retweets/likes/comments/shares per single follower) is abysmal and meaningless in any practical sense, I multiply it by 10,000. Given the range of followers most NBA teams have, the resulting “per 10,000 followers”-variable provides a good starting point to compare fan reactions to tweets and Facebook posts across teams.

2.2. Replies

Another potentially confounding variable: replies. When a tweet starts with a @username (aka is a reply), the only users who will see it in their timeline (other than the sender and the recipient) are those who follow both the sender and the recipient. This reduces the potential audience (and therefore the potential for engagement) quite a bit. In other words: Teams with more replies in their data would be disadvantaged in any subsequent calculation of engagement (as measured in favorites and retweets – see below). Therefore, I separate the replies from the “original” content and only analyze the latter.

However, I don’t want to throw away that information. How teams reply to tweets – and therefore directly interact with followers – is a great separate indicator of fan engagement. Even though it is hard to quantify (and therefore not included in the engagement calculation), interacting directly with fans can be seen as a proxy for the effort/manpower each team puts into their social media strategies. Here are the most interactive NBA teams on Twitter:

  1. Portland Trail Blazers — 29.4%
  2. Memphis Grizzlies — 23.1%
  3. Sacramento Kings — 22.6%
  4. Denver Nuggets —19.4%
  5. Miami Heat — 16.5%
  6. Atlanta Hawks — 13.9%
  7. New Orleans Pelicans — 10.7%
  8. Orlando Magic — 10.2%
  9. Philadelphia 76ers — 10.2%
  10. Utah Jazz — 9.2%

2.3. All-Star Voting

To vote for their favorite player, users were encouraged to tweet, retweet or reply with a player’s first and last name or Twitter handle, along with the hashtag #NBAVOTE. Teams — as a way to promote their players — would then post tweets containing “#NBAVOTE” and encourage fans to retweet. As a result, teams with more popular players would likely receive more retweets. To reduce the potential effect of the All-Star Voting, I eliminated all tweets containing the “#NBAVOTE” hashtag from the dataset.***

*** I retained that information for the Facebook posts, though.

2.4. On-Field Success [update]

A team’s success is often the most powerful predictor of fan engagement. From a logical perspective, it’s much easier (and pleasant) to create content for a winning team and get people to like it than it is to pick up the pieces after losses (you don’t really “like” a loss, right?). In fact, I ran a regression model predicting fan engagement from a range of variables, and the current season record of a team emerged as the most powerful predictor. Why does this matter? Well, I’m mostly interested in “who does the best job on social media?” (as in: which team has the best social media folks) – and not in “whose fans are most excited for some other reason?”. As a result, I need to control for the effect on on-field success. To do so, I calculated how much each win contributes to the different engagement indicators. For example, each win is (on average) worth 148 likes on Facebook.

2.5. Television [UPDATE]

Social media and TV go well together. Twitter is often considered the premier 2nd screen medium in the realm of sport: teams promote their Twitter handles on their courts, Twitter promotes specific hashtags for events. As a logical consequence, teams with a greater presence on national television (ESPN, ABC, TNT) should by default generate more engagement. If all teams were to get equal TV time, we could just neglect this factor — but they don’t. At the time of data collection, the average team had been on national TV (excl. NBA TV & League Pass) about 7 times. However, while teams like the Warriors (22 games) and Clippers (17) have had plenty of time to “promote” themselves on air, the Nuggets and Nets (each 1), and Magic (0) don’t get that chance very often (thankfully, the NBA provides that type of data). Long story short: I ran a series of models to compute how much each TV appearance on ESPN, ABC, and TNT contributes to the overall engagement — and being on the telly matters quite a lot. For example: Every time your team plays on ESPN, you get an additional ~900 likes for your Facebook content.

2.6. Market Size [UPDATE]

This is a tough one. One of the biggest (theoretical) advantages of social media for all sorts of businesses is that it creates a somewhat level playing field. A small, family-owned business in Buford, Wyoming, can (theoretically) reach the same worldwide audience as a major corporation in New York City. The reality, however, looks a bit different. While social media has certainly opened up new avenues for smaller-market teams to flourish and reach fans beyond their traditional market, the sometimes dramatic differences in the home markets of NBA teams still matter. For example, teams in New York or Los Angeles (the two biggest media markets in the NBA) will have avery different “baseline media exposure” than the Memphis Grizzlies or New Orleans Pelicans in the two smallest TV markets in the league. Overall, the effect is not dramatic, but can certainly make a difference for some teams. For example: The Knicks will automatically get ~100 more comments on Facebook than the Portland Trailblazers just because of the market.

3. Calculating Engagement

What is a like worth? Or a retweet? Assigning values to user behavior is complicated. How do we know why an individual likes a piece of content? Well, we don’t. One might like a tweet because it is interesting. One might like a tweet to archive it. Or in hopes of being recognized by the creator of the content. Or because someone we care about cares about the content and we want to show that we care, too. In other words: We often can’t know if a user really cares about our content – or if (s)he is using our content as a relationship-building token or a virtual currency for social attention.

In any case, though, the general consensus is that fan engagement on social media matters. Some of my own research, for example, has shown that increased interactivity in form of comments on Facebook relates to traffic been referred to an organizations’ website. And even a “like” represents an individual’s engagement with the creator of the content. Even though a user might have liked it for some other reason, (s)he must have a) been exposed to it, and b) not too appalled by it to have it associated with their online identity.

Building on that argument, we can then start thinking about different degrees of engagement. A comment, for example, represents greater psychological (one has to think about what to comment) and physical (one has to actually type it out) effort than simply clicking the like button. As a consequence: One who comments must care more about the content when willing to exert this additional effort. Therefore, a comment should be “worth” more than a like when calculating fan engagement on social media. Finally, a share not only often represents an endorsement of the underlying content, but also expands the reach of the original post beyond the initial audience (connections of the one who shares might not follow the content creator) and should, therefore, be of even greater value to the content creator.

Based on this logic, I can assign weights to the individual proxies of fan engagement and calculate a single score across platforms. Is that score going to be a perfect representation of “how well” a team is doing on social media? No. Certainly not. The actual numbers are arbitrary and the resulting final score has no deeper practical meaning (you can’t buy anything for let’s say 90 Engagement), but they allow a normalized comparison across teams. People like — and often need — a simplified (key) performance indicator to evaluate their performance and allow a (crude) comparison with their competitors. This is what this score does. At least I hope it does. I call it:

Win-adjusted Normalized Engagement Score (or: WANE Score).

And this is what it looks like [UPDATE]:

In the first step, I adjust each individual engagement indicator by the major control variables identified above to adjust the scores for teams’ appearances on three major TV channels, their market size, and winning. For the TV and success adjustments, I take the league average as a standard and adjust every team towards that mean. For example, a team like the Warriors will lose likes, comments, etc. for each game they are over the league average for TV games and wins, and a team like the Nets will have points added.

However, not all teams can be assumed to benefit from these factors equally. A team putting relatively few resources into the creation of engaging social media content on a daily basis won’t get as big of a boost from an additional win than a team that is constantly developing new formats. To adjust for that (unknown) factor, I created an adjustment based on the baseline social media engagement ranking for each team and each channel:

Social Media Engagement Adjustment Formula NBA

Once I have adjusted all the individual indicators (Twitter = favorites, retweets; Facebook = likes, comments, shares ) based on this formula, I can use it to further calculate the overall engagement:

NBA social media engagement formula calulations

What this formula does is normalizing the adjusted average number of favorites and retweets (for Twitter) and likes, comments, and shares (for Facebook) by the number of followers each team has, then assigning weights to them following the logic explained above, and finally adding them up. In the final step, I combine the values for Twitter and Facebook and then normalize the score to engagement per 10,000 followers.


4. So what does all that mean?

Good question. Although we can’t take the WANE Score as an absolute value and measure of success, the calculations provide at least a starting point for comparing fan engagement on social media across teams. The results how dramatic differences in fan engagement within the NBA — and might give us an idea where to look for successful social media strategies.

Here are some high-level observations:

  • Posting frequency varies quite a bit across teams —- on both Twitter and Facebook. While the Orlando Magic only sent out 1530 eligible tweets since September 2016, several teams tweeted more than 3200 times (which was the maximum I could collect). On Facebook (where I could get all data independent of the number of posts), the average team published 563 posts (~ 3-4 posts per day). Still, there was quite some variance in the data. Memphis published the most content with 809 posts, the Lakers the least with 358 posts over the course of the season so far.
  • Some teams are very likeable – others not so much. The Warriors get about 26 likes per 10,000 Fans on Facebook and 3 favorites per 10,000 Followers on Twitter, which makes them the “most likeable” team in the league. By far. They lead the Cavaliers by about 9 points on the combined scale. Milwaukee comes in third, just ahead of Philadelphia, Houston, and San Antonio. On the other and of the scale: The Mavericks and Pistons on Facebook (with less than 3 likes per 10,000 fans), and the Pelicans and Magic on Twitter (with less than half a favorite per 10,000 followers).
  • “Most Viral” content. It’s the Warriors, again. Golden State generates about 2 retweets per 10,000 fans on Facebook, followed by Philadelphia (1.53) and Atlanta (1.35). On Twitter, Toronto stands out (3.66 retweets per 10,000 followers) – with the Cavs (2.80) and Sixers (2.68) to follow. Combined, the Warriors produce the “most viral” content, followed by the Sixers, Cavs, and Raptors. On the other and of the scale: The Nets, Heat, and Nuggets for Facebook — and the Magic (again), Heat, and Pistons (again) on Twitter. Combined, the Heat rank last. Just behind the Nuggets and Magic. All of the numbers above are “pure” (not adjusted for wins/TV time).
  • Content matters! Despite including a variety of variables in my calculations, a good portion of the variance has not been explained. My estimation right now is that at least between 20-30% of engagement depends on the actual content teams produce.
  • Average? On average, an NBA team generates 1.41 favorites and 1.31 retweets on Twitter. On Facebook, the league average is about 7.82 likes per 10,000 followers per post. Comments are much harder to get: on average, only one in about 200,000 followers will comment. Finally, per 10,000 followers, about .67 shares are generated.

The #Copa100 tweets…

Figure 1. Top 5 languages spoken by individuals tweeting about the Copa America
Figure 1. Top 5 languages spoken by individuals tweeting about the Copa America

…in Spanish. But Tweets are likely to come from U.S.

With the #CopaAmerica well underway and the #Euro2016 kicking off as I type,  I had no choice but to dedicate this week’s episode of Professional Shenanigans to major international soccer tournaments and their representation on social media (i.e. Twitter).

After focusing on tweets about MLS teams and predicting their frequency by looking at characteristics of the state they originated in, I wanted this project to explore one of the most interesting aspects of major international sporting events: language. From a media perspective, managing the preferences of a multi-lingual audience is a challenge.

Knowing that soccer is a truly global sport, and realizing that social media is a low-cost but effective in-house solution for reaching this global audience, many teams have started internationalizing their Twitter content. For example, 14 out of 16 Bundesliga teams had at least two Twitter accounts — with one providing information in a language other than German (mostly: English). Some clubs even added further channels. Most notably, the fantastic @FCBayernUS popularizing the club in the US, and Bayer Leverkusen’s @bayer04_es (with more than 75k fans following updates on Chicharito).

Teams often have a clear rationale for creating specific language destinations (English as the accepted “lowest common denominator”, plus a language spoken by the fans of a major star playing for the team) — but how does a major tournament navigate this space? Just look at the #Euro2016: Almost every participating country has its own official language (in fact: I counted 19 languages for 24 participating teams).

The situation at the #Copa100 is far less dramatic. 13 out of 16 participating nations speak Spanish, so the priority should be clear. Then you add some English (for the USMNT as the host), as well as Portuguese (Brazil) — and you’re done. And that is exactly what the organizers did by creating three separate Twitter accounts. Interestingly, though, the main account (@CA2016 with more than 208k followers) is in English. Both “language destinations” trail considerably in followers (ES = 23k, PT = 1.5k), even though they were all established in April 2015 and tweet similar amounts of content. This is somewhat surprising, especially when looking at my language analysis (see Figure 1).

So the question becomes: Are Spanish-speaking Twitter users for some reason choosing the English account, or are the English-speaking Copa followers a silent majority? To find out, I looked at a subset of the @CA2016 followers (n=25,000; see Data Collection) and their language settings. Interestingly, the results mirror the distribution within tweets. The most frequently used language was Spanish (45%), followed by English (38%), French (2%), and Portuguese (2%). Arabic (1%) remained in the top 5. Overall, then, even though language destinations for Spanish and Portuguese exist, fans chose the English account.

Why: Identity? Quality? Originality?

Figure 2. Country of origin for subset of tweets (n=280)
Figure 2. Country of origin for subset of tweets including geo-location (n=280)

There are several potential explanations for fans’ preference for following a major sporting event in English rather than the language that they self-identify as native (at least based on their Twitter account). First, fans following international sporting events on social media (and especially in soccer) might be more likely to be fluent in English – at least enough to feel comfortable consuming English language content. Many fans are probably following their national teams’ stars in foreign leagues (MLS, Premier League, Bundesliga), and therefore consume English-language content to stay up-to-date. At the same time, these highly engaged fans might also perceive the content on the “bigger”  @CA2016 account to be more original and of higher quality (as the tournament is played in the U.S.).

At the same time, many fans with Spanish as their main Twitter language might actually live in the U.S. To test that hypothesis, I analyzed a subset of #Copa100 tweets that included geo-location (unfortunately, only 2,5% of tweets did). Still, the greatest portion of these tweets (~27%) originated in the U.S. (see Figure 2), showing some support for the argument that the Spanish-speaking population in the U.S. might follow the Copa in English.

As always: There is much more to analyze and I will use the #Copa100 and #Euro2016 as my playground for the weeks to come. Ideas, suggestions, and feedback are welcome!

How: This is how I got the data

Data collection: I accessed Twitter’s Streaming API using the StreamR package in R*. Search was limited to tweets mentioning either one of the two most popular hashtags (#Copa100, used by the official Copa America account; #CopaAmerica, pushed by Twitter) or the term “Copa America”. I ran three rounds of initial data collection, limiting each search to tweets published within a 30-minute time frame. Data was collected during the early afternoon hours, trying to avoid live updates during games (that’ll be a different study). The three searches returned a combined 10388 tweets. 

Of note: Even though data collection took place in the early afternoon while no games were played, the results are likely to reflect the scheduling of the tournament to a certain extent. Fans of the teams that had played the night before (Brasil vs. Haiti & Ecuador vs. Peru) or later that day (Uruguay vs. Venezuela & Mexico vs. Jamaica) should be more likely to talk about the tournament on Twitter during data collection.

For the second part of the analysis, I used the twitteR package to access information about the 25k most recent followers of the @CA2016 account and evaluated the most frequently set language within the returned accounts. Most followers (92,8%) were not verified and therefore likely non-media / athlete accounts with an average of 838 followers.

* If you’re interested in learning more about the package and how to use it to scrape tweets, I strongly recommend the work(shops) of Pablo Barbera

Shenanigans? Shenanigans!

Shenanigans [SHəˈnanəɡənz], professional. The Oxford Dictionary describes shenanigans as “silly or high-spirited behavior; mischief”. Although the activities described in this section are certainly not meant to be “secret or dishonest”, they are in most cases not what I would usually publish as an academic. Instead, they often are the result of a high-spirited or slightly mischievous idea, a general curiosity, or questions about sport, (social) media, and society that have come up at one point or another. If you have ideas/suggestions or would like to see any shenanigans applied to your commercial or academic endeavor, please don’t hesitate to contact me.