I've always liked working with sports data, and recently I stumbled across this neat little python module written by BurntSushi that eases some of the aches and pains of working with NFL data. You can find the source at https://github.com/BurntSushi/nflgame

In [5]:
import nflgame
import pandas as pd
import matplotlib.pyplot as plt
import datetime

%matplotlib inline

First we have to tell the API which season to pull relevant data for, and we do this by passing the year we want to nflgame.games, making an object in the process. nflgame also comes bundled with some helper functions for making our analysis, and we can use the nflgame.combine_game_stats() function to roll each game up to the season level per player.

In [6]:
games = nflgame.games(2015)
players = nflgame.combine_game_stats(games)

After we have our stats rolled up we can simply access each attribute by pointing to it on the object, below I've used a list comprehension to create a pandas Series containing the distance that each player who recorded a rushing play rushed in 2015.

In [7]:
runs = pd.Series([p.rushing_yds for p in players.rushing()])

A series is cool and gives us an easy way to manipulate those rushing yards, but what if we want to plot something more interesting? Using a loop we can easily iterate through each player and access mutiple attributes simultaneously. Below is a scatter plot depicting player age in days and rushing yards for this past seaon.

In [8]:
fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111)
ax.set_title('age vs. rushing yds', fontsize=14, color='black')

ax.set_ylabel('age (days)')


for p in players.rushing():
    plt.scatter(p.rushing_yds, (datetime.datetime.today() - datetime.datetime.strptime(p.player.birthdate,'%m/%d/%Y')).days)
In [9]:
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111)
ax.set_title('rushing yards 2015 season', fontsize=14, color='black')


plt.hist(runs, color='purple')
(array([ 208.,   42.,   16.,   13.,   12.,   10.,    5.,    5.,    0.,    2.]),
 array([  -13. ,   136.8,   286.6,   436.4,   586.2,   736. ,   885.8,
         1035.6,  1185.4,  1335.2,  1485. ]),
 <a list of 10 Patch objects>)

What if we wanted to plot a similar relationship, but with rushing touchdowns instead of rushing yards as our metric of interest?

In [10]:
fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111)
ax.set_title('age vs. rushing touchdowns', fontsize=14, color='black')


ax.set_ylabel('age (days)')

for p in players.rushing():
    plt.scatter(p.rushing_tds, (datetime.datetime.today() - datetime.datetime.strptime(p.player.birthdate,'%m/%d/%Y')).days, c='red')

I also noticed some other interesting information available with the API. On the player objects are other variables of interest including height, weight, jersey number, college, and much much more. I wanted to look at all NFL RBs and FBs, and see how one of my favorite players, Mike 'Meat Train' Tolbert would stack up against other rushers. Using matplotlibs plt.annotate() function, it is easy to draw an arrow to a particular point of interest, and this helps a great deal when trying to illustrate Tolbert's spot in the following graph.

In [11]:
fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111)
ax.set_title('height and weight of 2015 NFL RBs and FBs', fontsize=14, color='black')




for p in players.rushing():
    if p.player.position in ['RB', 'FB']:
        plt.scatter(p.player.weight, p.player.height, c='green')

        if p.player.name == 'Mike Tolbert':
                p.player.name + ', ' + p.player.team, color='black',
                xy = (p.player.weight, p.player.height), xytext = (p.player.weight, p.player.height-1.5), alpha = 0.9,
                arrowprops=dict(arrowstyle="-|>", color='black',

Above are just a few examples of how easy it is to access game level NFL data using nflgame, and only scratches the surface of what is possible. Next post I plan to make use of matplotlib's animate feature, and create some moving NBA shot charts.

In [ ]:


comments powered by Disqus