Professional Basketball Player Performance Trends

David Ng
davidng@fas.harvard.edu

Computer Science 171: Visualization
School of Engineering and Applied Sciences
Harvard University


This website presents a short overview and summary of the project. For additional detailed information, the project report is available in PDF format. The report is here: David_Ng_CS171_Final_Report.pdf

1. Project Overview

The professional basketball league, the National Basketball Association (NBA), posts every player's game statistics for the 2007-2008 season on their official website www.nba.com in real time. The audience is not able to visualize how a player's performance in each statistical category changes over the course of an NBA season. Furthermore, the audience can not compare the performance of two or more players. The motivating question becomes: How can a user visualize and compare an NBA player's performance in a particular statistical category to a number of other players? A user interactive visualization tool is created with the Processing programming language which mines all relevant data from the Yahoo! NBA [3] and the official NBA [2] websites and allows the user to quickly search for multiple players of interest and compare their performance in every statistical category over the course of an entire NBA season. The intended audience consists of users of all ages that are interested in basketball statistics. A filtering method that averages a player's performance per 48 minutes was also implemented to accentuate performance trends.

2. Visualization Approach

The official website of the NBA [2], www.nba.com, has statistics of every single player in the league for every single game played. The statistical categories that are recorded include a list of 19 performance items. They are minutes played, field goals attempted, field goals made, field goal percentage, three points attempted, three points made, three point percentage, free throws attempted, free throws made, free throw percentage, offensive rebounds, defensive rebounds, total rebounds, assists, turnovers, steals, blocks, personal fouls, and points. There are currently 433 players in 30 teams, 19 statistical categories, and 82 games in a season. The official NBA website computes the player's season averages in each category, but does not provide any tools or visualizations that show how a player's performance changes throughout the course of the season. Also, there is no way to compare the statistics of one player to a number of other players. As a user, I found this frustrating so I set out to solve the problem by answering the question I posed. A visualization was created with the Processing programming language to mine relevant data, quickly search for a number of players, and visually compare their statistical trends for the 2007-2008 NBA season.

The visualization can be summarized as follows. Additional details can be seen in the report for the visualization, available in PDF format here: David_Ng_CS171_Final_Report.pdf.

2.1 Data Acquisition

The user can run the Processing program mineStatsLogosPics (see the Downloads section for more details) to mine/scrape and acquire data for the visualization. The data acquired for the visualization can be broken up into four categories. The categories are the team profiles, player profiles, player statistics game logs, and team logos and player action/profile pictures. The team profiles provide city name and team abbreviation information. The player profiles provide player full name, team, position, height, weight, and years pro information. The team logos and player action/profile pictures are used to visually assist users in searching for players as well as analyzing data.

All the data was acquired by writing a Processing program that mines data from the official NBA [2] and Yahoo! NBA [3] websites. It should be noted that the data for all four categories should not all be mined in a short period of time. The official NBA and the Yahoo! NBA websites will block the IP address of the user sending requests for data at a high rate in a short period of time. Once the user's IP address is blocked, the user will be unable to access and mine data from the websites of interest for up to 24 hours or longer. The speed at which the data can be mined without exceeding the maximum request threshold is unknown so the user should proceed with caution! At the time the program was run, a total of 1751 files were mined which required 35.5 MB of disk space to store. These numbers may change with time when players leave and enter the NBA, pictures of players and logos of teams change, or the number of statistics or statistical categories change.

2.2 Player Search Engine

The user can run the Processing program visualizeData (see the Downloads section for more details) to search for players and visualize data. The player search engine screenshot can be seen in Figure 1. The items numbered 1 through 6 in the yellow circles highlight the key features of the search engine. Item 1 is a group of two tabs that allows the user to switch between the player search engine and the player statistics plots. When the user starts the program, a prompt will tell the user to search for players. The user can click on the search categories in item 3, which are Player Name, Team, Position, Height, Weight, and Years Pro. The selected search category is highlighted in green. The user can mouse-over the other red categories and click with the mouse to select one of those categories to be the search category. The example screen shot shows that the default search category, Player Name, is selected.

When the user enters the letter G, as shown in the example screenshot, all the players whose last name begins with the letter G are displayed in alphabetical order in item 4. If the user continues and types an A for a search string of GA, all the player whose last name begins with GA are displayed in alphabetical order in item 4. The current search string is displayed in item 2. Another example is that the user can select the search category Height and type the digit 7 on the keyboard and all the players that are greater than or equal to 7 feet tall are displayed in alphabetical order. The user can delete or append to the search string with the character and digit keys on the keyboard and the backspace key. After the results are displayed by the search engine, the user can mouse-over a result, which will then be highlighted in blue. A picture of the player in action, the player's team logo, as well as the season averages per game of that player are displayed in item 5. The user clicks on a player of interest during mouse-over to select the player for statistics plotting. The selected player's name then appears in the Selected Players list in item 6. Notice that the selected player names are color coded, which will play an important role as the legend of the plot.

2.3 Data Visualization

The player statistics display screenshot can be seen in Figure 2. The player statistics display can be accessed after there are selected players in the Selected Players list. The user simply clicks on the Plot Stats tab. The player statistics for the selected plot category are displayed in item 2 as a scatter plot of points connected by lines. The scatter plot was chosen because it allows the user to easily identify and compare trends between multiple players throughout the course of the season. The points connected by lines are color coded according to the legend colors in the Selected Players list in item 8. The user can mouse-over a data point in the data display area and the corresponding player's profile, team logo, and profile information (position, height, weight, and years pro) are displayed in the upper-right in item 4. This allows the user to easily identify the players when the display area becomes cluttered with many players. The example screenshot shows the user mousing-over a green data point corresponding to the player Paul Pierce, whose name appears in the corresponding green color in the Selected Players list. The player's picture, team logo, and profile are displayed in item 4.

The x-axis is always labeled Game Number because player trends over the course of an NBA season are being visualized. The y-axis, as seen in item 5, can change depending on what plot category is selected in item 9 and whether a data filter is applied in item 3. The user can mouse over the text FILTER DATA and click it to toggle the data filter on or off. When the data filter is off, the raw data points for each game are displayed. When the data filter is on, the player's statistics in each category are averaged per 48 minutes. The algorithm can be summarized as follows.

Filter Equation

Here, the term dataVal(catNum,gameNum) is the data value of category number catNum and game number gameNum. There are 12 category numbers corresponding to 12 categories as seen in item 9. The term minNum is the category number corresponding to the minutes played per game. The data average value for the filter was normalized to 48 minutes because there is a total of 48 minutes per game. This allows players who play relatively less minutes and players who play relatively more minutes to be compared side-by-side. However, it should be noted that this measure of performance does not take the total time played into account when comparing a number of players. The amount of time played throughout the course of an NBA season may have an effect on a player's performance.

The user can change the data categories to plot by mousing-over one of the 12 categories in item 9, which will then be highlighted in blue. The selected category to plot will be drawn in green. When a highlighted category is selected with a mouse click, it will turn green and the data from that category will be displayed in item 2. Item 7 shows that there are green (default color) circles next to each player's name. When the user mouses-over each green circle, it turns yellow indicating that the circle for the corresponding player is highlighted. If the user clicks the circle while it is highlighted, the data plot for that player is toggled on or off. When a player's data plot is turned off, the colored circle is displayed in red. For example, if there are too many players in the Selected Players list making the plot too cluttered, the user can turn some of the scatter plots on or off by simply clicking on the color-coded circles next to the player's name.

 The design is powerful because it gives the user a lot of control on what to see, what to filter, and what to hide.

3. Screenshots

3.1 Player Search Engine

Player Search Engine

Figure 1. The player search engine screenshot [2]. Items 1 through 6 highlight key features of the player search engine described in this section.



3.2 Data Visualization

Data Visualization

Figure 2. The player statistics plot screenshot [2]. Items 1 through 9 highlight key features of the player statistics plot described in this section.

4. Downloads

The description of the code and application files, as well as the directions for running the Processing program are available in TXT format here: README.txt

The links to the data files on the internet that the Processing data mining or scraping programs capture and save are here:

Reference [2]: http://www.nba.com
Reference [3]: http://sports.yahoo.com/nba

The source codes for the data mining/scraping and visualization programs are not posted because they include copyrighted material belonging to the National Basketball Association (NBA) which may not be distributed publicly according to their Terms of Use.  Please contact me directly if you would like to see a demo or try the application.

This page was last updated on May 15, 2008. If you have any questions or comments about this project, feel free to contact me at davidng@fas.harvard.edu.

References

[1] B. Fry, Visualizing Data, O'Reilly Media, Inc. January 11, 2008.

[2] National Basketball Association (NBA), Player images and statistics, www.nba.com, NBA Media Ventures, LLC. [Accessed May 7, 2008].

[3] Yahoo! Sports, Player images and statistics, http://sports.yahoo.com/nba, Yahoo! Inc. [Accessed May 7, 2008].