Importing Retrosheet Game Logs

So, I recently ran into a case where I needed to import the Retrosheet Game Logs, and I'd figure I'd share the steps I used to do that.

First, download the .zip archives for the Game Logs (I used 1900-2013) from here:

Retrosheet Game Logs

Then, extract all of them into the same folder.

After that, create the game_logs table (in an already created retrosheet scheme) using the following script:

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

CREATE TABLE IF NOT EXISTS `games` (
`date` date NOT NULL DEFAULT '0000-00-00',
`game_number` char(1) NOT NULL,
`day` char(3) NOT NULL,
`visitor` char(3) NOT NULL,
`visitor_lg` char(2) NOT NULL,
`visitor_game_number` smallint(3) NOT NULL DEFAULT 0,
`home` char(3) NOT NULL,
`home_lg` char(2) NOT NULL,
`home_game_number` smallint(3) NOT NULL DEFAULT 0,
`visitor_score` tinyint(3) NOT NULL DEFAULT 0,
`home_score` tinyint(3) NOT NULL DEFAULT 0,
`outs` smallint(5) NOT NULL DEFAULT 0,
`daynight` char(1) NOT NULL,
`completion` varchar(255) NOT NULL,
`forfeit` varchar(1) NOT NULL,
`protest` varchar(2) NOT NULL,
`park` varchar(5) NOT NULL,
`attendance` int(10) NOT NULL DEFAULT 0,
`game_minutes` smallint(5) NOT NULL DEFAULT 0,
`visitor_linescore` varchar(255) NOT NULL,
`home_linescore` varchar(255) NOT NULL,
`visitor_ab` tinyint(3) NOT NULL DEFAULT 0,
`visitor_h` tinyint(3) NOT NULL DEFAULT 0,
`visitor_2b` tinyint(3) NOT NULL DEFAULT 0,
`visitor_3b` tinyint(3) NOT NULL DEFAULT 0,
`visitor_hr` tinyint(3) NOT NULL DEFAULT 0,
`visitor_rbi` tinyint(3) NOT NULL DEFAULT 0,
`visitor_sh` tinyint(3) NOT NULL DEFAULT 0,
`visitor_sf` tinyint(3) NOT NULL DEFAULT 0,
`visitor_hbp` tinyint(3) NOT NULL DEFAULT 0,
`visitor_bb` tinyint(3) NOT NULL DEFAULT 0,
`visitor_ibb` tinyint(3) NOT NULL DEFAULT 0,
`visitor_so` tinyint(3) NOT NULL DEFAULT 0,
`visitor_sb` tinyint(3) NOT NULL DEFAULT 0,
`visitor_cs` tinyint(3) NOT NULL DEFAULT 0,
`visitor_gidp` tinyint(3) NOT NULL DEFAULT 0,
`visitor_ci` tinyint(3) NOT NULL DEFAULT 0,
`visitor_lob` tinyint(3) NOT NULL DEFAULT 0,
`visitor_pitchers_used` tinyint(3) NOT NULL DEFAULT 0,
`visitor_individual_er` tinyint(3) NOT NULL DEFAULT 0,
`visitor_team_er` tinyint(3) NOT NULL DEFAULT 0,
`visitor_wp` tinyint(3) NOT NULL DEFAULT 0,
`visitor_balks` tinyint(3) NOT NULL DEFAULT 0,
`visitor_putouts` tinyint(3) NOT NULL DEFAULT 0,
`visitor_assists` tinyint(3) NOT NULL DEFAULT 0,
`visitor_errors` tinyint(3) NOT NULL DEFAULT 0,
`visitor_passed_balls` tinyint(3) NOT NULL DEFAULT 0,
`visitor_double_plays` tinyint(3) NOT NULL DEFAULT 0,
`visitor_triple_plays` tinyint(3) NOT NULL DEFAULT 0,
`home_ab` tinyint(3) NOT NULL DEFAULT 0,
`home_h` tinyint(3) NOT NULL DEFAULT 0,
`home_2b` tinyint(3) NOT NULL DEFAULT 0,
`home_3b` tinyint(3) NOT NULL DEFAULT 0,
`home_hr` tinyint(3) NOT NULL DEFAULT 0,
`home_rbi` tinyint(3) NOT NULL DEFAULT 0,
`home_sh` tinyint(3) NOT NULL DEFAULT 0,
`home_sf` tinyint(3) NOT NULL DEFAULT 0,
`home_hbp` tinyint(3) NOT NULL DEFAULT 0,
`home_bb` tinyint(3) NOT NULL DEFAULT 0,
`home_ibb` tinyint(3) NOT NULL DEFAULT 0,
`home_so` tinyint(3) NOT NULL DEFAULT 0,
`home_sb` tinyint(3) NOT NULL DEFAULT 0,
`home_cs` tinyint(3) NOT NULL DEFAULT 0,
`home_gidp` tinyint(3) NOT NULL DEFAULT 0,
`home_ci` tinyint(3) NOT NULL DEFAULT 0,
`home_lob` tinyint(3) NOT NULL DEFAULT 0,
`home_pitchers_used` tinyint(3) NOT NULL DEFAULT 0,
`home_individual_er` tinyint(3) NOT NULL DEFAULT 0,
`home_team_er` tinyint(3) NOT NULL DEFAULT 0,
`home_wp` tinyint(3) NOT NULL DEFAULT 0,
`home_balks` tinyint(3) NOT NULL DEFAULT 0,
`home_putouts` tinyint(3) NOT NULL DEFAULT 0,
`home_assists` tinyint(3) NOT NULL DEFAULT 0,
`home_errors` tinyint(3) NOT NULL DEFAULT 0,
`home_passed_balls` tinyint(3) NOT NULL DEFAULT 0,
`home_double_plays` tinyint(3) NOT NULL DEFAULT 0,
`home_triple_plays` tinyint(3) NOT NULL DEFAULT 0,
`hp_ump_id` varchar(8) NOT NULL,
`hp_ump_name` varchar(255) NOT NULL,
`1b_ump_id` varchar(8) NOT NULL,
`1b_ump_name` varchar(255) NOT NULL,
`2b_ump_id` varchar(8) NOT NULL,
`2b_ump_name` varchar(255) NOT NULL,
`3b_ump_id` varchar(8) NOT NULL,
`3b_ump_name` varchar(255) NOT NULL,
`lf_ump_id` varchar(8) NOT NULL,
`lf_ump_name` varchar(255) NOT NULL,
`rf_ump_id` varchar(8) NOT NULL,
`rf_ump_name` varchar(255) NOT NULL,
`visitor_manager_id` varchar(8) NOT NULL,
`visitor_manager_name` varchar(255) NOT NULL,
`home_manager_id` varchar(8) NOT NULL,
`home_manager_name` varchar(255) NOT NULL,
`winning_pitcher_id` varchar(8) NOT NULL,
`winning_pitcher_name` varchar(255) NOT NULL,
`losing_pitcher_id` varchar(8) NOT NULL,
`losing_pitcher_name` varchar(255) NOT NULL,
`saving_pitcher_id` varchar(8) NOT NULL,
`saving_pitcher_name` varchar(255) NOT NULL,
`gwrbi_batter_id` varchar(8) NOT NULL,
`gwrbi_batter_name` varchar(255) NOT NULL,
`visitor_starting_pitcher_id` varchar(8) NOT NULL,
`visitor_starting_pitcher_name` varchar(255) NOT NULL,
`home_starting_pitcher_id` varchar(8) NOT NULL,
`home_starting_pitcher_name` varchar(255) NOT NULL,
`visitor_batter_1_id` varchar(8) NOT NULL,
`visitor_batter_1_name` varchar(255) NOT NULL,
`visitor_batter_1_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_2_id` varchar(8) NOT NULL,
`visitor_batter_2_name` varchar(255) NOT NULL,
`visitor_batter_2_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_3_id` varchar(8) NOT NULL,
`visitor_batter_3_name` varchar(255) NOT NULL,
`visitor_batter_3_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_4_id` varchar(8) NOT NULL,
`visitor_batter_4_name` varchar(255) NOT NULL,
`visitor_batter_4_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_5_id` varchar(8) NOT NULL,
`visitor_batter_5_name` varchar(255) NOT NULL,
`visitor_batter_5_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_6_id` varchar(8) NOT NULL,
`visitor_batter_6_name` varchar(255) NOT NULL,
`visitor_batter_6_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_7_id` varchar(8) NOT NULL,
`visitor_batter_7_name` varchar(255) NOT NULL,
`visitor_batter_7_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_8_id` varchar(8) NOT NULL,
`visitor_batter_8_name` varchar(255) NOT NULL,
`visitor_batter_8_pos` tinyint(3) NOT NULL DEFAULT 0,
`visitor_batter_9_id` varchar(8) NOT NULL,
`visitor_batter_9_name` varchar(255) NOT NULL,
`visitor_batter_9_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_1_id` varchar(8) NOT NULL,
`home_batter_1_name` varchar(255) NOT NULL,
`home_batter_1_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_2_id` varchar(8) NOT NULL,
`home_batter_2_name` varchar(255) NOT NULL,
`home_batter_2_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_3_id` varchar(8) NOT NULL,
`home_batter_3_name` varchar(255) NOT NULL,
`home_batter_3_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_4_id` varchar(8) NOT NULL,
`home_batter_4_name` varchar(255) NOT NULL,
`home_batter_4_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_5_id` varchar(8) NOT NULL,
`home_batter_5_name` varchar(255) NOT NULL,
`home_batter_5_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_6_id` varchar(8) NOT NULL,
`home_batter_6_name` varchar(255) NOT NULL,
`home_batter_6_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_7_id` varchar(8) NOT NULL,
`home_batter_7_name` varchar(255) NOT NULL,
`home_batter_7_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_8_id` varchar(8) NOT NULL,
`home_batter_8_name` varchar(255) NOT NULL,
`home_batter_8_pos` tinyint(3) NOT NULL DEFAULT 0,
`home_batter_9_id` varchar(8) NOT NULL,
`home_batter_9_name` varchar(255) NOT NULL,
`home_batter_9_pos` tinyint(3) NOT NULL DEFAULT 0,
`additional_info` varchar(255) NOT NULL,
`acquisition` char(1) NOT NULL,
PRIMARY KEY (date, game_number, visitor, home)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Once the table is created, you can import all of the text files using a batch file like this:

FOR /R ".\Retrosheet Game Logs" %%G IN (*.TXT) DO (
    MOVE "%%G" ".\Retrosheet Game Logs\game_logs.txt"
    mysqlimport --port=3306 --host=localhost -u root -pPASSWORD --local --verbose --force --fields-terminated-by=, --fields-enclosed-by=\" --lines-terminated-by=\n retrosheet ".\Retrosheet Game Logs\game_logs.txt"
    MOVE ".\game_logs.txt" "%%G"
)

When that is completed, you will have the game log data in a game_logs table inside of your schema.

doyler on Githubdoyler on Twitter
doyler
Ray Doyle is an avid pentester/security enthusiast/beer connoisseur who has worked in IT for almost 16 years now. From building machines and the software on them, to breaking into them and tearing it all down; he's done it all. To show for it, he has obtained an OSCE, OSCP, eCPPT, GXPN, eWPT, eWPTX, SLAE, eMAPT, Security+, ICAgile CP, ITIL v3 Foundation, and even a sabermetrics certification!

He currently serves as a Principal Penetration Testing Consultant for Secureworks. His previous position was a Senior Penetration Tester for a major financial institution.

When he's not figuring out what cert to get next or side project to work on, he enjoys playing video games, traveling, and watching sports.

4 Comments

Filed under Stealing First

4 Responses to Importing Retrosheet Game Logs

  1. John Signore

    I was able to create the table but I’m running into issues at the very end importing the data.

    I copied and pasted that into a batch file called gamelogs.bat. I ran the batch file but I still have nothing in my database.

    • Hmm, are your game log text files in “Retrosheet Game Logs/game_logs.txt” relative to the batch file?

      If so, are your MySQL username and password correct in the query in the batch file?

  2. John Signore

    I figured it out using a different method but ran into something else that has me confused.

    I downloaded all the game log files into a folder. I then used the command type *.TXT >> AllGames.Txt to get the data into one large file.

    Then I used to script at the top to create the table. Lastly, I imported the data file.

    So now I have data but it appears as if I don’t have complete data. I should have 2,430 games per season. If I run a query on each season, I’m getting 1,215. It appears I’m only getting half of each season. If I look at my text file, the data is completely there for the missing games so something is happening when I try to import it.

    • Are you running into any errors when you attempt to do the import?

      Additionally, is it possibly an issue with the DBMS/your settings? Some databases only allow you to select a limited number of rows by default without modifying some settings.

      Is it always the same half of games that don’t show up in your tables, or is it seemingly random?

      I haven’t run into that issue yet myself, but I’d be glad to try to help you work through it.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.