Tuesday, March 11, 2014

Researching the Millers and Saints

Any researcher will tell you to be wary of online sources. I recently discovered this myself when I went to a website created with the intention of listing every game played by the Minneapolis Millers. On the surface it's quite an accomplishment, apparently exhaustive in many ways, with the potential of being a most useful tool for the baseball researcher. An index to its contenst can be viewed at http://stewthornley.net/millersgames/

I applaud the effort behind such a comprehensive attempt at documenting American Association history. The author of the site, Stew Thornley, is known for his historical expertise on Minneapolis matters, in particular his baseball knowledge. Typically I would have no reason to doubt his work. But in this case he has dropped the ball in a big way.

I first became familiar with his site about a year and a half ago when I needed a source to use for the purpose of cross-checking the individual game data I was developing for the American Association rivalry between the Minneapolis Millers and St. Paul Saints. I had most of the games, some 1,300 of them, already entered into my database and was hoping to substantiate a few of my questionable items. In the process of investigating his database, I found a handful of errors in my own spreadsheet, and I was glad I had Thornley's comprehensive database to assist in this matter.

But along the way I noticed he had missed a few things. In fact, more than a few things. The first example I noticed was a conflict between what I had for a game location in 1905 and what he had listed. I went back into the newspaper scans I have on my computer and realized my location was correct. Another conflict came up soon after, so I checked on it, and again I was correct. At this point I realized I would have to go through this process for the entire database from 1902-1960. After a week of dedicated effort, I documented 42 errors from Thornley's website.

Researchers depend on the authority of online authors for the validity of their own work, so I am posting my listing of each of the 42 errors. These are significant miscues, ranging from incorrectly reported game locations to missing games. In one case, several games from one season appear in the listing for another season. There are a few minor errors here as well, but the extent of the inaccuracies found on this site makes Thornley's work unreliable. If there are 42 problems with games between the Millers and the Saints alone, how many more are there for the remaining teams of the American Association from 1902-1960? The answer to this question may never be known, because most writers will simply take such a listing as Thornley's at face value. That would be a mistake.

As most dedicated researchers do, I work hard to ensure the accuracy of my reporting before I publish it. To find this many inaccuracies in a database of this nature, even though it may not purport to be accurate, we hope is an aberration. But the lesson stands: don't trust what is listed on every website. Baseball-reference.com certainly has its share of problems in its minor league statistical reporting, and if you look hard enough, you'll find them in other "reputable" sites as well. It's part of the brave new online world we live in. Informed researchers will learn from the mistakes of others and will refrain from publishing their own material until they are ready to publish it.


1 comment:

Stew said...

Thanks. You are right that people should not put stuff out until it’s ready, and I should not have put the spreadsheets on the web without going over them a lot more than I did. I’m going to do a more systematic review than I did before. I hope to get this done soon; if not, I’ll consider taking down the pages in the meantime. I’m checking newspapers and also with Joel Rippel, who has compiled Millers-Saints results, on those games, and I will check my notes against the spreadsheets for all games and all years. I’m confused by the last six entries on the list (37-42) because the information is correct on the spreadsheets as well as in my notes. Am I missing something there? Stew