baseball_cards.txt Outline for the article "Baseball Card Collecting" See http://datagenetics.com/blog/april32016/index.html Abstract MISSING Introduction Many fans of baseball collect baseball cards, hoping to amass a complete set. Cards are sold in small wrapped randomly filled packets. Mathematics and computation give some insight into this problem. The Baseball Card Collector's Problem Definition of the sample problem (50 distinct cards, buy cards one at a time) Consider task of collecting complete set Give example of turning over one playing card at a time, to "collect" one each of heart, club, diamond, spade. Give related examples: Happy Meal (6 different toys), cereal box (5 different) Initial Analysis No matter how long we try, there is a chance of never getting a complete set. The first card is always a "keeper", the second card is almost surely a keeper (49/50 chances it is distinct), similarly each new card has a decreasing chance of being new. At some point, it is more likely than not that the next card is a repeat. Thus, getting the last missing cards will be very difficult. Mathematical Analysis Illustration: show a sample sequence of cards drawn; Demonstrate that, if you already have K-1 unique cards, the probability that the next card you draw is unique is PK=(N-(k-1))/N. The expected wait til the next unique card is CK=1/PK=N/(N-(k-1)). The expected total wait E(N) is the sum C1+C2+...+CN. E(N) = N * sum ( 1+1/2+1/3+...1/N) For our 50 card example, this gives E(50) = 224.95 and answers our main question. Harmonic Numbers The sum for E(N) is N * H(N), where H(N) is called the harmonic sum. Although it is natural to guess that H(N) tends to some limit, it blows up (very very slowly) as N goes to infinity. For example, it is possible to build a "bridge" out of playing cards where the topmost card hangs out halfway, the card beneath hangs out 1/3 of the way, and so on. By using more and more cards, the bridge can (theoretically) reach any desired distance. As N increases, H(N) can be approximated by the natural logarithm plus a correction term: H(N) -> ln(n) + gamma Use this to approximate E(N). Distribution of Drawings The expected value E(N) only gives us the average waiting time. If we actually do an experiment, we expect a different waiting time. If we do many experiments, we can see the distribution of waiting times. Show a graph of waiting times for the 50-card example. Note distinction between mean, mode, and median in the results. Extensions of the Problem Suppose we have multiple collectors who can swap cards to improve their sets; Suppose some cards are more rare than others. This is true for baseball cards. Conclusion MISSING References: MISSING