Future work | (mini)Book Genome Project

In order to expand upon the results presented in this paper, our first step would be to further clean the data — particularly with regards to the subjects. The page lengths should also be averaged over multiple editions of a given book, and books from the original “Best Books Ever” list that did not make it into this iteration of the database should be included, with particular attention to books closest to the top of the list.

Several example bookshelf test sets should also be made and tested with the same methods we employed here.

Perhaps the most interesting avenue of exploration is to find a way to scale the neutral and negative books such that the rare positives are not overwhelmed by them. We discussed several methods for doing this, such as scaling according to shelf size, $k$, and/or expected value. However, we did not feel that it was clear what solution was most justifiable and most likely to be effective, so we elected to leave the algorithm as-is for the time being. Nonetheless, scaling these values effectively could significantly improve the algorithm, in terms of both efficacy and speed.

Finally, it would be worthwhile to implement some version of the user interface we originally envisioned, so that we can see what it is like to interact with a system like the mBG.