Biodiversity data exist in large quantities, which is a boon to biodiversity science. In spite of large numbers of data records being available, however, the proportion of those records that is readily usable for science applications is quite small. The difference between the full number of data records existing versus the records that are ready for use is the result of what we call "leakage" of data, in the form of steps that have not been taken or errors that remove the utility of the data. In this contribution, we explore several large-scale biodiversity data sets in terms of why they contain data records that are or are not ready for use.