To assess the feasibility of determining all of the thylacine's DNA sequence (that is, not only the mitochondrial sequence, but also the much larger nuclear genome), we estimated how much of our data was thylacine nuclear DNA. What made this tricky was that we did not know the full genome sequence of any immediate relative. In contrast, when sequencing the woolly mammoth we could compare our data with the available sequences from the African savanna elephant. But for the thylacine, the closest sequenced genome is from a South American marsupial called the short-tailed opossum (Latin name: Monodelphis domestica). The two lineages have been separated at least since the land bridge between South America and Australia was broken, which happened perhaps 60 million years ago. (Mammoths and elephants have been separated only about 6 million years.)
The 60 million years of separation, together with the small sizes of many of the sequence fragments that we generated, made it difficult to be certain that a particular DNA fragment was actually from the thylacine, as opposed to being, say, human contamination. To improve the odds, we used the fact that for nuclear-genome sequences that encode a protein, the similarity between the thylacine and Monodelphis sequences will almost always be much higher than for an arbitrary region of the genome. In summary, our strategy was to see how much of the thylacine sequence aligned to Monodelphis protein-coding intervals, and then extrapolate to the full genome. (See the paper and supplementary material for details.)
This resulted in an estimate that roughly 30% of our sequence data was from the thylacine nuclear genome. Given the plummeting costs of genome sequencing, this indicates that it should be possible to determine the thylacine's nuclear genome sequence even without other improvements to our approach.