Our clustering algorithm is rigorous. We first identify commonly
repeated sequence elements in the data set. We then require that for any
two sequences to cluster, they must match at 96% identity over 100 or more
base pairs and the percentage of alignable sequence must be greater than
90%. Figure 2 shows how the percentage of alignable sequence is determined.
Two sequences are compared and aligned to maximize the number of matching
base pairs. At each end of the local alignment, the shorte