Navigation Links
The GeneZoo or the GeneJungle?

Over 40% of Stratagenes GeneConnection cDNA gene clusters are novel and more than 50% are estimated to be full-length clones

Jason Goncalves Jean-Michel Lelias Joe Sorge

The GeneConnection discovery cDNA clone collectionYY goes beyond normal cDNA collections. Stratagene has searched through many different human tissues and has identified thousands of cDNAs that are not found in the UniGene database (our GeneJungle).* We have systematically sequenced the ends of our clones and have used a rigorous cDNA clustering algorithm to determine how many of our clones are unique and how many can be found in UniGene (those found in UniGene are called our GeneZoo). About 40% of our clusters are in the GeneJungle and 60% are in the GeneZoo. The average insert size of our cDNAs is 1.7 Kb, which indicates that about half of the clones are predicted to have full length open reading frames. We have taken extreme care to guarantee clone identity and purity. Stratagenes clones can be found on our GeneConnection Virtual Lab using several different free search strategies, including searching by gene name, keyword, UniGene ID, or nucleotide or protein sequence ( A gel image of a restriction digest of each clone can be viewed on our website. Stratagenes GeneConnection discovery clone collection has been tracked and curated with precision and consistency in mind from its inception, and the GeneJungle collection provides an untapped resource for new discoveries.

The raw sequence of the human genome is now mostly complete, but it will be quite some time before all of the genes and their exon structures are known. Gene finding software programs are inaccurate at spotting exon-intron junctions. They can split groups of contiguous exons into two putative genes when they really belong to one gene. They can also group exons into one gene when they really belong to two or more genes. Thus the importance of high quality cDNA collections is rising once again. cDNAs are an excellent way of confirming gene structure, and Stratagene has a highly unique, well-characterized collection of 1.7 Kb average length human cDNA clones. Approximately 40% of Stratagenes cDNA clusters cannot be found in UniGene, and based on our initial in-house analysis of known sequences about 50% of Stratagenes clones contain full-length open reading frames which corresponds to our predicted estimate for the collection.

Inherent Problems with Clustering ESTs

ESTs (Expressed Sequence Tags) are DNA sequences derived from sequencing the 5 and/or 3 termini of cDNA inserts using vector-specific sequencing primers. Each EST sequence is typically 300 to 600 nucleotides in length. The dbEST database currently contains about 2.1 million human ESTs. UniGene is an attempt to reduce these sequences to a non-redundant set of gene-oriented sequence clusters. In theory there should be one cluster for each underlying gene. Build #116 of UniGene (July 3, 2000) has produced 81,967 unique clusters. However UniGene is plagued with many artifacts. The ESTs found in dbEST have come from many different sources, with widely varying degrees of sequence quality. The quality of the cDNA libraries from which these ESTs were derived is variable. Sometimes two cDNAs can be fused together in one vector. This creates UniGene artifacts whereby two unrelated cDNAs are placed into the same UniGene cluster. Genomic DNA contamination is common, as well as are products resulting from aberrant transcription and termination. Splice variants are of course common, and it is difficult for the UniGene clustering algorithms to differentiate splice variants from different gene products.

UniGene Artifact

Stratagene Solution

Fusion cDNAs from 2 different genes. Tends to cause separate genes to cluster together and underestimates gene number.

Did not ligate cDNAs, rather annealed them to vector to create library. Eliminates fusions.

Genomic DNA inserts contaminating cDNA library. Tends to create artificial clusters and overestimates the number of genes.

Did not ligate adapters onto ends of cDNA, thus genomic DNA false inserts are very rare.

Falsely separating 5 and 3 ends of cDNAs into different clusters. Tends to overestimate the number of genes.

Only cluster sequences having a polyA tail, thereby not counting 5 and 3 ends twice.

Falsely separating splice variants into separate clusters. Tends to over- estimate the number of genes.

Only use sequence contiguous to the polyA tail, minimizing the appearance of splice variants in the data.

Stratagene has made some very unique cDNA libraries. Much has been done to minimize artifacts such as fusion inserts and genomic DNA inserts (Table 1). Moreover the cDNAs have been highly normalized, yielding a very low level of clone redundancy. We have systematically sequenced the 3 ends of our clones and analyzed the resulting sequences. In the analysis, we remove sequences lacking poly-A tails. This effectively eliminates an artifact typically found in UniGene. If 5 and 3 sequences from the same gene are non-overlapping in UniGene, then UniGene will put these 5 and 3 sequences into two different clusters, overestimating the number of human genes. Because we only cluster 3 end sequences bearing poly-A tails we do not see this 5/3 splitting artifact (Table 1). Moreover, because the sequences that we have clustered to date are contiguous with the poly-A tails, we are less likely to be confused by splice variants, which occur less frequently in the 3 untranslated regions of transcripts (Table 1). Utilizing the 3 sequences as a unique gene-identification tag has been demonstrated as an effective gene-specific marker. Because the 3UTRs are not as conserved as the coding sequences, this makes it easier to distinguish between individual genes and paralogous gene family members that may have sequence homology in their coding sequences1.


Our clustering algorithm is rigorous. We first identify commonly repeated sequence elements in the data set. We then require that for any two sequences to cluster, they must match at 96% identity over 100 or more base pairs and the percentage of alignable sequence must be greater than 90%. Figure 2 shows how the percentage of alignable sequence is determined. Two sequences are compared and aligned to maximize the number of matching base pairs. At each end of the local alignment, the shorter of the two unaligned sequences is used to calculate the number of alignable bases. The number of alignable bases is simply the sum of the local alignment length plus the length of unaligned sequence flanking the local alignment. Of course, alignments of commonly repeated or low complexity sequences are discarded. The algorithm will not cluster sequences from different gene family members, since the untranslated regions tend not to align. In contrast, sequencing artifacts are ignored since they generally do not drop the percent identity below 96%. Our algorithm would place splice variants into different clusters; however since we only use the sequence contiguous to the poly-A tail to perform the clustering, splicing is not a significant factor. We could choose to ignore splice variants when clustering (Figure 2) by eliminating internally unpaired sequence from the computation of alignable length. However the algorithm we have chosen is more rigorous, and we rely instead on there being little splicing near the 3 ends.


We have also clustered our sequences together with 1.7 million human EST sequences that are included in the human UniGene Database (Build #116). Those clusters that contained a UniGene representative (Stratagene GeneZoo) were also compared with 41,472 sequence-verified IMAGE clones from Research Genetics, and 9,182 Unigem 2.0 clones from Incyte Genomics. Figure 1 shows that most clusters, except for those in Stratagenes GeneJungle, and except for 354 Incyte clones, fall within the 81,967 UniGene clusters. Interestingly, Incytes 9,182 Unigem clones collapse into 8,298 UniGene clusters when referenced against UniGene build #116, plus 354 non-UniGene clusters. Research Genetics 41,472 IMAGE clones collapse into 31,521 unique clusters when referenced against UniGene build #116. (Table 2)

Clustering was based on UniGene build #116 for sequences that match UniGene. For the 354 Incyte Unigem 2.0 clones that lie outside of UniGene itis assumed that each of the 354 represents an individual cluster.

Clone Set

# of Clones

# of Unique

% of Clusters Found In
UniGene #116

% of Clusters
Found in Research Genetics SV IMAGE set

% of Clusters
Found in
Unigem 2.0

% of Clusters Found in Stratagenes GeneConnection 1.0 Set

% of Clusters Found in UniGene, Research Genetics SV, or Unigem 2.0

UniGene Build #116








Research Genetics IMAGE








Unigem 2.0








Stratagenes GeneConnection 1.0 Discovery Set








Table 2 shows that Stratagenes GeneConnection 1.0 Discovery set has a substantial proportion of clusters not found in UniGene (about 41% of the Stratagene clusters are in the GeneJungle). This suggests that Stratagenes libraries contain rare sequences not commonly found in other cDNA libraries. With an average insert size of 1.7 Kb, the Discovery clone set contains over 50% full length human cDNAs. This suggests that out of 25,321 clones we currently have over 12,500 full-length sequence-tagged cDNAs, and over 5000 of these full-length cDNAs have never been reported publicly. Stratagene is expanding its collections and updating its website with additional sequences on a regular basis.

Searching for Stratagene Clones

To find clones within Stratagenes collection, the GeneConnection website ( allows searches by keyword gene name, accession number, Unigene number, or DNA or protein sequence. Stratagene has annotated all clones to optimize searches using key words, so that a clone similar to a characterized gene can also be found, for example ESTs, Highly similar to protein-tyrosine-phosphatase or Zinc finger protein homologous to Zfp-36 in mouse (ZFP36).


Searches can be carried out with nucleic acid or protein search queries. Search results will show the degree of match as both a % identity of the aligned bases and as the quality score of the match (Figure 4). Similar genes can be found this way. For example if you want to find homologs to a gene of interest, obtain the sequence of your gene of interest and paste it into the search window on the GeneConnection search page (see Figure 3). Several search engines are available. Nucleotide sequence target data can be searched with a nucleotide sequence query using simple BLASTN. Nucleotide sequence target data can be first translated into all 6 open reading frames and then searched with a query DNA sequence that is also translated into all 6 reading frames using the search engine TBLASTX. Nucleotide sequence target data can be translated into all 6 reading frames and searched with a query protein sequence using TBLASTN. If a match is found to a Stratagene clone, the clone information will be displayed. The Stratagene clones have all been restriction mapped and size estimates are available for all clones. Sample restriction gel data are available on the website for all clones. If the DNA sequence of the Stratagene clone is within the UniGene set (a GeneZoo clone) its DNA sequence will be revealed in the search report. If the DNA sequence of the Stratagene clone is outside of UniGene (a GeneJungle clone), its DNA sequence is provided upon standard purchase of the clone.


Since Stratagene has only entered 3 sequence information into its clustering database, we have designed an automatic indirect match strategy to help locate clones having homologous sequences in the UniGene database. Even if you enter a coding sequence or a 5 sequence, matches can be found through a bridging database. We have taken our 3 sequences and BLASTed them against all the sequences in the UniGene database. When identities above a certain threshold are found, the UniGene sequence is placed into an indirect database with a link to the homologous Stratagene clone. When you perform sequence searches at our GeneConnection website, the program automatically searches both our direct sequence data and the indirect database sequence data. Both types of matches are shown in separate sections of the search report. Indirect match reports show the alignment between your query sequence and the indirect (UniGene) sequence, with a link to the Stratagene clone name and number.

All of our GeneConnection discovery clones can be purchased as a bacterial stab culture. We use the XL10-Gold strain, which is T1-phage resistant thus minimizing the threat of T1-phage contamination. The clones in our collection are categorized as either GeneZoo clones or GeneJungle clones and are differentially priced accordingly. Refer to the website for price information and special discount prices are available for large volume orders.


While other clone collections may provide a defined subset of human cDNAs, Stratagenes GeneJungle goes beyond the familiar territory of UniGene. If you have a desire to discover new genes or gene families, the GeneJungle is an exciting place to explore. With an average insert size of 1.7 Kb, the probability of finding a previously undiscovered, full length human cDNA is high since we estimate that half of our clones contain full-length open reading frames. All clones have been restriction mapped, so you can obtain an estimate of the insert size before ordering a clone. The clones have been carefully sequenced and tracked, assuring that you receive the clone you ordered.

  1. Wilcox, AS, et al. (1991) Nucleic Acids Research 19(8): 1837-1843.

* Patents pending



Page: All 1 2 3 4 5 6 7 8 9 10

Related biology technology :

1. Mammalian Expression Vector for Efficient Cloning of PCR Fragments
2. Efficient Cloning and Electro-transformation of Large Eukaryotic DNA Fragments
3. Resolution of Linear DNA Fragments From 23 Kilobases to 6 Megabases Using Biphasic Linear Switch-Time Ramping
4. Inducing RNAi with siRNA Cocktails Generated by RNase III
5. Identification of Differentially Expressed Gene Products with the CastAway System*
6. Differential Gene Expression Analysis of Pure Cell Populations from Frozen UterineTissue Using Laser Capture Microdissection (LCM) and cDNA Microarrays
Post Your Comments:
TAG: The GeneZoo the GeneJungle

(Date:3/4/2015)...   ... Therapie, Strahlendosismanagement und IT-Lösungen für die gesamte Kette ... (NYSE: PHG, AEX: PHIA) gab heute seine ... bekannt, der vom 4. bis 8. März in ... Nr. 102 and 110 wird das Unternehmen ...
(Date:3/3/2015)... 2015 NASA astronaut Scott Kelly , who ... spend a year living and working on the International Space ... to 7 a.m. EDT Monday, March 9. Kelly ... completes the final weeks of his training. The interviews will ... Television highlighting his mission training and previous spaceflights. ...
(Date:3/3/2015)... Mar. 03, 2015 Research and Markets ... "Gene Therapy Market, 2015 - 2025" report ... Market, 2015-2025" report provides an extensive study on the ... has been carried out in this field for over ... (four available in Asian markets; one approved in the ...
(Date:3/3/2015)... 2015 Adaptive Biotechnologies announced today that ... Board of Director as Chair of the Audit Committee. ... Chad oversees all finance, treasury and accounting functions. Since ... finance, treasury and accounting functions, and led the finance ... 2013, he was named Puget Sound Business Journal,s CFO ...
Breaking Biology Technology:Philips stellt integrierte Bildgebungslösungen auf dem Europäischen Röntgenkongress 2015 vor 2Philips stellt integrierte Bildgebungslösungen auf dem Europäischen Röntgenkongress 2015 vor 3Philips stellt integrierte Bildgebungslösungen auf dem Europäischen Röntgenkongress 2015 vor 4Philips stellt integrierte Bildgebungslösungen auf dem Europäischen Röntgenkongress 2015 vor 5Philips stellt integrierte Bildgebungslösungen auf dem Europäischen Röntgenkongress 2015 vor 6NASA Astronaut Scott Kelly Available for Interviews before One-Year Space Station Mission 2Global Gene Therapy Market Report 2015-2025 - Extensive Study on the Marketed and Pipeline Gene Therapies 2Global Gene Therapy Market Report 2015-2025 - Extensive Study on the Marketed and Pipeline Gene Therapies 3Global Gene Therapy Market Report 2015-2025 - Extensive Study on the Marketed and Pipeline Gene Therapies 4Global Gene Therapy Market Report 2015-2025 - Extensive Study on the Marketed and Pipeline Gene Therapies 5Global Gene Therapy Market Report 2015-2025 - Extensive Study on the Marketed and Pipeline Gene Therapies 6Global Gene Therapy Market Report 2015-2025 - Extensive Study on the Marketed and Pipeline Gene Therapies 7Adaptive Biotechnologies adds Chad Cohen, Zillow Group CFO, to Board of Directors 2
... Acquisition Extends the Company,s Suite of Products and Increases ... LAUDERDALE, Fla. and PHILADELPHIA, June 23 OmniComm Systems, ... in integrated electronic data capture (EDC) solutions for clinical ... ("ERT"), a global provider of technology and services that ...
... Pursue All Pharmaceutical and Health-Oriented Nutraceutical Uses of Naturlose ... Incorporated (Nasdaq: SPEX ), an innovator in ... and regulatory consulting services to food, supplement, biotechnology and ... 1996 license agreement pursuant to which it granted Arla ...
... YM BioSciences Inc. (NYSE Amex: YMI , ... that identifies and advances a diverse portfolio of ... today reported that its licensee for nimotuzumab, Daiichi-Sankyo ... commenced enrollment of a Phase II trial evaluating ...
Cached Biology Technology:OmniComm Systems Announces Acquisition of EDC Business of eResearchTechnology 2OmniComm Systems Announces Acquisition of EDC Business of eResearchTechnology 3OmniComm Systems Announces Acquisition of EDC Business of eResearchTechnology 4Spherix Announces Termination of Arla License Agreement 2Spherix Announces Termination of Arla License Agreement 3Spherix Announces Termination of Arla License Agreement 4YM BIOSCIENCES REPORTS DAIICHI-SANKYO ENROLLS FIRST PATIENTS IN PHASE II, FIRST-LINE LUNG CANCER TRIAL WITH NIMOTUZUMAB 2YM BIOSCIENCES REPORTS DAIICHI-SANKYO ENROLLS FIRST PATIENTS IN PHASE II, FIRST-LINE LUNG CANCER TRIAL WITH NIMOTUZUMAB 3YM BIOSCIENCES REPORTS DAIICHI-SANKYO ENROLLS FIRST PATIENTS IN PHASE II, FIRST-LINE LUNG CANCER TRIAL WITH NIMOTUZUMAB 4YM BIOSCIENCES REPORTS DAIICHI-SANKYO ENROLLS FIRST PATIENTS IN PHASE II, FIRST-LINE LUNG CANCER TRIAL WITH NIMOTUZUMAB 5
(Date:2/5/2015)... --NXT-ID, Inc. (NASDAQ: NXTD and NXTDW) ... on the growing mobile commerce market, announces the launch ... part of its 2015 marketing and branding initiatives for ... consumer website for earlier this month. ... "Our new corporate website naturally showcases our premiere consumer ...
(Date:2/5/2015)... 2015  It is gratifying to see that the ... genomic science as a means to better understand human ... I was honored to participate in today,s White House ... Since the 1980s my teams have ... first sequenced genome of a free living organism, the ...
(Date:1/22/2015)... Jan. 13, 2015  Today, FreeWavz ( ), ... crowdfunding campaign on Fundable, . FreeWavz ... production capacity to meet customer demand. ... Invented by ...
Breaking Biology News(10 mins):NXT-ID, Inc. Launches New Corporate Website As Part Of Ongoing 2015 Branding Initiatives For Wocket Smart Wallet 2NXT-ID, Inc. Launches New Corporate Website As Part Of Ongoing 2015 Branding Initiatives For Wocket Smart Wallet 3J. Craig Venter, Ph.D., Co-Founder and CEO, Human Longevity, Inc. (HLI) Participates in White House Precision Medicine Event 2FreeWavz Launches on Fundable to Drive Speed-to-Market in 2015 2
... the University of Pennsylvania School of Medicine discovered that ... the internal molecular clock in mammals. What's more, this ... shed light on circadian rhythm disorders, including bipolar disorder. ... of metabolism, are reported in this week's issue of ...
... human bocavirus was found in nearly 5 percent of ... children. Researchers from the Centers for Disease Control ... Conference on Emerging Infectious Diseases , "Our preliminary data ... in Thailand, especially among young children," says Alicia Fry, ...
... the genome sequence of the bacterial pathogen Haemophilus ... of many other large, complex, medically, and commercially ... , However, the techniques used to derive these ... be unaware of potential errors lurking within the ...
Cached Biology News:Clock molecule's sensitivity to lithium sheds light on bipolar disorder 2Bocavirus infection may be associated with pneumonia in Thailand, especially in children 2Study suggests that publicly available genome data may contain small but significant errors 2
... of the routine trace metal analytical lab ... and simple to use instrumentation. The 7500ce ... challenging sample matrices found in the environmental ... toughest analytical challenges found in the clinical, ...
Polio component type I...
Choriongonadotropin (human, hCG)...
Complement factor B (human)...
Biology Products: