US20040081996A1 - Methods and products related to genotyping and DNA analysis - Google Patents

Methods and products related to genotyping and DNA analysis Download PDF

Info

Publication number
US20040081996A1
US20040081996A1 US10/676,154 US67615403A US2004081996A1 US 20040081996 A1 US20040081996 A1 US 20040081996A1 US 67615403 A US67615403 A US 67615403A US 2004081996 A1 US2004081996 A1 US 2004081996A1
Authority
US
United States
Prior art keywords
snp
rcg
pcr
genomic
nucleotide sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/676,154
Inventor
John Landers
Barbara Jordan
David Housman
Alain Charest
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Massachusetts Institute of Technology
Original Assignee
Massachusetts Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=31890742&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US20040081996(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Massachusetts Institute of Technology filed Critical Massachusetts Institute of Technology
Priority to US10/676,154 priority Critical patent/US20040081996A1/en
Publication of US20040081996A1 publication Critical patent/US20040081996A1/en
Priority to US12/186,673 priority patent/US20090098551A1/en
Assigned to MASSACHUSETTS INSTITUTE OF TECHNOLOGY reassignment MASSACHUSETTS INSTITUTE OF TECHNOLOGY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LANDERS, JOHN, HOUSMAN, DAVID E., KLANDERMAN, BARBARA JORDAN, CHAREST, ALAIN
Assigned to NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF HEALTH AND HUMAN SERVICES (DHHS), U.S. GOVERNMENT reassignment NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF HEALTH AND HUMAN SERVICES (DHHS), U.S. GOVERNMENT CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Priority to US14/164,770 priority patent/US20140243229A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/686Polymerase chain reaction [PCR]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • the present invention relates to methods and products associated with genotyping.
  • the invention relates to methods of detecting single nucleotide polymorphisms and reduced complexity genomes for use in genotyping methods as well as to various methods of genotyping, fingerprinting, and genomic analysis.
  • the invention also relates to products and kits, such as panels of single nucleotide polymorphism allele specific oligonucleotides, reduced complexity genomes, and databases for use in the methods of the invention.
  • Genomic DNA varies significantly from individual to individual, except in identical siblings. Many human diseases arise from genomic variations. The genetic diversity amongst humans and other life forms explains the heritable variations observed in disease susceptibility. Diseases arising from such genetic variations include Huntington's disease, cystic fibrosis, Duchenne muscular dystrophy, and certain forms of breast cancer. Each of these diseases is associated with a single gene mutation. Diseases such as multiple sclerosis, diabetes, Parkinson's, Alzheimer's disease, and hypertension are much more complex. These diseases may be due to polygenic (multiple gene influences) or multifactorial (multiple gene and environmental influences) causes. Many of the variations in the genome do not result in a disease trait. However, as described above, a single mutation can result in a disease trait.
  • Single base pair differences referred to as single nucleotide polymorphisms (SNPS) are the most frequent type of variation in the human genome (occurring at approximately 1 in 10 2 bases).
  • a SNP is a genomic position at which at least two or more alternative nucleotide alleles occur at a relatively high frequency (greater than 1%) in a population. SNPs are well-suited for studying sequence variation because they are relatively stable (i.e., exhibit low mutation rates) and because single nucleotide variations can be responsible for inherited traits.
  • Microsatellite markers are simple sequence length polymorphisms (SSLPs) consisting of di-, tri-, and tetra-nucleotide repeats.
  • variable regions which are useful for fingerprinting genomic DNA are tandem repeats of a short sequence referred to as a mini satellite. Polymorphism is due to allelic differences in the number of repeats, which can arise as a result of mitotic or meiotic unequal exchanges or by DNA slippage during replication.
  • Weber markers which are abundant interspersed repetitive DNA sequences, generally of the form (dC-dA) n (dG-dT) n . Weber markers exhibit length polymorphisms and are therefore useful for identifying individuals in paternity and forensic testing, as well as for mapping genes involved in genetic diseases.
  • dC-dA interspersed repetitive DNA sequences
  • dG-dT dG-dT
  • Weber markers exhibit length polymorphisms and are therefore useful for identifying individuals in paternity and forensic testing, as well as for mapping genes involved in genetic diseases.
  • 400 Weber or microsatellite markers are used to scan each genome using PCR. Using these methods, if 5,000 individual genomes are scanned, 2 million PCR reactions are performed (5,000 genomes ⁇ 400 markers).
  • the number of PCR reactions may be reduced by multiplexing, in which, for instance, four different sets of primer are reacted simultaneously in a single PCR, thus reducing the total number of PCRs for the example provided to 500,000.
  • the 500,000 PCR mixtures are separated by polyacrylamide gel electrophoresis (PAGE). If the samples are run on a 96-lane gel, 5,200 gels must be run to analyze all 500,000 PCR reaction mixtures.
  • PCR products can be identified by their position on the gels, and the differences in length of the products can be determined by analyzing the gels.
  • One problem with this type of analysis is that “stuttering” tends to occur, causing a smeared result and making the data difficult to interpret and score.
  • the HuSNP ChipTM is a disposable array of DNA molecules on a chip (400,000 per half inch square slide).
  • the single stranded DNA molecules bound to the slide are present in an ordered array of molecules having known sequences, some of which are complementary to one allele of a SNP-containing portion of a genome. If the same 5,000 individual genome study described above is performed using the Affymetrix HuSNP ChipTM analysis system, approximately 5,000 gene chips having 1,000 or more SNPs per chip would be required.
  • the genomic DNA samples Prior to the chip scan, the genomic DNA samples would be amplified by PCR in a similar manner to conventional microsatellite genotyping.
  • the gene chip method is also expensive and time-intensive.
  • the present invention relates to methods and products for identifying points of genetic diversity in genomes of a broad spectrum of species.
  • the invention relates to a high throughput method of genotyping of SNPs in a genome (e.g. a human genome) using reduced complexity genomes (RCGs) and, in some exemplary embodiments, using SNP allele specific oligonucleotides (SNP-ASO) and specific hybridization reactions performed, for example, on a surface.
  • the method of genotyping in some aspects of the invention, is accomplished by scanning a RCG for the presence or absence of a SNP allele.
  • tens of thousands of genomes from one species may be simultaneously assayed for the presence or absence of each allele of a SNP.
  • the methods can be automated, and the results can be recorded using a microarray scanner or other detection/recordation devices.
  • the invention encompasses several improvements over prior art methods. For instance, a genome-wide scan of thousands of individuals can be carried out at a fraction of the cost and time required by many prior art genotyping methods.
  • the invention in one aspect, is a method for detecting the presence of a SNP allele in a genomic sample.
  • the method includes preparing a RCG from a genomic sample and analyzing the RCG for the presence of the SNP allele.
  • the analysis is performed using a hybridization reaction involving a SNP allele specific oligonucleotide (SNP-ASO) which is complementary to a given allele of the SNP and the RCG. If the allele of the SNP is present in the genomic sample, then the SNP-ASO hybridizes with the RCG.
  • SNP-ASO SNP allele specific oligonucleotide
  • the method is a method for determining a genotype of a genome, whereby the genotype is identified by the presence or absence of alleles of the SNP in the RCG.
  • the method is a method for characterizing a tumor, wherein the RCG is isolated from a genome obtained from a tumor of a subject and wherein the tumor is characterized by the presence or absence of an allele of the SNP in the RCG.
  • the method is a method for determining allelic frequency for a SNP, and further comprises determining the number of arbitrarily selected genomes from a population which include each allele of the SNP in order to determine the allelic frequency of the SNP in the population.
  • the hybridization reaction is performed on a surface and the RCG or the SNP-ASO is immobilized on the surface.
  • the SNP-ASO is hybridized with a plurality of RCGs in individual reactions.
  • the method includes performing a hybridization reaction involving a RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization with a plurality of RCGs from the plurality of genomes, and determining the genotype based on whether the SNP-ASO hybridizes with at least some of the RCGs.
  • the RCG may be a PCR-derived RCG or a native RCG.
  • the RCG is prepared by performing degenerate oligonucleotide priming-PCR (DOP-PCR) using a degenerate oligonucleotide primer having a tag-(N) x -TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotides and wherein x is an integer from 0 to 9, and wherein N is any nucleotide.
  • the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues.
  • x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).
  • the method of genotyping is performed to determine genotypes more than one locus.
  • the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N a -TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the methods can be performed on a support.
  • the support is a solid support such as a glass slide, a membrane such as a nitrocellulose membrane, etc.
  • the RCG is prepared by interspersed repeat sequence-PCR (IRS-PCR), arbitrarily primed-PCR (AP-PCR), adapter-PCR, or multiple primed DOP-PCR.
  • IRS-PCR interspersed repeat sequence-PCR
  • AP-PCR arbitrarily primed-PCR
  • adapter-PCR adapter-PCR
  • DOP-PCR multiple primed DOP-PCR.
  • the methods are useful for determining a genotype associated with or linked to a specific phenotype, and the distinct isolated genomes or RCGs are associated with a common phenotype.
  • the SNP-ASO used according to the methods of the invention are polynucleotides including one allele of two possible nucleotides at the polymorphic site.
  • the SNP-ASO is composed of from about 10 to 50 nucleotides. In a preferred embodiment, the SNP-ASO is composed of from about 10 to 25 nucleotides.
  • the SNP-ASO is labeled.
  • the methods can, optionally, also include addition of an excess of non-labeled SNP-ASO in which the polymorphic nucleotide residue corresponds to a different allele of the SNP and which is added during the hybridization step. Additionally, a parallel reaction may be performed wherein the labeling of the two SNP-ASOs is reversed.
  • the label on the SNP-ASO in one embodiment is a radioactive isotope.
  • the labeled hybridized products on the surface may be exposed to an X-ray film to produce a signal on the film which corresponds to the radioactively labeled hybridization products.
  • the SNP-ASO is labeled with a fluorescent molecule.
  • the labeled hybridized products on the surface may be exposed to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products.
  • the RCG is labeled.
  • the label on the RCG in one embodiment is a radioactive isotope.
  • the labeled hybridized products on the surface may be exposed to an X-ray film to produce a signal on the film which corresponds to the radioactively labeled hybridization products.
  • the RCG is labeled with a fluorescent molecule.
  • the labeled hybridized products on the surface may be exposed to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products.
  • a plurality of different SNP-ASOs are attached to the surface.
  • the plurality includes at least 500 different SNP-ASOs.
  • the plurality includes at least 1000.
  • a plurality of SNP-ASOs are labeled with fluorescent molecules, each SNP-ASO being labeled with a spectrally distinct fluorescent molecule.
  • the number of spectrally distinct fluorescent molecules is two, three, four, five, six, seven, or eight.
  • the plurality of RCGs are labeled with fluorescent molecules, each RCG being labeled with a spectrally distinct fluorescent molecule. All of the RCGs having a spectrally distinct fluorescent molecule can be hybridized with a single support. In various embodiments the number of spectrally distinct fluorescent molecules is two, three, four, five, six, seven, or eight.
  • the invention encompasses methods for characterizing a tumor by assessing the loss of heterozygosity, determining allelic frequency for a SNP, generating a genomic pattern for an individual genome, and generating a genomic classification code for a genome.
  • the method for characterizing a tumor includes isolating genomic DNA from tumor samples obtained from a plurality of subjects, preparing a plurality of RCGs from the genomic DNA, performing a hybridization reaction involving a SNP-ASO and the plurality of RCGs (e.g. immobilized on a surface), and identifying the presence of a SNP allele in the genomic DNA based on whether the SNP-ASO hybridizes with at least some of the RCGs in order to characterize the tumor.
  • One or more of the RCGs or one or more of the SNP-ASOs can be immobilized on a surface.
  • the invention is a method generating a genomic pattern for an individual genome.
  • the method includes preparing a plurality of RCGs, analyzing the RCGs for the presence of one or more SNP alleles, and identifying a genomic pattern of SNPs for each RCG by determining the presence or absence therein of SNP alleles.
  • the analysis involves performing a hybridization reaction involving a panel of SNP-ASOs (e.g. ones which are each complementary to one allele of a SNP), and the plurality of RCGs.
  • the genomic pattern can be identified by determining the presence or absence of a SNP allele for each RCG by detecting whether the SNP-ASOs hybridize with the RCGs.
  • a plurality of SNP-ASOs are hybridized with the support, and each SNP-ASO of the panel is hybridized with a different support than the other SNP-ASO.
  • the genomic pattern is a genomic classification code which is generated from the pattern of SNP alleles for each RCG. In other embodiments, the genomic classification code is also generated from the allelic frequency of the SNPs. In yet other embodiments, the genomic pattern is a visual pattern. The genomic pattern may be in physical or electronic form.
  • the invention includes is a method for generating a genomic pattern for an individual genome.
  • the method includes identifying a genomic pattern of SNP alleles for each RCG by determining the presence or absence therein of selected SNP alleles.
  • a method for generating a genomic classification code for a genome includes preparing a RCG, analyzing the RCG for the presence of one or more SNP alleles (e.g. ones of known allelic frequency), identifying a genomic pattern of SNP alleles for the RCG by determining the presence or absence therein of SNP alleles, and generating a genomic classification code for the RCG based on the presence or absence (and, optionally, the allelic frequency) of the SNP alleles.
  • the analysis involves performing a hybridization reaction involving the RCG and a panel of SNP-ASOs (e.g. corresponding to SNP alleles of known allelic frequency), each of which is complementary to one allele of a SNP.
  • the genomic pattern is identified based on whether each SNP-ASO hybridizes with the RCG.
  • the method for determining allelic frequency for a SNP includes preparing a plurality of RCGs from distinct isolated genomes, performing a hybridization reaction involving one RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization with each of the plurality of RCGs, and determining the number of RCGs which include each allele of the SNP in order to determine the allelic frequency of the SNP.
  • the RCGs are immobilized on the surface.
  • the method for generating a genomic pattern for an individual genome includes preparing a plurality of RCGs, performing a hybridization reaction involving a RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization step with each of the plurality of RCGs, and identifying a genomic pattern of SNPs for each RCG by determining the presence therein of SNPs based on whether each SNP-ASO hybridizes with each RCG.
  • the method for generating a genomic classification code for a genome includes preparing a RCG, performing a hybridization reaction involving the RCG and a panel of SNP-ASOs (e.g. immobilized on a surface), identifying a genomic pattern of SNPs for the RCG by determining the presence therein of SNPs based on whether each SNP-ASO hybridizes with the RCG, and generating a genomic classification code for the RCG based on the identities of the SNPs which hybridize with the RCG, the identities of the SNPs which do not hybridize with the RCG, and, optionally, also based on the allelic frequency of the SNPs.
  • SNP-ASOs e.g. immobilized on a surface
  • each SNP-ASO of the panel is immobilized on a separate surface. In another embodiment, more than one SNP-ASO of the panel is being immobilized on the same surface, each SNP-ASO being immobilized on a distinct area of the surface.
  • the genomic classification code is encoded as one or more computer-readable signals on a computer-readable medium
  • compositions are provided.
  • the composition is a plurality of RCGs immobilized on a surface, wherein the RCGs are prepared by a method including the step of performing DOP-PCR using a DOP primer having a tag-(N) n — TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues.
  • x is an integer from 3 to 9 (e.g. 6, 7, 8 or 9).
  • the composition is a panel of SNP-ASOs immobilized on a surface, wherein the SNPs are identified by a method including preparing a set of primers from a RCG, performing PCR using the set of primers on a plurality of isolated genomes to yield DNA products, isolating and, optionally, sequencing the DNA products, and identifying a SNP based on the sequences of the PCR products.
  • the plurality of isolated genomes includes at least four isolated genomes.
  • a kit includes a container housing a set of PCR primers for reducing the complexity of a genome, and a container housing a set of SNP-ASOs.
  • the SNPs which correspond to the SNP-ASOs of the kit are preferably present within a RCG made using the PCR primers of the kit with a frequency of at least 50%.
  • the set of PCR primers are primers for DOP-PCR.
  • the degenerate oligonucleotide primer has a tag-(N) x -TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 nucleotide residues wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues.
  • x is an integer from 3 to 9 (e.g., 6, 7, 8 or 9).
  • the RCG is prepared by IRS-PCR, AP-PCR, or adapter-PCR.
  • the SNP-ASOs of the invention are polynucleotides including one of the alternative nucleotides at a polymorphic nucleotide residue of a SNP.
  • the SNP-ASO is composed of from about 10 to 50 nucleotide residues.
  • the SNP-ASO is composed of from about 10 to 25 nucleotide residues.
  • the SNP-ASOs are labeled with a fluorescent molecule.
  • a composition includes a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment including a tag (N) x -TARGET nucleotide, wherein the TARGET nucleotide sequence is identical in all of the DNA fragments of each RCG, wherein the TARGET nucleotidesequence includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues.
  • x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).
  • the invention is a method for identifying a SNP.
  • the method includes preparing a set of primers from a RCG, wherein the RCG is composed of a first set of PCR products, PCR-amplifying a plurality of isolated genomes using the set of primers to yield a second set of PCR products, isolating, and optionally, sequencing the PCR products, and identifying a SNP based on the sequences of one or both sets of PCR products.
  • the plurality of isolated genomes is a pool of genomes.
  • the isolated genomes are RCGs.
  • RCGs can be prepared in a variety of ways, but it is preferred, in some aspects, that the RCG is prepared by DOP-PCR.
  • the method of preparing the set of primers is performed by at least: preparing a RCG, separating the first set of PCR products into individual PCR products, determining the nucleotide sequence of each end of at least one of the PCR products, and generating primers for use in the subsequent PCR step based on the sequence of the ends of the PCR product(s).
  • the set of PCR products may be separated by any means known in the art for separating polynucleotides.
  • the set of PCR products is separated by gel electrophoresis.
  • one or more libraries are prepared from segments of the gel containing several PCR products and clones are isolated from the library, each clone including a PCR product from the library.
  • the set of PCR products is separated by high pressure liquid chromatography or column chromatography.
  • the RCG used to generate primers or PCR products for identifying SNPs can be prepared by PCR methods.
  • the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N) x -TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues.
  • x is an integer from 3-9 (e.g. 6, 7, 8, or 9).
  • the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag(N) x -TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the RCG is prepared by IRS-PCR, AP-PCR, or adapter-PCR.
  • the set of primers is composed of a plurality of polynucleotides, each polynucleotide including a tag (N) x -TARGET nucleotide sequence, wherein TARGET is the same sequence in each polynucleotide in the set of primers.
  • the sequence of (N) x is different in each primer within a set of primers.
  • the set of primers includes at least 4 3 , 4 4 , 4 5 , 4 6 , 4 7 , 4 8 , or 49 different primers in the set.
  • the invention is a method for generating a RCG using DOP-PCR.
  • the method includes the step of performing degenerate DOP-PCR using a degenerate oligonucleotide primer having an (N) n — TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues.
  • x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).
  • the tag includes 6 nucleotide residues.
  • the RCG is used in a genotyping procedure.
  • the RCG is analyzed to detect a polymorphism. The analysis step may be performed using mass spectroscopy.
  • the invention is a method for assessing whether a subject is at risk for developing a disease.
  • the method includes the steps of using the methods of the invention identify a plurality of SNPs that occur in at least, for example 10% of genomes obtained from individuals afflicted with the disease and determining whether one or more of those SNPs occurs in the subject.
  • the affected individuals are compared with the unaffected individuals. Important information can be generated from the observation that there is a difference between affected and unaffected individuals alone.
  • the invention is a method for identifying a set of one or more SNPs associated with a disease or disease risk.
  • the method includes the steps of preparing individual RCGs obtained from subjects afflicted with a disease, using the same set of primers to prepare each RCG, and comparing the SNP allele frequency identified in those RCGs with the same genetic SNP allele frequency in normal (i.e., non-afflicted) subjects to identify SNP associated with the disease.
  • the invention is a method for identifying a set of SNPs randomly distributed throughout the genome. The set of SNPs is used as a panel of genetic markers to perform a genome-wide scan for linkage analysis.
  • a computer-readable medium having computer-readable signals stored thereon is provided.
  • the signals define a data structure that one or more data components.
  • Each data component includes a first data element defining a genomic classification code that identifies a corresponding genome.
  • Each genomic classification code classifies the corresponding genome based one or more single nucleotide polymorphisms of the corresponding genome.
  • the genomic classification code is a unique identifier of the corresponding genome.
  • the genomic classification code is based on a pattern of the single nucleotide polymorphisms of the corresponding genome, where the pattern indicates the presence or absence of each single nucleotide polymorphism.
  • each data component also includes one or more data elements, each data element defining an attributes of the corresponding genome.
  • FIG. 1 is a schematic flow chart depicting a method according to the invention for identifying SNPs.
  • FIG. 2 shows data depicting the process of identifying a SNP: (a) depicts a gel in which inter-Alu PCR genomic DNA products prepared from the 8C primer (which has the nucleotide sequence SEQ ID NO:3) were separated; (b) depicts a gel in which inserts from the library clones were separated; and (c) depicts a filter having two positive or matched clones.
  • FIG. 3 depicts the results of a genotyping and mapping experiment: (a) depicts hybridization results obtained using G allele ASO; (b) depicts hybridization results obtained using A allele ASO; (c) is a pedigree of CEPH family #884 with genotypes indicted from (a) and (b); and (d) is a map of chromosome 31q21-23.
  • FIG. 4 is a schematic flow chart depicting a method according to the invention for detecting SNPs.
  • FIG. 5 is a block diagram of a computer system for storing and manipulating genomic information.
  • FIG. 6A is an example of a record for storing information about a genome and/or genes or SNPs within the genome.
  • FIG. 6B is an example of a record for storing genomic information.
  • FIG. 6C is an example of a record for storing information about genes or SNPs within a genome.
  • FIG. 7 is a flow chart of a method for determining whether genomic information of a sample genome such as SNPs match that of another genome.
  • FIG. 8 depicts results obtained from a hybridization reaction involving RCGs prepared by DOP-PCR and SNP-ASOs immobilized on a surface in a microarray format.
  • SEQ. ID. NO. 1 is CAGNNNCTG SEQ. ID. NO. 2 is TTTTTTTTCAG SEQ. ID. NO. 3 is CTT GCA GTG AGC CGA GATC SEQ. ID. NO. 4 is CTCGAGNNNNNNAAGCGATG SEQ ID NO. 5-697 are nucleotide sequences containing SNPs.
  • the invention relates in some aspects to genotyping methods involving detection of one or more single nucleotide polymorphisms (SNPs) in a reduced complexity genome (RCG) prepared from the genome of a subject.
  • SNPs single nucleotide polymorphisms
  • RCG reduced complexity genome
  • the invention includes methods of identifying SNPs associated with a disease or with pre-disposition to a disease.
  • the invention further includes methods of screening RCGs prepared from one or more subjects in a population. Such screening can be used, for example, to determine whether the subject is afflicted with, or is likely to become afflicted with, a disorder, to determine allelic frequencies in the population, or to determine degrees of interrelation among subjects in the population. Additional aspects and details of the compositions, kits, and methods of the invention are described in the following sections.
  • the invention involves several discoveries which have led to new advances in the field of genotyping.
  • the invention is based on the development of high throughput methods for analyzing genomic diversity.
  • the methods combine use of SNPs, methods for reducing the complexity of genomes, and high throughput screening methods.
  • many prior art methods for genotyping are based on use of hypervariable markers such as Weber markers, which predominantly detect differences in numbers of repeats.
  • Use of a high throughput SNP analysis method is advantageous in view of the Weber marker system for several reasons. For instance, the results of a Weber analysis system are displayed in the form of a gel, which is difficult to read and must be scored by a professional.
  • the high throughput SNP analysis method of the invention provides a binary result which indicates the presence or absence of the SNP in the sample genome. Additionally, the method of the invention requires significantly less work and is considerably less expensive to perform. As described in the background of the invention, the Weber system requires the performance of 500,000 PCR reactions and use of 5,200 gels to analyze 5,000 genomes. The same study performed using the methods of the invention could be performed without using gels. Additionally, SNPs are not species-specific and therefore the methods of the invention can be performed on diverse species and are not limited to humans. It is more tedious to perform inter-species analysis using Weber markers than using the methods of the invention.
  • Affymetrix utilizes a HuSNP ChipTM system having an ordered array of SNPs immobilized on a surface for analyzing nucleic acids. This system is, however, prohibitively expensive for performing large studies such as the 5,000 genome study described above.
  • the invention is useful for identifying polymorphisms within a genome.
  • Another use for the invention involves identification of polymorphisms associated with a plurality of distinct genomes.
  • the distinct genomes may be isolated from populations which are related by some phenotypic characteristic, familial origin, physical proximity, race, class, etc. In other cases, the genomes are selected at random from populations such that they have no relation to one another other than being selected from the same population.
  • the method is performed to determine the genotype (e.g. SNP content) of subjects having a specific phenotypic characteristic, such as a genetic disease or other trait.
  • the methods of the invention may also be used to characterize the genetic makeup of a tumor by testing for loss of heterozygosity or to determine the allelic frequency of a particular SNP. Additionally, the methods may be used to generate a genomic classification code for a genome by identifying the presence or absence of each of a panel of SNPs in the genome and to determine the allelic frequency of the SNPs. Each of these uses is discussed in more detail herein.
  • the genotyping methods of the invention are based on use of RCGs that can be reproducibly produced. These RCGs are used to identify SNPs, and can be screened individually for the presence or absence of the SNP alleles.
  • the invention in some aspects, is based on the finding that the complexity of the genome can be reduced using various PCR and other genome complexity reduction methods and that RCG's made using such methods can be scanned for the presence of SNPs.
  • One problem with using SNP-ASOs to screen a whole genome i.e. a genome, the complexity of which has not been reduced
  • S/N signal to noise
  • the target sequence e.g. about 17 nucleotide residues
  • the complexity of the genome can be reduced in a reproducible manner and that the resulting RCG is useful for identifying the presence of SNPs in the whole genome and for genotyping methods. Reduction in complexity allows genotyping of multiple SNPs following performance of a single PCR reaction, reducing the number of experimental manipulations that must be performed.
  • the RCG is a reliable representation of a specific subfraction of the whole genome, and can be analyzed as though it were a genome of considerably lower complexity.
  • RCGs are prepared from isolated genomes.
  • An “isolated genome” as used herein is genomic DNA that is isolated from a subject and may include the entire genomic DNA.
  • an isolated genome may be a RCG, or it may be an entire genomic DNA sample.
  • Genomic DNA is a population of DNA that comprises the entire genetic component of a species excluding, where applicable, mitochondrial and chloroplast DNA.
  • the methods of the invention can be used to analyze mitochondrial, chloroplast, etc., DNA as well.
  • the genomic DNA can vary in complexity. For instance, species which are relatively low on the evolutionary scale, such as bacteria, can have genomic DNA which is significantly less complex than species higher on the evolutionary scale. Bacteria such as E.
  • coli have approximately 2.4 ⁇ 10 9 grams per mole of haploid genome, and bacterial genomes having a size of less than about 5 million base pairs (5 megabases) are known.
  • Genomes of intermediate complexity such as those of plants, for instance, rice, have a genome size of approximately 700-1,000 megabases.
  • Genomes of highest complexity such as maize or humans, have a genome size of approximately 10-10.
  • Humans have approximately 7.4 ⁇ 10 12 grams per mole of haploid genome.
  • a “subject” as used herein refers to any type of DNA-containing organism, and includes, for example, bacteria, viruses, fungi, animals, including vertebrates and invertebrates, and plants.
  • a “RCG” as used herein is a reproducible fraction of an isolated genome which is composed of a plurality of DNA fragments.
  • the RCG can be composed of random or nonrandom segments or arbitrary or non-arbitrary segments.
  • the term “reproducible fraction” refers to a portion of the genome which encompasses less than the entire native genome. If a reproducible fraction is produced twice or more using the same experimental conditions the fractions produced in each repetition include at least 50% of the same sequences. In some embodiments the fractions include at least 70%, 80%, 90%, 95%, 97%, or 99% of the same sequences, depending on how the fractions are produced.
  • a RCG is produced by PCR another RCG can be generated under identical experimental conditions having at a minimum greater than 90% of the sequences in the first RCG.
  • Other methods for preparing a RCG such as size selection are still considered to be reproducible but often produce less than 99% of the same sequences.
  • a “plurality” of elements, as used throughout the application refers to 2 or more of the element.
  • a “DNA fragment” is a polynucleotide sequence obtained from a genome at any point along the genome and encompassing any sequence of nucleotides.
  • the DNA fragments of the invention can be generated according to any one of two types mechanisms, and thus there are two types of RCGs, PCR-generated RCGs and native RCGs.
  • PCR-generated RCGs are randomly primed. That is, each of the polynucleotide fragments in the PCR-generated RCG all have common sequences at or near the 5′ and 3′ end of the fragment (When a tag is used in the primer, all of the 5′ and 3′ ends are identical. When a tag is not used the 5′ and 3‘ends have a series of N’s followed by the TARGET sequence (reading in a 5′ to 3′ direction). The TARGET sequence is identical in each primer, with the exception of multiple-primed DOP-PCR) but the remaining nucleotides within the fragments do not have any sequence relation to one another.
  • each polynucleotide fragment in a RCG includes a common 5′ and 3′ sequence which is determined by the constant region of the primer used to generate the RCG. For instance, if the RCG is generated using DOP-PCR (described in more detail below) each polynucleotide fragment would have near the 5′ or 3′ end nucleotides that are determined by the “TARGET nucleotide sequence”.
  • the TARGET nucleotide sequence is a sequence which is selected arbitrarily but which is constant within a set or subset (e.g. multiple primed DOP-PCR) of primers.
  • each polynucleotide fragment can have the same nucleotide sequence near the 5′ and 3′ end arising from the same TARGET nucleotide sequence.
  • more than one primer can be used to generate the RCG.
  • each member of the RCG would have a 5′ and 3′ end in common with at least one other member of the RCG and, more preferably, each member of the RCG would have a 5′ and 3′ end in common with at least 5% of the other members of the RCG.
  • a RCG is prepared using DOP-PCR with 2 different primers having different TARGET nucleotide sequences, a population containing of four sets of PCR products having common ends could be generated.
  • One set of PCR products could be generated having the TARGET nucleotide sequence of the first primer at or near both the 5′ and 3′ ends and another set could be generated having the TARGET nucleotide sequence of the second primer at or near both the 5′ and 3′ ends.
  • Another set of PCR products could be generated having the TARGET nucleotide sequence of the second primer at or near the 5′ end and the TARGET nucleotide sequence of the first primer at or near the 3′ end.
  • a fourth set of PCR products could be generated having the TARGET nucleotide sequence of the second primer at or near the 3′ end and the TARGET nucleotide sequence of the first primer at or near the 5′ end.
  • the PCR generated genomes are composed of synthetic DNA fragments.
  • the DNA fragments of the native RCGs have arbitrary sequences. That is, each of the polynucleotide fragments in the native RCG do not have necessarily any sequence relation to another fragment of the same RCG. These sequences are selected based on other properties, such as size or, secondary characteristics. These sequences are referred to as native RCGs because they are prepared from native nucleic acid preparations rather than being synthesized. Thus they are native-non-synthetic DNA fragments. The fragments of the native RCG may share some sequence relation to one another (e.g. if produced by restriction enzymes). In some embodiments they do not share any sequence relation to one another.
  • the RCG includes a plurality of DNA fragments ranging in size from approximately 200 to 2,000 nucleotide residues.
  • a RCG includes from 95 to 0.05% of the intact native genome.
  • the fraction of the isolated genome which is present in the RCG of the invention represents at most 90% of the isolated genome, and in preferred embodiments, contains less than 50%, 40%, 30%, 20%, 10%, 5%, or 1% of the genome.
  • a RCG preferably includes between 0.05 and 1% of the intact native genome.
  • the RCG encompasses 10% or less of an intact native genome of a complex organism.
  • Genomic DNA can be isolated from a tissue sample, a whole organism, or a sample of cells. Additionally, the isolated genomes of the invention are preferably substantially free of proteins that interfere with PCR or hybridization processes, and are also substantially free of proteins that damage DNA, such as nucleases. Preferably, the isolated genomes are also free of non-protein inhibitors of polymerase function (e.g. heavy metals) and non-protein inhibitors of hybridization when the PCR-generated RCGs are formed. Proteins may be removed from the isolated genomes by many methods known in the art.
  • polymerase function e.g. heavy metals
  • proteins may be removed using a protease, such as proteinase K or pronase, by using a strong detergent such as sodium dodecyl sulfate (SDS) or sodium lauryl sarcosinate (SLS) to lyse the cells from which the isolated genomes are obtained, or both. Lysed cells may be extracted with phenol and chloroform to produce an aqueous phase containing nucleic acid, including the isolated genomes, which can be precipitated with ethanol.
  • a strong detergent such as sodium dodecyl sulfate (SDS) or sodium lauryl sarcosinate (SLS)
  • PCR-generated RCG Several methods can be used to generate PCR-generated RCG including IRS-PCR, AP-PCR, DOP-PCR, multiple primed PCR, and adaptor-PCR. Hybridization conditions for particular PCR methods are selected in the context of the primer type and primer length to produce to yield a set of DNA fragments which is a percentage of the genome, as defined above. PCR methods have been described in many references, see e.g., U.S. Pat. Nos. 5,104,792; 5,106,727; 5,043,272; 5,487,985; 5,597,694; 5,731,171; 5,599,674; and 5,789,168.
  • PCR methods described herein are performed according to PCR methods well-known in the art.
  • U.S. Pat. No. 5,333,675 issued to Mullis et al. describes an apparatus and method for performing automated PCR.
  • performance of a PCR method results in amplification of a selected region of DNA by providing two DNA primers, each of which is complementary to a portion of one strand within the selected region of DNA.
  • the primer is hybridized to a template strand of nucleic acid in the presence of deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and a chain extender enzyme, such as DNA polymerase.
  • dATP deoxyribonucleotide triphosphates
  • dCTP deoxyribonucleotide triphosphates
  • dGTP dGTP
  • dTTP chain extender enzyme
  • the primers are hybridized with the separated strands, forming DNA molecules that are single stranded except for the region hybridized with the primer, where they are double stranded.
  • the double stranded regions are extended by the action of the chain extender enzyme (e.g. DNA polymerase) to form an extended double stranded molecule between the original two primers.
  • the double stranded DNA molecules are separated to produce single strands which can then be re-hybridized with the primers. The process is repeated for a number of cycles to generate a series of DNA strands having the same nucleotide sequence between and including the primers.
  • Chain extender enzymes are well known in the art and include, for example, E. coli DNA polymerase I, klenow fragment of E. coli DNA polymerase I, T4 DNA polymerase, T7 DNA polymerase, recombinant modified T7 DNA polymerase, reverse transcriptase, and other enzymes. Heat stable enzymes are particularly preferred as they are useful in automated thermal cycle equipment.
  • Heat stable polymerases include, for example, DNA polymerases isolated from bacillus stearothermophilus (Bio-Rad), thermus thermophilous (finzyme, ATCC number 27634), thermus species (ATCC number 31674), thermus aquaticus strain TV11518 (ATCC number 25105), sulfolobus acidocaldarius , described by Bukhrashuili et al., Biochem. Biophys.
  • thermus filiformus ATCC number 43280
  • Taq DNA polymerase commercially available from Perkin-Elmer-Cetus (Norwalk, Conn.), Promega (Madison, Wis.) and Stratagene (La Jolla, Calif.)
  • AmpliTaqTM DNA polymerase a recombinant thermus equitus Taq DNA polymerase, available from Perkin-Elmer-Cetus and described in U.S. Pat. No. 4,889,818.
  • the PCR-based RCG generation methods performed according to the invention are automated and performed using thermal cyclers.
  • thermal cyclers are well-known in the art. For instance, M.J. Research (Watertown, Mass.) provides a thermal cycler having a peltier heat pump to provide precise uniform temperature control in the thermal cyclers; DeltaCycler thermal cyclers from Ericomp (San Diego, Calif.) also are peltier-based and include automatic ramping control, time/temperature extension programming and a choice of tube or microplate configurations.
  • the RoboCyclerTM by Stratagene incorporates robotics to produce rapid temperature transitions during cycling and well-to-well uniformity between samples; and a particularly preferred cycler, is the Perkin-Elmer Applied Biosystems (Foster City, Calif.) ABI PrismTM 877 Integrated Thermal cycler, which is operated through a programmable interface that automates liquid handling and thermocycling processes for fluorescent DNA sequencing and PCR reactions.
  • the Perkin-Elmer Applied Biosystems machine is designed specifically for high-throughput genotyping projects and fully automates genotyping steps, including PCR product pooling.
  • DOP-PCR Degenerate oligonucleotide primed-PCR
  • a DOP-PCR primer as used herein can have the following structure:
  • the “TARGET” nucleotide sequence includes at least 5 arbitrarily selected nucleotide residues that are the same for each primer of the set.
  • x is an integer from 0 to 9, and N is any nucleotide residue.
  • the value of x is preferably the same for each primer of a DOP-PCR primer sety.
  • the TARGET nucleotide sequence includes at least 6 or 7 and preferably at least 8, 9, or 10 arbitrarily-selected nucleotides.
  • the tag is optional.
  • the “tag”, as used herein, is a sequence which is useful for processing the RCG but not necessary.
  • the tag unlike the other sequences in the primer, does not necessarily hybridize with genomic DNA during the initial round of genomic PCR amplification. In later amplification rounds, the tag hybridizes with PCR, amplified DNA. Thus, the tag does not contribute to the sequence initially recognized by the primer. Since the tag does not participate in the initial hybridization reaction with genomic DNA, but is involved in the primer extension process, the PCR products that are formed (i.e., the reproducible DNA fragments) include the tag sequence. Thus, the end products are DNA fragments that have a sequence identical to a sequence found in the genome except for the tag sequence.
  • the tag is useful because in later rounds of PCR it allows use of a higher annealing temperature than could otherwise be used with shorter oligonucleotides.
  • the arbitrarily selected sequence is positioned at the 3′ end of the primer. This sequence, although arbitrarily selected, is the same for each primer in a set of DOP-PCR primers. From 0 to 9 nucleotide residues (“N” in the formula above) are located at the 5′-end of the TARGET sequence in the DOP-PCR primers of the invention. Each of these residues can be independently selected from naturally-occurring or artificial nucleotide residues. By way of example, each “N” residue can be an inosine or methylcytosine residue.
  • x is an integer that can be from 0 to 9, and is preferably from 3 to 9 (e.g. 3, 4, 5, 6, 7, 8, or 9).
  • a base pair tag can be positioned at the 5′ end of the primer. This tag can optionally include a restriction enzyme site. In general, inclusion of a tag sequence in the DOP-PCR primers of the invention is preferred, but not necessary.
  • the initial rounds of DOP-PCR are preferably performed at a low temperature given that the specificity of the reaction will be determined by only the 3′ TARGET nucleotide sequence.
  • a slow ramp time during these cycles ensures that the primers do not detach from the template before being extended.
  • Subsequent rounds are carried out at a higher annealing temperature because in the subsequent rounds the 5′ end of the DOP-PCR primer (the tag) is able to contribute to the primer annealing.
  • a PCR cycle performed under low stringency hybridization conditions generally is from about 35° C. to about 55° C.
  • DOP-PCR involves a randomly chosen sequence
  • the resultant PCR products are generated from genome sequences arbitrarily distributed throughout the genome and will generally not be clustered within specific sites of the genome.
  • creation of new sets of DOP-PCR-amplified DNA fragments can be easily accomplished by changing the sequence, length, or both, of the primer.
  • RCGs having greater or lesser complexity can be generated by selecting DOP-PCR primers having shorter or longer, respectively, TARGET and (N) x nucleotide sequences.
  • This approach can also be used with multiple DOP-PCR primers such as in the “multiple-primed DOP-PCR” method (described below).
  • use of arbitrarily chosen sequences of DOP-PCR is useful in many species because the arbitrarily-selected sequences are not species-specific, as with some forms of PCR which require use of a specific known sequence.
  • IRS-PCR interspersed repeat sequence PCR
  • Mammalian chromosomes include both repeated and unique sequences. Some of the repeated sequences are short interspersed repeated sequences (IRS's) and others are long IRS's.
  • IRS's short interspersed repeated sequences
  • One major family of short IRS's found in humans includes Alu repeat sequences. Amplification using a single Alu primer will occurs whenever two Alu elements lie in inverted orientation to each other on opposite strands. There are believed to be approximately 900,000 Alu repeats in a human haploid genome.
  • Another type of IRS sequence is the L1 element (most common is LlHs) which is present in 10 4 -10 5 copies in a human genome.
  • the L1 sequence is expressed less abundantly in the genome than the Alu sequence, fewer amplification products are produced upon amplification using an L1 primer.
  • a primer which has homology to a repetitive sequence present on opposite strands within the genome of the species to be analyzed is used.
  • the inter-repeat sequence can be amplified.
  • the method has the advantage that the complexity of the resulting PCR products can be controlled by how homologous the primer chosen is with the repeat consensus (that is, the more homologous the primer is with the repeat consensus sequence, the more complex the PCR product will be).
  • an IRS-PCR primer has a sequence wherein at least a portion of the primer is homologous with (e.g. 50%, 75%, 90%, 95% or more identical to) the consensus nucleotide sequence of an IRS of the subject.
  • SINES small interspersed repeat sequences
  • genomic DNA sequences having this configuration are substrates for Alu PCR in human DNA and B1 and B2 PCR in the mouse.
  • the precise number of products which are represented in a specific Alu, B1, or B2 PCR reaction depends on the choice of primer used for the reaction. This variation in product complexity is due to the variation in sequence among the large number of representative sequences of the IRS family in each species.
  • AP-PCR arbitrarily primed PCR
  • AP-PCR utilizes short oligonucleotides as PCR primers to amplify a discrete subset of portions of a high complexity genome.
  • the primer sequence is arbitrary and is selected without knowledge of the sequence of the target nucleic acids to be amplified.
  • the arbitrary primer is generally 50-60% G+C.
  • the AP-PCR method is similar to the DOP-PCR method described above, except that the AP-PCR primer consists of only the arbitrarily-selected nucleotides and not the 5′ flanking degenerate residues or the tag (i.e. N x residue described for the DOP-PCR primers).
  • the genome may be primed using a single arbitrary primer or a combination of two or more arbitrary primers, each having a different, but optionally related, sequence.
  • AP-PCR is performed under low stringency hybridization conditions, allowing hybridization of the primer with targets with which the primer can exhibit a substantial degree of mismatching.
  • a PCR cycle performed under low stringency hybridization conditions generally is from about 35° C. to about 55° C.
  • Mismatches refer to non complementary nucleotide bases in the primer, relative to the template with which it is hybridized.
  • AP-PCR methods have been used previously in combination with gel electrophoresis to determine genotypes.
  • AP-PCR products are generationally fractionated on a high resolution polyacrylamide gel, and the presence or absence of specific bands is used to genotype a specific locus.
  • the difference between the presence and absence of a band is a consequence of a single nucleotide DNA sequence difference in one of the primer binding sites for a given single copy sequence.
  • the product complexity obtained using a given primer or primer set can be determined by several methods. For instance, the product complexity can be determined using PCR amplification of a panel of human yeast artificial chromosome (YAC) DNA samples from a CEPH 1 library. These YACs each carry a human DNA segment approximately 300-400 kilobase pairs in length. Product complexity for each primer set can be inferred by comparing the number of bands produced per YAC when analyzed on agarose gel with an IRS-PCR product of known complexity. Additionally, for products of relatively low complexity, electrophoresis on polyacrylamide gels can establish the product complexity, compared to a standard.
  • YAC human yeast artificial chromosome
  • an effective way to estimate the complexity of the product is to carry out a reannealing reaction using resistance to S1 nuclease-catalyzed degradation to determine the rate of reannealing of internally labeled, denatured, double-stranded DNA product. Comparison with reannealing rates of standards of known complexity permits accurate estimation of product complexity.
  • Each of these three methods may be used for IRS PCR.
  • the second and third methods are best for AP-PCR and DOP-PCR which, unlike IRS-PCR, will not selectively amplify human DNA from a crude YAC DNA preparation.
  • the complexity of PCR products generated by AP-PCR can be regulated by selecting the primer sequence length, the number of primers in a primer set, or some combination of these. By choosing the appropriate combination, AP-PCR may also be used to reduce the complexity of a genome for SNP identification and genotyping, as described herein. AP-PCR markers are different from Alu PCR primers, have a different genomic distribution, and can therefore complement an IRS-PCR genome complexity-reducing method. The methods can be used in combination to produce complementary information from genome scans.
  • One PCR method for preparing RCGs is an adapter-linker amplification PCR method (previously described in e.g., Saunders et al., Nuc. Acids Res., 17 9027 (1990); Johnson, Genomics, 6: 243 (1990) and PCT Application WO90/00434, published Aug. 9, 1990.
  • genomic DNA is digested using a restriction enzyme, and a set of linkers is ligated onto the ends of the resulting DNA fragments.
  • PCR amplification of genomic DNA is accomplished using a primer which can bind with the adapter linker sequence.
  • Two possible variations of this procedure which can be used to limit genome complexity are (a) to use a restriction enzyme which produces a set of fragments which vary in length such that only a subset (e.g. those smaller than a PCR-amplifiable length) are amplified; and (b) to digest the genomic DNA using a restriction enzyme that produces an overhang of random nucleotide sequence (e.g., AlwN1 recognizes CAGNNNCTG; SEQ ID NO: 1) and cleaves between NNN and CTG).
  • Adapters are constructed to anneal with only a subset of the products.
  • adapters having a specific 3 nucleotide residue overhang (corresponding to the random 3 base pair sequence produced by the restriction enzyme digestion) would be used to yield (43) 64-fold reduction in complexity. Fragments which have an overhang sequence complementary to the adapter overhang are the only ones which are amplified.
  • RCGs Another method for generating RCGs is based on the development of native RCGs.
  • Several methods can be used to generate native RCGs, including DNA fragment size selection, isolating a fraction of DNA from a sample which has been denatured and reannealed, pH-separation, separation based on secondary structure, etc.
  • Size selection can be used to generate a RCG by separating polynucleotides in a genome into different fractions wherein each fraction contains polynucleotides of an approximately equal size.
  • One or more fractions can be selected and used as the RCG. The number of fractions selected will depend on the method used to fragment the genome and to fractionate the pieces of the genome, as well as the total number of fractions. In order to increase the complexity of the RCG, more fractions are selected.
  • One method of generating a RCG involves fragmenting a genome into arbitrarily size pieces and separating the pieces on a gel (or by HPLC or another size fractionation method). A portion of the gel is excised, and DNA fragments contained in the portion are isolated. Typically, restriction enzymes can be used to produce DNA fragments in a reproducible manner.
  • Separation based on secondary structure can be accomplished in a manner similar to size selection. Different fractions of a genome having secondary structure can be separated on a gel. One or more fractions are excised from the gel, and DNA fragments are isolated therefrom.
  • Another method for creating a native RCG involves isolating a fraction of DNA from a sample which has been denatured and reannealed.
  • a genomic DNA sample is denatured, and denatured nucleic acid molecules are allowed to reanneal under selected conditions. Some conditions allow more of the DNA to be reannealed than other conditions. These conditions are well known to those of ordinary skill in the art. Either the reannealed or the remaining denatured fractions can be isolated. It is desirable to select the smaller of these two fractions in order to generate RCG.
  • the reannealing conditions used in the particular reaction determine which fraction is the smaller fraction. Variations of this method can also be used to generate RCGs.
  • the double stranded DNA may be removed (e.g., using column chromatography), the remaining DNA can then be allowed to partially reanneal, and the reannealed fraction can be isolated and used.
  • This variation is particularly useful for removing repetitive elements of the DNA, which rapidly reanneal.
  • the amount of isolated genome used in the method of preparing RCGs will vary, depending on the complexity of the initial isolated genome.
  • Genomes of low complexity such as bacterial genomes having a size of less than about 5 million base pairs (5 megabases) usually are used in an amount from approximately 10 picograms to about 250 nanograms. A more preferred range is from 30 picograms to about 7.5 nanograms, and even more preferably, about 1 nanogram.
  • Genomes of intermediate complexity such as plants (for instance, rice, having a genome size of approximately 700-1,000 megabases) can be used in a range of from approximately 0.5 nanograms to 250 nanograms. More preferably, the amount is between 1 nanogram and 50 nanograms.
  • Genomes of highest complexity such as maize or humans, having a genome size of approximately 3,000 megabases
  • PCR-generated RCGs can be prepared using DOP-PCR involving multiple primers, which is referred to herein as “multiple-primed-DOP-PCR”.
  • Multiple-primed-DOP-PCR involves the use of at least two primers which are arranged similarly to the single primers discussed above and are typically composed of 3 parts.
  • a multiple-primed-DOP-PCR primer as used herein has the following structure:
  • the TARGET 2 nucleotide sequence includes at least 5, and preferably at least 6, TARGET nucleotide residues, x is an integer from 0-9, and N is any nucleotide residue.
  • the sequence chosen arbitrarily and positioned at the 3′ end of the primer can be manipulated in multiple-primed-DOP-PCR to produce a different end product than for DOP-PCR because use of two or more sets of primers adds another level of diversity, thus producing a RCG or amplified genome, depending on the primers chosen.
  • Each of the at least two sets of primers of multiple-primed-DOP-PCR has a different TARGET sequence. Similar to the single primer of DOP-PCR a set of primers is generated for each of the at least two primers and, every primer within a single set has the same TARGET sequence as the other primers of the set.
  • This TARGET sequence is flanked at its 5′ end by 0 to 9 nucleotide residues (“N”s).
  • the set of N's will differ from primer to primer within a set of primers.
  • a set of primers may include up to 4 x different primers, each primer having a unique (N) x sequence.
  • a tag can be positioned at the 5′ end.
  • RNA genomes differ from RCGs in that they are generated from RNA rather than from DNA.
  • An RNA genome can be, for instance, a cDNA preparation made by reverse transcription of RNA obtained from cells of a subject (e.g. human ovarian carcinoma cells).
  • a RNA genome can be composed of DNA sequences, as long as the DNA is derived from RNA.
  • RNA can also be used directly.
  • RNA genotyping method involves use of RNA, rather than DNA, as the source of nucleic acid for genotyping.
  • RNA is reverse transcribed (e.g. using a reverse transcriptase) to produce cDNA for use as an RNA genome.
  • the RNA method has at least one advantage over DNA-based methods. SNPs in coding regions (cSNPs) are more likely to be directly involved in detectable phenotypes and are thus more likely to be informative with regard to how such phenotypes can be affected. Furthermore, since this method can require only a reverse transcription step, it is amenable to high-throughput analysis.
  • a reverse transcriptase primer which only binds a subset of RNA species e.g. a dT primer having a 3-base anchor, e.g. TTTTTTTTTT CAG; SEQ ID NO: 2
  • a reverse transcriptase primer which only binds a subset of RNA species
  • a dT primer having a 3-base anchor e.g. TTTTTTTTTT CAG; SEQ ID NO: 2
  • the RNA/cDNA sample can be attached to a surface and hybridized with a SNP-ASO.
  • the invention includes a method for identifying a SNP.
  • Genomic fragments which include SNPs can be prepared according to the invention by preparing a set of primers from a RCG (e.g., a RCG is composed of a set of PCR products), performing PCR using the set of primers to amplify a plurality of isolated genomes to produce DNA products, and identifying SNPs included in the DNA products.
  • the presence of a SNP in the DNA product can be identified using methods such as direct sequencing, i.e.
  • the SNPs are identified based on the sequences of the polymerase chain-reaction products identified using sequencing methods.
  • a “single nucleotide polymorphism” or “SNP” as used herein is a single base pair (i.e., a pair of complementary nucleotide residues on opposite genomic strands) within a DNA region wherein the identities of the paired nucleotide residues vary from individual to individual.
  • SNP single nucleotide polymorphism
  • two or more alternative base pairings occur at a relatively high frequency (greater than 1%) in a subject, (e.g. human) population.
  • a “polymorphic region” is a region or segment of DNA the nucleotide sequence of which varies from individual to individual. The two DNA strands which are complementary to one another except at the variable position are referred to as alleles.
  • a polymorphism is allelic because some members of a species have one allele and other members have a variant allele and some have both. When only one variant sequence exists, a polymorphism is referred to as a diallelic polymorphism. There are three possible genotypes in a diallelic polymorphic DNA in a diploid organism.
  • genotypes arise because it is possible that a diploid individual's DNA may be homozygous for one allele, homozygous for the other allele, or heterozygous (i.e. having one copy of each allele). When other mutations are present, it is possible to have triallelic or higher order polymorphisms. These multiple mutation polymorphisms produce more complicated genotypes.
  • SNPs are well-suited for studying sequence variation because they are relatively stable (i.e. they exhibit low mutation rates) and because it appears that SNPs can be responsible for inherited traits. These properties make SNPs particularly useful as genetic markers for identifying disease-associated genes. SNPs are also useful for such purposes as linkage studies in families, determining linkage disequilibrium in isolated populations, performing association analysis of patients and controls, and loss of heterozygosity studies in tumors.
  • DOP-PCR is performed using genomic DNA obtained from an individual.
  • the products are separated on an agarose gel.
  • the products are separated by approximate length into approximately 8 segments having sizes of about 400-1000 base pairs, and libraries are made from each of the segments. This approach prevents domination of the library by one or two abundant products.
  • Plasmid DNA is isolated from individual colonies containing portions of the library. Inserts are isolated and the ends of the inserts are sequenced using vector primers. A new set of primers is then synthesized based on these insert sequences to allow PCR to be performed using RCG obtained from one or more individuals or from a pool of individuals.
  • the DNA products generated by the PCR are sequenced and inspected for the presence of two nucleotide residues at one location, an indication that a polymorphism exists at that position within one of the alleles.
  • a “primer” as used herein is a polynucleotide which hybridizes with a target nucleic acid with which it is complementary and which is capable of acting as an initiator of nucleic acid synthesis under conditions for primer extension.
  • Primer extension conditions include hybridization between the primer and template, the presence of free nucleotides, a chain extender enzyme, e.g., DNA polymerase, and appropriate temperature and pH.
  • a set of primers is prepared by at least the following steps: preparing a RCG, composed of a set of PCR products, separating the set of PCR products into individual PCR products, determining the sequence of each end of at least one of the PCR products, and generating the set of primers for use in the subsequent PCR step based on the sequence of the ends of the insert(s).
  • a “set of PCR products”, as used herein, is a plurality of synthetic polynucleotide sequences, each polynucleotide sequence being different from one another except for a stretch of nucleotides in the 5′ and 3′ regions of the polynucleotides which are identical in each polynucleotide. These regions correspond to the primers used to generate the RCG and the sequence in these regions varies depending on what primer is used. When a DOP PCR primer is used, the sequence that varies in each primer preferably has a sequence N x , wherein x is 512 and N is any nucleotide.
  • a set of DNA products is different from a “set of PCR products” as used herein and refers to DNA generated by PCR using specific primers which amplify a specific locus.
  • the primer may be purified from a nucleic acid preparation which includes, it or it may be prepared synthetically.
  • nucleic acid fragments may be isolated from nucleic acid sequences in genomes, plasmids, or other vectors by site-specific cleavage, etc.
  • the primers may be prepared by de novo chemical synthesis, such as by using phosphotriester or phosphodiester synethetic methods, such as those described in U.S. Pat. No. 4,356,270; Itakura et al. (1989), Ann. Rev. Biochem., 53:323-56; and Brown et al. (1979), Meth. Enzymol., 68:109.
  • Primers may also be prepared using recombinant technology, such as that described in Sambrook, “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, p.390-401 (1982).
  • nucleotide residue refers to a single monomeric unit of a nucleic acid such as DNA or RNA.
  • base pair refers to two nucleotide residues which are complementary to one another and are capable of hydrogen bonding with one another. Traditional base pairs are between G:C and T:A.
  • G, C, T, U and A refer to (deoxy)guanosine, (deoxy)cytidine, (deoxy)thymidine, uridine, and (deoxy)adenosine, respectively.
  • nucleic acids refers to a class of molecules including single stranded and double stranded deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and polynucleotides. Nucleic acids within the scope of the invention include naturally occurring and synthetic nucleic acids, nucleic acid analogs, modified nucleic acids, nucleic acids containing modified nucleotides, modified nucleic acid analogs, and mixtures of any of these.
  • SNPs identified or detected in the genotyping methods described herein can also be identified by other methods known in the art. Many methods have been described for identifying SNPs. (see e.g. WO95/12607, Bostein, et al., Am. J. Hum. Genet, 32:314-331 (1980), etc.). In some embodiments, it is preferred that SNPs be identified using the same method that will subsequently be used for genotype analysis.
  • the SNPs and RCGs of the invention are useful for a variety of purposes.
  • SNPs and RCGs are useful for performing genotyping analysis; for identification of a subject, such as in paternity or maternity testing, in immigration and inheritance disputes, in breeding tests in animals, in zygosity testing in twins, in tests for inbreeding in humans and animals; in evaluation of transplant suitability such as with bone marrow transplants; in identification of human and animal remains; in quality control of cultured cells; in forensic testing such as forensic analysis of semen samples, blood stains, and other biological materials; in characterization of the genetic makeup of a tumor by testing for loss of heterozygosity; in determining the allelic frequency of a particular SNP; and in generating a genomic classification code for a genome by identifying the presence or absence of each of a panel of SNPs in the genome of a subject and optionally determining the allelic frequency of the SNPs.
  • Genotyping is the process of identifying the presence or absence of specific genomic sequences within genomic DNA. Distinct genomes may be isolated from individuals of populations which are related by some phenotypic characteristic, by familial origin, by physical proximity, by race, by class, etc. in order to identify polymorphisms (e.g. ones associated with a plurality of distinct genomes) which are correlated with the phenotype family, location, race, class, etc. Alternatively, distinct genomes may be isolated at random from populations such that they have no relation to one another other than their origin in the population. Identification of polymorphisms in such genomes indicates the presence or absence of the polymorphisms in the population as a whole, but not necessarily correlated with a particular phenotype.
  • genotyping is often used to identify a polymorphism associated with a particular phenotypic trait, this correlation is not necessary. Genotyping only requires that a polymorphism, which may or may not reside in a coding region, is present. When genotyping is used to identify a phenotypic characteristic, it is presumed that the polymorphism affects the phenotypic trait being characterized. A phenotype may be desirable, detrimental, or, in some cases, neutral.
  • Polymorphisms identified according to the methods of the invention can contribute to a phenotype. Some polymorphisms occur within a protein coding sequence and thus can affect the protein structure, thereby causing or contributing to an observed phenotype. Other polymorphisms occur outside of the protein coding sequence but affect the expression of the gene. Still other polymorphisms merely occur near genes of interest and are useful as markers of that gene. A single polymorphism can cause or contribute to more than one phenotypic characteristic and, likewise, a single phenotypic characteristic may be due to more than one polymorphism. In general multiple polymorphisms occurring within a gene correlate with the same phenotype. Additionally, whether an individual is heterozygous or homozygous for a particular polymorphism can affect the presence or absence of a particular phenotypic trait.
  • Phenotypic correlation is performed by identifying an experimental population of subjects exhibiting a phenotypic characteristic and a control population which do not exhibit that phenotypic characteristic. Polymorphisms which occur within the experimental population of subjects sharing a phenotypic characteristic and which do not occur in the control population are said to be polymorphisms which are correlated with a phenotypic trait. Once a polymorphism has been identified as being correlated with a phenotypic trait, genomes of subjects which have potential to develop a phenotypic trait or characteristic can be screened to determine occurrence or non-occurrence of the polymorphism in the subjects' genomes in order to establish whether those subjects are likely to eventually develop the phenotypic characteristic. These types of analyses are generally carried out on subjects at risk of developing a particular disorder such as Huntington's disease or breast cancer.
  • a phenotypic trait encompasses any type of genetic disease, condition, or characteristic, the presence or absence of which can be positively determined in a subject.
  • Phenotypic traits that are genetic diseases or conditions include multifactorial diseases of which a component may be genetic (e.g. owing to occurrence in the subject of a SNP), and predisposition to such diseases. These diseases include such as, but not limited to, asthma, cancer, autoimmune diseases, inflammation, blindness, ulcers, heart or cardiovascular diseases, nervous system disorders, and susceptibility to infection by pathogenic microorganisms or viruses.
  • Autoimmune diseases include, but are not limited to, rheumatoid arthritis, multiple sclerosis, diabetes, systemic lupus, erythematosus and Grave's disease.
  • Cancers include, but are not limited to, cancers of the bladder, brain, breast, colon, esophagus, kidney, hematopoietic system eg. leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach, and uterus.
  • a phenotypic characteristic includes any attribute of a subject other than a disease or disorder, the presence or absence of which can be detected. Such characteristics can, in some instances, be associated with occurrence of a SNP in a subject which exhibits the characteristic.
  • characteristics include, but are not limited to, susceptibility to drug or other therapeutic treatments, appearance, height, color (e.g. of flowering plants), strength, speed (e.g. of race horses), hair color, etc.
  • phenotypic traits associated with genetic variation have been described, see e.g., U.S. Pat. No. 5,908,978 (which identifies association of disease resistance in certain species of plants associated with genetic variations) and U.S. Pat. No. 5,942,392 (which describes genetic markers associated with development of Alzheimer's disease).
  • Identification of associations between genetic variations e.g. occurrence of SNPs
  • phenotypic traits is useful for many purposes. For example, identification of a correlation between the presence of a SNP allele in a subject and the ultimate development by the subject of a disease is particularly useful for administering early treatments, or instituting lifestyle changes (e.g., reducing cholesterol or fatty foods in order to avoid cardiovascular disease in subjects having a greater-than-normal predisposition to such disease), or closely monitoring a patient for development of cancer or other disease. It may also be useful in prenatal screening to identify whether a fetus is afflicted with or is predisposed to develop a serious disease. Additionally, this type of information is useful for screening animals or plants bred for the purpose of enhancing or exhibiting of desired characteristics.
  • One method for determining a genotype associated with a plurality of genomes is screening for the presence or absence of a SNP in a plurality of RCGs. For example, such screening may be performed using a hybridization reaction including a SNP-ASO and the RCGs. Either the SNP-ASO or the RCGs can, optionally be immobilized on a surface. The genotype is determined based on whether the SNP-ASO hybridizes with at least some of the RCGs. Other methods for determining a genotype involve methods which are not based on hybridization, including, but not limited to, mass spectrometric methods. Methods for performing mass spectrometry using nucleic acid samples have been described. See e.g., U.S. Pat. No. 5,885,775. The components of the RCG can be analyzed by mass spectrometry to identify the presence or absence of a SNP allele in the RCG.
  • a “SNP-ASO”, as used herein, is an oligonucleotide which includes one of two alternative nucleotides at a polymorphic site within its nucleotide sequence. In some embodiments, it is preferred that the oligonucleotide include only a single mismatched nucleotide residue namely the polymorphic residue, relative to an allele of a SNP. In other cases, however, the oligonucleotide may contain additional nucleotide mismatches such as neutral bases or may include nucleotide analogs. This is described in more detail below.
  • the SNP-ASO is composed from about 10 to 50 nucleotide residues. In more preferred embodiments, it is composed of from about 10 to 25 nucleotide residues.
  • Oligonucleotides may be purchased from commercial sources such as Genosys, Inc., Houston, Tex. or, alternatively, may be synthesized de novo on an Applied Biosystems 381 A DNA synthesizer or equivalent type of machine.
  • the oligonucleotides may be labeled by any method known in the art.
  • One preferred method is end-labeling, which can be performed as described in Maniatis et al., “Molecular Cloning: A Laboratory Manual”, Cold Spring Harbor Laboratories, Cold Spring Harbor, N.Y. (1982).
  • genomic DNA may be directly analyzed or minimally reduced. This is particularly useful for screening tissue isolates to detect the presence of a bacterium or to identify the bacteria. Additionally, it is possible that, upon development of certain technical advances (e.g., more stringent hybridization, more sensitive detection equipment), even complex genomes may not need an extensive complexity reduction step.
  • genomic DNA of a well-characterized set of subjects is processed using PCR with appropriate primers to produce RCGs.
  • the DNA is spotted onto one or more surfaces (e.g., multiple glass slides) for genotyping.
  • This process can be performed using a microarray spotting apparatus which can spot more than 1,000 samples within a square centimeter area, or more than 10,000 samples on a typical microscope slide.
  • Each slide is hybridized with a fluorescently tagged allele-specific SNP oligonucleotide under TMAC conditions analogous to those described below.
  • the genotype of each individual can be determined by detecting the presence or absence of a signal for a selected set of SNP-ASOs. A schematic of the method is shown in FIG. 4.
  • the resulting genomic DNA fragments can be attached to a solid support in order to be analyzed by hybridization.
  • the RCG fragments may be attached to the slide by any method for attaching DNA to a surface. Methods for immobilizing nucleic acids have been described extensively, e.g., in U.S. Pat. Nos. 5,679,524; 5,610,287; 5,919,626; and 5,445,934. For instance, DNA fragments may be spotted onto poly-L-lysine-coated glass slides, and then crosslinked by UV irradiation. A second, more preferred method, which has been developed, involves including a 5′ amino group on each of the DNA fragments of the RCG.
  • the DNA fragments are spotted onto silane-coated slides in the presence of NaOH in order to covalently attach the fragments to the slide.
  • This method is advantageous because a covalent bond is formed between the fragments and the surface.
  • Another method for accomplishing DNA fragment immobilization is to spot the RCG fragments onto a nylon membrane.
  • Other methods of binding DNA to surfaces are possible and are well known to those of ordinary skill in the art. For instance, attachment to amino-alkyl-coated slides can be used. More detailed methods are described in the Examples below.
  • the surface to which the oligonucleotide arrays are conjugated is preferably a rigid or semi-rigid support which may, optionally, have appropriate light absorbing or transmitting characteristics for use with commercially available detection equipment.
  • Substrates which are commonly used and which have appropriate light absorbing or transmitting characteristics include, but are not limited to, glass, Si, Ge, GaAs, GaP, SiO 2 , SiN 4 , modified silicon, and polymers such as (poly)tetrafluoroethylene, (poly)vinylidenedifluoride, polystyrene, polycarbonate, or combinations thereof.
  • the surface of the support may be non-coated or coated with a variety of materials. Coatings include, but are not limited to, polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, and membranes.
  • the SNP-ASOs are hybridized under standard hybridization conditions with RCGs covalently conjugated to a surface. Briefly, SNP-ASOs are labeled at their 5′ ends. A hybridization mixture containing the SNP-ASOs and, optionally, an isostabilizing agent, denaturing agent, or renaturation accelerant is brought into contact with an array of RCGs immobilized on the surface and the mixture and the surface are incubated under appropriate hybridization conditions. The SNP-ASOs which do not hybridize are removed by washing the array with a wash mixture (such as a hybridization buffer) to leave only hybridized SNP-ASOs attached to the surface.
  • a wash mixture such as a hybridization buffer
  • detection of the label e.g., a fluorescent molecule
  • an image of the surface can be captured (e.g., using a fluorescence microscope equipped with a CCD camera and automated stage capabilities, phosphoimager, etc.).
  • the label may also, or instead, be detailed using a microarray scanner (e.g. one made by Genetic Microsystems).
  • a microarray scanner provides image analysis which can be converted to a binary (i.e. +/ ⁇ ) signal for each sample using, for example, any of several available software applications (e.g., NIH image, ScanAnalyze, etc.) in a data format. The high signal/noise ratio for this analysis allows determination of data in this mode to be straightforward and easily automated.
  • the methods may utilize two or more fluorescent dyes which can be spectrally differentiated to reduce the number of samples to be analyzed. For instance, if four fluorescent dyes having spectral distinctions (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used. Then four hybridization reactions can be carried out under a single hybridization condition.
  • the SNP-ASOs are conjugated to a surface and hybridized with RCGs.
  • the SNP-ASO is present in a hybridization mixture at a concentration of from about 0.005 nanomoles per liter SNP-ASO hybridization mixture to about 50 nM SNP-ASO per ml hybridization mixture. More preferably, the concentration is from 0.5 nanomoles per liter to 1 nanomole per liter. A preferred concentration for radioactivity is 0.66 nanomoles per liter.
  • the mixture preferably also includes a hybridization optimizing agent in order to improve signal discrimination between genomic sequences which are identically complementary to the SNP-ASO and those which contain a single mismatched nucleotide (as well as any neutral base etc. substitutions).
  • Isostabilizing agents are compounds such as betaines and lower tetraalkyl ammonium salts which reduce the sequence dependence of DNA thermal melting transitions. These types of compounds also increase discrimination between matched and mismatched SNPs/genomes.
  • a denaturing agent may also be included in the hybridization mixture.
  • a denaturing agent is a composition that lowers the melting temperature of double stranded nucleic acid molecules, generally by reducing hydrogen bonding between bases or preventing hydration of nucleic acid molecules.
  • Denaturing agents are well-known in the art and include, for example, DMSO, formaldehyde, glycerol, urea, formamide, and chaotropic salts.
  • hybridization conditions in general are those used commonly in the art, such as those described in Sambrook et al., “Molecular Cloning: A Laboratory Manual”, (1989), 2nd Ed., Cold Spring Harbor, N.Y.; Berger and Kimmel, “Guide to Molecular Cloning Techniques”, Methods in Enzymology, ( 1987), Volume 152, Academic Press, Inc., San Diego, Calif.; and Young and Davis, (1983), PNAS (USA) 80:1194.
  • incubation temperatures for hybridization of nucleic acids range from about 20° C. to 75° C.
  • a preferred temperature range for hybridization is from about 50° C. to 54° C.
  • the hybridization temperature for longer probes is preferably from about 55° C. to 65° C. and for shorter probes is less than 52° C.
  • Rehybridization may be performed in a variety of time frames.
  • hybridization of SNP and RCGs performed for at least 30 minutes.
  • either or both of the SNP-ASO and the RCG are labeled.
  • the label may be added directly to the SNP-ASO or the RCG during synthesis of the oligonucleotide or during generation of RCG fragments.
  • a PCR reaction performed using labeled primers or labeled nucleotides will produce a labeled product.
  • Labeled nucleotides e.g., fluorescein-labeled CTP
  • Methods for attaching labels to nucleic acids are well known to those of ordinary skill in the art and, in addition to the PCR method, include, for example, nick translation and end-labeling.
  • Labels suitable for use in the methods of the present invention include any type of label detectable by standard means, including spectroscopic, photochemical, biochemical, electrical, optical, or chemical methods.
  • Preferred types of labels include fluorescent labels such as fluorescein.
  • a fluorescent label is a compound comprising at least one fluorophore.
  • Commercially available fluorescent labels include, for example, fluorescein phosphoramidides such as fluoreprime (Pharmacia, Piscataway, N.J.), fluoredite (Millipore, Bedford, Mass.), FAM (ABI, Foster City, Calif.), rhodamine, polymethadine dye derivative, phosphores, Texas red, green fluorescent protein, CY3, and CY5.
  • Polynucleotides can be labeled with one or more spectrally distinct fluorescent labels.
  • “Spectrally distinct”fluorescent labels are labels which can be distinguished from one another based on one or more of their characteristic absorption spectra, emission spectra, fluorescent lifetimes, or the like. Spectrally distinct fluorescent labels have the advantage that they may be used in combination (“multiplexed”).
  • Radionuclides such as 3 H, 125 I, 35 S, 14 C, or 32 P are also useful labels according to the methods of the invention. A plurality of radioactively distinguishable radionuclides can be used. Such radionuclides can be distinguished, for example, based on the type of radiation (e.g.
  • the 32 P signal can be detected using a phosphoimager, which currently has a resolution of approximately 50 microns.
  • Other known techniques such as chemiluminescence or colormetric (enzymatic color reaction), can also be used.
  • multiplexing refers to the use of a set of distinct fluorescent labels in a single assay.
  • fluorescent labels have been described extensively in the art, such as the fluorescent labels described in PCT Published Patent Application WO98/31834.
  • Fluorescent primers are a preferred method of labeling polynucleotides.
  • the fluorescent tag is stable for more than a year. Radioactively labeled primers are stable for a shorter period.
  • fluorescent primers may be used in combination if they are spectrally distinct, as discussed above. This allows multiple hybridizations to be detected in a single hybridization mixture. As a result, the total number of reactions needed for a genome-wide scan is reduced. For example, for analysis of 1000 loci, 2000 hybridizations are needed (1000 loci ⁇ 2 polymorphisms/loci). The use of 4 fluorescently-labeled oligonucleotides will cut this number 4-fold and thus only 500 hybridizations will be needed.
  • SNP allele-specific oligonucleotide hybridization In order to determine the genotype of an individual at a SNP locus, it is desirable to employ SNP allele-specific oligonucleotide hybridization. Preferably, two hybridization mixtures are prepared for each locus (or they can be performed together).
  • the first hybridization mixture contains a labeled (e.g., radioactive or fluorescent) SNP-ASO (typically 17-21 nucleotide residues in length centered around the polymorphic residue).
  • a 20-50 fold excess of non-labeled oligonucleotides corresponding to another allele referred to herein as a “complementary SNP-ASO” is included in the hybridization mixture.
  • Non-labeled complementary SNP-ASO can be avoided by using SNP-ASO containing a neutral base as described below.
  • the SNP-ASO that was labeled in the first mixture is not labeled, and the non-labeled SNP-ASO is labeled instead.
  • Hybridization is performed in the presence of a hybridization buffer. The melting temperature of oligonucleotides can be determined empirically for each experiment.
  • the pair of 2 oligonucleotides corresponding to different alleles of the same SNP (the SNP-ASOs and the complementary SNP-ASO) are referred to herein as a pair of allele-specific oligonucleotides (ASOs). Further experimental details regarding selecting and making SNP-ASOs are provided in the Examples section below.
  • the methods described above are based on conjugation of genomic DNA fragments (i.e. a RCG) to a solid support.
  • Hybridization analysis can also be performed with the SNP-ASO conjugated to the support (e.g. in an array).
  • the oligonucleotide array is hybridized with one or more RCGs. Attaching of the SNP-ASOs or RCGs onto the support may be performed by any method known in the art. Many methods for attaching oligonucleotides to surfaces in arrays have been described, see, e.g. PCT Published Patent Application WO97/29212, U.S. Pat. Nos. 4,588,682; 5,667,976; and 5,760,130. Other methods include, for example, using arrays of metal pins. Additionally, RCGs may be attached to the surface by the methods disclosed in the Examples below.
  • An “array” as used herein is a set of molecules arranged in a specific order with respect to a surface.
  • the array is composed of polynucleotides (e.g. either SNP-ASOs or RCGs) attached to the surface.
  • Oligonucleotide arrays can be used to screen nucleic acid samples for a target nucleic acid, which can be labeled with a detectable marker.
  • a fluorescent signal resulting from hybridization between a target nucleic acid and a substrate-bound oligonucleotide provides information relating to the identity of the target nucleic acid by reference to the location of the oligonucleotide in the array on the substrate.
  • Such a hybridization assay can generate thousands of signals which exhibit different signal strengths. These signals correspond to particular oligonucleotides of the array. Different signal strengths will arise based on the amount of labeled target nucleic acid hybridized with an oligonucleotide of the array. This amount, in turn, can be influenced by the proportion of AT-rich regions and GC-rich regions within the oligonucleotide (which determines thermal stability). The relative amounts of hybridized target nucleic acid can also be influenced by, for example, the number of different probes arrayed on the substrate, the length of the target nucleic acid, and the degree of hybridization between mismatched residues.
  • Oligonucleotide arrays in some embodiments, have a density of at least 500 features per square centimeter, but in practice can have much lower densities.
  • a feature as used herein, is an area of a substrate on which oligonucleotides having a single sequence are immobilized.
  • the oligonucleotide arrays of the invention may be produced by any method known in the art. Many such arrays are commercially available, and many methods have been described for producing them.
  • One preferred method for producing arrays includes spatially directed oligonucleotide synthesis. Spatially directed oligonucleotide may be performed using light-directed oligonucleotide synthesis, microlithography, application by ink jet, microchannel deposition to specific location, and sequestration with physical barriers. Each of these methods is well-known in the art and has been described extensively. For instance, the light-directed oligonucleotide synthesis method has been disclosed in U.S. Pat. Nos.
  • This technique involves modification of the surface of the solid support with linkers and photolabile protecting groups using a photolithographic mask to produce reactive (e.g. hydroxyl) groups in the illuminated regions.
  • a 3′-O-phosphoramideactivated deoxynucleocide having a 5′-hydroxylprotected group is supplied to the surface such that coupling occurs at sites that were exposed to light.
  • the substrate is rinsed, and the surface is illuminated with a second mask, and another activated deoxynucleotide is presented to the surface. The cycle is repeated until the desired set of products is obtained.
  • nucleotides can be capped.
  • Another method involves mechanically protecting portions of the surface and selectively deprotecting/coupling materials to the exposed portions of the surface, such as the method described in U.S. Pat. No. 5,384,261.
  • the mechanical means is generally referred to as a mask.
  • Other methods for array preparation are described in PCT Published Patent Applications WO97/39151, WO98/20967, and WO98/10858, which describe an automated apparatus for the chemical synthesis of molecular arrays, U.S. Pat. No. 5,143,854, Fodor et al., Science ( 1991), 251:767-777 and Kozal et al., Nature Medicine, v. 2, p. 753-759 (1996).
  • Hybridizing a SNP-ASO with an array of RCGs is followed by detection of hybridization.
  • Part of the genotyping methods described herein is to determine if a positive or negative signal exists for each hybridization for an individual and then based on this information, determine the genotype for the corresponding SNP locus. This step is relatively straightforward, but varies depending on the method of detection. Essentially, all of the detection methods described here (fluorescent, radioactive, etc.) can be reduced to a digital image file, e.g. using a microarray reader or phosphoimager. Presently, there are several software products which will overlay a grid on an image and determine the signal strength value for each element of the grid.
  • the array having labeled SNP-ASOs (or labeled RCGs) hybridized thereto can be analyzed using automated equipment.
  • Automated equipment for analyzing arrays can include an excitation radiation source which emits radiation at a first wavelength, an optical detector, and a stage for securing the surface supporting the array.
  • the excitation source emits excitation radiation which is focused on at least one area of the array and which induces emission from fluorescent labels.
  • the signal is preferably in the form of radiation having a different wavelength than the excitation radiation.
  • Emitted radiation is collected by a detector, which generates a signal proportional to the amount of radiation sensed thereon.
  • the array may then be moved so that a different area can be exposed to the radiation source to produce a signal.
  • a two-dimensional image of the array is obtained.
  • the movement of the array is accomplished using automated equipment, such as a multi-axis translation stage, such as one which moves the array at a constant velocity.
  • the array may remain stationary, and devices may be employed to cause scanning of the light over the stationary array.
  • One type of detection method includes a CCD imaging system, e.g. when the nucleic acids are labeled with fluorescent probes.
  • CCD imaging systems for use with array detection have been described. For instance, a photodiode detector may be placed on the opposite side of the array from the excitation source. Alternatively, a CCD camera may be used in place of the photodiode detector to image the array.
  • One advantage of using these systems is rapid read time. In general, an entire 50 ⁇ 50 centimeter array can be read in about 30 seconds or less using standard equipment. If more powerful equipment and efficient dyes are used, the read time may be reduced to less than 5 seconds.
  • a computer can be used to transform the data into a displayed image which varies in color depending on the intensity of light emission at a particular location.
  • Any type of commercial software which can perform this type of data analysis can be used.
  • the data analysis involves the steps of determining the intensity of the fluorescence emitted as a function of the position on the substrate, removing the outliers, and calculating the relative binding affinity.
  • One or more of the presence, absence, and intensity of signal corresponding to a label is used to assess the presence or absence of an SNP corresponding to the label in the RCG.
  • the presence and absence of one or more SNP's in a RCG can be used to assign a genotype to the individual.
  • the following depicts the genotype analysis of 3 individuals at a given locus at which an A/G polymorphism occurs: Individual SNP 1 Allele “A” SNP 1 Allele “G” Genotype Larry + ⁇ A/A Moe ⁇ + G/G Curly + + A/G
  • SNP analysis can be used to determine whether an individual has or will develop a particular phenotypic trait and whether the presence or absence of a specific allele correlates with a particular phenotypic trait.
  • genomic samples are isolated from a group of individuals which exhibit the particular phenotypic trait, and the samples are analyzed for the presence of common SNPs.
  • the genomic sample obtained from each individual is used to prepare a RCG. These RCGs are screened using panels of SNPs in a high throughput method of the invention to determine whether the presence or absence of a particular allele is associated with the phenotype.
  • a particular polymorphic allele is present in 30% of individuals who develop Alzheimer's disease, then an individual having that allele has a higher likelihood of developing Alzheimer's disease.
  • the likelihood can also depend on several factors such as whether individuals not afflicted with Alzheimer's disease have this allele and whether other factors are associated with the development of Alzheimer's disease.
  • This type of analysis can be useful for determining a probability that a particular phenotype will be exhibited.
  • multiple SNPs associated with a particular phenotype can be analyzed. Although values can be calculated, it is enough to identify that a difference exists.
  • SNPs which segregate with a particular disease.
  • Multiple polymorphic sites may be detected and examined to identify a physical linkage between them or between a marker (SNP) and a phenotype. Both of these are useful for mapping a genetic locus linked to or associated with a phenotypic trait to a chromosomal position and thereby revealing one or more genes associated with the phenotypic trait. If two polymorphic sites segregate randomly, then they are either on separate chromosomes or are distant enough, with respect to one another on the same chromosome that they do not co-segregate. If two sites co-segregate with significant frequency, then they are linked to one another on the same chromosome.
  • linkage analyses are useful for developing genetic maps. See e.g., Lander et al., PNAS (USA) 83, 7353-7357 (1986), Lander et al., Genetics 121, 185-199 (1989).
  • the invention is also useful for identifying polymorphic sites which do not segregate, i.e., when one sibling has a chromosomal region that includes a polymorphic site and another sibling does not have that region.
  • Linkage analysis is often performed on family members which exhibit high rates of a particular phenotype or on patients suffering from a particular disease.
  • Biological samples are isolated from each subject exhibiting a phenotypic trait, as well as from subjects which do not exhibit the phenotypic trait. These samples are each used to generate individual RCGs and the presence or absence of polymorphic markers is determined using panels of SNPs. The data can be analyzed to determine whether the various SNPs are associated with the phenotypic trait and whether or not any SNPs segregate with the phenotypic trait.
  • the methods of the invention are also useful for assessing loss of heterozygosity in a tumor.
  • Loss of heterozygosity in a tumor is useful for determining the status of the tumor, such as whether the tumor is an aggressive, metastatic tumor.
  • the method is generally performed by isolating genomic DNA from tumor sample obtained from a plurality of subjects having tumors of the same type, as well as from normal (i.e., non-cancerous) tissue obtained from the same subjects. These genomic DNA samples are used to generate RCGs which can be hybridized with a SNP-ASO, for example using the surface array technology described herein.
  • the absence of a SNP allele in the RCG generated from the tumor compared to the RCG generated from normal tissue indicates whether loss of heterozygosity has occurred. If a SNP allele is associated with a metastatic state of a cancer, the absence of the SNP allele can be compared to its presence or absence in a non-metastatic tumor sample or a normal tissue sample.
  • a database of SNPs which occur in normal and tumor tissues can be generated and an occurrence of SNPs in a patient's sample can be compared with the database for diagnostic or prognostic purposes.
  • metastasis is a major cause of treatment failure in cancer patients. If metastasis can be detected early, it can be treated aggressively in order to slow the progression of the disease. Metastasis is a complex process involving detachment of cells from a primary tumor, movement of the cells through the circulation, and eventual colonization of tumor cells at local or distant tissue sites. Additionally, it is desirable to be able to detect a pre-disposition for development of a particular cancer such that monitoring and early treatment may be initiated. Many cancers and tumors are associated with genetic alterations.
  • Solid tumors progress from tumorigenesis through a metastatic stage and into a stage at which several genetic aberrations can occur. e.g., Smith et al., Breast Cancer Res. Terat., 18 Suppl. 1, S5-14, 1991. Genetic aberrations are believed to alter the tumor such that it can progress to the next stage, i.e., by conferring proliferative advantages, the ability to develop drug resistance or enhanced angiogenesis, proteolysis, or metastatic capacity. These genetic aberrations are referred to as “loss of heterozygosity.” Loss of heterozygosity can be caused by a deletion or recombination resulting in a genetic mutation which plays a role in tumor progression.
  • Loss of heterozygosity for tumor suppressor genes is believed to play a role in tumor progression. For instance, it is believed that mutations in the retinoblastoma tumor suppressor gene located in chromosome 13q14 causes progression of retinoblastomas, osteosarcomas, small cell lung cancer, and breast cancer. Likewise, the short arm of chromosome 3 has been shown to be associated with cancer such as small cell lung cancer, renal cancer and ovarian cancers. For instance, ulcerative colitis is a disease which is associated with increased risk of cancer presumably involving a multistep progression involving accumulated genetic changes (U.S. Pat. No. 5,814,444).
  • the methods of the invention are particularly advantageous for studying loss of heterozygosity because thousands of tumor samples can be screened at one time. Additionally, the methods can be used to identify new regions of loss that have not previously been identified in tumors.
  • the methods of the invention are useful for generating a genomic pattern for an individual genome of a subject.
  • the genomic pattern of a genome indicates the presence or absence of polymorphisms, for example, SNPs, within a genome.
  • Genomic DNA is unique to each individual subject (except identical twins). Accordingly, the more polymorphisms that are analyzed for a given genome of a subject, the higher probability of generating a unique genomic pattern for the individual from which the sample was isolated.
  • the genomic pattern can be used for a variety of purposes, such as for identification with respect to forensic analysis or population identification, or paternity or maternity testing.
  • the genomic pattern may also be used for classification purposes as well as to identify patterns of polymorphisms within different populations of subjects.
  • Genomic patterns may be used for many purposes, including forensic analysis and paternity or maternity testing.
  • the use of genomic information for forensic analysis has been described in many references, see e.g., National Research Council, The Evaluation of Forensic DNA Evidence (EDS Pollard et al., National Academy Press, DC, 1996).
  • Forensic analysis of DNA is based on determination of the presence or absence of alleles of polymorphic regions within a genomic sample. The more polymorphisms that are analyzed, the higher probability of identifying the correct individual from which the sample was isolated.
  • DNA when a biological sample, such as blood or sperm, is found at a crime scene, DNA can be isolated and RCGs can be prepared. This RCG can then be screened with a panel of SNPs to generate a genomic pattern.
  • the genomic pattern can be matched with a genomic pattern produced from a suspect or compared to a database of genomic patterns which has been compiled.
  • the SNPs used in the analysis are those in which the frequency of the polymorphic variation (allelic frequency) has been determined, such that a statistical analysis can be used to determine the probability that the sample genome matches the suspect's genome or a genome within the database.
  • x and y in the equation represent the frequency that an allele A or B will occur in a haploid genome.
  • the calculation can be extended for more polymorphic forms at a given locus.
  • the predictability increases with the number of polymorphic forms tested.
  • a binomial expansion is used to calculate P(ID).
  • the probabilities of each locus can be multiplied to provide the cumulative probability of identity and from this the cumulative probability of non-identity for a particular number of loci can be calculated. This value indicates the likelihood that random individuals have the same loci.
  • the same type of quantitative analysis can be used to determine whether a subject is a parent of a particular child. This type of information is useful in paternity testing, animal breeding studies, and identification of babies or children whose identity has been confused, e.g., through adoption or inadequate record keeping in a hospital, or through separation of families by occurrences such as earthquake or war.
  • the genomic pattern may be used to generate a genomic classification code (GNC).
  • GNC genomic classification code
  • the GNC may be represented by one or more data signals and stored as part of a data structure on a computer-readable medium, for example, a database.
  • the stored GNCs may be used to characterize, classify, or identify the subjects for which the GNCs were generated.
  • Each GNC may be generated by representing the presence or absence of each polymorphism with a computer-readable signal. These signals may then be encoded, for example, by performing a function on the signals.
  • the GNCs may be used as part of a classification or identification system for subjects such as, for example, humans, plants, or animals.
  • a data structure may include a plurality of entries, for example, data records or table entries, where each entry identifies an individual.
  • Each entry may include the GNC generated for the individual as well as other.
  • the GNC or portions thereof may then be stored in an index data structure, for example, another table.
  • a portion of a GNC may be indexed so that each GNC may be further classified by a portion of its genomic pattern as opposed to only the entire genomic pattern.
  • the data structures may then be searched to identify an individual who has committed a crime. For example, if a biological sample from the individual (such as blood) is recovered from the crime scene, the GNC of the individual may generated by the methods described herein, and a database of records including GNCs searched until a match is found.
  • the GNCs may be used to classify individuals within a group such as soldiers in the armed forces, cattle in a herd, or produce within a specific crop.
  • the armed forces may generate a database containing the GNC of each soldier, and the database could be used to identify the soldier if necessary.
  • a database could be generated where records and indexes of the database include the GNCs of individual animals within a herd of cattle, so that lost or stolen animals could later be identified and returned to the proper owner.
  • the code may optionally be converted into a bar code or other human- or machine-readable form.
  • each line of a bar code may indicate the presence of specific polymorphisms or groups of specific polymorphisms for a particular subject.
  • Taxonomic identification is useful for determining the presence and identity of a pathogenic organism such as a virus, bacteria, protozoa, or multicellular parasites in a tissue sample.
  • bacteria and other pathogenic organisms are identified based on morphology, determination of nutritional requirements or fermentation patterns, determination of antibiotic resistance, comparison of isoenzyme patterns, or determination of sensitivity to bacteriophage strains. These types of methods generally require approximately 48 to 72 hours to identify the pathogenic organism. More recently, methods for identifying pathogenic organisms have been focused on genotype analysis, for instance, using RFLPs. RFLP analysis has been performed using hybridization methods (such as southern blots) and PCR assays.
  • FIG. 5 shows a computer system 100 for storing and manipulating genomic information.
  • the computer system 100 includes a genomic database 102 which includes a plurality of records 104 a - n storing information corresponding to a plurality of genomes.
  • Each of the records 104 a - n may store genetic information about each genome or an RCG generated therefrom.
  • the genomes for which information is stored in the genomic database 102 may be any kind of genomes from any type of subject.
  • the genomes may represent distinct genomes of individual members of a species, particular classes of the individuals, ie., army, prisoners, etc.
  • FIG. 6A An example of the format of a record 200 in the genomic database 102 (i.e., one of the records 104 a - n ) is shown in FIG. 6A.
  • the record 200 includes a genome identifier (Genome ID) 202 that identifies the genome corresponding to the record 200 . If enough polymorphisms of the genome were analyzed to generate the spectral pattern (such that the possibility that the GNC uniquely identifies the genome is high), or if a group to which the genome belongs has few enough members, than the GNC of the genome could serve as the Genome ID 202 .
  • the record 202 also may include genomic information fields 204 a - n .
  • the genomic information may be any information associated with the genome identified by the Genome ID 202 such as, for example, a GNC, a portion of a GNC, the presence or absence of a particular SNP, a genetic attribute (genotype), a physical attribute (phenotype), a name, a taxonomic identifier, a classification of the genome, a description of the individual from which the genome was taken, a disease of the individual, a mutation, a color, etc.
  • Each information field 204 a - n may be used as an entry in an index data structure that has a structure similar to record 200 .
  • each entry of the index data structure may include an indexed information field as a first data element, and one or more Genome IDs 202 as additional elements, such that all elements that share a common attribute are stored in a common data structure.
  • the format of the record 200 shown in FIG. 6A is merely an example of a format that may be used to represent genomes in the genomic database 102 .
  • the amount of information stored for each record 200 , the number of records 200 , and the number of fields indexed may vary.
  • each information field 204 a - n may include one or more fields itself, and each of these fields themselves may include more fields, etc.
  • FIG. 6B an embodiment of the information field 204 a is shown.
  • the information field 204 a includes a plurality of fields 206 a - m for storing more information about the information represented by information field 204 a .
  • the following description refers to the fields 206 a - m of the gene ID 204 a , such description is equally applicable to information fields 204 b -n.
  • each of the fields 206 a - m may represent a portion of the GNC, a particular SNP of the genomic pattern from which the GNC was generated, a group of such SNPs, a description of the GNC, a description of a one of the SNPs, etc.
  • the fields 206 a - m of the gene ID 204 a may store any kind of value that is capable of being stored in a computer readable medium such as, for example, a binary value, a hexadecimal value, an integral decimal value, or a floating point value.
  • a user may perform a query on the genomic database 102 to search for genomic information of interest, for example, all genomes having a GNC that matches the GNC of a murder suspect.
  • genomic information of interest for example, all genomes having a GNC that matches the GNC of a murder suspect.
  • a biological sample contains a particular sequence. That sequence can be compared with sequences in the database to identify information such as which individual the sample was isolated from, or whether the genetic sequence corresponds to a particular phenotypic trait.
  • the user may search the genomic database 102 for genetic matches to identify an individual, genotypes which correlate with a particular phenotype, genotypes associated with various classes of individuals etc.
  • a user may provide user input 106 indicating genomic information for which to search to a query user interface 108 .
  • the user input 106 may, for example, indicate an SNP for which to search using a standard character-based notation.
  • the query user interface 108 may, for example, provide a graphical user interface (GUI) which allows the user to select from a list of types of accessible genomic information using an input device such as a keyboard or a mouse.
  • GUI graphical user interface
  • the query user interface 108 generates a search query 110 based on the user input 106 .
  • a search engine 112 receives the search query 110 and generates a mask 114 based on the search query.
  • Example formats of the mask 114 and ways in which the mask 114 may be used to determine whether the genomic information specified by the mask 114 matches genomic information of genomes in the genomic database 102 are described in more detail below with respect to FIG. 7.
  • the search engine 112 determines whether the genomic information specified by the mask 114 matches genomic information of genomes stored in the genomic database 102 .
  • the search engine 112 generates search results 116 indicating whether the genomic database 102 includes genomes having the genomic information specified by the mask 114 .
  • the search results 116 may also indicate which genomes in the genomic database 102 have the genomic information specified by the mask 114 .
  • the search results 116 may indicate which genomes in the genomic database 102 include the specified sequence, GNC, or SNP. If the user input 106 specified particular genetic information concerning a genome (e.g., enough to identify an individual), the search results 116 may indicate which individual genome listed in the genomic database 102 matches the particular information, thus identifying the individual from whom the sample was taken. Similarly, if the user input 106 specified genetic sequences which are not adequate to specifically identify the individual, the search results 116 may still be adequate to identify a class of individuals that have genomes in the genomic database 102 that match the genetic sequence. For example, the search results may indicate that the genomic information of genomes of all Caucasian males matches the specified genetic sequence.
  • FIG. 7 illustrates a process 300 that may be used by the search engine 112 to generate the search results 116 .
  • the search engine 112 receives the search query 110 from the query user interface 108 (step 302 ).
  • the search engine 112 generates the mask 114 generated based on the search query 110 (step 304 ).
  • the search engine 112 performs a binary operation on one or more of the records 104 a -n in the genomic database 102 using the mask 114 (step 306 ).
  • the search engine 112 generates the search results 116 based on the results of the binary operation performed in step 306 (step 308 ).
  • a computer system for implementing the system 100 of FIG. 5 as a computer program typically includes a main unit connected to both an output device which displays information to a user and an input device which receives input from a user.
  • the main unit generally includes a processor connected to a memory system via an interconnection mechanism.
  • the input device and output device also are connected to the processor and memory system via the interconnection mechanism.
  • One or more output devices may be connected to the computer system.
  • Example output devices include a cathode ray tube (CRT) display, liquid crystal displays (LCD), printers, communication devices such as a modem, and audio output.
  • One or more input devices may be connected to the computer system.
  • Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet communication device, and data input devices such as sensors. The invention is not limited to the particular input or output devices used in combination with the computer system or to those described herein.
  • the computer system may be a general purpose computer system which is programmable using a computer programming language, such as for example, C++, Java, or other language, such as a scripting language or assembly language.
  • the computer system may also include specially programmed, special purpose hardware such as, for example, an application-specific integrated circuit (ASIC).
  • ASIC application-specific integrated circuit
  • the processor is typically a commercially available processor, of which the series x86, Celeron, and Pentium processors, available from Intel, and similar devices from AMD and Cyrix, the 680 ⁇ 0 series microprocessors available from Motorola, the PowerPC microprocessor from IBM and the Alpha-series processors from Digital Equipment Corporation, are examples. Many other processors are available.
  • Such a microprocessor executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services.
  • the processor and operating system define a computer platform for which application programs in high-level programming languages are written.
  • a memory system typically includes a computer readable and writeable nonvolatile recording medium, of which a magnetic disk, a flash memory, and tape are examples.
  • the disk may be removable such as, for example, a floppy disk or a read/write CD, or permanent, known as a hard drive.
  • a disk has a number of tracks in which signals are stored, typically in binary form, i.e., a form interpreted as a sequence of one and zeros. Such signals may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program.
  • the processor causes data to be read from the nonvolatile recording medium into an integrated circuit memory element, which is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM).
  • DRAM dynamic random access memory
  • SRAM static memory
  • the integrated circuit memory element allows for faster access to the information by the processor than does the disk.
  • the processor generally manipulates the data within the integrated circuit memory and then copies the data to the disk after processing is completed.
  • a variety of mechanisms are known for managing data movement between the disk and the integrated circuit memory element, and the invention is not limited to any particular mechanism. It should also be understood that the invention is not limited to a particular memory system.
  • the invention is not limited to a particular computer platform, particular processor, or particular high-level programming language. Additionally, the computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network. It should be understood that each module (e.g. 108 , 112 ) in FIG. 5 may be a separate module of a computer program, or may be a separate computer program. Such modules may be operable on separate computers. Data (e.g. 102 , 106 , 110 , 114 , and 116 ) may be stored in a memory system or transmitted between computer systems. The invention is not limited to any particular implementation using software, hardware, firmware, or any combination thereof.
  • the various elements of the system may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor.
  • Various steps of the process for example, steps 302 , 304 , 306 , and 308 of FIG. 7, may be performed by a computer processor executing a program tangibly embodied on a computer-readable medium to perform functions by operating on input and generating output.
  • Computer programming languages suitable for implementing such a system include procedural programming languages, object-oriented programming languages, and combinations of the two.
  • compositions are a plurality of RCGs immobilized on a surface, where the plurality of RCGs are prepared by DOP-PCR.
  • Another composition is a panel of SNP-ASOs immobilized on a surface, wherein the SNPs are identified by using RCGs as described above.
  • kits having a container housing a set of PCR primers for reducing the complexity of a genome and a container housing a set of SNP-ASOs, particularly wherein the SNPs are present with a frequency of at least 50 or 55% in a RCG made using the primer set.
  • the set of PCR primers are primers for DOP-PCR and preferably the DOP-PCR primer has the tag-(N) x -TARGET structure described herein, i.e., wherein the TARGET includes at least 7 arbitrarily selected nucleotide residues, wherein x is an integer from 3 to 9, and wherein each N is any nucleotide residue and wherein tag is a polynucleotide as described above.
  • the SNPs in the kit are attached to a surface such as a slide.
  • SNPs identified according to the methods of the invention using the B1 5′ rev primer include the following: locus ASO Allele Strain (SEQ ID#) 1 tttatg A agg C ataaaaa A 129/ 14 tttatg G agg C ataaaaa B BS-DBA 15 tttatg A agg T ataaaaa C Spre 16 2 ctgggctg T attcattt A 129-DBA 17 ctgggctg C attcattt B B6 18 tctGcctcc TG agtgct C B6-129-DBA 19 tct A cctcc CA agtgct D Spre 20 3 tagctaga A tcaagctt A BG 21 tagctaga G tcaagctt B DBA-Spre 22 4 gctgtgc AAC aatcac A
  • SNPs identified using the BJ1 DOP-PCR Primer include: SNPs present within DOP-PCR using primer BJ1 Genotype of CEPH individuals: ASO name ASO sequence 12-01 104-01 884-01 1331-01 SEQ ID # 3A-G CATCTATAGGTTCACT GT TT TT TT 580 3A-T CATCTATATGTTCACTT 581 5A-C GCCAACAACATTGAGA GG CG GG GG 582 5A-G GCCAACAAGATTGAGAG 583 7A-C GGGTCGTGCGTCCCCC TT CT TT TT 584 7A-T GGGTCGTGTCCCCCT 585 9A-A ATTGTCTCACATTTCT AA GG AA AA 586 9A-G ATTGTCTCGCATTTCTT 587 12A-C GGTGTGGTCGCAGPAG CC CC CT CT 588 12A-T GGTGTGGTTGCAGAAGG 589 15A-A TCATTGCC
  • the invention also encompasses a composition comprising a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment including a (N) x -TARGET polynucleotide structure as described above, i.e., wherein the TARGET portion is identical in all of the DNA fragments of each RCG, the portion includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein each N is any nucleotide residue.
  • the TARGET portion includes at least 8 nucleotides residues.
  • the invention includes a method for performing DOP-PCR.
  • the prior art DOP-PCR technique was originally developed to amplify the entire genome in cases where DNA was in short supply. This method is accomplished using a primer set wherein each primer has an arbitrarily selected six nucleotide residue portion, at its 3′ end.
  • the complexity of the resultant product is extremely high due to the short length and results in amplification of the genome.
  • By increasing the length of the arbitrarily selected of the DOP-PCR primer from 6 nucleotides to 7, and preferably 8, or more nucleotide residues the complexity of the genome is significantly reduced.
  • High allele frequency SNPs are estimated to occur in the human genome once every kilobase or less (Cooper et al., 1985).
  • a method for identifying these SNPs is illustrated in FIG. 1.
  • inter-Alu PCR was performed on genomes isolated from three unrelated individuals. The PCR products were cloned, and a mini library was made for each of the 3 individuals. The library clone inserts were PCR-amplified and spotted on nylon filters. Clones were matched by hybridization into two sets of identical clones from each individual, for a total of 6 clones per matched clone set. These sets of clones were sequenced, and the sequences were compared in order to identify SNPs.
  • This method of identifying SNPs has several advantages over the prior art PCR amplification methods. For instance, a higher quality sequence is obtained from cloned DNA than is obtained from cycle sequencing of PCR products. Additionally, every sequence represents a specific allele, rather than potentially representing a heterozygote. Finally, sequencing ambiguities, Taq polymerase errors, and other source of sequence error particular to one representation of the sequence are reduced by application of an algorithm which requires that the same variant sequence be present in at least 2 of the 6 clones sampled.
  • the Alu PCR method for identifying SNPs can be performed using genomic DNA obtained from independent individuals, unrelated or related. Briefly, Alu PCR is performed which yields a product having an estimated complexity of approximately 100 different single copy genomic DNA sequences and an average sequence length of between about 500 base pairs and 1 kilobase pairs. The PCR products are cloned, and a mini library is made for each individual. Approximately 800 clones are selected from each library and transferred into a 96-well dish. Filter replicas of each plate are hybridized with PCR probes from individual clones selected from one of the libraries in order to create a matched clone set of 6 clones, 2 from each individual. Many sets of clones can be isolated from these libraries. The clones can be sequenced and compared to identify SNPs.
  • An Alu primer designated primer 8C was designed to produce an Alu PCR product having a complexity of approximately 100 independent products.
  • Primer 8C (having the nucleotide sequence CTT GCA GTG AGC CGA GATC; SEQ ID NO: 3) is complementary with base pairs 218-237 of the Alu consensus sequence (Britten et al., 1994).
  • the last base pair of the primer was selected to correspond to base pair 237 of the consensus sequence, a nucleotide which has been shown to be highly variable among Alu sequences.
  • Primer 8C therefore produces a product having complexity lower than that produced using Alu primers which match a segment of the Alu sequence in which there is little variation in nucleotide sequence among Alu family members.
  • the filters were washed in 2 ⁇ standard saline citrate (SSC), 0.1% SDS at room temperature for 15 minutes, followed by 2 washes in 0.1 ⁇ SSC, 0.1% SDS at 65° C. for 45 minutes each. The filters were then exposed to Kodak X-OMAT X-ray film overnight.
  • SSC standard saline citrate
  • FIG. 2 shows the data obtained for identification of SNPs.
  • the results of the gel electrophoresis of inter-Alu PCR genomic DNA products prepared using the 8C primer is shown in FIG. 2A.
  • Mini libraries were prepared from the Alu PCR genomic DNA products. Colonies were picked from the libraries, and inserts were amplified. The inserts were separated by gel electrophoresis to demonstrate that each was a single insert. The gel is shown in FIG. 2B.
  • Inter-Alu PCR was performed using genomic DNA obtained from 136 members of 8 CEPH families (numbers 102, 884, 1331, 1332, 1347, 1362, 1413, and 1416) using the 8C Alu primer, as described above.
  • the products from these reactions were denatured by alkali treatment (10-fold addition of 0.5 M NaOH, 2.0 M NaCl, 25 mM EDTA) and dot blotted onto multiple HybondTM N+filters (Amersham) using a 96-well dot blot apparatus (Schleicher and Schull).
  • a set of two allele-specific oligonucleotides consisting of two 17-residue oligonucleotides centered on the polymorphic nucleotide residue were synthesized.
  • Each filter was hybridized with 1 picomole 32 P-kinase labeled allele-specific oligonucleotides and a 50-fold excess of non-labeled competitor oligonucleotide complementary to the opposite allele (Shuber et al., 1993). Hybridizations were carried out overnight at 52° C.
  • the results of the genotyping and mapping are shown in FIG. 3.
  • the genotype data determined from CEPH families number 884 and 1347 were compared to the CEPH genotype database version 8.1 (HTTP:www.cephb.fr/cephdb/) by calculating a 2 point lod score using the computer software program MultiMap version 2.0 running on a Sparc Ultra I computer.
  • This analysis revealed a linkage to marker D3S1292 with a lod score of 5.419 at a theta value of 0.0.
  • PCR amplification of the CCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel (Research Genetics).
  • FIG. 3C The DNA was not available for one individual in this pedigree, and that square is left blank. Mapping of CCRSNP1 was performed by two independent methods. First, genotype data from informative CEPH families numbers 884 and 1347 were compared to the CEPH genotype database version 8.1 by calculation of a 2 point lod score. Secondly, PCR amplification of the CCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel. The highest lod scores determined by these analyses were D3S1292 and D3S3445, respectively, as shown in FIG. 3D.
  • the percentage of SNPs detected using the above-described methods is dependent on the number of chromosomes sampled, as well as the allele frequency.
  • Allele-specific oligonucleotides are synthesized based on standard protocols (Shuber et al., 1997). Briefly, polynucleotides of 17 bases centering on the polymorphic site are synthesized for each allele of a SNP. Hybridization with DNA dots of IRS or DOP-PCR products affixed to a membrane were performed, followed by hybridization to end labeled allele-specific oligonucleotides under TMAC buffer conditions. These conditions are known to equalize the contribution of AT and GC base pairs to melting temperature, thereby providing a uniform temperature for hybridization of allele-specific oligonucleotides independent of nucleotide composition.
  • genotypes of CEPH progenitors and their offspring are determined.
  • the Mendelian segregation of each SNP marker confirms its identity as a SNP marker and accrued estimate of its relative allele frequency, hence, its likely usefulness as a genetic marker.
  • Markers which yield complex segregation patterns or show very low allele frequencies on CEPH progenitors are set aside for future analysis, and remaining markers are further characterized.
  • Allele frequencies are determined by hybridization with the standard worldwide panel which U.S. NIH currently is making available to researchers for standardization of allele frequency comparison. Allele-specific oligonucleotide methodology used for genetic mapping is used to determine allele frequency.
  • Genomic DNA of a well-characterized set of subjects is PCR-amplified using appropriate primers. These DNA samples serve as the substrate for system development.
  • the DNA is spotted onto multiple glass slides for genotyping. This process can be carried out using a microarray spotting apparatus which can spot greater than 1,000 samples within a square centimeter area or more than 10,000 samples on a typical microscope slide. Each slide is hybridized with a fluorescently tagged allele-specific oligonucleotide under TMAC conditions analogous to those described above.
  • the genotype of each individual is determined by the presence or absence of a signal for a selected set of allele-specific oligonucleotides. A schematic of the method is shown in FIG. 4.
  • PCR products are attached to the slide using any methods for attaching DNA to a surface that are known in the art.
  • PCR products may be spotted onto poly-L-lysine-coated glass slides, and crosslinked by UV irradiation prior to hybridization.
  • a second, more preferred method which has been developed according to the invention, involves use of oligonucleotides having a 5′ amino group for each of the PCR reactions described above.
  • the PCR products are spotted onto silane-coated slides in the presence of NaOH to covalently attach the products to the slide. This method is advantageous because a covalent bond is formed, which produces a stable attachment to the surface.
  • SNP-ASO are hybridized under TMAC hybridization conditions with the RCGs covalently conjugated to the surface.
  • the allele-specific oligonucleotides are labeled at their 5′-ends with a fluorescent dye, (e.g., Cy3).
  • a fluorescent dye e.g., Cy3
  • detection of the fluorescent oligonucleotides is performed in one of two ways. Fluorescent images can be captured using a fluorescence microscope equipped with a CCD camera and automated stage capabilities. Alternatively, the data can be obtained using a microarray scanner (e.g. one made by Genetic Microsystems). A microarray scanner provides image analysis which can be converted to a digital (e.g.
  • +/ ⁇ signal for each sample using any of several available software applications (e.g., NIH image, ScanAnalyze, etc.).
  • the high signal/noise ratio for this analysis allows for the determination of data in this mode to be straightforward and automated.
  • These data once exported, can be manipulated to conform with a format which can be analyzed by any of several human genetics applications such as CR1-MAP and LINKAGE software.
  • the methods may involve use of two or more fluorescent dyes or other labels which can be spectrally differentiated to reduce the number of samples which need to be analyzed. For instance, if four fluorescent spectrally distinct dyes, (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used, then four hybridization reactions can be performed in a single hybridization mixture.
  • fluorescent spectrally distinct dyes e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX
  • the initial step of the SNP identification method and the genotyping approach described above is to reduce the complexity of genomic DNA in a reproducible manner.
  • the purpose of this step with respect to genotying is to allow genotyping of multiple SNPs using the products of a single. PCR reaction.
  • a PCR primer was synthesized which bears homology to a repetitive sequence present within the genome of the species to be analyzed (e.g., Alu sequence in humans).
  • the inter-repeat sequence can be amplified.
  • the method has the advantage that the complexity of the resultant PCR can be controlled by how closely the nucleotide sequence primer chosen is to the consensus nucleotide sequence of the repeat element (that is, the closer to the repeat consensus, the more complex the PCR product).
  • a 50 microliter reaction for each sample was set up as follows: distilled, deionized H 2 O (ddH 2 O) 30.75 10X PCR Buffer 5 ⁇ l (500 mM KCl, 100 mM Tris-HCl pH 8.3, 15 mM MgCl 2 ⁇ M, 0.1% gelatin) 1.25 mM dNTPs 7.5 ⁇ l 20 ⁇ m Primer 8C 1.5 ⁇ l Taq polymerase (1.25 units) 0.25 ⁇ l Template (50 ng genomic DNA in ddH 2 O) 5.0 ⁇ l 50 ul total
  • the PCR reaction was performed, for example, in a Perkin Elmer 9600 thermal cycler under the following conditions: 1 min. 94° C. 30 sec. 94° C.
  • RCGs were also performed using DOP-PCR with the following primer (CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4) (wherein N is any nucleotide).
  • DOP-PCR uses a single primer which is typically composed of 3 parts, herein designated tag-(N) x -TARGET.
  • the TARGET portion is a polynucleotide which comprises at least 7, and preferably at least 8, arbitrarily-selected nucleotide residues, x is an integer from 0 to 9, and N is any nucleotide residue.
  • Tag is a polynucleotide as described above.
  • the initial rounds of DOP-PCR were performed at a low temperature, because the specificity of the reaction is determined primarily by the nucleotide sequence of the TARGET portion and the N, residues. A slow ramp time during these cycles insures that the primers do not detach from the template prior to chain extension. Subsequent amplification rounds were carried out at a higher annealing temperature because of the fact that the 5′ end of the DOP-PCR primer can also contribute to primer annealing.
  • the DOP-PCR method was performed using a reaction mixture comprising the following ingredients: distilled deionized H 2 O 24 ⁇ l 10X PCR Buffer 5 ⁇ l 1.25 mM dNTPs 8 ⁇ l 20 ⁇ M Primer DOP-BJ1 (SEQ ID No. 4) 7.5 ⁇ l Taq polymerase 0.5 ⁇ l (1.25 units) Template 5 ⁇ l (50 ng genomic DNA in distilled deionized H 2 O) 50 ⁇ l
  • the PCR reaction was performed, for example, in a Perkin Elmer 9600 thermal cycler using the following reaction conditions: 1 min. 94° C. 1 min. 94° C.
  • Another method for attaching nucleic acids to a support involves the use of microarrays. This method attaches minute quantities of PCR products samples onto a glass slide. The number of samples that can be spotted is greater than 1000/cm 2 , and therefore over 10,000 samples can be analyzed simultaneously on a glass slide. To accomplish this, pre-cleaned glass slides were placed in a mixture of 80 ml dry xylene, 32 ml 96% 3-glycidoxy-propyltrimethoxy silane, and 160 ⁇ l 99% N-ethyldiisopropylamin at 80° C. overnight. The slides were rinsed for 5 minutes in ethylacetate and dried at 80° C. for 30 minutes.
  • SNP-ASOs SNP allele-specific oligos
  • Hybridization of SNP-ASOs (2 for each locus) to with IRS-PCR or DOP-PCR products of several individuals has been performed.
  • the final step in this process is to determine if a positive or negative signal exists for each hybridization for an individual and then, based on this information, determine the genotype for that particular locus.
  • all of the detection methods described herein can be reduced to a digital image file, for example using a microarray reader or using a phosphoimager.
  • Genomic DNA isolated from approximately 40 individuals was subjected to DOP-PCR using primer BJ1 (CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4).
  • 100 microliter of the DOP-PCR mixture was precipitated by addition of 10 microliters 3M sodium acetate (pH 5.2) and 110 microliters of isopropanol and were stored at ⁇ 20° C. for at least 1 hour.
  • the samples were spun down in a microcentrifuge for 30 minutes and the supernatant was removed.
  • the pellets were rinsed with 70% ethanol and spun again for 30 minutes. The supernatant was removed and the pellets were air-dried overnight at room temperature.
  • the pellets were then resuspended in 12 microliters of distilled water and stored at ⁇ 20° C. until denatured by the addition of 3 microliter of 2N NaOH/50 mM EDTA and maintained at 37° C. for 20 minutes and then at room temperature for 15 minutes.
  • the samples were then spotted onto nylon coated-glass slides using a Genetic Microsystems GMS417 microarrayer. Upon completion of the spotting, the slides were placed in an 80° C. vacuum oven for 2 hours, and then stored at room temperature.
  • a set of 2 allele specific SNP-ASOs consisting of two 17mers centered around a polymorphic nucleotide residue were synthesized.
  • Hyb Buffer 3M TMAC/0.5% SDS/1 mM EDTA/10 mM NaPO 4 /5 ⁇ Denhardt's solution/40 ⁇ g/ml yeast RNA
  • Hybridizations were carried out overnight at 52° C.
  • the slides were washed twice for 30 minutes at room temperature in TMAC Wash Buffer (3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO 4 pH 6.8) followed by 20 minutes at 54° C.
  • TMAC Wash Buffer 3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO 4 pH 6.8
  • the slides were exposed to Kodak BioMax MR X-ray film. The results are shown in FIG. 8.
  • the genotypes were determined by the hybridization patterns shown in FIG. 8 wherein loci are indicated.

Abstract

The invention encompasses methods and products related to genotyping. The method of genotyping of the invention is based on the use of single nucleotide polymorphisms (SNPs) to perform high throughput genome scans. The high throughput method can be performed by hybridizing SNP allele-specific oligonucleotides and a reduced complexity genome (RCG). The invention also relates to methods of preparing the SNP specific oligonucleotides and RCGs, methods of fingerprinting, determining allele frequency for a SNP, characterizing tumors, generating a genomic classification code for a genome, identifying previously unknown SNPs, and related compositions and kits.

Description

    RELATED APPLICATIONS
  • This application claims priority to U.S. Provisional Application No. 60/101,757, filed Sep. 25, 1998, the entire contents of which is hereby incorporated by reference.[0001]
  • GOVERNMENT SUPPORT
  • [0002] The present invention was supported in part by a grant from the United States National Institutes of Health under contract/grant number 5-R01-HG00299-18; the National Cancer Institute of Canada under contract/grant # 009645;007477; National Research Foundation DHHS, NIH, NCI, 5 F32 CA73118-03 and NIH Predoctoring Grant T32 GM07287. The U.S. Government may retain certain rights in the invention.
  • FIELD OF THE INVENTION
  • The present invention relates to methods and products associated with genotyping. In particular, the invention relates to methods of detecting single nucleotide polymorphisms and reduced complexity genomes for use in genotyping methods as well as to various methods of genotyping, fingerprinting, and genomic analysis. The invention also relates to products and kits, such as panels of single nucleotide polymorphism allele specific oligonucleotides, reduced complexity genomes, and databases for use in the methods of the invention. [0003]
  • BACKGROUND OF THE INVENTION
  • Genomic DNA varies significantly from individual to individual, except in identical siblings. Many human diseases arise from genomic variations. The genetic diversity amongst humans and other life forms explains the heritable variations observed in disease susceptibility. Diseases arising from such genetic variations include Huntington's disease, cystic fibrosis, Duchenne muscular dystrophy, and certain forms of breast cancer. Each of these diseases is associated with a single gene mutation. Diseases such as multiple sclerosis, diabetes, Parkinson's, Alzheimer's disease, and hypertension are much more complex. These diseases may be due to polygenic (multiple gene influences) or multifactorial (multiple gene and environmental influences) causes. Many of the variations in the genome do not result in a disease trait. However, as described above, a single mutation can result in a disease trait. [0004]
  • The ability to scan the human genome to identify the location of genes which underlie or are associated with the pathology of such diseases is an enormously powerful tool in medicine and human biology. Several types of sequence variations, including insertions and deletions, differences in the number of repeated sequences, and single base pair differences result in genomic diversity. [0005]
  • Single base pair differences, referred to as single nucleotide polymorphisms (SNPS) are the most frequent type of variation in the human genome (occurring at approximately 1 in 10[0006] 2 bases). A SNP is a genomic position at which at least two or more alternative nucleotide alleles occur at a relatively high frequency (greater than 1%) in a population. SNPs are well-suited for studying sequence variation because they are relatively stable (i.e., exhibit low mutation rates) and because single nucleotide variations can be responsible for inherited traits.
  • Polymorphisms identified using microsatellite-based analysis, for example, have been used for a variety of purposes. Use of genetic linkage strategies to identify the locations of single Mendelian factors has been successful in many cases (Benomar et al. (1995), [0007] Nat. Genet., 10:84-8; Blanton et al. (1991), Genomics, 11:857-69). Identification of chromosomal locations of tumor suppressor genes has generally been accomplished by studying loss of heterozygosity in human tumors (Cavenee et al. (1983), Nature, 305:779-784; Collins et al. (1996), Proc. Natl. Acad. Sci. USA, 93:14771-14775; Koufos et al. (1984), Nature, 309:170-172; and Legius et al. (1993), Nat. Genet., 3:122-126). Additionally, use of genetic markers to infer the chromosomal locations of genes contributing to complex traits, such as type I diabetes (Davis et al. (1994), Nature, 371:130-136; Todd et al. (1995), Proc. Natl. Acad. Sci. USA, 92:8560-8565), has become a focus of research in human genetics.
  • Although substantial progress has been made in identifying the genetic basis of many human diseases, current methodologies used to develop this information are limited by prohibitive costs and the extensive amount of work required to obtain genotype information from large sample populations. These limitations make identification of complex gene mutations contributing to disorders such as diabetes extremely difficult. Techniques for scanning the human genome to identify the locations of genes involved in disease processes began in the early 1980s with the use of restriction fragment length polymorphism (RFLP) analysis (Botstein et al. (1980), [0008] Am. J. Hum. Genet., 32:314-31; Nakamura et al. (1987), Science, 235:1616-22). RFLP analysis involves southern blotting and other techniques. Southern blotting is both expensive and time-consuming when performed on large numbers of samples, such as those required to identify a complex genotype associated with a particular phenotype. Some of these problems were avoided with the development of polymerase chain reaction (PCR) based microsatellite marker analysis. Microsatellite markers are simple sequence length polymorphisms (SSLPs) consisting of di-, tri-, and tetra-nucleotide repeats.
  • Other types of genomic analysis are based on use of markers which hybridize with hypervariable regions of DNA having multiallelic variation and high heterozygosity. The variable regions which are useful for fingerprinting genomic DNA are tandem repeats of a short sequence referred to as a mini satellite. Polymorphism is due to allelic differences in the number of repeats, which can arise as a result of mitotic or meiotic unequal exchanges or by DNA slippage during replication. [0009]
  • The most commonly used method for genotyping involves Weber markers, which are abundant interspersed repetitive DNA sequences, generally of the form (dC-dA)[0010] n (dG-dT)n. Weber markers exhibit length polymorphisms and are therefore useful for identifying individuals in paternity and forensic testing, as well as for mapping genes involved in genetic diseases. In the Weber method of genotyping, generally 400 Weber or microsatellite markers are used to scan each genome using PCR. Using these methods, if 5,000 individual genomes are scanned, 2 million PCR reactions are performed (5,000 genomes×400 markers). The number of PCR reactions may be reduced by multiplexing, in which, for instance, four different sets of primer are reacted simultaneously in a single PCR, thus reducing the total number of PCRs for the example provided to 500,000. The 500,000 PCR mixtures are separated by polyacrylamide gel electrophoresis (PAGE). If the samples are run on a 96-lane gel, 5,200 gels must be run to analyze all 500,000 PCR reaction mixtures. PCR products can be identified by their position on the gels, and the differences in length of the products can be determined by analyzing the gels. One problem with this type of analysis is that “stuttering” tends to occur, causing a smeared result and making the data difficult to interpret and score.
  • More recent advances in genotyping are based on automated technologies utilizing DNA chips, such as the Affymetrix HuSNP Chiprm analysis system. The HuSNP Chip™ is a disposable array of DNA molecules on a chip (400,000 per half inch square slide). The single stranded DNA molecules bound to the slide are present in an ordered array of molecules having known sequences, some of which are complementary to one allele of a SNP-containing portion of a genome. If the same 5,000 individual genome study described above is performed using the Affymetrix HuSNP Chip™ analysis system, approximately 5,000 gene chips having 1,000 or more SNPs per chip would be required. Prior to the chip scan, the genomic DNA samples would be amplified by PCR in a similar manner to conventional microsatellite genotyping. The gene chip method is also expensive and time-intensive. [0011]
  • SUMMARY OF THE INVENTION
  • The present invention relates to methods and products for identifying points of genetic diversity in genomes of a broad spectrum of species. In particular, the invention relates to a high throughput method of genotyping of SNPs in a genome (e.g. a human genome) using reduced complexity genomes (RCGs) and, in some exemplary embodiments, using SNP allele specific oligonucleotides (SNP-ASO) and specific hybridization reactions performed, for example, on a surface. The method of genotyping, in some aspects of the invention, is accomplished by scanning a RCG for the presence or absence of a SNP allele. Using this method, tens of thousands of genomes from one species may be simultaneously assayed for the presence or absence of each allele of a SNP. The methods can be automated, and the results can be recorded using a microarray scanner or other detection/recordation devices. [0012]
  • The invention encompasses several improvements over prior art methods. For instance, a genome-wide scan of thousands of individuals can be carried out at a fraction of the cost and time required by many prior art genotyping methods. [0013]
  • The invention, in one aspect, is a method for detecting the presence of a SNP allele in a genomic sample. The method, in one aspect, includes preparing a RCG from a genomic sample and analyzing the RCG for the presence of the SNP allele. In some aspects, the analysis is performed using a hybridization reaction involving a SNP allele specific oligonucleotide (SNP-ASO) which is complementary to a given allele of the SNP and the RCG. If the allele of the SNP is present in the genomic sample, then the SNP-ASO hybridizes with the RCG. [0014]
  • In some aspects, the method is a method for determining a genotype of a genome, whereby the genotype is identified by the presence or absence of alleles of the SNP in the RCG. In other aspects, the method is a method for characterizing a tumor, wherein the RCG is isolated from a genome obtained from a tumor of a subject and wherein the tumor is characterized by the presence or absence of an allele of the SNP in the RCG. [0015]
  • In other aspects, the method is a method for determining allelic frequency for a SNP, and further comprises determining the number of arbitrarily selected genomes from a population which include each allele of the SNP in order to determine the allelic frequency of the SNP in the population. [0016]
  • In some embodiments, the hybridization reaction is performed on a surface and the RCG or the SNP-ASO is immobilized on the surface. In yet other embodiments, the SNP-ASO is hybridized with a plurality of RCGs in individual reactions. [0017]
  • In other aspects, the method includes performing a hybridization reaction involving a RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization with a plurality of RCGs from the plurality of genomes, and determining the genotype based on whether the SNP-ASO hybridizes with at least some of the RCGs. [0018]
  • The RCG may be a PCR-derived RCG or a native RCG. In some embodiments, the RCG is prepared by performing degenerate oligonucleotide priming-PCR (DOP-PCR) using a degenerate oligonucleotide primer having a tag-(N)[0019] x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotides and wherein x is an integer from 0 to 9, and wherein N is any nucleotide. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9). Preferably, the method of genotyping is performed to determine genotypes more than one locus. In other embodiments, the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(Na-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • The methods can be performed on a support. Preferably, the support is a solid support such as a glass slide, a membrane such as a nitrocellulose membrane, etc. [0020]
  • In yet other embodiments, the RCG is prepared by interspersed repeat sequence-PCR (IRS-PCR), arbitrarily primed-PCR (AP-PCR), adapter-PCR, or multiple primed DOP-PCR. [0021]
  • In a preferred embodiment, the methods are useful for determining a genotype associated with or linked to a specific phenotype, and the distinct isolated genomes or RCGs are associated with a common phenotype. [0022]
  • The SNP-ASO used according to the methods of the invention are polynucleotides including one allele of two possible nucleotides at the polymorphic site. In one embodiment, the SNP-ASO is composed of from about 10 to 50 nucleotides. In a preferred embodiment, the SNP-ASO is composed of from about 10 to 25 nucleotides. [0023]
  • According to one embodiment, the SNP-ASO is labeled. The methods can, optionally, also include addition of an excess of non-labeled SNP-ASO in which the polymorphic nucleotide residue corresponds to a different allele of the SNP and which is added during the hybridization step. Additionally, a parallel reaction may be performed wherein the labeling of the two SNP-ASOs is reversed. The label on the SNP-ASO in one embodiment is a radioactive isotope. In this embodiment, the labeled hybridized products on the surface may be exposed to an X-ray film to produce a signal on the film which corresponds to the radioactively labeled hybridization products. In another embodiment, the SNP-ASO is labeled with a fluorescent molecule. In this embodiment, the labeled hybridized products on the surface may be exposed to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products. [0024]
  • According to one embodiment, the RCG is labeled. The label on the RCG in one embodiment is a radioactive isotope. In this embodiment, the labeled hybridized products on the surface may be exposed to an X-ray film to produce a signal on the film which corresponds to the radioactively labeled hybridization products. In another embodiment, the RCG is labeled with a fluorescent molecule. In this embodiment, the labeled hybridized products on the surface may be exposed to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products. [0025]
  • In one embodiment, a plurality of different SNP-ASOs are attached to the surface. In another embodiment, the plurality includes at least 500 different SNP-ASOs. In yet another embodiment, the plurality includes at least 1000. [0026]
  • In another embodiment, a plurality of SNP-ASOs are labeled with fluorescent molecules, each SNP-ASO being labeled with a spectrally distinct fluorescent molecule. In various embodiments, the number of spectrally distinct fluorescent molecules is two, three, four, five, six, seven, or eight. [0027]
  • In yet another embodiment, the plurality of RCGs are labeled with fluorescent molecules, each RCG being labeled with a spectrally distinct fluorescent molecule. All of the RCGs having a spectrally distinct fluorescent molecule can be hybridized with a single support. In various embodiments the number of spectrally distinct fluorescent molecules is two, three, four, five, six, seven, or eight. [0028]
  • According to other aspects, the invention encompasses methods for characterizing a tumor by assessing the loss of heterozygosity, determining allelic frequency for a SNP, generating a genomic pattern for an individual genome, and generating a genomic classification code for a genome. [0029]
  • In one aspect, the method for characterizing a tumor includes isolating genomic DNA from tumor samples obtained from a plurality of subjects, preparing a plurality of RCGs from the genomic DNA, performing a hybridization reaction involving a SNP-ASO and the plurality of RCGs (e.g. immobilized on a surface), and identifying the presence of a SNP allele in the genomic DNA based on whether the SNP-ASO hybridizes with at least some of the RCGs in order to characterize the tumor. One or more of the RCGs or one or more of the SNP-ASOs can be immobilized on a surface. [0030]
  • In another aspect, the invention is a method generating a genomic pattern for an individual genome. The method, in one aspect, includes preparing a plurality of RCGs, analyzing the RCGs for the presence of one or more SNP alleles, and identifying a genomic pattern of SNPs for each RCG by determining the presence or absence therein of SNP alleles. In some embodiments, the analysis involves performing a hybridization reaction involving a panel of SNP-ASOs (e.g. ones which are each complementary to one allele of a SNP), and the plurality of RCGs. The genomic pattern can be identified by determining the presence or absence of a SNP allele for each RCG by detecting whether the SNP-ASOs hybridize with the RCGs. In one embodiment, a plurality of SNP-ASOs are hybridized with the support, and each SNP-ASO of the panel is hybridized with a different support than the other SNP-ASO. [0031]
  • In some embodiments, the genomic pattern is a genomic classification code which is generated from the pattern of SNP alleles for each RCG. In other embodiments, the genomic classification code is also generated from the allelic frequency of the SNPs. In yet other embodiments, the genomic pattern is a visual pattern. The genomic pattern may be in physical or electronic form. [0032]
  • In another aspect, the invention includes is a method for generating a genomic pattern for an individual genome. The method includes identifying a genomic pattern of SNP alleles for each RCG by determining the presence or absence therein of selected SNP alleles. [0033]
  • A method for generating a genomic classification code for a genome is provided in another aspect of the invention. The method includes preparing a RCG, analyzing the RCG for the presence of one or more SNP alleles (e.g. ones of known allelic frequency), identifying a genomic pattern of SNP alleles for the RCG by determining the presence or absence therein of SNP alleles, and generating a genomic classification code for the RCG based on the presence or absence (and, optionally, the allelic frequency) of the SNP alleles. In some embodiments, the analysis involves performing a hybridization reaction involving the RCG and a panel of SNP-ASOs (e.g. corresponding to SNP alleles of known allelic frequency), each of which is complementary to one allele of a SNP. The genomic pattern is identified based on whether each SNP-ASO hybridizes with the RCG. [0034]
  • The method for determining allelic frequency for a SNP, in another aspect, includes preparing a plurality of RCGs from distinct isolated genomes, performing a hybridization reaction involving one RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization with each of the plurality of RCGs, and determining the number of RCGs which include each allele of the SNP in order to determine the allelic frequency of the SNP. In other embodiments the RCGs are immobilized on the surface. [0035]
  • In another aspect, the method for generating a genomic pattern for an individual genome includes preparing a plurality of RCGs, performing a hybridization reaction involving a RCG and a surface having a SNP-ASO immobilized thereon, repeating the hybridization step with each of the plurality of RCGs, and identifying a genomic pattern of SNPs for each RCG by determining the presence therein of SNPs based on whether each SNP-ASO hybridizes with each RCG. [0036]
  • The method for generating a genomic classification code for a genome, in another aspect, includes preparing a RCG, performing a hybridization reaction involving the RCG and a panel of SNP-ASOs (e.g. immobilized on a surface), identifying a genomic pattern of SNPs for the RCG by determining the presence therein of SNPs based on whether each SNP-ASO hybridizes with the RCG, and generating a genomic classification code for the RCG based on the identities of the SNPs which hybridize with the RCG, the identities of the SNPs which do not hybridize with the RCG, and, optionally, also based on the allelic frequency of the SNPs. In one embodiment, each SNP-ASO of the panel is immobilized on a separate surface. In another embodiment, more than one SNP-ASO of the panel is being immobilized on the same surface, each SNP-ASO being immobilized on a distinct area of the surface. [0037]
  • In an embodiment, the genomic classification code is encoded as one or more computer-readable signals on a computer-readable medium In other aspects of the invention, compositions are provided. According to one aspect, the composition is a plurality of RCGs immobilized on a surface, wherein the RCGs are prepared by a method including the step of performing DOP-PCR using a DOP primer having a tag-(N)[0038] n— TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8 or 9).
  • According to another aspect, the composition is a panel of SNP-ASOs immobilized on a surface, wherein the SNPs are identified by a method including preparing a set of primers from a RCG, performing PCR using the set of primers on a plurality of isolated genomes to yield DNA products, isolating and, optionally, sequencing the DNA products, and identifying a SNP based on the sequences of the PCR products. In one embodiment, the plurality of isolated genomes includes at least four isolated genomes. [0039]
  • According to another aspect of the invention, a kit is provided. The kit includes a container housing a set of PCR primers for reducing the complexity of a genome, and a container housing a set of SNP-ASOs. The SNPs which correspond to the SNP-ASOs of the kit are preferably present within a RCG made using the PCR primers of the kit with a frequency of at least 50%. [0040]
  • In one embodiment, the set of PCR primers are primers for DOP-PCR. Preferably, the degenerate oligonucleotide primer has a tag-(N)[0041] x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 nucleotide residues wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g., 6, 7, 8 or 9).
  • In yet other embodiments, the RCG is prepared by IRS-PCR, AP-PCR, or adapter-PCR. [0042]
  • The SNP-ASOs of the invention are polynucleotides including one of the alternative nucleotides at a polymorphic nucleotide residue of a SNP. In one embodiment, the SNP-ASO is composed of from about 10 to 50 nucleotide residues. In a preferred embodiment the SNP-ASO is composed of from about 10 to 25 nucleotide residues. In another embodiment, the SNP-ASOs are labeled with a fluorescent molecule. [0043]
  • According to yet another aspect of the invention, a composition is provided. The composition includes a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment including a tag (N)[0044] x-TARGET nucleotide, wherein the TARGET nucleotide sequence is identical in all of the DNA fragments of each RCG, wherein the TARGET nucleotidesequence includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).
  • In one aspect, the invention is a method for identifying a SNP. The method includes preparing a set of primers from a RCG, wherein the RCG is composed of a first set of PCR products, PCR-amplifying a plurality of isolated genomes using the set of primers to yield a second set of PCR products, isolating, and optionally, sequencing the PCR products, and identifying a SNP based on the sequences of one or both sets of PCR products. In one embodiment, the plurality of isolated genomes is a pool of genomes. Preferably, the isolated genomes are RCGs. RCGs can be prepared in a variety of ways, but it is preferred, in some aspects, that the RCG is prepared by DOP-PCR. [0045]
  • In one embodiment, the method of preparing the set of primers is performed by at least: preparing a RCG, separating the first set of PCR products into individual PCR products, determining the nucleotide sequence of each end of at least one of the PCR products, and generating primers for use in the subsequent PCR step based on the sequence of the ends of the PCR product(s). [0046]
  • The set of PCR products may be separated by any means known in the art for separating polynucleotides. In a preferred embodiment, the set of PCR products is separated by gel electrophoresis. Preferably, one or more libraries are prepared from segments of the gel containing several PCR products and clones are isolated from the library, each clone including a PCR product from the library. In other embodiments, the set of PCR products is separated by high pressure liquid chromatography or column chromatography. [0047]
  • The RCG used to generate primers or PCR products for identifying SNPs can be prepared by PCR methods. Preferably, the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)[0048] x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments, the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3-9 (e.g. 6, 7, 8, or 9). In other embodiments, the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
  • In yet other embodiments, the RCG is prepared by IRS-PCR, AP-PCR, or adapter-PCR. [0049]
  • In a preferred embodiment of the invention, the set of primers is composed of a plurality of polynucleotides, each polynucleotide including a tag (N)[0050] x-TARGET nucleotide sequence, wherein TARGET is the same sequence in each polynucleotide in the set of primers. The sequence of (N)x is different in each primer within a set of primers. In some embodiments, the set of primers includes at least 43, 44, 45, 46, 47, 48, or 49 different primers in the set.
  • In another aspect, the invention is a method for generating a RCG using DOP-PCR. The method includes the step of performing degenerate DOP-PCR using a degenerate oligonucleotide primer having an (N)[0051] n— TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue. In various embodiments the TARGET nucleotide sequence includes 8, 9, 10, 11, or 12 nucleotide residues. In other embodiments, x is an integer from 3 to 9 (e.g. 6, 7, 8, or 9).
  • According to one embodiment, the tag includes 6 nucleotide residues. Preferably the RCG is used in a genotyping procedure. In other embodiments, the RCG is analyzed to detect a polymorphism. The analysis step may be performed using mass spectroscopy. [0052]
  • In another aspect the invention is a method for assessing whether a subject is at risk for developing a disease. The method includes the steps of using the methods of the invention identify a plurality of SNPs that occur in at least, for example 10% of genomes obtained from individuals afflicted with the disease and determining whether one or more of those SNPs occurs in the subject. In the method the affected individuals are compared with the unaffected individuals. Important information can be generated from the observation that there is a difference between affected and unaffected individuals alone. [0053]
  • In other aspects the invention is a method for identifying a set of one or more SNPs associated with a disease or disease risk. The method includes the steps of preparing individual RCGs obtained from subjects afflicted with a disease, using the same set of primers to prepare each RCG, and comparing the SNP allele frequency identified in those RCGs with the same genetic SNP allele frequency in normal (i.e., non-afflicted) subjects to identify SNP associated with the disease. In other aspects the invention is a method for identifying a set of SNPs randomly distributed throughout the genome. The set of SNPs is used as a panel of genetic markers to perform a genome-wide scan for linkage analysis. [0054]
  • In an embodiment, a computer-readable medium having computer-readable signals stored thereon is provided. The signals define a data structure that one or more data components. Each data component includes a first data element defining a genomic classification code that identifies a corresponding genome. Each genomic classification code classifies the corresponding genome based one or more single nucleotide polymorphisms of the corresponding genome. [0055]
  • In an optional aspect of this embodiment, the genomic classification code is a unique identifier of the corresponding genome. [0056]
  • In an optional aspect of this embodiment, the genomic classification code is based on a pattern of the single nucleotide polymorphisms of the corresponding genome, where the pattern indicates the presence or absence of each single nucleotide polymorphism. [0057]
  • In another optional aspect of this embodiment, each data component also includes one or more data elements, each data element defining an attributes of the corresponding genome. [0058]
  • Each of the embodiments of the invention can encompass various recitations made herein. It is, therefore, anticipated that each of the recitations of the invention involving any one element or combinations of elements can, optionally, be included in each aspect of the invention.[0059]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a schematic flow chart depicting a method according to the invention for identifying SNPs. [0060]
  • FIG. 2 shows data depicting the process of identifying a SNP: (a) depicts a gel in which inter-Alu PCR genomic DNA products prepared from the 8C primer (which has the nucleotide sequence SEQ ID NO:3) were separated; (b) depicts a gel in which inserts from the library clones were separated; and (c) depicts a filter having two positive or matched clones. [0061]
  • FIG. 3 depicts the results of a genotyping and mapping experiment: (a) depicts hybridization results obtained using G allele ASO; (b) depicts hybridization results obtained using A allele ASO; (c) is a pedigree of CEPH family #884 with genotypes indicted from (a) and (b); and (d) is a map of chromosome 31q21-23. [0062]
  • FIG. 4 is a schematic flow chart depicting a method according to the invention for detecting SNPs. [0063]
  • FIG. 5 is a block diagram of a computer system for storing and manipulating genomic information. [0064]
  • FIG. 6A is an example of a record for storing information about a genome and/or genes or SNPs within the genome. [0065]
  • FIG. 6B is an example of a record for storing genomic information. [0066]
  • FIG. 6C is an example of a record for storing information about genes or SNPs within a genome. [0067]
  • FIG. 7 is a flow chart of a method for determining whether genomic information of a sample genome such as SNPs match that of another genome. [0068]
  • FIG. 8 depicts results obtained from a hybridization reaction involving RCGs prepared by DOP-PCR and SNP-ASOs immobilized on a surface in a microarray format.[0069]
  • BRIEF DESCRIPTION OF THE SEQUENCES
  • [0070]
    SEQ. ID. NO. 1 is CAGNNNCTG
    SEQ. ID. NO. 2 is TTTTTTTTTTCAG
    SEQ. ID. NO. 3 is CTT GCA GTG AGC CGA GATC
    SEQ. ID. NO. 4 is CTCGAGNNNNNNAAGCGATG
    SEQ ID NO. 5-697 are nucleotide sequences
    containing SNPs.
  • DETAILED DESCRIPTION OF THE INVENTION
  • The invention relates in some aspects to genotyping methods involving detection of one or more single nucleotide polymorphisms (SNPs) in a reduced complexity genome (RCG) prepared from the genome of a subject. The invention includes methods of identifying SNPs associated with a disease or with pre-disposition to a disease. The invention further includes methods of screening RCGs prepared from one or more subjects in a population. Such screening can be used, for example, to determine whether the subject is afflicted with, or is likely to become afflicted with, a disorder, to determine allelic frequencies in the population, or to determine degrees of interrelation among subjects in the population. Additional aspects and details of the compositions, kits, and methods of the invention are described in the following sections. [0071]
  • The invention involves several discoveries which have led to new advances in the field of genotyping. The invention is based on the development of high throughput methods for analyzing genomic diversity. The methods combine use of SNPs, methods for reducing the complexity of genomes, and high throughput screening methods. As discussed in the background of the invention, many prior art methods for genotyping are based on use of hypervariable markers such as Weber markers, which predominantly detect differences in numbers of repeats. Use of a high throughput SNP analysis method is advantageous in view of the Weber marker system for several reasons. For instance, the results of a Weber analysis system are displayed in the form of a gel, which is difficult to read and must be scored by a professional. The high throughput SNP analysis method of the invention provides a binary result which indicates the presence or absence of the SNP in the sample genome. Additionally, the method of the invention requires significantly less work and is considerably less expensive to perform. As described in the background of the invention, the Weber system requires the performance of 500,000 PCR reactions and use of 5,200 gels to analyze 5,000 genomes. The same study performed using the methods of the invention could be performed without using gels. Additionally, SNPs are not species-specific and therefore the methods of the invention can be performed on diverse species and are not limited to humans. It is more tedious to perform inter-species analysis using Weber markers than using the methods of the invention. [0072]
  • Some prior art methods do use SNPs for genotyping but the high throughput method of the invention has advantages over these methods as well. Affymetrix utilizes a HuSNP Chip™ system having an ordered array of SNPs immobilized on a surface for analyzing nucleic acids. This system is, however, prohibitively expensive for performing large studies such as the 5,000 genome study described above. [0073]
  • The invention is useful for identifying polymorphisms within a genome. Another use for the invention involves identification of polymorphisms associated with a plurality of distinct genomes. The distinct genomes may be isolated from populations which are related by some phenotypic characteristic, familial origin, physical proximity, race, class, etc. In other cases, the genomes are selected at random from populations such that they have no relation to one another other than being selected from the same population. In one preferred embodiment, the method is performed to determine the genotype (e.g. SNP content) of subjects having a specific phenotypic characteristic, such as a genetic disease or other trait. Other uses for the methods of the invention involve identification or characterization of a subject, such as in paternity and maternity testing, immigration and inheritance disputes, breeding tests in animals, zygosity testing in twins, tests for inbreeding in humans and animals, evaluation of transplant suitability, such as with bone marrow transplants, identification of human and animal remains, quality control of cultured cells, and forensic testing such as forensic analysis of semen samples, blood stains, and other biological materials. The methods of the invention may also be used to characterize the genetic makeup of a tumor by testing for loss of heterozygosity or to determine the allelic frequency of a particular SNP. Additionally, the methods may be used to generate a genomic classification code for a genome by identifying the presence or absence of each of a panel of SNPs in the genome and to determine the allelic frequency of the SNPs. Each of these uses is discussed in more detail herein. [0074]
  • The genotyping methods of the invention are based on use of RCGs that can be reproducibly produced. These RCGs are used to identify SNPs, and can be screened individually for the presence or absence of the SNP alleles. [0075]
  • The invention, in some aspects, is based on the finding that the complexity of the genome can be reduced using various PCR and other genome complexity reduction methods and that RCG's made using such methods can be scanned for the presence of SNPs. One problem with using SNP-ASOs to screen a whole genome (i.e. a genome, the complexity of which has not been reduced) is that the signal to noise (S/N) ratio is high due to the high complexity of the genome and relative frequency of occurrence of a particular SNP-specific sequence within the whole genome. When an entire genome of a complex organism is used as the target for allele-specific oligonucleotide hybridization, the target sequence (e.g. about 17 nucleotide residues) to be detected represents only e.g. approximately 10[0076] 8-109 1 part in 108 of the DNA sample (e.g. for a NP-ASO about 17 nucleotides). It has been discovered, according to the invention, that the complexity of the genome can be reduced in a reproducible manner and that the resulting RCG is useful for identifying the presence of SNPs in the whole genome and for genotyping methods. Reduction in complexity allows genotyping of multiple SNPs following performance of a single PCR reaction, reducing the number of experimental manipulations that must be performed. The RCG is a reliable representation of a specific subfraction of the whole genome, and can be analyzed as though it were a genome of considerably lower complexity.
  • RCGs are prepared from isolated genomes. An “isolated genome” as used herein is genomic DNA that is isolated from a subject and may include the entire genomic DNA. For instance, an isolated genome may be a RCG, or it may be an entire genomic DNA sample. Genomic DNA is a population of DNA that comprises the entire genetic component of a species excluding, where applicable, mitochondrial and chloroplast DNA. Of course, the methods of the invention can be used to analyze mitochondrial, chloroplast, etc., DNA as well. Depending on the particular species of the subject, the genomic DNA can vary in complexity. For instance, species which are relatively low on the evolutionary scale, such as bacteria, can have genomic DNA which is significantly less complex than species higher on the evolutionary scale. Bacteria such as [0077] E. coli have approximately 2.4×109 grams per mole of haploid genome, and bacterial genomes having a size of less than about 5 million base pairs (5 megabases) are known. Genomes of intermediate complexity, such as those of plants, for instance, rice, have a genome size of approximately 700-1,000 megabases. Genomes of highest complexity, such as maize or humans, have a genome size of approximately 10-10. Humans have approximately 7.4×1012 grams per mole of haploid genome.
  • A “subject” as used herein refers to any type of DNA-containing organism, and includes, for example, bacteria, viruses, fungi, animals, including vertebrates and invertebrates, and plants. [0078]
  • A “RCG” as used herein is a reproducible fraction of an isolated genome which is composed of a plurality of DNA fragments. The RCG can be composed of random or nonrandom segments or arbitrary or non-arbitrary segments. The term “reproducible fraction” refers to a portion of the genome which encompasses less than the entire native genome. If a reproducible fraction is produced twice or more using the same experimental conditions the fractions produced in each repetition include at least 50% of the same sequences. In some embodiments the fractions include at least 70%, 80%, 90%, 95%, 97%, or 99% of the same sequences, depending on how the fractions are produced. For instance, if a RCG is produced by PCR another RCG can be generated under identical experimental conditions having at a minimum greater than 90% of the sequences in the first RCG. Other methods for preparing a RCG such as size selection are still considered to be reproducible but often produce less than 99% of the same sequences. [0079]
  • A “plurality” of elements, as used throughout the application refers to 2 or more of the element. A “DNA fragment” is a polynucleotide sequence obtained from a genome at any point along the genome and encompassing any sequence of nucleotides. The DNA fragments of the invention can be generated according to any one of two types mechanisms, and thus there are two types of RCGs, PCR-generated RCGs and native RCGs. [0080]
  • PCR-generated RCGs are randomly primed. That is, each of the polynucleotide fragments in the PCR-generated RCG all have common sequences at or near the 5′ and 3′ end of the fragment (When a tag is used in the primer, all of the 5′ and 3′ ends are identical. When a tag is not used the 5′ and 3‘ends have a series of N’s followed by the TARGET sequence (reading in a 5′ to 3′ direction). The TARGET sequence is identical in each primer, with the exception of multiple-primed DOP-PCR) but the remaining nucleotides within the fragments do not have any sequence relation to one another. Thus, each polynucleotide fragment in a RCG includes a common 5′ and 3′ sequence which is determined by the constant region of the primer used to generate the RCG. For instance, if the RCG is generated using DOP-PCR (described in more detail below) each polynucleotide fragment would have near the 5′ or 3′ end nucleotides that are determined by the “TARGET nucleotide sequence”. The TARGET nucleotide sequence is a sequence which is selected arbitrarily but which is constant within a set or subset (e.g. multiple primed DOP-PCR) of primers. Thus, each polynucleotide fragment can have the same nucleotide sequence near the 5′ and 3′ end arising from the same TARGET nucleotide sequence. In some cases more than one primer can be used to generate the RCG. When more than one primer is used, each member of the RCG would have a 5′ and 3′ end in common with at least one other member of the RCG and, more preferably, each member of the RCG would have a 5′ and 3′ end in common with at least 5% of the other members of the RCG. For example, if a RCG is prepared using DOP-PCR with 2 different primers having different TARGET nucleotide sequences, a population containing of four sets of PCR products having common ends could be generated. One set of PCR products could be generated having the TARGET nucleotide sequence of the first primer at or near both the 5′ and 3′ ends and another set could be generated having the TARGET nucleotide sequence of the second primer at or near both the 5′ and 3′ ends. Another set of PCR products could be generated having the TARGET nucleotide sequence of the second primer at or near the 5′ end and the TARGET nucleotide sequence of the first primer at or near the 3′ end. A fourth set of PCR products could be generated having the TARGET nucleotide sequence of the second primer at or near the 3′ end and the TARGET nucleotide sequence of the first primer at or near the 5′ end. The PCR generated genomes are composed of synthetic DNA fragments. [0081]
  • The DNA fragments of the native RCGs have arbitrary sequences. That is, each of the polynucleotide fragments in the native RCG do not have necessarily any sequence relation to another fragment of the same RCG. These sequences are selected based on other properties, such as size or, secondary characteristics. These sequences are referred to as native RCGs because they are prepared from native nucleic acid preparations rather than being synthesized. Thus they are native-non-synthetic DNA fragments. The fragments of the native RCG may share some sequence relation to one another (e.g. if produced by restriction enzymes). In some embodiments they do not share any sequence relation to one another. [0082]
  • In some preferred embodiments, the RCG includes a plurality of DNA fragments ranging in size from approximately 200 to 2,000 nucleotide residues. In a preferred embodiment, a RCG includes from 95 to 0.05% of the intact native genome. The fraction of the isolated genome which is present in the RCG of the invention represents at most 90% of the isolated genome, and in preferred embodiments, contains less than 50%, 40%, 30%, 20%, 10%, 5%, or 1% of the genome. A RCG preferably includes between 0.05 and 1% of the intact native genome. In a preferred embodiment, the RCG encompasses 10% or less of an intact native genome of a complex organism. [0083]
  • Genomic DNA can be isolated from a tissue sample, a whole organism, or a sample of cells. Additionally, the isolated genomes of the invention are preferably substantially free of proteins that interfere with PCR or hybridization processes, and are also substantially free of proteins that damage DNA, such as nucleases. Preferably, the isolated genomes are also free of non-protein inhibitors of polymerase function (e.g. heavy metals) and non-protein inhibitors of hybridization when the PCR-generated RCGs are formed. Proteins may be removed from the isolated genomes by many methods known in the art. For instance, proteins may be removed using a protease, such as proteinase K or pronase, by using a strong detergent such as sodium dodecyl sulfate (SDS) or sodium lauryl sarcosinate (SLS) to lyse the cells from which the isolated genomes are obtained, or both. Lysed cells may be extracted with phenol and chloroform to produce an aqueous phase containing nucleic acid, including the isolated genomes, which can be precipitated with ethanol. [0084]
  • Several methods can be used to generate PCR-generated RCG including IRS-PCR, AP-PCR, DOP-PCR, multiple primed PCR, and adaptor-PCR. Hybridization conditions for particular PCR methods are selected in the context of the primer type and primer length to produce to yield a set of DNA fragments which is a percentage of the genome, as defined above. PCR methods have been described in many references, see e.g., U.S. Pat. Nos. 5,104,792; 5,106,727; 5,043,272; 5,487,985; 5,597,694; 5,731,171; 5,599,674; and 5,789,168. Basic PCR methods have been described in e.g., Saiki et al., Science, 230: 1350 (1985) and U.S. Pat. Nos. 4,683,195, 4,683,202 (both issued Jul. 18, 1987) and U.S. Pat. No. 4,800,159 (issued Jan. 24, 1989). [0085]
  • The PCR methods described herein are performed according to PCR methods well-known in the art. For instance, U.S. Pat. No. 5,333,675, issued to Mullis et al. describes an apparatus and method for performing automated PCR. In general, performance of a PCR method results in amplification of a selected region of DNA by providing two DNA primers, each of which is complementary to a portion of one strand within the selected region of DNA. The primer is hybridized to a template strand of nucleic acid in the presence of deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and a chain extender enzyme, such as DNA polymerase. The primers are hybridized with the separated strands, forming DNA molecules that are single stranded except for the region hybridized with the primer, where they are double stranded. The double stranded regions are extended by the action of the chain extender enzyme (e.g. DNA polymerase) to form an extended double stranded molecule between the original two primers. The double stranded DNA molecules are separated to produce single strands which can then be re-hybridized with the primers. The process is repeated for a number of cycles to generate a series of DNA strands having the same nucleotide sequence between and including the primers. [0086]
  • Chain extender enzymes are well known in the art and include, for example, [0087] E. coli DNA polymerase I, klenow fragment of E. coli DNA polymerase I, T4 DNA polymerase, T7 DNA polymerase, recombinant modified T7 DNA polymerase, reverse transcriptase, and other enzymes. Heat stable enzymes are particularly preferred as they are useful in automated thermal cycle equipment. Heat stable polymerases include, for example, DNA polymerases isolated from bacillus stearothermophilus (Bio-Rad), thermus thermophilous (finzyme, ATCC number 27634), thermus species (ATCC number 31674), thermus aquaticus strain TV11518 (ATCC number 25105), sulfolobus acidocaldarius, described by Bukhrashuili et al., Biochem. Biophys. Acta., 1008:102-07 (1909), thermus filiformus (ATCC number 43280), Taq DNA polymerase, commercially available from Perkin-Elmer-Cetus (Norwalk, Conn.), Promega (Madison, Wis.) and Stratagene (La Jolla, Calif.), and AmpliTaq™ DNA polymerase, a recombinant thermus equitus Taq DNA polymerase, available from Perkin-Elmer-Cetus and described in U.S. Pat. No. 4,889,818.
  • Preferably, the PCR-based RCG generation methods performed according to the invention are automated and performed using thermal cyclers. Many types of thermal cyclers are well-known in the art. For instance, M.J. Research (Watertown, Mass.) provides a thermal cycler having a peltier heat pump to provide precise uniform temperature control in the thermal cyclers; DeltaCycler thermal cyclers from Ericomp (San Diego, Calif.) also are peltier-based and include automatic ramping control, time/temperature extension programming and a choice of tube or microplate configurations. The RoboCycler™ by Stratagene (La Jolla, Calif.) incorporates robotics to produce rapid temperature transitions during cycling and well-to-well uniformity between samples; and a particularly preferred cycler, is the Perkin-Elmer Applied Biosystems (Foster City, Calif.) ABI Prism™ 877 Integrated Thermal cycler, which is operated through a programmable interface that automates liquid handling and thermocycling processes for fluorescent DNA sequencing and PCR reactions. The Perkin-Elmer Applied Biosystems machine is designed specifically for high-throughput genotyping projects and fully automates genotyping steps, including PCR product pooling. [0088]
  • Degenerate oligonucleotide primed-PCR (DOP-PCR) involves use of a single primer set, wherein each primer of the set is typically composed of 3 parts. A DOP-PCR primer as used herein can have the following structure: [0089]
  • 5′tag-(N)[0090] x-TARGET 3′
  • The “TARGET” nucleotide sequence includes at least 5 arbitrarily selected nucleotide residues that are the same for each primer of the set. x is an integer from 0 to 9, and N is any nucleotide residue. The value of x is preferably the same for each primer of a DOP-PCR primer sety. In other embodiments, the TARGET nucleotide sequence includes at least 6 or 7 and preferably at least 8, 9, or 10 arbitrarily-selected nucleotides. The tag is optional. [0091]
  • A “TARGET nucleotide” can be used herein is selected arbitrarily. A set of primers is used to generate a particular RCG. Each primer in the set includes the same TARGET nucleotide sequence as the other primers. Of course, sets of primers having different TARGET sequences can be combined. [0092]
  • The “tag”, as used herein, is a sequence which is useful for processing the RCG but not necessary. The tag, unlike the other sequences in the primer, does not necessarily hybridize with genomic DNA during the initial round of genomic PCR amplification. In later amplification rounds, the tag hybridizes with PCR, amplified DNA. Thus, the tag does not contribute to the sequence initially recognized by the primer. Since the tag does not participate in the initial hybridization reaction with genomic DNA, but is involved in the primer extension process, the PCR products that are formed (i.e., the reproducible DNA fragments) include the tag sequence. Thus, the end products are DNA fragments that have a sequence identical to a sequence found in the genome except for the tag sequence. The tag is useful because in later rounds of PCR it allows use of a higher annealing temperature than could otherwise be used with shorter oligonucleotides. The arbitrarily selected sequence is positioned at the 3′ end of the primer. This sequence, although arbitrarily selected, is the same for each primer in a set of DOP-PCR primers. From 0 to 9 nucleotide residues (“N” in the formula above) are located at the 5′-end of the TARGET sequence in the DOP-PCR primers of the invention. Each of these residues can be independently selected from naturally-occurring or artificial nucleotide residues. By way of example, each “N” residue can be an inosine or methylcytosine residue. In the formula, “x” is an integer that can be from 0 to 9, and is preferably from 3 to 9 (e.g. 3, 4, 5, 6, 7, 8, or 9). Each set of DOP-PCR primers of the invention can thus contain up to 4[0093] x unique primers (i.e., 1, 4, 16, 64 . . . , 262144 primers for x=0, 1, 2, 3, . . . , 9). Finally, a base pair tag can be positioned at the 5′ end of the primer. This tag can optionally include a restriction enzyme site. In general, inclusion of a tag sequence in the DOP-PCR primers of the invention is preferred, but not necessary.
  • The initial rounds of DOP-PCR are preferably performed at a low temperature given that the specificity of the reaction will be determined by only the 3′ TARGET nucleotide sequence. A slow ramp time during these cycles ensures that the primers do not detach from the template before being extended. Subsequent rounds are carried out at a higher annealing temperature because in the subsequent rounds the 5′ end of the DOP-PCR primer (the tag) is able to contribute to the primer annealing. A PCR cycle performed under low stringency hybridization conditions generally is from about 35° C. to about 55° C. [0094]
  • Because DOP-PCR involves a randomly chosen sequence, the resultant PCR products are generated from genome sequences arbitrarily distributed throughout the genome and will generally not be clustered within specific sites of the genome. Additionally, creation of new sets of DOP-PCR-amplified DNA fragments can be easily accomplished by changing the sequence, length, or both, of the primer. RCGs having greater or lesser complexity can be generated by selecting DOP-PCR primers having shorter or longer, respectively, TARGET and (N)[0095] x nucleotide sequences. This approach can also be used with multiple DOP-PCR primers such as in the “multiple-primed DOP-PCR” method (described below). Finally, use of arbitrarily chosen sequences of DOP-PCR is useful in many species because the arbitrarily-selected sequences are not species-specific, as with some forms of PCR which require use of a specific known sequence.
  • Another method for generating a PCR-generated RCG involves interspersed repeat sequence PCR (IRS-PCR). Mammalian chromosomes include both repeated and unique sequences. Some of the repeated sequences are short interspersed repeated sequences (IRS's) and others are long IRS's. One major family of short IRS's found in humans includes Alu repeat sequences. Amplification using a single Alu primer will occurs whenever two Alu elements lie in inverted orientation to each other on opposite strands. There are believed to be approximately 900,000 Alu repeats in a human haploid genome. Another type of IRS sequence is the L1 element (most common is LlHs) which is present in 10[0096] 4-105 copies in a human genome. Because the L1 sequence is expressed less abundantly in the genome than the Alu sequence, fewer amplification products are produced upon amplification using an L1 primer. In IRS-PCR, a primer which has homology to a repetitive sequence present on opposite strands within the genome of the species to be analyzed is used. When two repeat elements having the primer sequence are present in a head-to-head fashion within a limited distance (approximately 2000 nucleotide residues), the inter-repeat sequence can be amplified. The method has the advantage that the complexity of the resulting PCR products can be controlled by how homologous the primer chosen is with the repeat consensus (that is, the more homologous the primer is with the repeat consensus sequence, the more complex the PCR product will be).
  • In general, an IRS-PCR primer has a sequence wherein at least a portion of the primer is homologous with (e.g. 50%, 75%, 90%, 95% or more identical to) the consensus nucleotide sequence of an IRS of the subject. [0097]
  • In mammalian genomes, small interspersed repeat sequences (SINES) are present in extremely high copy number and are often configured such that a single copy sequence of between 500 nucleotide residues and 1000 nucleotide residues is situated between two repeats which are oriented in a head-to-head or tail-to-tail manner. Genomic DNA sequences having this configuration are substrates for Alu PCR in human DNA and B1 and B2 PCR in the mouse. The precise number of products which are represented in a specific Alu, B1, or B2 PCR reaction depends on the choice of primer used for the reaction. This variation in product complexity is due to the variation in sequence among the large number of representative sequences of the IRS family in each species. A detailed study of this variation was described by Britten (Britten, R. J. (1994), [0098] Proc. Natl. Acad. Sci. USA, 91:5992-5996). In the Britten study, the sequence variation for each nucleotide residue of the Alu consensus sequence was analyzed for 1574 human Alu sequences. The complexity of Alu PCR products generated by amplification using a given Alu PCR primer can be predicted to a significant extent based on the degree to which the nucleotide sequence of the primer matches consensus nucleotide sequences. As a general rule, Alu PCR products become progressively less complex as the primer sequence diverges from the Alu consensus. Because two hybridized primers are required at each site for which Alu PCR is to be accomplished, it is predictable that linear variation and the number of genomic sites to which a primer may bind will be reflected in the complexity of PCR products, which is roughly proportional to the square of primer binding efficiency. This prediction conforms to experimental results, permitting synthesis of Alu PCR products having a wide range of product complexity values. Therefore, when it is desirable to reduce the number of PCR products obtained using Alu PCR, the primer sequence should be designed to diverge by a predictable amount from the Alu consensus sequence.
  • Another method for generating a RCG involves arbitrarily primed PCR (AP-PCR). AP-PCR utilizes short oligonucleotides as PCR primers to amplify a discrete subset of portions of a high complexity genome. For AP-PCR, the primer sequence is arbitrary and is selected without knowledge of the sequence of the target nucleic acids to be amplified. The arbitrary primer is generally 50-60% G+C. The AP-PCR method is similar to the DOP-PCR method described above, except that the AP-PCR primer consists of only the arbitrarily-selected nucleotides and not the 5′ flanking degenerate residues or the tag (i.e. N[0099] x residue described for the DOP-PCR primers). The genome may be primed using a single arbitrary primer or a combination of two or more arbitrary primers, each having a different, but optionally related, sequence.
  • AP-PCR is performed under low stringency hybridization conditions, allowing hybridization of the primer with targets with which the primer can exhibit a substantial degree of mismatching. A PCR cycle performed under low stringency hybridization conditions generally is from about 35° C. to about 55° C. Mismatches refer to non complementary nucleotide bases in the primer, relative to the template with which it is hybridized. [0100]
  • AP-PCR methods have been used previously in combination with gel electrophoresis to determine genotypes. AP-PCR products are generationally fractionated on a high resolution polyacrylamide gel, and the presence or absence of specific bands is used to genotype a specific locus. In general, the difference between the presence and absence of a band is a consequence of a single nucleotide DNA sequence difference in one of the primer binding sites for a given single copy sequence. [0101]
  • The product complexity obtained using a given primer or primer set can be determined by several methods. For instance, the product complexity can be determined using PCR amplification of a panel of human yeast artificial chromosome (YAC) DNA samples from a [0102] CEPH 1 library. These YACs each carry a human DNA segment approximately 300-400 kilobase pairs in length. Product complexity for each primer set can be inferred by comparing the number of bands produced per YAC when analyzed on agarose gel with an IRS-PCR product of known complexity. Additionally, for products of relatively low complexity, electrophoresis on polyacrylamide gels can establish the product complexity, compared to a standard. Alternatively, an effective way to estimate the complexity of the product is to carry out a reannealing reaction using resistance to S1 nuclease-catalyzed degradation to determine the rate of reannealing of internally labeled, denatured, double-stranded DNA product. Comparison with reannealing rates of standards of known complexity permits accurate estimation of product complexity. Each of these three methods may be used for IRS PCR. The second and third methods are best for AP-PCR and DOP-PCR which, unlike IRS-PCR, will not selectively amplify human DNA from a crude YAC DNA preparation.
  • The complexity of PCR products generated by AP-PCR can be regulated by selecting the primer sequence length, the number of primers in a primer set, or some combination of these. By choosing the appropriate combination, AP-PCR may also be used to reduce the complexity of a genome for SNP identification and genotyping, as described herein. AP-PCR markers are different from Alu PCR primers, have a different genomic distribution, and can therefore complement an IRS-PCR genome complexity-reducing method. The methods can be used in combination to produce complementary information from genome scans. [0103]
  • One PCR method for preparing RCGs is an adapter-linker amplification PCR method (previously described in e.g., Saunders et al., Nuc. Acids Res., 17 9027 (1990); Johnson, Genomics, 6: 243 (1990) and PCT Application WO90/00434, published Aug. 9, 1990. In this method, genomic DNA is digested using a restriction enzyme, and a set of linkers is ligated onto the ends of the resulting DNA fragments. PCR amplification of genomic DNA is accomplished using a primer which can bind with the adapter linker sequence. Two possible variations of this procedure which can be used to limit genome complexity are (a) to use a restriction enzyme which produces a set of fragments which vary in length such that only a subset (e.g. those smaller than a PCR-amplifiable length) are amplified; and (b) to digest the genomic DNA using a restriction enzyme that produces an overhang of random nucleotide sequence (e.g., AlwN1 recognizes CAGNNNCTG; SEQ ID NO: 1) and cleaves between NNN and CTG). Adapters are constructed to anneal with only a subset of the products. For example, in the case of AlwN1, adapters having a specific 3 nucleotide residue overhang (corresponding to the random 3 base pair sequence produced by the restriction enzyme digestion) would be used to yield (43) 64-fold reduction in complexity. Fragments which have an overhang sequence complementary to the adapter overhang are the only ones which are amplified. [0104]
  • Another method for generating RCGs is based on the development of native RCGs. Several methods can be used to generate native RCGs, including DNA fragment size selection, isolating a fraction of DNA from a sample which has been denatured and reannealed, pH-separation, separation based on secondary structure, etc. [0105]
  • Size selection can be used to generate a RCG by separating polynucleotides in a genome into different fractions wherein each fraction contains polynucleotides of an approximately equal size. One or more fractions can be selected and used as the RCG. The number of fractions selected will depend on the method used to fragment the genome and to fractionate the pieces of the genome, as well as the total number of fractions. In order to increase the complexity of the RCG, more fractions are selected. One method of generating a RCG involves fragmenting a genome into arbitrarily size pieces and separating the pieces on a gel (or by HPLC or another size fractionation method). A portion of the gel is excised, and DNA fragments contained in the portion are isolated. Typically, restriction enzymes can be used to produce DNA fragments in a reproducible manner. [0106]
  • Separation based on secondary structure can be accomplished in a manner similar to size selection. Different fractions of a genome having secondary structure can be separated on a gel. One or more fractions are excised from the gel, and DNA fragments are isolated therefrom. [0107]
  • Another method for creating a native RCG involves isolating a fraction of DNA from a sample which has been denatured and reannealed. A genomic DNA sample is denatured, and denatured nucleic acid molecules are allowed to reanneal under selected conditions. Some conditions allow more of the DNA to be reannealed than other conditions. These conditions are well known to those of ordinary skill in the art. Either the reannealed or the remaining denatured fractions can be isolated. It is desirable to select the smaller of these two fractions in order to generate RCG. The reannealing conditions used in the particular reaction determine which fraction is the smaller fraction. Variations of this method can also be used to generate RCGs. For instance, once a portion of the fraction is allowed to reanneal, the double stranded DNA may be removed (e.g., using column chromatography), the remaining DNA can then be allowed to partially reanneal, and the reannealed fraction can be isolated and used. This variation is particularly useful for removing repetitive elements of the DNA, which rapidly reanneal. [0108]
  • The amount of isolated genome used in the method of preparing RCGs will vary, depending on the complexity of the initial isolated genome. Genomes of low complexity, such as bacterial genomes having a size of less than about 5 million base pairs (5 megabases), usually are used in an amount from approximately 10 picograms to about 250 nanograms. A more preferred range is from 30 picograms to about 7.5 nanograms, and even more preferably, about 1 nanogram. Genomes of intermediate complexity, such as plants (for instance, rice, having a genome size of approximately 700-1,000 megabases) can be used in a range of from approximately 0.5 nanograms to 250 nanograms. More preferably, the amount is between 1 nanogram and 50 nanograms. Genomes of highest complexity (such as maize or humans, having a genome size of approximately 3,000 megabases) can be used in an amount from approximately 1 nanogram to 250 nanograms (e.g. for PCR). [0109]
  • In addition to the DOP-PCR methods described above, PCR-generated RCGs can be prepared using DOP-PCR involving multiple primers, which is referred to herein as “multiple-primed-DOP-PCR”. Multiple-primed-DOP-PCR involves the use of at least two primers which are arranged similarly to the single primers discussed above and are typically composed of 3 parts. A multiple-primed-DOP-PCR primer as used herein has the following structure: [0110]
  • tag-(N)[0111] x-TARGET2
  • The TARGET[0112] 2 nucleotide sequence includes at least 5, and preferably at least 6, TARGET nucleotide residues, x is an integer from 0-9, and N is any nucleotide residue.
  • The sequence chosen arbitrarily and positioned at the 3′ end of the primer can be manipulated in multiple-primed-DOP-PCR to produce a different end product than for DOP-PCR because use of two or more sets of primers adds another level of diversity, thus producing a RCG or amplified genome, depending on the primers chosen. Each of the at least two sets of primers of multiple-primed-DOP-PCR has a different TARGET sequence. Similar to the single primer of DOP-PCR a set of primers is generated for each of the at least two primers and, every primer within a single set has the same TARGET sequence as the other primers of the set. This TARGET sequence is flanked at its 5′ end by 0 to 9 nucleotide residues (“N”s). The set of N's will differ from primer to primer within a set of primers. A set of primers may include up to 4[0113] x different primers, each primer having a unique (N)x sequence. Finally a tag can be positioned at the 5′ end.
  • In other aspects of the invention, methods for identifying SNPs can be performed using RNA genomes rather than RCGs. RNA genomes differ from RCGs in that they are generated from RNA rather than from DNA. An RNA genome can be, for instance, a cDNA preparation made by reverse transcription of RNA obtained from cells of a subject (e.g. human ovarian carcinoma cells). Thus, an RNA genome can be composed of DNA sequences, as long as the DNA is derived from RNA. RNA can also be used directly. [0114]
  • The genotyping and other methods of the invention can also be performed using a RNA genotyping method. This method involves use of RNA, rather than DNA, as the source of nucleic acid for genotyping. In this embodiment, RNA is reverse transcribed (e.g. using a reverse transcriptase) to produce cDNA for use as an RNA genome. The RNA method has at least one advantage over DNA-based methods. SNPs in coding regions (cSNPs) are more likely to be directly involved in detectable phenotypes and are thus more likely to be informative with regard to how such phenotypes can be affected. Furthermore, since this method can require only a reverse transcription step, it is amenable to high-throughput analysis. In a preferred embodiment, a reverse transcriptase primer which only binds a subset of RNA species (e.g. a dT primer having a 3-base anchor, e.g. TTTTTTTTTT CAG; SEQ ID NO: 2) is used to further reduce RNA genome complexity (48-fold using the dt-3base anchor primer). In the RNA-genotyping method of the invention the RNA/cDNA sample can be attached to a surface and hybridized with a SNP-ASO. [0115]
  • In another aspect, the invention includes a method for identifying a SNP. Genomic fragments which include SNPs can be prepared according to the invention by preparing a set of primers from a RCG (e.g., a RCG is composed of a set of PCR products), performing PCR using the set of primers to amplify a plurality of isolated genomes to produce DNA products, and identifying SNPs included in the DNA products. The presence of a SNP in the DNA product can be identified using methods such as direct sequencing, i.e. using dideoxy chain termination or Maxam Gilbert (see e.g., Sambrook et al, “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, 1989, New York; or Zyskind et al., Recombinant DNA Laboratory Manual, Acad. Press, 1988), denaturing gradient gel electrophoresis to identify different sequence dependent melting properties and electrophoretic migration of SNPs containing DNA fragments (see e.g., Erlich, ed., PCR Technology, Principles and applications for DNA Amplification, Freeman and Co., NY, 1992), and conformation analysis to differentiate sequences based on differences in electrophoretic migration patterns of single stranded DNA products (see e.g., Orita et al., Proc. Nat. Acad. Sci. 86, 2766-2770, 1989). In preferred embodiments, the SNPs are identified based on the sequences of the polymerase chain-reaction products identified using sequencing methods. [0116]
  • A “single nucleotide polymorphism” or “SNP” as used herein is a single base pair (i.e., a pair of complementary nucleotide residues on opposite genomic strands) within a DNA region wherein the identities of the paired nucleotide residues vary from individual to individual. At the variable base pair in the SNP, two or more alternative base pairings occur at a relatively high frequency (greater than 1%) in a subject, (e.g. human) population. [0117]
  • A “polymorphic region” is a region or segment of DNA the nucleotide sequence of which varies from individual to individual. The two DNA strands which are complementary to one another except at the variable position are referred to as alleles. A polymorphism is allelic because some members of a species have one allele and other members have a variant allele and some have both. When only one variant sequence exists, a polymorphism is referred to as a diallelic polymorphism. There are three possible genotypes in a diallelic polymorphic DNA in a diploid organism. These three genotypes arise because it is possible that a diploid individual's DNA may be homozygous for one allele, homozygous for the other allele, or heterozygous (i.e. having one copy of each allele). When other mutations are present, it is possible to have triallelic or higher order polymorphisms. These multiple mutation polymorphisms produce more complicated genotypes. [0118]
  • SNPs are well-suited for studying sequence variation because they are relatively stable (i.e. they exhibit low mutation rates) and because it appears that SNPs can be responsible for inherited traits. These properties make SNPs particularly useful as genetic markers for identifying disease-associated genes. SNPs are also useful for such purposes as linkage studies in families, determining linkage disequilibrium in isolated populations, performing association analysis of patients and controls, and loss of heterozygosity studies in tumors. [0119]
  • An exemplary method for identifying SNPs is presented in the Examples below. Briefly, DOP-PCR is performed using genomic DNA obtained from an individual. The products are separated on an agarose gel. The products are separated by approximate length into approximately 8 segments having sizes of about 400-1000 base pairs, and libraries are made from each of the segments. This approach prevents domination of the library by one or two abundant products. Plasmid DNA is isolated from individual colonies containing portions of the library. Inserts are isolated and the ends of the inserts are sequenced using vector primers. A new set of primers is then synthesized based on these insert sequences to allow PCR to be performed using RCG obtained from one or more individuals or from a pool of individuals. The DNA products generated by the PCR are sequenced and inspected for the presence of two nucleotide residues at one location, an indication that a polymorphism exists at that position within one of the alleles. [0120]
  • A “primer” as used herein is a polynucleotide which hybridizes with a target nucleic acid with which it is complementary and which is capable of acting as an initiator of nucleic acid synthesis under conditions for primer extension. Primer extension conditions include hybridization between the primer and template, the presence of free nucleotides, a chain extender enzyme, e.g., DNA polymerase, and appropriate temperature and pH. [0121]
  • In preferred embodiments, a set of primers is prepared by at least the following steps: preparing a RCG, composed of a set of PCR products, separating the set of PCR products into individual PCR products, determining the sequence of each end of at least one of the PCR products, and generating the set of primers for use in the subsequent PCR step based on the sequence of the ends of the insert(s). [0122]
  • A “set of PCR products”, as used herein, is a plurality of synthetic polynucleotide sequences, each polynucleotide sequence being different from one another except for a stretch of nucleotides in the 5′ and 3′ regions of the polynucleotides which are identical in each polynucleotide. These regions correspond to the primers used to generate the RCG and the sequence in these regions varies depending on what primer is used. When a DOP PCR primer is used, the sequence that varies in each primer preferably has a sequence N[0123] x, wherein x is 512 and N is any nucleotide. A set of DNA products is different from a “set of PCR products” as used herein and refers to DNA generated by PCR using specific primers which amplify a specific locus.
  • Once the sequence of a primer is known, the primer may be purified from a nucleic acid preparation which includes, it or it may be prepared synthetically. For instance, nucleic acid fragments may be isolated from nucleic acid sequences in genomes, plasmids, or other vectors by site-specific cleavage, etc. Alternatively, the primers may be prepared by de novo chemical synthesis, such as by using phosphotriester or phosphodiester synethetic methods, such as those described in U.S. Pat. No. 4,356,270; Itakura et al. (1989), [0124] Ann. Rev. Biochem., 53:323-56; and Brown et al. (1979), Meth. Enzymol., 68:109. Primers may also be prepared using recombinant technology, such as that described in Sambrook, “Molecular Cloning: A Laboratory Manual,” Cold Spring Harbor Laboratory, p.390-401 (1982).
  • The term “nucleotide residue” refers to a single monomeric unit of a nucleic acid such as DNA or RNA. The term “base pair” refers to two nucleotide residues which are complementary to one another and are capable of hydrogen bonding with one another. Traditional base pairs are between G:C and T:A. The letters G, C, T, U and A refer to (deoxy)guanosine, (deoxy)cytidine, (deoxy)thymidine, uridine, and (deoxy)adenosine, respectively. The term “nucleic acids” as used herein refers to a class of molecules including single stranded and double stranded deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and polynucleotides. Nucleic acids within the scope of the invention include naturally occurring and synthetic nucleic acids, nucleic acid analogs, modified nucleic acids, nucleic acids containing modified nucleotides, modified nucleic acid analogs, and mixtures of any of these. [0125]
  • SNPs identified or detected in the genotyping methods described herein can also be identified by other methods known in the art. Many methods have been described for identifying SNPs. (see e.g. WO95/12607, Bostein, et al., [0126] Am. J. Hum. Genet, 32:314-331 (1980), etc.). In some embodiments, it is preferred that SNPs be identified using the same method that will subsequently be used for genotype analysis.
  • As discussed briefly above, the SNPs and RCGs of the invention are useful for a variety of purposes. For instance, SNPs and RCGs are useful for performing genotyping analysis; for identification of a subject, such as in paternity or maternity testing, in immigration and inheritance disputes, in breeding tests in animals, in zygosity testing in twins, in tests for inbreeding in humans and animals; in evaluation of transplant suitability such as with bone marrow transplants; in identification of human and animal remains; in quality control of cultured cells; in forensic testing such as forensic analysis of semen samples, blood stains, and other biological materials; in characterization of the genetic makeup of a tumor by testing for loss of heterozygosity; in determining the allelic frequency of a particular SNP; and in generating a genomic classification code for a genome by identifying the presence or absence of each of a panel of SNPs in the genome of a subject and optionally determining the allelic frequency of the SNPs. [0127]
  • A preferred use of the invention is in a high throughput method of genotyping. “Genotyping” is the process of identifying the presence or absence of specific genomic sequences within genomic DNA. Distinct genomes may be isolated from individuals of populations which are related by some phenotypic characteristic, by familial origin, by physical proximity, by race, by class, etc. in order to identify polymorphisms (e.g. ones associated with a plurality of distinct genomes) which are correlated with the phenotype family, location, race, class, etc. Alternatively, distinct genomes may be isolated at random from populations such that they have no relation to one another other than their origin in the population. Identification of polymorphisms in such genomes indicates the presence or absence of the polymorphisms in the population as a whole, but not necessarily correlated with a particular phenotype. [0128]
  • Although genotyping is often used to identify a polymorphism associated with a particular phenotypic trait, this correlation is not necessary. Genotyping only requires that a polymorphism, which may or may not reside in a coding region, is present. When genotyping is used to identify a phenotypic characteristic, it is presumed that the polymorphism affects the phenotypic trait being characterized. A phenotype may be desirable, detrimental, or, in some cases, neutral. [0129]
  • Polymorphisms identified according to the methods of the invention can contribute to a phenotype. Some polymorphisms occur within a protein coding sequence and thus can affect the protein structure, thereby causing or contributing to an observed phenotype. Other polymorphisms occur outside of the protein coding sequence but affect the expression of the gene. Still other polymorphisms merely occur near genes of interest and are useful as markers of that gene. A single polymorphism can cause or contribute to more than one phenotypic characteristic and, likewise, a single phenotypic characteristic may be due to more than one polymorphism. In general multiple polymorphisms occurring within a gene correlate with the same phenotype. Additionally, whether an individual is heterozygous or homozygous for a particular polymorphism can affect the presence or absence of a particular phenotypic trait. [0130]
  • Phenotypic correlation is performed by identifying an experimental population of subjects exhibiting a phenotypic characteristic and a control population which do not exhibit that phenotypic characteristic. Polymorphisms which occur within the experimental population of subjects sharing a phenotypic characteristic and which do not occur in the control population are said to be polymorphisms which are correlated with a phenotypic trait. Once a polymorphism has been identified as being correlated with a phenotypic trait, genomes of subjects which have potential to develop a phenotypic trait or characteristic can be screened to determine occurrence or non-occurrence of the polymorphism in the subjects' genomes in order to establish whether those subjects are likely to eventually develop the phenotypic characteristic. These types of analyses are generally carried out on subjects at risk of developing a particular disorder such as Huntington's disease or breast cancer. [0131]
  • A phenotypic trait encompasses any type of genetic disease, condition, or characteristic, the presence or absence of which can be positively determined in a subject. Phenotypic traits that are genetic diseases or conditions include multifactorial diseases of which a component may be genetic (e.g. owing to occurrence in the subject of a SNP), and predisposition to such diseases. These diseases include such as, but not limited to, asthma, cancer, autoimmune diseases, inflammation, blindness, ulcers, heart or cardiovascular diseases, nervous system disorders, and susceptibility to infection by pathogenic microorganisms or viruses. Autoimmune diseases include, but are not limited to, rheumatoid arthritis, multiple sclerosis, diabetes, systemic lupus, erythematosus and Grave's disease. Cancers include, but are not limited to, cancers of the bladder, brain, breast, colon, esophagus, kidney, hematopoietic system eg. leukemia, liver, lung, oral cavity, ovary, pancreas, prostate, skin, stomach, and uterus. A phenotypic characteristic includes any attribute of a subject other than a disease or disorder, the presence or absence of which can be detected. Such characteristics can, in some instances, be associated with occurrence of a SNP in a subject which exhibits the characteristic. Examples of characteristics include, but are not limited to, susceptibility to drug or other therapeutic treatments, appearance, height, color (e.g. of flowering plants), strength, speed (e.g. of race horses), hair color, etc. Many examples of phenotypic traits associated with genetic variation have been described, see e.g., U.S. Pat. No. 5,908,978 (which identifies association of disease resistance in certain species of plants associated with genetic variations) and U.S. Pat. No. 5,942,392 (which describes genetic markers associated with development of Alzheimer's disease). [0132]
  • Identification of associations between genetic variations (e.g. occurrence of SNPs) and phenotypic traits is useful for many purposes. For example, identification of a correlation between the presence of a SNP allele in a subject and the ultimate development by the subject of a disease is particularly useful for administering early treatments, or instituting lifestyle changes (e.g., reducing cholesterol or fatty foods in order to avoid cardiovascular disease in subjects having a greater-than-normal predisposition to such disease), or closely monitoring a patient for development of cancer or other disease. It may also be useful in prenatal screening to identify whether a fetus is afflicted with or is predisposed to develop a serious disease. Additionally, this type of information is useful for screening animals or plants bred for the purpose of enhancing or exhibiting of desired characteristics. [0133]
  • One method for determining a genotype associated with a plurality of genomes is screening for the presence or absence of a SNP in a plurality of RCGs. For example, such screening may be performed using a hybridization reaction including a SNP-ASO and the RCGs. Either the SNP-ASO or the RCGs can, optionally be immobilized on a surface. The genotype is determined based on whether the SNP-ASO hybridizes with at least some of the RCGs. Other methods for determining a genotype involve methods which are not based on hybridization, including, but not limited to, mass spectrometric methods. Methods for performing mass spectrometry using nucleic acid samples have been described. See e.g., U.S. Pat. No. 5,885,775. The components of the RCG can be analyzed by mass spectrometry to identify the presence or absence of a SNP allele in the RCG. [0134]
  • A “SNP-ASO”, as used herein, is an oligonucleotide which includes one of two alternative nucleotides at a polymorphic site within its nucleotide sequence. In some embodiments, it is preferred that the oligonucleotide include only a single mismatched nucleotide residue namely the polymorphic residue, relative to an allele of a SNP. In other cases, however, the oligonucleotide may contain additional nucleotide mismatches such as neutral bases or may include nucleotide analogs. This is described in more detail below. In preferred embodiments, the SNP-ASO is composed from about 10 to 50 nucleotide residues. In more preferred embodiments, it is composed of from about 10 to 25 nucleotide residues. [0135]
  • Oligonucleotides may be purchased from commercial sources such as Genosys, Inc., Houston, Tex. or, alternatively, may be synthesized de novo on an Applied Biosystems 381 A DNA synthesizer or equivalent type of machine. [0136]
  • The oligonucleotides may be labeled by any method known in the art. One preferred method is end-labeling, which can be performed as described in Maniatis et al., “Molecular Cloning: A Laboratory Manual”, Cold Spring Harbor Laboratories, Cold Spring Harbor, N.Y. (1982). [0137]
  • It is possible that in organisms having a relatively non-complex genome, only a minimal complexity reduction step is necessary, and the genomic DNA may be directly analyzed or minimally reduced. This is particularly useful for screening tissue isolates to detect the presence of a bacterium or to identify the bacteria. Additionally, it is possible that, upon development of certain technical advances (e.g., more stringent hybridization, more sensitive detection equipment), even complex genomes may not need an extensive complexity reduction step. [0138]
  • Preferably, automated genotyping is performed. In general, genomic DNA of a well-characterized set of subjects, such as the CEPH families, is processed using PCR with appropriate primers to produce RCGs. The DNA is spotted onto one or more surfaces (e.g., multiple glass slides) for genotyping. This process can be performed using a microarray spotting apparatus which can spot more than 1,000 samples within a square centimeter area, or more than 10,000 samples on a typical microscope slide. Each slide is hybridized with a fluorescently tagged allele-specific SNP oligonucleotide under TMAC conditions analogous to those described below. The genotype of each individual can be determined by detecting the presence or absence of a signal for a selected set of SNP-ASOs. A schematic of the method is shown in FIG. 4. [0139]
  • Once the complexity of genomic DNA obtained from an individual has been reduced, the resulting genomic DNA fragments can be attached to a solid support in order to be analyzed by hybridization. The RCG fragments may be attached to the slide by any method for attaching DNA to a surface. Methods for immobilizing nucleic acids have been described extensively, e.g., in U.S. Pat. Nos. 5,679,524; 5,610,287; 5,919,626; and 5,445,934. For instance, DNA fragments may be spotted onto poly-L-lysine-coated glass slides, and then crosslinked by UV irradiation. A second, more preferred method, which has been developed, involves including a 5′ amino group on each of the DNA fragments of the RCG. The DNA fragments are spotted onto silane-coated slides in the presence of NaOH in order to covalently attach the fragments to the slide. This method is advantageous because a covalent bond is formed between the fragments and the surface. Another method for accomplishing DNA fragment immobilization is to spot the RCG fragments onto a nylon membrane. Other methods of binding DNA to surfaces are possible and are well known to those of ordinary skill in the art. For instance, attachment to amino-alkyl-coated slides can be used. More detailed methods are described in the Examples below. [0140]
  • The surface to which the oligonucleotide arrays are conjugated is preferably a rigid or semi-rigid support which may, optionally, have appropriate light absorbing or transmitting characteristics for use with commercially available detection equipment. Substrates which are commonly used and which have appropriate light absorbing or transmitting characteristics include, but are not limited to, glass, Si, Ge, GaAs, GaP, SiO[0141] 2, SiN4, modified silicon, and polymers such as (poly)tetrafluoroethylene, (poly)vinylidenedifluoride, polystyrene, polycarbonate, or combinations thereof. Additionally, the surface of the support may be non-coated or coated with a variety of materials. Coatings include, but are not limited to, polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, and membranes.
  • In one embodiment the SNP-ASOs are hybridized under standard hybridization conditions with RCGs covalently conjugated to a surface. Briefly, SNP-ASOs are labeled at their 5′ ends. A hybridization mixture containing the SNP-ASOs and, optionally, an isostabilizing agent, denaturing agent, or renaturation accelerant is brought into contact with an array of RCGs immobilized on the surface and the mixture and the surface are incubated under appropriate hybridization conditions. The SNP-ASOs which do not hybridize are removed by washing the array with a wash mixture (such as a hybridization buffer) to leave only hybridized SNP-ASOs attached to the surface. After washing, detection of the label (e.g., a fluorescent molecule) is performed. For example, an image of the surface can be captured (e.g., using a fluorescence microscope equipped with a CCD camera and automated stage capabilities, phosphoimager, etc.). The label may also, or instead, be detailed using a microarray scanner (e.g. one made by Genetic Microsystems). A microarray scanner provides image analysis which can be converted to a binary (i.e. +/−) signal for each sample using, for example, any of several available software applications (e.g., NIH image, ScanAnalyze, etc.) in a data format. The high signal/noise ratio for this analysis allows determination of data in this mode to be straightforward and easily automated. These data, once exported, can be manipulated to generate a format which can be directly analyzed by human genetics applications (such as CR1-MAP and LINKAGE via software). Additionally, the methods may utilize two or more fluorescent dyes which can be spectrally differentiated to reduce the number of samples to be analyzed. For instance, if four fluorescent dyes having spectral distinctions (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used. Then four hybridization reactions can be carried out under a single hybridization condition. In other embodiments discussed in more detail below, the SNP-ASOs are conjugated to a surface and hybridized with RCGs. [0142]
  • Conditions for optimal hybridization are described below in the Examples. In general, the SNP-ASO is present in a hybridization mixture at a concentration of from about 0.005 nanomoles per liter SNP-ASO hybridization mixture to about 50 nM SNP-ASO per ml hybridization mixture. More preferably, the concentration is from 0.5 nanomoles per liter to 1 nanomole per liter. A preferred concentration for radioactivity is 0.66 nanomoles per liter. The mixture preferably also includes a hybridization optimizing agent in order to improve signal discrimination between genomic sequences which are identically complementary to the SNP-ASO and those which contain a single mismatched nucleotide (as well as any neutral base etc. substitutions). Isostabilizing agents are compounds such as betaines and lower tetraalkyl ammonium salts which reduce the sequence dependence of DNA thermal melting transitions. These types of compounds also increase discrimination between matched and mismatched SNPs/genomes. A denaturing agent may also be included in the hybridization mixture. A denaturing agent is a composition that lowers the melting temperature of double stranded nucleic acid molecules, generally by reducing hydrogen bonding between bases or preventing hydration of nucleic acid molecules. Denaturing agents are well-known in the art and include, for example, DMSO, formaldehyde, glycerol, urea, formamide, and chaotropic salts. The hybridization conditions in general are those used commonly in the art, such as those described in Sambrook et al., “Molecular Cloning: A Laboratory Manual”, (1989), 2nd Ed., Cold Spring Harbor, N.Y.; Berger and Kimmel, “Guide to Molecular Cloning Techniques”, [0143] Methods in Enzymology, (1987), Volume 152, Academic Press, Inc., San Diego, Calif.; and Young and Davis, (1983), PNAS (USA) 80:1194.
  • In general, incubation temperatures for hybridization of nucleic acids range from about 20° C. to 75° C. For probes 17 nucleotides residues and longer, a preferred temperature range for hybridization is from about 50° C. to 54° C. The hybridization temperature for longer probes is preferably from about 55° C. to 65° C. and for shorter probes is less than 52° C. Rehybridization may be performed in a variety of time frames. Preferably, hybridization of SNP and RCGs performed for at least 30 minutes. [0144]
  • Preferably, either or both of the SNP-ASO and the RCG are labeled. The label may be added directly to the SNP-ASO or the RCG during synthesis of the oligonucleotide or during generation of RCG fragments. For instance, a PCR reaction performed using labeled primers or labeled nucleotides will produce a labeled product. Labeled nucleotides (e.g., fluorescein-labeled CTP) are commercially available. Methods for attaching labels to nucleic acids are well known to those of ordinary skill in the art and, in addition to the PCR method, include, for example, nick translation and end-labeling. [0145]
  • Labels suitable for use in the methods of the present invention include any type of label detectable by standard means, including spectroscopic, photochemical, biochemical, electrical, optical, or chemical methods. Preferred types of labels include fluorescent labels such as fluorescein. A fluorescent label is a compound comprising at least one fluorophore. Commercially available fluorescent labels include, for example, fluorescein phosphoramidides such as fluoreprime (Pharmacia, Piscataway, N.J.), fluoredite (Millipore, Bedford, Mass.), FAM (ABI, Foster City, Calif.), rhodamine, polymethadine dye derivative, phosphores, Texas red, green fluorescent protein, CY3, and CY5. Polynucleotides can be labeled with one or more spectrally distinct fluorescent labels. “Spectrally distinct”fluorescent labels are labels which can be distinguished from one another based on one or more of their characteristic absorption spectra, emission spectra, fluorescent lifetimes, or the like. Spectrally distinct fluorescent labels have the advantage that they may be used in combination (“multiplexed”). Radionuclides such as [0146] 3H, 125I, 35S, 14C, or 32P are also useful labels according to the methods of the invention. A plurality of radioactively distinguishable radionuclides can be used. Such radionuclides can be distinguished, for example, based on the type of radiation (e.g. α, β, or δ radiation) emitted by the radionuclides. The 32P signal can be detected using a phosphoimager, which currently has a resolution of approximately 50 microns. Other known techniques, such as chemiluminescence or colormetric (enzymatic color reaction), can also be used.
  • By using spectrally distinct fluorescent probes, it is possible to analyze more than one locus a single hybridization mixture. The term “multiplexing” refers to the use of a set of distinct fluorescent labels in a single assay. Such fluorescent labels have been described extensively in the art, such as the fluorescent labels described in PCT Published Patent Application WO98/31834. [0147]
  • Fluorescent primers are a preferred method of labeling polynucleotides. The fluorescent tag is stable for more than a year. Radioactively labeled primers are stable for a shorter period. In addition, fluorescent primers may be used in combination if they are spectrally distinct, as discussed above. This allows multiple hybridizations to be detected in a single hybridization mixture. As a result, the total number of reactions needed for a genome-wide scan is reduced. For example, for analysis of 1000 loci, 2000 hybridizations are needed (1000 loci×2 polymorphisms/loci). The use of 4 fluorescently-labeled oligonucleotides will cut this number 4-fold and thus only 500 hybridizations will be needed. [0148]
  • In order to determine the genotype of an individual at a SNP locus, it is desirable to employ SNP allele-specific oligonucleotide hybridization. Preferably, two hybridization mixtures are prepared for each locus (or they can be performed together). The first hybridization mixture contains a labeled (e.g., radioactive or fluorescent) SNP-ASO (typically 17-21 nucleotide residues in length centered around the polymorphic residue). To increase specificity, a 20-50 fold excess of non-labeled oligonucleotides corresponding to another allele (referred to herein as a “complementary SNP-ASO”) is included in the hybridization mixture. Use of the non-labeled complementary SNP-ASO can be avoided by using SNP-ASO containing a neutral base as described below. In the second hybridization mixture, the SNP-ASO that was labeled in the first mixture is not labeled, and the non-labeled SNP-ASO is labeled instead. Hybridization is performed in the presence of a hybridization buffer. The melting temperature of oligonucleotides can be determined empirically for each experiment. The pair of 2 oligonucleotides corresponding to different alleles of the same SNP (the SNP-ASOs and the complementary SNP-ASO) are referred to herein as a pair of allele-specific oligonucleotides (ASOs). Further experimental details regarding selecting and making SNP-ASOs are provided in the Examples section below. [0149]
  • In addition to the method described above, several other methods of allele specific hybridization may be used for hybridizing SNP-ASOs with RCGs. One method is to increase discrimination of SNPs in DNA hybridization by means of artificial mismatches. Artificial mismatches are inserted into oligonucleotide probes using a neutral base such as the base analog 3-nitropyrrole. A significant enhancement of discrimination is generally obtained, with a strong dependence of the enhancement on the spacing between mismatches. [0150]
  • In general, the methods described above are based on conjugation of genomic DNA fragments (i.e. a RCG) to a solid support. Hybridization analysis can also be performed with the SNP-ASO conjugated to the support (e.g. in an array). The oligonucleotide array is hybridized with one or more RCGs. Attaching of the SNP-ASOs or RCGs onto the support may be performed by any method known in the art. Many methods for attaching oligonucleotides to surfaces in arrays have been described, see, e.g. PCT Published Patent Application WO97/29212, U.S. Pat. Nos. 4,588,682; 5,667,976; and 5,760,130. Other methods include, for example, using arrays of metal pins. Additionally, RCGs may be attached to the surface by the methods disclosed in the Examples below. [0151]
  • An “array” as used herein is a set of molecules arranged in a specific order with respect to a surface. Preferably the array is composed of polynucleotides (e.g. either SNP-ASOs or RCGs) attached to the surface. Oligonucleotide arrays can be used to screen nucleic acid samples for a target nucleic acid, which can be labeled with a detectable marker. A fluorescent signal resulting from hybridization between a target nucleic acid and a substrate-bound oligonucleotide provides information relating to the identity of the target nucleic acid by reference to the location of the oligonucleotide in the array on the substrate. Such a hybridization assay can generate thousands of signals which exhibit different signal strengths. These signals correspond to particular oligonucleotides of the array. Different signal strengths will arise based on the amount of labeled target nucleic acid hybridized with an oligonucleotide of the array. This amount, in turn, can be influenced by the proportion of AT-rich regions and GC-rich regions within the oligonucleotide (which determines thermal stability). The relative amounts of hybridized target nucleic acid can also be influenced by, for example, the number of different probes arrayed on the substrate, the length of the target nucleic acid, and the degree of hybridization between mismatched residues. Oligonucleotide arrays, in some embodiments, have a density of at least 500 features per square centimeter, but in practice can have much lower densities. A feature, as used herein, is an area of a substrate on which oligonucleotides having a single sequence are immobilized. [0152]
  • The oligonucleotide arrays of the invention may be produced by any method known in the art. Many such arrays are commercially available, and many methods have been described for producing them. One preferred method for producing arrays includes spatially directed oligonucleotide synthesis. Spatially directed oligonucleotide may be performed using light-directed oligonucleotide synthesis, microlithography, application by ink jet, microchannel deposition to specific location, and sequestration with physical barriers. Each of these methods is well-known in the art and has been described extensively. For instance, the light-directed oligonucleotide synthesis method has been disclosed in U.S. Pat. Nos. 5,143,854; 5,489,678; and 5,571,639; and PCT applications having publication numbers WO90/15070; WO92/10092; and WO94/12305. This technique involves modification of the surface of the solid support with linkers and photolabile protecting groups using a photolithographic mask to produce reactive (e.g. hydroxyl) groups in the illuminated regions. A 3′-O-phosphoramideactivated deoxynucleocide having a 5′-hydroxylprotected group is supplied to the surface such that coupling occurs at sites that were exposed to light. The substrate is rinsed, and the surface is illuminated with a second mask, and another activated deoxynucleotide is presented to the surface. The cycle is repeated until the desired set of products is obtained. After the cycle is finished, the nucleotides can be capped. Another method involves mechanically protecting portions of the surface and selectively deprotecting/coupling materials to the exposed portions of the surface, such as the method described in U.S. Pat. No. 5,384,261. The mechanical means is generally referred to as a mask. Other methods for array preparation are described in PCT Published Patent Applications WO97/39151, WO98/20967, and WO98/10858, which describe an automated apparatus for the chemical synthesis of molecular arrays, U.S. Pat. No. 5,143,854, Fodor et al., [0153] Science (1991), 251:767-777 and Kozal et al., Nature Medicine, v. 2, p. 753-759 (1996).
  • Hybridizing a SNP-ASO with an array of RCGs (or hybridizing a RCG with an array of SNP ASO) is followed by detection of hybridization. Part of the genotyping methods described herein is to determine if a positive or negative signal exists for each hybridization for an individual and then based on this information, determine the genotype for the corresponding SNP locus. This step is relatively straightforward, but varies depending on the method of detection. Essentially, all of the detection methods described here (fluorescent, radioactive, etc.) can be reduced to a digital image file, e.g. using a microarray reader or phosphoimager. Presently, there are several software products which will overlay a grid on an image and determine the signal strength value for each element of the grid. These values can be imported into a computer program, such as the Microsoft Corporation spreadsheet program designated Microsoft Excel™, with which simple analysis can be performed to assign each signal a manipulable value (e.g. 1 or 0 or + or −). Once this is accomplished, an individual's genotype can be described in terms of the pattern of hybridization of RCG fragments obtained from the individual with selected SNP ASO corresponding to disease-associated SNPs. [0154]
  • The array having labeled SNP-ASOs (or labeled RCGs) hybridized thereto can be analyzed using automated equipment. Automated equipment for analyzing arrays can include an excitation radiation source which emits radiation at a first wavelength, an optical detector, and a stage for securing the surface supporting the array. The excitation source emits excitation radiation which is focused on at least one area of the array and which induces emission from fluorescent labels. The signal is preferably in the form of radiation having a different wavelength than the excitation radiation. Emitted radiation is collected by a detector, which generates a signal proportional to the amount of radiation sensed thereon. The array may then be moved so that a different area can be exposed to the radiation source to produce a signal. Once each area of the array has been scanned, a two-dimensional image of the array is obtained. Preferably, the movement of the array is accomplished using automated equipment, such as a multi-axis translation stage, such as one which moves the array at a constant velocity. In alternative embodiments, the array may remain stationary, and devices may be employed to cause scanning of the light over the stationary array. [0155]
  • One type of detection method includes a CCD imaging system, e.g. when the nucleic acids are labeled with fluorescent probes. Other detectors are well known to those of skill in the art and also, or alternatively, be used. CCD imaging systems for use with array detection have been described. For instance, a photodiode detector may be placed on the opposite side of the array from the excitation source. Alternatively, a CCD camera may be used in place of the photodiode detector to image the array. One advantage of using these systems is rapid read time. In general, an entire 50×50 centimeter array can be read in about 30 seconds or less using standard equipment. If more powerful equipment and efficient dyes are used, the read time may be reduced to less than 5 seconds. [0156]
  • Once the data is obtained, e.g. as a two-dimensional image, a computer can be used to transform the data into a displayed image which varies in color depending on the intensity of light emission at a particular location. Any type of commercial software which can perform this type of data analysis can be used. In general, the data analysis involves the steps of determining the intensity of the fluorescence emitted as a function of the position on the substrate, removing the outliers, and calculating the relative binding affinity. One or more of the presence, absence, and intensity of signal corresponding to a label is used to assess the presence or absence of an SNP corresponding to the label in the RCG. The presence and absence of one or more SNP's in a RCG can be used to assign a genotype to the individual. For example, the following depicts the genotype analysis of 3 individuals at a given locus at which an A/G polymorphism occurs: [0157]
    Individual SNP 1 Allele “A” SNP 1 Allele “G” Genotype
    Larry + A/A
    Moe + G/G
    Curly + + A/G
  • As mentioned above, SNP analysis can be used to determine whether an individual has or will develop a particular phenotypic trait and whether the presence or absence of a specific allele correlates with a particular phenotypic trait. In order to determine which SNPs are related to a particular phenotypic trait, genomic samples are isolated from a group of individuals which exhibit the particular phenotypic trait, and the samples are analyzed for the presence of common SNPs. The genomic sample obtained from each individual is used to prepare a RCG. These RCGs are screened using panels of SNPs in a high throughput method of the invention to determine whether the presence or absence of a particular allele is associated with the phenotype. In some cases, it may be possible to predict the likelihood that a particular subject will exhibit the related phenotype. If a particular polymorphic allele is present in 30% of individuals who develop Alzheimer's disease, then an individual having that allele has a higher likelihood of developing Alzheimer's disease. The likelihood can also depend on several factors such as whether individuals not afflicted with Alzheimer's disease have this allele and whether other factors are associated with the development of Alzheimer's disease. This type of analysis can be useful for determining a probability that a particular phenotype will be exhibited. In order to increase the predictive ability of this type of analysis, multiple SNPs associated with a particular phenotype can be analyzed. Although values can be calculated, it is enough to identify that a difference exists. [0158]
  • It is also possible to identify SNPs which segregate with a particular disease. Multiple polymorphic sites may be detected and examined to identify a physical linkage between them or between a marker (SNP) and a phenotype. Both of these are useful for mapping a genetic locus linked to or associated with a phenotypic trait to a chromosomal position and thereby revealing one or more genes associated with the phenotypic trait. If two polymorphic sites segregate randomly, then they are either on separate chromosomes or are distant enough, with respect to one another on the same chromosome that they do not co-segregate. If two sites co-segregate with significant frequency, then they are linked to one another on the same chromosome. These types of linkage analyses are useful for developing genetic maps. See e.g., Lander et al., PNAS (USA) 83, 7353-7357 (1986), Lander et al., Genetics 121, 185-199 (1989). The invention is also useful for identifying polymorphic sites which do not segregate, i.e., when one sibling has a chromosomal region that includes a polymorphic site and another sibling does not have that region. [0159]
  • Linkage analysis is often performed on family members which exhibit high rates of a particular phenotype or on patients suffering from a particular disease. Biological samples are isolated from each subject exhibiting a phenotypic trait, as well as from subjects which do not exhibit the phenotypic trait. These samples are each used to generate individual RCGs and the presence or absence of polymorphic markers is determined using panels of SNPs. The data can be analyzed to determine whether the various SNPs are associated with the phenotypic trait and whether or not any SNPs segregate with the phenotypic trait. [0160]
  • Methods for analyzing linkage data have been described in many references, including Thompson & Thompson, Genetics in Medicine (5th edition), W. B. Saunders Co., Philadelphia, 1991; and Strachan, “Mapping the Human Genome” in the Human Genome (Bios Scientific Publishers Ltd., Oxford) chapter 4, and summarized in PCT published patent application WO98/18967 by Affymetrix, Inc. Linkage analysis involving by calculating log of the odds values (LOD values) reveals the likelihood of linkage between a marker and a genetic locus at a recombination fraction, compared to the value when the marker and genetic locus are not linked. The recombination fraction indicates the likelihood that markers are linked. Computer programs and mathematical tables have been developed for calculating LOD scores of different recombination fraction values and determining the recombination fraction based on a particular LOD score, respectively. See e.g., Lathrop, PNAS, USA 81, 3443-3446 (1984); Smith et al., Mathematical Tables for Research Workers in Human Genetics (Churchill, London, 1961); Smith, Ann. Hum. Genet. 32, 127-1500 (1968). Use of LOD values for genetic mapping of phenotypic traits is described in PCT published patent application WO98/18967 by Affymetrix, Inc. In general, a positive LOD score value indicates that two genetic loci are linked and a LOD score of +3 or greater is strong evidence that two loci are linked. A negative value suggests that the linkage is less likely. [0161]
  • The methods of the invention are also useful for assessing loss of heterozygosity in a tumor. Loss of heterozygosity in a tumor is useful for determining the status of the tumor, such as whether the tumor is an aggressive, metastatic tumor. The method is generally performed by isolating genomic DNA from tumor sample obtained from a plurality of subjects having tumors of the same type, as well as from normal (i.e., non-cancerous) tissue obtained from the same subjects. These genomic DNA samples are used to generate RCGs which can be hybridized with a SNP-ASO, for example using the surface array technology described herein. The absence of a SNP allele in the RCG generated from the tumor compared to the RCG generated from normal tissue indicates whether loss of heterozygosity has occurred. If a SNP allele is associated with a metastatic state of a cancer, the absence of the SNP allele can be compared to its presence or absence in a non-metastatic tumor sample or a normal tissue sample. A database of SNPs which occur in normal and tumor tissues can be generated and an occurrence of SNPs in a patient's sample can be compared with the database for diagnostic or prognostic purposes. [0162]
  • It is useful to be able to differentiate non-metastatic primary tumors from metastatic tumors, because metastasis is a major cause of treatment failure in cancer patients. If metastasis can be detected early, it can be treated aggressively in order to slow the progression of the disease. Metastasis is a complex process involving detachment of cells from a primary tumor, movement of the cells through the circulation, and eventual colonization of tumor cells at local or distant tissue sites. Additionally, it is desirable to be able to detect a pre-disposition for development of a particular cancer such that monitoring and early treatment may be initiated. Many cancers and tumors are associated with genetic alterations. For instance, an extensive cytogenetic analysis of hematologic malignancies such as lymphomas and leukemias have been described, see e.g., Solomon et al., Science 254, 1153-1160, 1991. Many solid tumors have complex genetic abnormalities requiring more complex analysis. [0163]
  • Solid tumors progress from tumorigenesis through a metastatic stage and into a stage at which several genetic aberrations can occur. e.g., Smith et al., Breast Cancer Res. Terat., 18 Suppl. 1, S5-14, 1991. Genetic aberrations are believed to alter the tumor such that it can progress to the next stage, i.e., by conferring proliferative advantages, the ability to develop drug resistance or enhanced angiogenesis, proteolysis, or metastatic capacity. These genetic aberrations are referred to as “loss of heterozygosity.” Loss of heterozygosity can be caused by a deletion or recombination resulting in a genetic mutation which plays a role in tumor progression. Loss of heterozygosity for tumor suppressor genes is believed to play a role in tumor progression. For instance, it is believed that mutations in the retinoblastoma tumor suppressor gene located in chromosome 13q14 causes progression of retinoblastomas, osteosarcomas, small cell lung cancer, and breast cancer. Likewise, the short arm of [0164] chromosome 3 has been shown to be associated with cancer such as small cell lung cancer, renal cancer and ovarian cancers. For instance, ulcerative colitis is a disease which is associated with increased risk of cancer presumably involving a multistep progression involving accumulated genetic changes (U.S. Pat. No. 5,814,444). It has been shown that patients afflicted with long duration ulcerative colitis exhibit an increased risk of cancer, and that one early marker is loss of heterozygosity of a region of the distal short arm of chromosome 8. This region is the site of a putative tumor suppressor gene that may also be implicated in prostate and breast cancer. Loss of heterozygosity can easily be detected by performing the methods of the invention routinely on patients afflicted with ulcerative colitis. Similar analyses can be performed using samples obtained from other tumors known or believed to be associated with loss of heterozygosity.
  • The methods of the invention are particularly advantageous for studying loss of heterozygosity because thousands of tumor samples can be screened at one time. Additionally, the methods can be used to identify new regions of loss that have not previously been identified in tumors. [0165]
  • The methods of the invention are useful for generating a genomic pattern for an individual genome of a subject. The genomic pattern of a genome indicates the presence or absence of polymorphisms, for example, SNPs, within a genome. Genomic DNA is unique to each individual subject (except identical twins). Accordingly, the more polymorphisms that are analyzed for a given genome of a subject, the higher probability of generating a unique genomic pattern for the individual from which the sample was isolated. The genomic pattern can be used for a variety of purposes, such as for identification with respect to forensic analysis or population identification, or paternity or maternity testing. The genomic pattern may also be used for classification purposes as well as to identify patterns of polymorphisms within different populations of subjects. [0166]
  • Genomic patterns may be used for many purposes, including forensic analysis and paternity or maternity testing. The use of genomic information for forensic analysis has been described in many references, see e.g., National Research Council, The Evaluation of Forensic DNA Evidence (EDS Pollard et al., National Academy Press, DC, 1996). Forensic analysis of DNA is based on determination of the presence or absence of alleles of polymorphic regions within a genomic sample. The more polymorphisms that are analyzed, the higher probability of identifying the correct individual from which the sample was isolated. [0167]
  • In an embodiment of the invention, when a biological sample, such as blood or sperm, is found at a crime scene, DNA can be isolated and RCGs can be prepared. This RCG can then be screened with a panel of SNPs to generate a genomic pattern. The genomic pattern can be matched with a genomic pattern produced from a suspect or compared to a database of genomic patterns which has been compiled. Preferably, the SNPs used in the analysis are those in which the frequency of the polymorphic variation (allelic frequency) has been determined, such that a statistical analysis can be used to determine the probability that the sample genome matches the suspect's genome or a genome within the database. The probability that two individuals have the same polymorphic or allelic form at a given genetic site is described in detail in PCT published patent application WO98/18967, the entire contents of which are hereby incorporated by reference. Briefly, this probability defined as P(ID) can be determined by the equation: [0168]
  • P(ID)=(x 2)2+(2xy)2+(y 2)2
  • x and y in the equation represent the frequency that an allele A or B will occur in a haploid genome. [0169]
  • The calculation can be extended for more polymorphic forms at a given locus. The predictability increases with the number of polymorphic forms tested. In a locus of n alleles, a binomial expansion is used to calculate P(ID). The probabilities of each locus can be multiplied to provide the cumulative probability of identity and from this the cumulative probability of non-identity for a particular number of loci can be calculated. This value indicates the likelihood that random individuals have the same loci. The same type of quantitative analysis can be used to determine whether a subject is a parent of a particular child. This type of information is useful in paternity testing, animal breeding studies, and identification of babies or children whose identity has been confused, e.g., through adoption or inadequate record keeping in a hospital, or through separation of families by occurrences such as earthquake or war. [0170]
  • The genomic pattern may be used to generate a genomic classification code (GNC). [0171]
  • The GNC may be represented by one or more data signals and stored as part of a data structure on a computer-readable medium, for example, a database. The stored GNCs may be used to characterize, classify, or identify the subjects for which the GNCs were generated. Each GNC may be generated by representing the presence or absence of each polymorphism with a computer-readable signal. These signals may then be encoded, for example, by performing a function on the signals. [0172]
  • Accordingly, the GNCs may be used as part of a classification or identification system for subjects such as, for example, humans, plants, or animals. As discussed above, the more polymorphisms that are analyzed for a given genome of a subject, the higher probability of generating a unique genomic pattern for the individual from which the sample was isolated, and consequently, the higher the probability that the GNC uniquely identifies an individual. In such a system, a data structure may include a plurality of entries, for example, data records or table entries, where each entry identifies an individual. Each entry may include the GNC generated for the individual as well as other. The GNC or portions thereof may then be stored in an index data structure, for example, another table. A portion of a GNC may be indexed so that each GNC may be further classified by a portion of its genomic pattern as opposed to only the entire genomic pattern. [0173]
  • The data structures may then be searched to identify an individual who has committed a crime. For example, if a biological sample from the individual (such as blood) is recovered from the crime scene, the GNC of the individual may generated by the methods described herein, and a database of records including GNCs searched until a match is found. Thus, the GNCs may be used to classify individuals within a group such as soldiers in the armed forces, cattle in a herd, or produce within a specific crop. For example, the armed forces may generate a database containing the GNC of each soldier, and the database could be used to identify the soldier if necessary. Likewise, a database could be generated where records and indexes of the database include the GNCs of individual animals within a herd of cattle, so that lost or stolen animals could later be identified and returned to the proper owner. [0174]
  • The code may optionally be converted into a bar code or other human- or machine-readable form. For example, each line of a bar code may indicate the presence of specific polymorphisms or groups of specific polymorphisms for a particular subject. [0175]
  • Additionally, it is useful to be able to identify the genus, species, or other taxonomic classification to which an organism belongs. The methods of the invention can accomplish this in a high throughput manner. Taxonomic identification is useful for determining the presence and identity of a pathogenic organism such as a virus, bacteria, protozoa, or multicellular parasites in a tissue sample. In most hospitals, bacteria and other pathogenic organisms are identified based on morphology, determination of nutritional requirements or fermentation patterns, determination of antibiotic resistance, comparison of isoenzyme patterns, or determination of sensitivity to bacteriophage strains. These types of methods generally require approximately 48 to 72 hours to identify the pathogenic organism. More recently, methods for identifying pathogenic organisms have been focused on genotype analysis, for instance, using RFLPs. RFLP analysis has been performed using hybridization methods (such as southern blots) and PCR assays. [0176]
  • The information generated according to the methods of the invention and in particular the GNCs, can be included in a data structure, for example, a database, on computer-readable medium, wherein the information is correlated with other information pertaining to the genomes or the subjects or types of subjects, from which the genomes are obtained. FIG. 5 shows a [0177] computer system 100 for storing and manipulating genomic information. The computer system 100 includes a genomic database 102 which includes a plurality of records 104 a-n storing information corresponding to a plurality of genomes. Each of the records 104 a-n may store genetic information about each genome or an RCG generated therefrom. The genomes for which information is stored in the genomic database 102 may be any kind of genomes from any type of subject. For example, the genomes may represent distinct genomes of individual members of a species, particular classes of the individuals, ie., army, prisoners, etc.
  • An example of the format of a [0178] record 200 in the genomic database 102 (i.e., one of the records 104 a-n) is shown in FIG. 6A. As shown in FIG. 6A, the record 200 includes a genome identifier (Genome ID) 202 that identifies the genome corresponding to the record 200. If enough polymorphisms of the genome were analyzed to generate the spectral pattern (such that the possibility that the GNC uniquely identifies the genome is high), or if a group to which the genome belongs has few enough members, than the GNC of the genome could serve as the Genome ID 202. The record 202 also may include genomic information fields 204 a-n. The genomic information may be any information associated with the genome identified by the Genome ID 202 such as, for example, a GNC, a portion of a GNC, the presence or absence of a particular SNP, a genetic attribute (genotype), a physical attribute (phenotype), a name, a taxonomic identifier, a classification of the genome, a description of the individual from which the genome was taken, a disease of the individual, a mutation, a color, etc. Each information field 204 a-n may be used as an entry in an index data structure that has a structure similar to record 200. For example, each entry of the index data structure may include an indexed information field as a first data element, and one or more Genome IDs 202 as additional elements, such that all elements that share a common attribute are stored in a common data structure. The format of the record 200 shown in FIG. 6A is merely an example of a format that may be used to represent genomes in the genomic database 102. The amount of information stored for each record 200, the number of records 200, and the number of fields indexed may vary.
  • Further, each information field [0179] 204 a-n may include one or more fields itself, and each of these fields themselves may include more fields, etc. Referring to FIG. 6B, an embodiment of the information field 204 a is shown. The information field 204 a includes a plurality of fields 206 a-m for storing more information about the information represented by information field 204 a. Although the following description refers to the fields 206 a-m of the gene ID 204 a, such description is equally applicable to information fields 204 b-n. For example, if information field 204 a represented a GNC of the genome corresponding to the genome ID 202, then each of the fields 206 a-m may represent a portion of the GNC, a particular SNP of the genomic pattern from which the GNC was generated, a group of such SNPs, a description of the GNC, a description of a one of the SNPs, etc.
  • The fields [0180] 206 a-m of the gene ID 204 a may store any kind of value that is capable of being stored in a computer readable medium such as, for example, a binary value, a hexadecimal value, an integral decimal value, or a floating point value.
  • A user may perform a query on the [0181] genomic database 102 to search for genomic information of interest, for example, all genomes having a GNC that matches the GNC of a murder suspect. In another example, it may be known that a biological sample contains a particular sequence. That sequence can be compared with sequences in the database to identify information such as which individual the sample was isolated from, or whether the genetic sequence corresponds to a particular phenotypic trait. For example, the user may search the genomic database 102 for genetic matches to identify an individual, genotypes which correlate with a particular phenotype, genotypes associated with various classes of individuals etc. Referring to FIG. 5, a user may provide user input 106 indicating genomic information for which to search to a query user interface 108. The user input 106 may, for example, indicate an SNP for which to search using a standard character-based notation. The query user interface 108 may, for example, provide a graphical user interface (GUI) which allows the user to select from a list of types of accessible genomic information using an input device such as a keyboard or a mouse.
  • The [0182] query user interface 108 generates a search query 110 based on the user input 106. A search engine 112 receives the search query 110 and generates a mask 114 based on the search query. Example formats of the mask 114 and ways in which the mask 114 may be used to determine whether the genomic information specified by the mask 114 matches genomic information of genomes in the genomic database 102 are described in more detail below with respect to FIG. 7. The search engine 112 determines whether the genomic information specified by the mask 114 matches genomic information of genomes stored in the genomic database 102. As a result of the search, the search engine 112 generates search results 116 indicating whether the genomic database 102 includes genomes having the genomic information specified by the mask 114. The search results 116 may also indicate which genomes in the genomic database 102 have the genomic information specified by the mask 114.
  • If, for example, the [0183] user input 106 specified a sequence of a gene, a GNC, or an SNP, the search results 116 may indicate which genomes in the genomic database 102 include the specified sequence, GNC, or SNP. If the user input 106 specified particular genetic information concerning a genome (e.g., enough to identify an individual), the search results 116 may indicate which individual genome listed in the genomic database 102 matches the particular information, thus identifying the individual from whom the sample was taken. Similarly, if the user input 106 specified genetic sequences which are not adequate to specifically identify the individual, the search results 116 may still be adequate to identify a class of individuals that have genomes in the genomic database 102 that match the genetic sequence. For example, the search results may indicate that the genomic information of genomes of all Caucasian males matches the specified genetic sequence.
  • FIG. 7 illustrates a [0184] process 300 that may be used by the search engine 112 to generate the search results 116. The search engine 112 receives the search query 110 from the query user interface 108 (step 302). The search engine 112 generates the mask 114 generated based on the search query 110 (step 304). The search engine 112 performs a binary operation on one or more of the records 104 a-n in the genomic database 102 using the mask 114 (step 306). The search engine 112 generates the search results 116 based on the results of the binary operation performed in step 306 (step 308).
  • A computer system for implementing the [0185] system 100 of FIG. 5 as a computer program typically includes a main unit connected to both an output device which displays information to a user and an input device which receives input from a user. The main unit generally includes a processor connected to a memory system via an interconnection mechanism. The input device and output device also are connected to the processor and memory system via the interconnection mechanism.
  • One or more output devices may be connected to the computer system. Example output devices include a cathode ray tube (CRT) display, liquid crystal displays (LCD), printers, communication devices such as a modem, and audio output. One or more input devices may be connected to the computer system. Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet communication device, and data input devices such as sensors. The invention is not limited to the particular input or output devices used in combination with the computer system or to those described herein. [0186]
  • The computer system may be a general purpose computer system which is programmable using a computer programming language, such as for example, C++, Java, or other language, such as a scripting language or assembly language. The computer system may also include specially programmed, special purpose hardware such as, for example, an application-specific integrated circuit (ASIC). In a general purpose computer system, the processor is typically a commercially available processor, of which the series x86, Celeron, and Pentium processors, available from Intel, and similar devices from AMD and Cyrix, the 680×0 series microprocessors available from Motorola, the PowerPC microprocessor from IBM and the Alpha-series processors from Digital Equipment Corporation, are examples. Many other processors are available. Such a microprocessor executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services. The processor and operating system define a computer platform for which application programs in high-level programming languages are written. [0187]
  • A memory system typically includes a computer readable and writeable nonvolatile recording medium, of which a magnetic disk, a flash memory, and tape are examples. The disk may be removable such as, for example, a floppy disk or a read/write CD, or permanent, known as a hard drive. A disk has a number of tracks in which signals are stored, typically in binary form, i.e., a form interpreted as a sequence of one and zeros. Such signals may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program. Typically, in operation, the processor causes data to be read from the nonvolatile recording medium into an integrated circuit memory element, which is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM). The integrated circuit memory element allows for faster access to the information by the processor than does the disk. The processor generally manipulates the data within the integrated circuit memory and then copies the data to the disk after processing is completed. A variety of mechanisms are known for managing data movement between the disk and the integrated circuit memory element, and the invention is not limited to any particular mechanism. It should also be understood that the invention is not limited to a particular memory system. [0188]
  • The invention is not limited to a particular computer platform, particular processor, or particular high-level programming language. Additionally, the computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network. It should be understood that each module (e.g. [0189] 108, 112) in FIG. 5 may be a separate module of a computer program, or may be a separate computer program. Such modules may be operable on separate computers. Data (e.g. 102, 106, 110, 114, and 116) may be stored in a memory system or transmitted between computer systems. The invention is not limited to any particular implementation using software, hardware, firmware, or any combination thereof. The various elements of the system, either individually or in combination, may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor. Various steps of the process, for example, steps 302, 304, 306, and 308 of FIG. 7, may be performed by a computer processor executing a program tangibly embodied on a computer-readable medium to perform functions by operating on input and generating output. Computer programming languages suitable for implementing such a system include procedural programming languages, object-oriented programming languages, and combinations of the two.
  • The invention also encompasses compositions. One composition of the invention is a plurality of RCGs immobilized on a surface, where the plurality of RCGs are prepared by DOP-PCR. Another composition is a panel of SNP-ASOs immobilized on a surface, wherein the SNPs are identified by using RCGs as described above. [0190]
  • The invention also includes kits having a container housing a set of PCR primers for reducing the complexity of a genome and a container housing a set of SNP-ASOs, particularly wherein the SNPs are present with a frequency of at least 50 or 55% in a RCG made using the primer set. In some kits, the set of PCR primers are primers for DOP-PCR and preferably the DOP-PCR primer has the tag-(N)[0191] x-TARGET structure described herein, i.e., wherein the TARGET includes at least 7 arbitrarily selected nucleotide residues, wherein x is an integer from 3 to 9, and wherein each N is any nucleotide residue and wherein tag is a polynucleotide as described above. In some embodiments the SNPs in the kit are attached to a surface such as a slide.
  • SNPs identified according to the methods of the invention using the B1 5′ rev primer include the following: [0192]
    locus ASO Allele Strain (SEQ ID#)
    1 tttatgAaggCataaaaa A 129/ 14
    tttatgGaggCataaaaa B BS-DBA 15
    tttatgAaggTataaaaa C Spre 16
    2 ctgggctgTattcattt A 129-DBA 17
    ctgggctgCattcattt B B6 18
    tctGcctccTGagtgct C B6-129-DBA 19
    tctAcctccCAagtgct D Spre 20
    3 tagctagaAtcaagctt A BG 21
    tagctagaGtcaagctt B DBA-Spre 22
    4 gctgtgcAACaaatcac A 129/ 23
    cagctgtgc---aaatcacc B B6 24
    5 tttcgtga-tgtttctat A 129-Spre 25
    tttcgtgaAtgtttcta B BG-DBA 26
    6 cactgtctAcatcttta A B6-129 27
    cactgtctCcatcttta B DBA-Spre 28
    7 taacattcTtgaagcca A 129-DBA-Spre 29
    taacattcCtgaagcca B B6 30
    8 gcttccaTttcctaagg A 129-DBA 31
    gcttccaCttcctaagg B B6 32
    9 aggaatgGcAataatcc A B6-129 33
    aggaatgGcGataatcc B DBA 34
    aggaatgAcAataatcc C Spre 35
    ttaaattcGtaaatgga D BG-129-DBA 36
    ttaaattcAtaaatgga E Spre 37
    10 taacattcTtgaagcca A 129-DBA-Spre 38
    taacattcCtgaagcca B B6 39
    11 ttcTGtgActccaCttg A 129 40
    ttcTGtgActccaTttg B BG-DBA 41
    ttcCCtgTctccaTttg C Spre 42
    12 gtagtttgCcaggaacc A 129-Spre 43
    gtagtttgTcaggaacc B BG-DBA 44
    13 tgctactcctctctactcg A 129 45
    tgctattcctctctgctcg B BG-DBA-Spre 46
    cttgatcaccctctgatga C BS-129--DBA 47
    cttggtcaccctctaatga D Spre 48
    14 gaggtggtgcagagtga A 129-DBA 49
    gaggtggcgcagagtga B B6 50
    gaggtggcccagagtga C Spre 51
    15 cccactgaaccgcacag A 129-DBA 52
    cccactgagctgcacag B B6 53
    cccactcagccgcacag C Spre 54
    16 tgaagacacagccagcc A 129-DBA 55
    tgaagacgcagccagcc B B6 56
    tgaagacgaagccagcc C Spre 57
    17 agaagttggtaccaggg A 129/FVB/F1/cast/spre 58
    agaagttgttaccaggg B B6 59
    18 tatgattacgtaatgtt A 129/B6/F1 60
    tatgattatgtaatgtt B FVB/F1 61
    19 atgattccagtgagtta A 129/B6 62
    atgattcctgtgagtta B FVB/F1 63
    catactattaacactggaa C Cast-129 64
    catattattaacacaggaa D Spre 65
    20 gtcaagaacaggcaata A 129/bG/fl/FVB 66
    gtcaagaataggcaata B f1 67
    cagactagggaaccttc C 129 68
    cagacgagggaaccttc E Spre 69
    cagactagggagccttc D Cast 70
    21 tgtccagttgtttgcat A 129/ 71
    tgtccagtcgtttgcat B b6/fvb/f1 72
    ggggtagccagtttggt C Cast-129 73
    ggggtagcaagtttggt D Spre 74
    22 caggaagctgtagctcc A 129/f1 75
    caggaagccgtagctcc B bE/fvb 76
    cctgagcctgtctacct C Cast-129 77
    cctgagcccgtctacct D Spre 78
    23 taacattcttgaagcca A 129/FVB/F1/cast/spre 79
    taacattcctgaagcca B B6 80
    24 ccaactgaaccgcacag A 129/FVB 81
    ccaactgagctgcacag B B6 82
    gagctagctcacacattct C Cast-129 83
    gagttagctcacacgttct D Spre 84
    25 acgggggggtggcgtta A 129/f1 85
    acggggggtggcgttaa B bG/fvb/cast/spre 86
    tagacagccagcgcgtcac C Cast-129 87
    tagatagccagcgcatcac D Spre 88
    26 gcttttcttgagagtggc A 129/b6 89
    gcttttctttagagtggc B fvb 90
    gcttttcgtgagagtggc C f1 91
    27 ctacagataaagttata A 129/bS/fvb/f1 92
    ctacagatgaagttata B f1 93
    tagacctgctgctatct C Cast-129 94
    tagacctgttgctatct D Spre 95
    28 tgttgttctggcctcca A 129/F1 96
    tgttgttttggcctcca B B6 97
    ttctgagaatttgttag C 129/B6 98
    ttctgagagtttgttag D F1/spre 99
    29 caggaagcagtagctcc A 129 100
    caggaagccgtagctcc B BG/FVB/F1 101
    agagtcaggtaagttgc C Cast-129 102
    agagtcagataagttgc D Spre 103
    30 agatttcaaaaagtttt A 129/b6 104
    agattccaaaaggtttt B f1 105
    agatttcaaaaagtttt C fvb 106
    cctgaggggagcaatca D Cast-129 107
    cctgagggaagcaatca E Spre 108
    31 aaggtaagataactaag A 129.f1 109
    aaggtaaggtaactaag B b6/fvbn 110
    ggactacacagagaaac C Cast-129 111
    ggactacatagagaaac D Spre 112
    32 cccaggctacacgaggg A 129/fVb/f1 113
    cccaggctacatgaggg B b6 114
    cttaccagttgtgagac C 129 115
    cttaccacttgtgagac D Spre 116
    cttaccagtcgtgagac E Cast 117
    33 ctgccctcaggtcttta A 129 118
    ctgccctccggtcttta B b6/fvbn 119
    gcaataaaattgtttta C Cast-129 120
    gcaatgagatcgtttta D Spre 121
    34 tgttctgtggagacccc A 129/fvbn/f1/cast/spre 122
    tgttctgtagagacccc B b6 123
    35 cacattgaatcaaagcc A 129/bG/fvbn/f1 124
    cacattgagtcaaagcc B f1 125
    ggactacccacccgttc C 129 126
    gcgactgc--acccattct E Spre 127
    gcgactgccccc--attct D Cast 128
    36 cctgggccagccaggaa A 129/b6/cast 129
    cctgggcctgccaggaa B fvbn/f1/spre 130
    37 ccccaggtaaccatctt A 129/f1 131
    ccccaggtgaccatctt B b6/fvbn/cast/spre 132
    ttctgtatattagctga C Cast-129 133
    tttctatattaa--ctgac D Spre 134
    38 ggacccggacggtcttc A 129/b6 135
    ggacccggtcggtcttc B bvb/f1 136
    gtccctaatgttagcat C Cast-129 137
    gtccccaatgtcagcat D Spre 138
    39 acgggggggtggcgtta A 129/f1 139
    acgggggg-tggcgttaa B b6/fvbn/cast/spre 140
    tagacagccagcgcgtcac C Cast 141
    tagatagccagcgcatcac D Spre 142
    40 gattcttcgtgttcctt A 129-b6-F1 143
    gattcttcatgttcctt B FVBN-Cast-Spre 144
    41 tgtaaaaacttagaata A 129/b6/f1 145
    tgtaaaaatttagaata B fvbn/cast/spre 146
    42 tgtgaaagcgctcccaa A 129/fvbn/f1/cast/spre 147
    tgtgaaagtgctcccaa B b6 148
    43 caaaggctcagagaatc A 129/b6/f1 149
    caaaggcttagagaatc B fvbn 150
    ttaattctctccaaaca C 129/b6/fvb/f1 151
    ttaaggctctccggaca D f1 152
    44 ctgccaccgtgcacaca A 129/b6 153
    ctgccaccatgcacaca B fvbn/f1 154
    ccaaatattctgattcc C 129-Spre 155
    ccaaatattcttttttt D Cast 156
    45 atgagctgaccctccct A 129/BG/F1 157
    atgagctgcccctccct B FVB 158
    acactaggtaaaagctc C 129/BS/FVB/F1 159
    acactaggcaaaagctc D F1 160
    agacaccacgaccgagg E 129-Spre 161
    agacaccaagaccgagg F Cast 162
    46 gcagcgtccggttaagt A 129/f1 163
    gcagcgtctggttaagt B bG/fvbn/f1 164
    cagatactacaaggatg C 129 165
    tacagatac---aaggatgc D SPRE/Cast 166
    47 tcagctagtgtatctgt A 129/FVB/F1 167
    tcacctagtgtatttgt B B6/F1 168
    ttttttatttttggatt C 129-Cast 169
    tttt-aatttttggattt D Spre 170
    48 gatattgttttcattta A 129/ 171
    gatattgtcttcattta B b6/fvbn/f1 172
    49 agacccggtgctggtgt A 129/b6 173
    agacccggcgctggtgt B fvbn/f1/cast 174
    50 cttctaagctttgtctt A 129/fvb/f1/cast/spre 175
    cttctaagttttgtctt B b6/f1 176
    51 agttggcaaccagcatg A 129/ 177
    agttggcatccagcatg B b6/fvbn/f1 178
    ggtgaaatggtaattac C 129-Cast 179
    ggtgaaatagtaattac D Spre 180
    52 acgggatataacgagtt A 129/FVB/F1 181
    acgggatacaacgagtt B BG/cast/spre 182
    gggatacaacgagtttc C 129-Cast 183
    gggatacaccgagtttc D Spre 184
    53 gtatcttgggtgtcctg A 129/FVB/F1 185
    gtaacttgggtgttctg B B6/F1/spre 186
    gggtgtcctgccccatc C 129 187
    gggtgttctgttttatc D Spre 188
    54 tgtccagttgttttgca A 129 189
    tgtccagtcgttttgca B B6/FVB/F1/spre 190
    aagacagccggaactct C 129 191
    aagacagcaggaactct D Spre 192
    55 tgataggaccaaagaga A 129/b6/f1 193
    cgataggactaaagaga B fvbn/f1 194
    tccaaagccagggccca C 129 195
    tccaaattcagggccca D Spre 196
    56 cctgggccagccagaag A 129/B6/cast 197
    cctgggcctgccagaag B FVB/F1/spre 198
    57 gattctctgagcctttg A 129/b6/f1 199
    gattctctaagcctttg B fvbn 200
    taccattttttagatga C 129 201
    taccatttcttagatga D Spre 202
    58 ctggaagggcagtgaat A 129 203
    tctggacgagggtgaat B B6/FVB 204
    59 tagttgcagcacaaatg A 129/B6 205
    tagttgtagcacaaatg B FVB/F1 206
    60 acactaccgcacagagc A 129/b6/fvbn/f1 207
    acactaccacacagagc B f1 208
    aataataagtaaataag C 129/ 209
    aataataaataaataag D cast 210
    61 tggcagtagttgttcat A 129/b6 211
    tggcagtaattgttcat B fvbn/f1 212
    aggtatgacgtcataag C 129-Cast 213
    aggtatgatgtcataag D Spre 214
    62 gttgttgttgaagattt A 129/fvbf1/f1 215
    ttgttgttg---aagattta B b6/f1 216
    gatagtacaggtgttgtca C 129... 217
    gatggtacaggtgtcgtca D Spre 218
    63 aatataatgtaacagga A 129/F1 219
    aatataatataacagga B BS/FVB/F1 220
    64 ttaaccatttatctgat A 129/FVB 221
    ttaaccatatatctgat B B6/F2 222
    65 agagcccagcaaagttc A 129/B6 223
    agagcccaacaaagttc B FVB/F1 224
    atcccgaaccggggaaaat C 129-b6 225
    atcccaaaccgggggaaat D cast-spre 226
    66 atgacaccaccacaacc A 129 227
    atgacaccgccacaacc B B6/FVB/F1 228
    67 aggcaaacagatataac A 129/FVB/F1 229
    aggcaaacggatataac B BE/cast/spre 230
    tgtattcactaataaga C 129-Cast 231
    tgtattcattaataaga D Spre 232
    68 ttggcgtatacttcata A 129/BG/F1 233
    ttggcgtacacttcata B FVB 234
    ctcaccacgctccatct C 129 235
    ctcaccaccctccatct D Cast-Spre 236
    69 atatctaaa----ggcacag A 129/FVB 237
    tatctacataaaggcac B B6/F1/cast/spre 238
    gtgtctcctagtctccc C B6-Cast 239
    gtgtctcccagtctccc D Spre 240
    70 atgagctgaccctccct A 129/B6/F3 241
    atgagctgcccctccct B FVB/F1 242
    ggacaacatttaattgg C 129-Cast 243
    ggacaacacttaattgg D Spre 244
    71 gctttaaaatttttatt A 129 245
    gctttaaattttttatt B B6/FVB/F1 246
    aaatttgttcctaaatg C 129 247
    aaatttgtacctaaatg D Cast-Spre 248
    72 gtgttgttctggcctcc A 129/FVB/spre 249
    gtgttgttttggcctcc B B6/F1 250
    73 tgaatgacaaaaagaca A 129/B6/FVB 251
    tgaatgacgaaaagaca B F1/cast 252
    B2 5′Rev ACTGAGCCATCTCWCCAG W=A+T
    101 acttaacttaagctggc A 129/ 253
    gtacttaa-----gctggcctg B b6/fvb/f1 254
    102 actctaatatcccacag A 129/fvbn/f1 255
    actctaatctcccacag B b6 256
    cggatcggctctagttc C 129/cast 257
    cggatcagctctagttc D spre 258
    103 tcaaaccaataaggagg A 129/b6/fvb/f1 259
    tcaaaccagtaaggagg B f1 260
    104 gtgtgtgtgtggggggg A 129/f1 261
    gtgtgtgtg---gggggggt B b6/fvbn 262
    cttaataataatttcat C 129/cast 263
    cttaataacaatttcat D spre 264
    105 gtgtctccatatgtgtg A 129/b6/f1 265
    gtgtctacacatgtgtg B fvbn 266
    106 aactcatcatgatggtt A 129/ 267
    aactcataatgatggtt B b6/fvbn/f1 268
    aactcatcacgatggtt C cast 269
    atcactcatagcccaga D 129/ 270
    atcacttatagcccaga F spre 271
    atcactcatatcccaga E cast 272
    107 catcttaccagcattga A 129/cast/spre 273
    catcttactagcattga B b6/fvbn/f1 274
    108 agtcagccggctctggc A 129/b6/f1 275
    agtcagccagctctggc B fvbn/f1 276
    gggtaggagtggggatgag C 129/ 277
    gggcaggagtgggggtgag E spre 278
    gggtaggagtgggggtgag D cast 279
    109 tcagtattgttcttctc A 129/f1/spre 280
    tcagtatttttcttctc B b6/fvbn/f1/cast 281
    110 agcagagactgagctcg A 129/ 282
    agcagagaccgagctcg B b6/fvbn/f1 283
    acaggggtcgattcgtc C 129/b6/fvbn/f1/cast 284
    acagggatcgattcgtc E spre 285
    acaggggtcgtttcgtc D f1 286
    111 tcccaaagcattcaagg A 129/b6/f1 287
    tcccaaagtattcaagg B fvbn/f2 288
    gaccagggttaatgact C 129/b6 289
    gaccagggctaatgact D cast/spre 290
    112 ctattaacagagtcgag A 129/b6/f1 300
    ctattaacggagtcgag B fvbn 301
    gtgatactggatgtctg C 129/b6 302
    gtgataccg-atgtctgg D cast/spre 303
    113 ctctctcgatagtctaa A 129/f1 304
    ctctctcgctagtctaa B b6/fvbn/f1/cast 305
    tctctcgatagtctaat C 129/ 306
    tctctcgctggtctaat D cast 307
    114 agatgcaaaattcttag A 129/ 308
    agatgcacagttcttag B b6/fvbn/f1 309
    115 ggaaaatgctcaggtag A 129/f1/cast/spre 310
    ggaaaatgttcaggtag B b6/fvbn 311
    116 tctgggcagagtgCagg A 129/ 312
    tctgggcagcgtgcagg B b6/fvb/f1 313
    117 tatggaacggttgcttc A 129/fvb 314
    tatggaactgttgcttc B b6/f1 315
    aagcctggtacccgctg C 129/cast 316
    aagcctggcacccgctg D spre 317
    118 cattcttctttttctga A 129/ 318
    cattcttcgttttctga B b6/fvbn/f1/cast/spre 319
    ctgcaggcttgtctgtg C 129/CAST 320
    ctgcaggtttgtctgtg D spre 321
    119 tgccatttcctataaca A 129/f1 322
    tgccatttgctataaca B b6/fvbn 323
    120 ccgccacacccgctcct A 129/b6 324
    ccgccacagccgctcct B fvbn/f1 325
    121 caaataatgctagttat A 129/b6/f1 326
    caaataatgttagttat B fvbn 327
    122 ggatgttgacacgctac A 129/fvbn/f1 328
    ggatgttgtcacgctac B b6/f1 329
    catgtgtc-caacgccat C 129/ 330
    catgtgtcacaacgcca D cast/spre 331
    123 aaaggggccttaaagga A 129/fvbf1/f1 332
    aaaggggctttaaagga B b6 333
    tgaaaagttcttttcat C 129/cast 334
    tgaaaagtacttttcat D spre 335
    124 cctctctatgtgtgagc A 129/b6/f1 336
    cctctctacgtgtgagc B fvbn 337
    gaagttttaggagattct-t C 129/ 338
    gaagatttaggagagtctc D spre 339
    125 agggatgtattttgtta A 129/fvbn/f1 340
    agggatgtgttttgtta B b6 341
    acaattcaaatgtatat C 129/cast 342
    acaattcatatgtatat D spre 343
    126 cttgcctaacctgcaca A 129/b6/f1 344
    cttgcctagcctgcaca B fvbn 345
    caacagc---acctcatatc C 129/b6/cast 346
    acagcggtgcctcgtat D spre 347
    127 actcacagtgtcagggc A 129/fvbn/f1/spre 348
    actcacagcgtcagggc B b6/cast 349
    128 ggctgctcctgtgtgtctg A 129/fvbf1/f1/cast 350
    ggctcttcctgtgtgtctg B b6 351
    ggctgctcctgtgtttctg C spre 352
    129 aatagatgcccttctga A 129/f1 353
    aatagatgccctcttga B b6/fvbn 354
    aatcgatgcccttctga C spre 355
    130 ttggtctagcaggtagc A 129/fvbf1/f1 356
    ttggtctaccaggtagc B b6 357
    agccttggctcttaaaa C 129/cast 358
    agccttggttcttaaaa D spre 359
    131 agtctctggcgcctttg A 129/fvbn/f1/Cast/spre 360
    agtctctgccgcctttg B b6 361
    132 tagcaggaggcacagctta A 129/ 362
    aagcaggaggcacaactta B b6 363
    aagcaggaggcacagctta C fvb/f1/CAST 364
    tagcaggaggcacagcttg D spre 365
    133 aggagagaccggactcc A 129/fvb/f1 366
    aggagagagcggactcc B b6 367
    134 tacaagtcatccttcct A 129/b6/f1 368
    tacaagtcgtccttcct B fvbn/f1 369
    atacctccctcagacaa C 129/cast 370
    atacctcc-tcagacaag D spre 371
    135 aaacaaacaaacaaacc A 129/b6/f1/cast/spre 372
    aaacaaaccaacaaacc B fvbn 373
    gtgcgccaccatgacca C 129/cast 374
    gtgcgccatcatgacca D spre 375
    136 ggctttcccattagtgg A 129/ 376
    ggctttcctattagtgg B b6/fvbn/f1 377
    ccctcacctctctctca C 129/cast 378
    ccctcacccctctctca D spre 379
    137 aatctctcgcgttcatt A 129/fvbn/f1 380
    aatctctcacgttcatt B b6 381
    138 aatgataccgatcctta A 129/f1 382
    aatgatacagatcctta B b6/fvbn 383
    ataaaactgcattcgtg C 129/b6 384
    ataaaactacattcgtg D cast/spre 385
    B1Musch AGTTCCAGGACAGCCAGG
    201 atatctccgactttgaa A 129/cast 386
    atatctccaactttgaa B b6/fvb/f1/spre 387
    tggccctgcagagtctg C 129-Cast 388
    tggctctgcagag-ctgg D Spre 389
    202 caatggatc---aaagatgc A 129-FVB-F1 390
    atggatcaacaaagatg B B6 391
    gctgcctc--aaggtataa C 129/be 392
    ctgcctcttaaggtata D cast/spre 393
    203 acctatggctcctcatc A 129/b6/f1 394
    acctatggttcctcatc B fvb 395
    tcttctcccctgcttta C 129-Cast 396
    tcttctcac-tgctttag D spre 397
    204 ccgc-ataaaaagctgag A FVB-F1 398
    ccgccataaaa-gctgag B B6-F1 399
    agaatatagggtttttt C 129/cast 400
    tagaatacag--ttttttt D spre 401
    205 agagttgctgtgcaggg A 129/b6/f1 402
    agagttgccgtgcaggg B fvb/cast 403
    agagttgcagtgcaggg C spre 404
    206 taagcagtgttcttggc A 129-B6-F1 405
    taagcagtattcttggc B FVBN 406
    tcttctcccctgcttta C 129/cast 407
    tcttctcac-tgctttag D spre 408
    207 tttttttttattattga A 129/fvb/f1 409
    tttttttt-attattgaa B b6 410
    tgtggtacgcacatctg C 129-Cast 411
    tgtggtacacacatctg D Spre 412
    208 agactcttagacttctg A 129/f1 413
    agactcttaggcttctg B b6/fvb/f1 414
    agactcataagcttctg C spre 415
    agactcttaggcttctg D cast 416
    209 cacgtacccgaacgtga A 129-B6 417
    cacgtacctgaacgtga B FVB-F1 418
    attacggtttgtcgtca C 129/CAST 419
    attacggttggtcgtca D spre 420
    210 ccaagatacgaaaccag A 129/f1/cast/spre 421
    ccaagatatgaaaccag B b6 422
    211 tgcaatgaccagcaacc A 129/b6 423
    tgcaacgaccagcaacc B fvb/f1/cast 424
    tgtaacgaccaacaact C spre 425
    212 tctaaagggaaagatgg A 129-FVB 426
    tctaaagg-aaagatgga B B6-F1 427
    213 ctggactcatacataca A 129-FVB-F1 428
    ctggactcgtacataca B B6-F1-Cast/SPRE 429
    agtttggtcccctggac C 129/FVB/BG-F1-Cast 430
    agtttggtttcctggac D Spre 431
    214 tatagcttcatgtaaaa A 129/fvb/f1/cast/spre432
    tatagctttatgtaaaa B b6 433
    215 tttttttt-attattgaa A 129 434
    tttttttttattattga B B6-FVB-F1 435
    actcattgccaatttaa C 129 436
    actcattcagaatttaa D spre/CAST 437
    216 atgcgtaatgggggcta A 129 438
    atgcgtaacgggggcta B bS/fvb/f1/cast/SPRE 439
    ataattgctcttttaaa C 129/b6/fvb/f1/cast 440
    gtaattgctcttttaaa D spre 441
    217 tctgattagtgatggat A 129-F1 442
    tctgatta-tgatggatt B B6 443
    agcagagtgtctcgtaa C 129 444
    agcagagtatctcgtaa D spre/CAST 445
    218 gctggcagatatcggta A 129/b6/f1 446
    gctggcaggtatcggta B fvb/cast 447
    219 aactgcaatgaccagca A 129-B6 448
    aactgcaacgaccagca B FVB-F1 449
    gctggtcattgcagttt C 129 450
    gttggtcgttacagttt D spre 451
    gctggtcgttgcagttt E cast 452
    220 gctggcagatatcggta A 129-B6-F1 453
    gctggcaggtatcggta B FVB 454
    atagaaagtccaccgtc C 129/cast 455
    atagaaagcccaccgtc D spre 456
    221 ttagtgaccgtgtaaac A 129/b6/f1 457
    ttagtgactgtgtaaac B fvb 458
    ggggaggagctttgttc C 129-Cast 459
    ggggaggatctttgttc D Spre 460
    222 ggcctggacacaaaagc A 129/fvb/f1 461
    ggcctggaaacaaaagc B b6 462
    cccttttctagtattgt C 129 463
    cccttttccagtattgt D Cast-Spre 464
    223 gaattggttttaggaat A 129-F1-Cast-Spre 465
    gaattggtattaggaat B B6 466
    224 acccagctttccatggt A 129/f1 467
    acccagctctccatggt B b6/fvb/CAST 468
    225 tcacgttcgggtacgtg A 129/b6/f1 469
    tcacgttcaggtacgtg B fvb/f1 470
    tgccttccggttggcaa C 129-Cast 471
    tgccttccagttggcaa D spre 472
    226 ttttatcatacaattgc A 129-F1 473
    ttttatcagacaattgc B B6-FVB-F1 474
    227 atcttctcttctttgag A 129/f1 475
    atcttctcctctttgag B b6/fvb 476
    cagtcctctgctttctC C 129-Cast 477
    cagtcctcagctttctc D Spre 478
    228 ccaagatacgaaaccag A 129/f1/spre 479
    ccaagatatgaaaccag B b6 480
    229 ggtattcaagggttact A 129/cast/spre 481
    ggtattca-gggttactg B b6/fvb 1bp del. 482
    230 acctatggctcctcatc A 129/b6/f1/cast 483
    acctatggttcctcatc B fvb 484
    231 ttttatcatacaattgc A 129/f1 485
    ttttatcagacaattgc B b6/fvb 486
    232 aaccagggcttaagtct A 129 487
    aaccagggattaagtct B b6/fvb/f1 488
    cagaaaaacagatatac C 129-BG-FVB-F1 489
    cagaaaaagagatatac D Spre 490
    234 tctgagcgtgagtgctg A 129/fvb 491
    tctgagcgcgagtgctg B b6/f1/cast/spre 492
    acctcagaagcggaggt C 129-B6-FVB-F1 493
    acctcggaaggggaggt D Spre 494
    acctcggaagcggaggt E Cast 495
    235 taactcgatcgctatca A 129-BG-F1 496
    taactcgcttgctatca B FVBN-Cast 497
    taactcgctcgctatca C Spre 498
    236 gaatttctcaacttctt A 129/fvb/f1/spre 499
    gaatttctgaacttctt B b6/f1 500
    237 caggggtccccaatttg A 129/f1/SPRE 500
    caggggtctccaatttg B b6/fvb 501
    238 ttttgctgtgc-aggcta A 129-B6-F1 502
    ttttactgtgccaggct B FVB 503
    gacagccctgtctcaaa C 129/cast 504
    agagaaaccctgtctca D spre 505
    239 gcaccggtctgagcagt A 129/f1 506
    gcaccggtttgagcagt B b6/fvb/f1 507
    ccgtgcccctgaacaat C 129-B6-FVB-F1-Cast 508
    ccgtgcccttgaacaat D Spre 509
    240 tcacgttcgggtacgtg A 129/b6/f1 510
    tcacgttcaggtacgtg B fvb/f1 511
    tgattcgctgggactct C 129-Cast 512
    tgattcgccgggactct D Spre 513
    241 ttgatatccgaggcctt A 129/bE/fvb/f1 514
    ttgatatctgaggcctt B f1/cAST/SPRE 515
    242 tccctgggccaagcata A 129/b6/fvb 516
    tccctgggtcaagcata B f1 517
    243 ttatggctgaggatcac A 129-B6-F1-Cast 518
    ttatggctgcggatcat B FVB 519
    ttatggcaggggatcac C Spre 520
    244 ctctctgcgctgaagca A 129/b6 521
    ctctctgctctgaagca B fvb/f1 522
    agatacagagatgtgtt C 129-BE-FVB-F1 523
    agatactgaggtgtgtt D Spre 524
    245 cgacatctggcagatgt A 129/f1 525
    cgacatctagcagatgt B b6/fvb 526
    gtcacaaatagtatttc C 129/cast 527
    gtcacaaagagtatttc D spre 528
    246 aaggtgtgtgcgtgtgt A 129/f1 529
    aaggtgtgcgcgtgtgt B fvb 530
    247 agtcttttttttcctga A 129-B6-FVB 531
    tagtc-tttttttt-cctgaa B F1 532
    248 caggctgtgggaggctt A 129/b6/f1 533
    caggctgcggaaggctt B fvb 534
    ctgtaagtcattcaata C 129-B6-FVB-F1-Cast 535
    ctgtaagtaattcaata D Spre 536
    249 caggggtccccaatttg A 129/f1 537
    caggggtctccaatttg B b6/fvb 538
    250 gactcatggccgccttg A 129 539
    gactcattgccgcctgg B B6-FVB-F1 540
    gactcctggccgcctgg C F1 541
    gactcctggctgcctgg D Spre 542
    gactcctggccgcctgg E Cast 543
    251 acagggga-ggaaggaag A 129 544
    acaggggaaggaaggaa B b6/fvb/f1 545
    252 ttgatatagattgattc A 129/b6/f1 546
    ttgatatatattgattc B fvb/f1 547
    atagaacagcaaagtaa C 129-B6-FVB-F1-Cast 548
    atagaacaacaaagtaa D Spre 549
    253 aacaagcatctatggat A 129/fvb/f1 550
    aacaagcacctatggat B b6 551
    DOP
    300 gagcaggttaagcgatg A 129/ 552
    gagcaggtgaagcgatg B B6 553
    301 ggcttccagcttgattc A 129/ 554
    ggcttccaacttgattc B B6 555
    302 agatagggatgaatccc A 129/ 556
    agataggggtgaatccc B B6 557
    303 tcattcaccgtttattg A 129/ 558
    tcattcactgtttattg B B6 559
    304 ctgacatactgcttagg A 129/ 560
    ctgacatattgcttagg B B6 561
    305 ctaggaaagcctaaatt A 129/ 562
    ctaggaaaacctaaatt B B6 563
    306 atgtcaggattttaaga A 129/ 564
    atgtcagggttttaaga B B6 565
    307 ggtttccaattggaaag A 129/ 566
    ggtttccagttggaaag B B6 567
    308 cgaggagtgcaaagcga A 129/ 568
    cgaggagtccaaagcga B B6 569
    309 tgtgtgtgtgtctgtct A 129/ 570
    tgtgtgtgcgtctgtct B B6 571
    310 gcaagatgcagctgcat A 129/ 572
    gcaagatgtagctgcat B B6 573
    311 gctggggctattctgta A 129/ 574
    gctggggccattctgta B B6 575
    312 caataacggacctgcct A 129/ 576
    caataacgaacctgcct B B6 577
    313 tagcctctctacatagg A 129/ 578
    tagcctctgtacatagg B B6 579
  • Other SNPs identified using the BJ1 DOP-PCR Primer include: [0193]
    SNPs present within DOP-PCR using primer BJ1
    Genotype of CEPH individuals:
    ASO name ASO sequence 12-01 104-01 884-01 1331-01 SEQ ID #
    3A-G CATCTATAGGTTCACT GT TT TT TT 580
    3A-T CATCTATATGTTCACTT 581
    5A-C GCCAACAACATTGAGA GG CG GG GG 582
    5A-G GCCAACAAGATTGAGAG 583
    7A-C GGGTCGTGCGTCCCCC TT CT TT TT 584
    7A-T GGGTCGTGTGTCCCCCT 585
    9A-A ATTGTCTCACATTTCT AA GG AA AA 586
    9A-G ATTGTCTCGCATTTCTT 587
    12A-C GGTGTGGTCGCAGPAG CC CC CT CT 588
    12A-T GGTGTGGTTGCAGAAGG 589
    15A-A TCATTGCCACACTTG AA GG AA GG 590
    15A-G TCATTGCCGCACTTGPA 591
    20A-A ATCTGTCTACAATGAT AG GG AA AG 592
    20A-G ATCTGTCTGCAATGATC 593
    22A-A GGCTGGGCACAGTGGC AA GG AA AA 594
    22A-G GGCTGGGCCCAGTGGCT 595
    34A-A CAGCCTGGAGAACAAG CC CC CC AC 596
    34A-C CAGCCTGGCGAACAAGT 597
    39A-C TTTGACACCCGGAAGC CT CC CC CC 598
    39A-T TTTGACACTCGGAAGCT 599
    40A-C CTGCCTTTCATACTGC CT TT CT TT 600
    40A-T CTGCCTTTTATACTGCC 601
    40B-C ACAATAGACGTTCCCC TT CT TT CT 602
    40B-T ACAATAGATGTTCCCCG 603
    41A-A GGTGTTTGATTTGTAC CC AC CC CC 604
    41A-C GGTGTTTGCTTTGTACT 605
    42A-A TCCAACTCAAAAAATG AT AA AT AT 606
    42A-T TCCAACTCTAAAAATGT 607
    44A-C GGGCCGCTCACAGTCC CC CT CC CC 608
    44A-T GGGCCGCTTACAGTCCA 609
    44B-C GCATGGCTCGTGGGTT CT CT TT CT 610
    44B-T GCATGGCTTGTGGGTTT 611
    46A-G GTTGGGAAGTGGAGCG GG TT GG TT 612
    46A-T GTTGGGAATTGGAGCGG 613
    50A-A AAGGGATGAGGATGTG AG AA AA AG 614
    50A-G AAGGGATGGGGATGTGA 615
    50B-A TCCTCGAGAGCTTTGC AG AG AA AG 616
    50B-G TCCTCGAGGGCTTTGCT 617
    51A-C TGACAATGCGTGCCC CT CC CC CC 618
    51A-T TGACAATGTGTGCCCAA 619
    53A-A TCCATGTCATAGATTT AG AA AA AA 620
    53A-G TCCATGTCGTAGATTTC 621
    66A-A TGGAGGACAGTGGAGGG TT TT TT AT 622
    66A-T TGGAGGACTGTGGAGGG 623
    69A-C ACCCATTTCCTGAAAA TT CT TT TT 624
    69A-T ACCCATTTTCTGAAAAT 625
    71A-G CTGAGTTCGGCACTGC TT GG GG TT 626
    71A-T CTGAGTTCTGCACTGCT 627
    71B-G ACCAGTTTGGCTCAAA GG TT TT GG 628
    71B-T ACCAGTTTTGCTCAAAG 629
    72A-A CCAATCAGAACGTGCA AA GG GG AA 630
    72A-G CCAATCAGAGCGTGCAG 631
    73A-A ACCCACACAGACACTG AA AT TT AT 632
    73A-T ACCCACACTGACACTGC 633
    81A-C GGACAAAGCGCTGGTG TT CT CC CT 634
    81A-T GGACAAAGTGCTGGTGT 635
    81C-C AGCTGGTCCCCCTMCCC TT CT CC CC 636
    81C-T AGCTGGTCTCCCTMCCC 637
    90A-A GGTGTAGTAAGCACAG AA AA AC AA 638
    90A-C GGTGTAGTCAGCACAGC 639
    91A-C AGCGAACACGGGGG CC CC TT CC 640
    91A-T AGCGAACATGGGGGAAA 641
    98D-A GTGACAGCACCAAACT GG AG GG GG 642
    98D-G GTGACAGCGCCAAACTT 643
    101A-C GTCTGTTGCTGTTATT TT TT TT CT 644
    101A-T GTCTGTTGTTGTTATTT 645
    111A-A ACCAGCATAGCCCAGA GG GG GG AG 646
    111A-G ACCAGCATGGCCCAGAG 647
    111B-A CGTAGGAGACAAGACC GG GG GG AG 648
    111B-G CGTAGGAGGCAAGACCT 649
    117A-A CTCTGCTGAATCTCCCA GG GG AG 650
    117A-G CTCTGCTGGATCTCCCA 651
    124A-A AAGCAAAGACTGATTC TT AT TT TT 652
    124A-T AAGCAAAGTCTGATTCA 653
    125A-A AGGCAGCTAGAGGGAG CC AA AC AA 654
    125A-C AGGCAGCTCGAGGGAGA 655
    130C-C TTCCATTCCGTTCAAT TT TT TT CC 656
    130C-T TTCCATTCTGTTCAATT 657
    130D-C TATTGTTACTGATTTT CT CT CT TT 658
    130D-T TATTGTTATTGATTTTG 659
    136A-A GAGCTTTCAGAGGCTG AA AG AG AG 660
    136A-G GAGCTTTCGGAGGCTGA 661
    137A-A GGGGGAAGATATGGAG GG AG AA AG 662
    137A-G GGGGGAAGGTATGGAGT 663
    143A-C CATGGCCTCGTGGGTT TC TC TT TC 664
    143A-T CATGGCCTTGTGGGTTT 665
    147B-A GGGKAGGGAGACCAGC AA AG GG GG 666
    147B-G GGGKAGGGGGACCAGCT 667
    147C-A GCAGTGTCAGTGTGGG TT AT AA AT 668
    147C-T GCAGTGTCTGTGTGGGT 669
    147D-A ACACCAGCACTTTGAT AA AG GG AG 670
    147D-G ACACCAGCGCTTTGATC 671
    151A-A CCTTCTGCAACCACAC GG GG AG AG 672
    151A-G CCTTCTGCGACCACACC 673
    163A-A AAATTCGCAGGAGCCG GG AG GG GG 674
    163A-G AAATTCGCGGGAGCCGA 675
    164B-A AGGTCTAGACGCTCAC AG GG AG GG 676
    164B-G AGGTCTAGGCGCTCACC 677
    164C-A GGAGGAACACTTCAAA GG AG GG GG 678
    164C-G GGAGGAACGCTTCAAAC 679
    170A-A TTTGTGCTATACCTTG AA AG AG AG 680
    170A-G TTTGTGCTGTACCTTGA 681
    179A-C ATGATGCACACACCCT CT CC TT CC 682
    179A-T ATGATGCATACACCCTG 683
    181B-C TATTGCTCCGCCTCCT CT TT CC TT 684
    181B-T TATTGCTCTGCCTCCTC 685
    181D-C CTCAGAGACTGTGTGC CG CC CC CC 686
    181D-G CTCAGAGAGTGTGTGCC 687
    187A-C ATCTTCTGCGTCACTC CT CT CC CC 688
    187A-T ATCTTCTGTGTCACTCA 689
    187B-A CAGCATCTAGTAACCA AG AA GG AG 690
    187B-G CAGCATCTGGTAACCAC 691
    190A-C ATTAGTGCCAAATACA CC CC CT CT 692
    190A-T ATTAGTGCTAAATACAT 693
    195B-A TGCTCCACAGCAGCCG AT TT TT TT 694
    195B-T TGCTCCACTGCAGCCGT 695
    196A-A TAGGGGAGAATCTGTT CC AC AC AA 696
    196A-C TAGGGGAGCATCTGTTT 697
  • The invention also encompasses a composition comprising a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment including a (N)[0194] x-TARGET polynucleotide structure as described above, i.e., wherein the TARGET portion is identical in all of the DNA fragments of each RCG, the portion includes at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein each N is any nucleotide residue. Preferably the TARGET portion includes at least 8 nucleotides residues.
  • In other aspects, the invention includes a method for performing DOP-PCR. The prior art DOP-PCR technique was originally developed to amplify the entire genome in cases where DNA was in short supply. This method is accomplished using a primer set wherein each primer has an arbitrarily selected six nucleotide residue portion, at its 3′ end. The complexity of the resultant product is extremely high due to the short length and results in amplification of the genome. By increasing the length of the arbitrarily selected of the DOP-PCR primer from 6 nucleotides to 7, and preferably 8, or more nucleotide residues the complexity of the genome is significantly reduced. [0195]
  • EXAMPLES Example 1 Identification and Isolation of SNPs
  • High allele frequency SNPs are estimated to occur in the human genome once every kilobase or less (Cooper et al., 1985). A method for identifying these SNPs is illustrated in FIG. 1. As shown in FIG. 1, inter-Alu PCR was performed on genomes isolated from three unrelated individuals. The PCR products were cloned, and a mini library was made for each of the 3 individuals. The library clone inserts were PCR-amplified and spotted on nylon filters. Clones were matched by hybridization into two sets of identical clones from each individual, for a total of 6 clones per matched clone set. These sets of clones were sequenced, and the sequences were compared in order to identify SNPs. This method of identifying SNPs has several advantages over the prior art PCR amplification methods. For instance, a higher quality sequence is obtained from cloned DNA than is obtained from cycle sequencing of PCR products. Additionally, every sequence represents a specific allele, rather than potentially representing a heterozygote. Finally, sequencing ambiguities, Taq polymerase errors, and other source of sequence error particular to one representation of the sequence are reduced by application of an algorithm which requires that the same variant sequence be present in at least 2 of the 6 clones sampled. [0196]
  • In general, the Alu PCR method for identifying SNPs can be performed using genomic DNA obtained from independent individuals, unrelated or related. Briefly, Alu PCR is performed which yields a product having an estimated complexity of approximately 100 different single copy genomic DNA sequences and an average sequence length of between about 500 base pairs and 1 kilobase pairs. The PCR products are cloned, and a mini library is made for each individual. Approximately 800 clones are selected from each library and transferred into a 96-well dish. Filter replicas of each plate are hybridized with PCR probes from individual clones selected from one of the libraries in order to create a matched clone set of 6 clones, 2 from each individual. Many sets of clones can be isolated from these libraries. The clones can be sequenced and compared to identify SNPs. [0197]
  • Methods [0198]
  • An Alu primer designated primer 8C was designed to produce an Alu PCR product having a complexity of approximately 100 independent products. Primer 8C (having the nucleotide sequence CTT GCA GTG AGC CGA GATC; SEQ ID NO: 3) is complementary with base pairs 218-237 of the Alu consensus sequence (Britten et al., 1994). In order to reduce the complexity of the product, however, the last base pair of the primer was selected to correspond to base pair 237 of the consensus sequence, a nucleotide which has been shown to be highly variable among Alu sequences. Primer 8C therefore produces a product having complexity lower than that produced using Alu primers which match a segment of the Alu sequence in which there is little variation in nucleotide sequence among Alu family members. [0199]
  • Preliminary experiments were conducted to estimate the complexity of the product produced by Alu PCR reaction with primer 8C on the CEPH Mega Yacs. These preliminary experiments confirmed that primer 8C produced a lower number of Alu PCR products than other Alu PCR primers closely matching less variable sequences in the Alu consensus. [0200]
  • Three libraries of Alu PCR products were produced from inter-Alu PCR reactions involving genomic DNA derived from three unrelated CEPH individuals designated 201, 1701, and 2301. The reactions were performed at an annealing temperature of 58° C. for 32 cycles using the 8C Alu primer. Each set of PCR reaction products was purified by phenol:chloroform extraction followed by ethanol precipitation. The products were shotgun cloned into the T-vector pCR2.1 (Invitrogen); electroporated into [0201] E. Coli strain DH10B Electromax ampicillin-containing LB agar plates. 768 colonies were picked from each of the three libraries into eight 96-well format plates containing LB+ampicillin and grown overnight. The following day, an equal volume of glycerol was added and the plates were stored at −80° C. An initial survey of the picked clones indicated an average insert size of between 500 base pairs and 1 kilobase pair.
  • To identify matching clones in each library, 1 microliter of an overnight culture made from each library plate well was subjected to PCR amplification using vector-derived primers. Amplified inserts were spotted onto Hybond™ N+filters (Amersham) using a 96-pin replicating device such that each filter had 384 products present in duplicate. The DNA was subjected to alkali denaturation by standard methods and fixed by baking at 80° C. for 2 hours. Individual inserts derived from the library were radiolabeled by random hexamer priming and used as probes against the three libraries (6 filters per probe). Hybridization was carried out overnight at 42° C. in buffer containing 50% formamide as described in Sambrook et al. The following day, the filters were washed in 2× standard saline citrate (SSC), 0.1% SDS at room temperature for 15 minutes, followed by 2 washes in 0.1×SSC, 0.1% SDS at 65° C. for 45 minutes each. The filters were then exposed to Kodak X-OMAT X-ray film overnight. [0202]
  • Results [0203]
  • FIG. 2 shows the data obtained for identification of SNPs. The results of the gel electrophoresis of inter-Alu PCR genomic DNA products prepared using the 8C primer is shown in FIG. 2A. Mini libraries were prepared from the Alu PCR genomic DNA products. Colonies were picked from the libraries, and inserts were amplified. The inserts were separated by gel electrophoresis to demonstrate that each was a single insert. The gel is shown in FIG. 2B. [0204]
  • Once the individual amplified inserts were spotted on Hybond™ N+filters, the inserts were radiolabeled by random hexamer primary and used as probes of the entire contents against the three mini libraries. One of the filters, having 2 positive or matched clones, is shown in FIG. 2C. [0205]
  • The results of screening 330 base pairs of genomic DNA by the matched clone method led to the identification of 6 SNPs, 4 in single copy DNA, 2 in the flanking Alu sequence. These observations were consistent with the projected rate of SNP currents of 1 high frequency SNP per 1,000 base pairs or less. The single copy SNPs identified are presented below in Table I. [0206]
    TABLE I
    CEPH
    Individual
    1 2 3 4
    201 taagtGtacaa(SEQ cccacGgagaa aattgCttccc aaattCaatgt (SEQ
    ID NO.5) (SEQ ID NO.7) (SEQ ID NO.9) ID NO. 11)
    taagtGtacaa cccacGgagaa aattgCttccc aaattCaatgt..
    (SEQ ID NO.5) (SEQ ID NO.7) (SEQ ID NO.9) (SEQ ID NO.11)
    1701 taagtAtacaa cccacAgagaa aattgCttccc(SEQ aaattCaatgt..
    (SEQ ID NO.6) (SEQ ID NO.8) ID NO. 9) (SEQ ID NO.11)
    taagtGtacaa cccacGgagaa aattgTttccc (SEQ aaattCaatgt..
    (SEQ ID NO.5) (SEQ ID NO.7) ID NO.10) (SEQ ID NO.11)
    2301 taagtGtacaa cccacAgagaa aattgCttccc aaattAaatgt..
    (SEQ ID NO.5) (SEQ ID NO.8) (SEQ ID NO.9) (SEQ ID. NO.12)
    taagtGtacaa cccacGgagaa aattgTttccc aaattCaatgt..
    (SEQ ID NO.5) (SEQ ID NO.7) (SEQ ID NO.10) (SEQ ID NO.11)
  • To verify the identities of the SNPs shown in Table I, specific primers were synthesized which permitted amplification of each single copy locus. Cycle sequencing was then performed on PCR products from each of the three unrelated individuals, and the site of the putative SNP was examined. In all cases, the genotype of the individual derived by cycle sequencing was consistent with the genotype observed in the matched clone set. [0207]
  • Example 2 Allele-Specific Oligonucleotide Hybridization to Alu PCR SNPs
  • Methods [0208]
  • Inter-Alu PCR was performed using genomic DNA obtained from 136 members of 8 CEPH families ([0209] numbers 102, 884, 1331, 1332, 1347, 1362, 1413, and 1416) using the 8C Alu primer, as described above. The products from these reactions were denatured by alkali treatment (10-fold addition of 0.5 M NaOH, 2.0 M NaCl, 25 mM EDTA) and dot blotted onto multiple Hybond™ N+filters (Amersham) using a 96-well dot blot apparatus (Schleicher and Schull). For each SNP, a set of two allele-specific oligonucleotides consisting of two 17-residue oligonucleotides centered on the polymorphic nucleotide residue were synthesized. Each filter was hybridized with 1 picomole 32P-kinase labeled allele-specific oligonucleotides and a 50-fold excess of non-labeled competitor oligonucleotide complementary to the opposite allele (Shuber et al., 1993). Hybridizations were carried out overnight at 52° C. in 10 mL TMAC buffer 3.0 M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO4, pH 6.8, 5× Denhardt's solution, 40 micrograms/milliliter yeast RNA). Blots were washed for 20 minutes at room temperature in TMAC wash buffer (3 M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM Na3PO4 pH 6.8) followed by 20 minutes at 52° C. (52° C.-52° C. is optimal). The blots were then exposed to Kodak X OMAT AR X-ray film for 8-24 hours and genotypes were determined by the hybridization pattern.
  • Results [0210]
  • The results of the genotyping and mapping are shown in FIG. 3. In order to determine the map location of the SNP, the genotype data determined from CEPH families number 884 and 1347 were compared to the CEPH genotype database version 8.1 (HTTP:www.cephb.fr/cephdb/) by calculating a 2 point lod score using the computer software program MultiMap version 2.0 running on a Sparc Ultra I computer. This analysis revealed a linkage to marker D3S1292 with a lod score of 5.419 at a theta value of 0.0. To confirm this location, PCR amplification of the CCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel (Research Genetics). This analysis placed marker CCRSNP1 at 4.40 cR from D3S3445 with a lod score greater than 15.0. Integrated maps from the genetic location database (Collins et al., 1996) indicated that the locations of the markers identified by these two independent methods are overlapping. These results support the mapping of even low frequency polymorphisms by two point linkage to markers previously established on CEPH families. [0211]
  • Of the dot blots performed on each CEPH family PCR, two families were informative at this SNP locus, namely families number, [0212] 884 and 1347. The dot blot is shown in FIG. 3A. Lines are drawn around signals representing CEPH family 884 on the dot blot shown in FIGS. 3A and 3B. Allele-specific oligonucleotide hybridizations were performed on the filters shown in FIGS. 3A and 3B under TMAC buffer conditions with G allele-specific oligonucleotide (FIG. 3A) and A allele-specific oligonucleotide (FIG. 3B). The pedigree of CEPH family number 884 with genotypes as scored from the filter shown in FIGS. 3A and 3B is shown in FIG. 3C. The DNA was not available for one individual in this pedigree, and that square is left blank. Mapping of CCRSNP1 was performed by two independent methods. First, genotype data from informative CEPH families numbers 884 and 1347 were compared to the CEPH genotype database version 8.1 by calculation of a 2 point lod score. Secondly, PCR amplification of the CCRSNP1 marker was performed on the Gene Bridge 4 radiation hybrid panel. The highest lod scores determined by these analyses were D3S1292 and D3S3445, respectively, as shown in FIG. 3D.
  • The percentage of SNPs detected using the above-described methods is dependent on the number of chromosomes sampled, as well as the allele frequency. [0213]
  • Example 3 Confirmation of SNP Identity
  • Allele-specific oligonucleotides are synthesized based on standard protocols (Shuber et al., 1997). Briefly, polynucleotides of 17 bases centering on the polymorphic site are synthesized for each allele of a SNP. Hybridization with DNA dots of IRS or DOP-PCR products affixed to a membrane were performed, followed by hybridization to end labeled allele-specific oligonucleotides under TMAC buffer conditions. These conditions are known to equalize the contribution of AT and GC base pairs to melting temperature, thereby providing a uniform temperature for hybridization of allele-specific oligonucleotides independent of nucleotide composition. [0214]
  • Using this methodology, genotypes of CEPH progenitors and their offspring are determined. The Mendelian segregation of each SNP marker confirms its identity as a SNP marker and accrued estimate of its relative allele frequency, hence, its likely usefulness as a genetic marker. Markers which yield complex segregation patterns or show very low allele frequencies on CEPH progenitors are set aside for future analysis, and remaining markers are further characterized. [0215]
  • Example 4 Development of Detailed Information on Map Position and Allele Frequency for Each SNP
  • Two complementary methods are used to establish genetic map position for each marker. Each marker is genotyped on a number of CEPH families. The result is compared, using MultiMap (Matise et al., 1993, as described above) or other appropriate software, against the CEPH database to determine by linkage the most likely position of the SNP marker. [0216]
  • Allele frequencies are determined by hybridization with the standard worldwide panel which U.S. NIH currently is making available to researchers for standardization of allele frequency comparison. Allele-specific oligonucleotide methodology used for genetic mapping is used to determine allele frequency. [0217]
  • Example 5 Development of a System for Scoring Genotype Using SNPs
  • After the identification of a set of SNPs, automated genotyping is performed. Genomic DNA of a well-characterized set of subjects, such as the CEPH families, is PCR-amplified using appropriate primers. These DNA samples serve as the substrate for system development. The DNA is spotted onto multiple glass slides for genotyping. This process can be carried out using a microarray spotting apparatus which can spot greater than 1,000 samples within a square centimeter area or more than 10,000 samples on a typical microscope slide. Each slide is hybridized with a fluorescently tagged allele-specific oligonucleotide under TMAC conditions analogous to those described above. The genotype of each individual is determined by the presence or absence of a signal for a selected set of allele-specific oligonucleotides. A schematic of the method is shown in FIG. 4. [0218]
  • PCR products are attached to the slide using any methods for attaching DNA to a surface that are known in the art. For instance, PCR products may be spotted onto poly-L-lysine-coated glass slides, and crosslinked by UV irradiation prior to hybridization. A second, more preferred method, which has been developed according to the invention, involves use of oligonucleotides having a 5′ amino group for each of the PCR reactions described above. The PCR products are spotted onto silane-coated slides in the presence of NaOH to covalently attach the products to the slide. This method is advantageous because a covalent bond is formed, which produces a stable attachment to the surface. [0219]
  • SNP-ASO are hybridized under TMAC hybridization conditions with the RCGs covalently conjugated to the surface. The allele-specific oligonucleotides are labeled at their 5′-ends with a fluorescent dye, (e.g., Cy3). After washing, detection of the fluorescent oligonucleotides is performed in one of two ways. Fluorescent images can be captured using a fluorescence microscope equipped with a CCD camera and automated stage capabilities. Alternatively, the data can be obtained using a microarray scanner (e.g. one made by Genetic Microsystems). A microarray scanner provides image analysis which can be converted to a digital (e.g. +/−) signal for each sample using any of several available software applications (e.g., NIH image, ScanAnalyze, etc.). The high signal/noise ratio for this analysis allows for the determination of data in this mode to be straightforward and automated. These data, once exported, can be manipulated to conform with a format which can be analyzed by any of several human genetics applications such as CR1-MAP and LINKAGE software. Additionally, the methods may involve use of two or more fluorescent dyes or other labels which can be spectrally differentiated to reduce the number of samples which need to be analyzed. For instance, if four fluorescent spectrally distinct dyes, (e.g., ABI Prism dyes 6-FAM, HEX, NED, ROX) are used, then four hybridization reactions can be performed in a single hybridization mixture. [0220]
  • Example 6 Reduction of Genome Complexity Using IRS-PCR or DOP-PCR
  • The initial step of the SNP identification method and the genotyping approach described above is to reduce the complexity of genomic DNA in a reproducible manner. The purpose of this step with respect to genotying is to allow genotyping of multiple SNPs using the products of a single. PCR reaction. Using the IRS-PCR approach, a PCR primer was synthesized which bears homology to a repetitive sequence present within the genome of the species to be analyzed (e.g., Alu sequence in humans). When two repeat elements bearing the primer sequence are present in a head-to-head fashion within a limited distance (approximately 2 kilobase pairs), the inter-repeat sequence can be amplified. The method has the advantage that the complexity of the resultant PCR can be controlled by how closely the nucleotide sequence primer chosen is to the consensus nucleotide sequence of the repeat element (that is, the closer to the repeat consensus, the more complex the PCR product). [0221]
  • In detail, a 50 microliter reaction for each sample was set up as follows: [0222]
    distilled, deionized H2O (ddH2O) 30.75
    10X PCR Buffer 5 μl
    (500 mM KCl, 100 mM Tris-HCl pH 8.3, 15 mM
    MgCl2 μM, 0.1% gelatin)
    1.25 mM dNTPs 7.5 μl
    20 μm Primer 8C 1.5 μl
    Taq polymerase (1.25 units) 0.25 μl
    Template (50 ng genomic DNA in ddH2O) 5.0 μl
    50 ul total
  • The PCR reaction was performed, for example, in a Perkin Elmer 9600 thermal cycler under the following conditions: [0223]
     1 min. 94° C.
    30 sec. 94° C.|
    45 sec. 58° C.|32 cycles
    90 sec. 72° C.|
    10 min. 72° C.
    Hold  4° C.
  • An aliquot of the reaction mixture was separated on an agarose gel to confirm successful amplification. [0224]
  • RCGs were also performed using DOP-PCR with the following primer (CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4) (wherein N is any nucleotide). DOP-PCR uses a single primer which is typically composed of 3 parts, herein designated tag-(N)[0225] x-TARGET. The TARGET portion is a polynucleotide which comprises at least 7, and preferably at least 8, arbitrarily-selected nucleotide residues, x is an integer from 0 to 9, and N is any nucleotide residue. Tag is a polynucleotide as described above.
  • The initial rounds of DOP-PCR were performed at a low temperature, because the specificity of the reaction is determined primarily by the nucleotide sequence of the TARGET portion and the N, residues. A slow ramp time during these cycles insures that the primers do not detach from the template prior to chain extension. Subsequent amplification rounds were carried out at a higher annealing temperature because of the fact that the 5′ end of the DOP-PCR primer can also contribute to primer annealing. [0226]
  • The DOP-PCR method was performed using a reaction mixture comprising the following ingredients: [0227]
    distilled deionized H2O 24 μl
    10X PCR Buffer 5 μl
    1.25 mM dNTPs 8 μl
    20 μM Primer DOP-BJ1 (SEQ ID No. 4) 7.5 μl
    Taq polymerase 0.5 μl
    (1.25 units)
    Template 5 μl
    (50 ng genomic DNA in distilled deionized H2O) 50 μl
  • The PCR reaction was performed, for example, in a Perkin Elmer 9600 thermal cycler using the following reaction conditions: [0228]
      1 min. 94° C.
      1 min. 94° C.|
    1.5 min. 45° C.|5 cycles
      2 min. ramp to 72° C.|
      3 min. 72° C.|
      1 min. 94° C.|
    1.5 min. 58° C.|35 cycles
      3 min. 72° C.|
     10 min. 72° C.
    Hold  4° C.
  • Example 7 Attachment of PCR Products to a Solid Support
  • Once the complexity of the genomic DNA from an individual has been reduced, it can be attached to a solid support in order to facilitate hybridization analysis. One method of attaching DNA to a solid support involves spotting PCR products onto a nylon membrane. This protocol was performed as follows: [0229]
  • Upon completion of the PCR reaction (typically in a 50 μl reaction mixture), a 10-fold amount of denaturing solution (500 mM NaOH, 2.0M NaCl, 25 mM EDTA) and a small amount (5 ul) of India Ink were added. Sixty microliters of product was applied to a pre-wetted Hybond™ N+membrane (Amersham) using a Schleicher and Schull 96-well dot blot apparatus. The membrane was immediately removed and placed DNA side up on top of Whatmann 3MM paper saturated with 2×SSC for 2 minutes. The filters were air-dried and the DNA was fixed to the membrane by baking in an 80° C. oven for 2 hours. The membranes were then used for hybridization. [0230]
  • Another method for attaching nucleic acids to a support involves the use of microarrays. This method attaches minute quantities of PCR products samples onto a glass slide. The number of samples that can be spotted is greater than 1000/cm[0231] 2, and therefore over 10,000 samples can be analyzed simultaneously on a glass slide. To accomplish this, pre-cleaned glass slides were placed in a mixture of 80 ml dry xylene, 32 ml 96% 3-glycidoxy-propyltrimethoxy silane, and 160 μl 99% N-ethyldiisopropylamin at 80° C. overnight. The slides were rinsed for 5 minutes in ethylacetate and dried at 80° C. for 30 minutes. An equal volume of 0.8 M NaOH (0.6M NaOH and 0.6-0.8M KOH also works) was added directly to the PCR product (which contained a 5′ amino group incorporated into the PCR primer) and the components were mixed. The resulting solution was spotted onto a glass slide under humid conditions. At the earliest opportunity, the slide was placed in a humid chamber overnight at 37° C. The next day, the slide was removed from the humid chamber and kept at 37° C. for an additional 1 hour. The slide was incubated in an 80° C. oven for 2.5 hours, and then washed for 5 minutes in 0.1% SDS. The slide was washed for an additional 5 minutes in ddH2O and air dried. Attachment to the slide was monitored by OilGreen staining (obtained from Molecular Probes), which specifically binds single-stranded DNA.
  • Example 8 Hybridization Using Allele Specific Oligonucleotides for Each SNP
  • In order to determine the genotype of an individual at a selected SNP locus, we employed allele-specific oligo hybridizations. Using this method, 2 hybridization reactions were performed at each locus. The first hybridization reaction involved a labeled (radioactive or fluorescent) SNP-ASO (typically 17 nucleotides residues) centered around and complementary to one allele of the SNP. To increase specificity, a 20 to 50-fold excess of non-labeled SNP-ASO complementary to the opposite allele of the SNP was included in the hybridization mixture. For the second hybridization, the allele specificity of the previously labeled and non-labeled SNP-ASOs was reversed. Hybridization occurred in the presence of TMAC buffer, which has the property that oligonucleotides of the same length have the same annealing temperature. [0232]
  • Specifically, for analysis of each SNP, a pair of SNP allele-specific oligos (SNP-ASOs) consisting of two 17mers centered around the polymorphic nucleotide were synthesized. Each filter was hybridized with 20 pmol [0233] 33P-labeled kinase labeled SNP-ASO (0.66 pmol/ml) and a 50-fold excess of non-labeled competitor oligonucleotide complementary to the other allele of the SNP. Hybridizations was performed overnight at 52° C. in 10 ml TMAC buffer (3.0M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO4 6.8, 5× Denhardt's solution, 40 μg/ml yeast RNA). Blots were washed for 20 minutes at room temperature in TMAC Wash Buffer (3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM Na3PO4 pH 6.8) followed by 20 minutes washing at 52° C. The blots were exposed to Kodak X-OMATAR X-ray film for 8-24 hours, and genotypes were determined by analyzing the hybridization pattern.
  • Example 9 Scoring the Hybridization Pattern for Each Sample to Determine Genotype
  • Hybridization of SNP-ASOs (2 for each locus) to with IRS-PCR or DOP-PCR products of several individuals has been performed. The final step in this process is to determine if a positive or negative signal exists for each hybridization for an individual and then, based on this information, determine the genotype for that particular locus. Essentially, all of the detection methods described herein can be reduced to a digital image file, for example using a microarray reader or using a phosphoimager. Presently, there are several software products which will overlay a grid onto the image and determine the signal strength value at each element of the grid. These values are imported into a spreadsheet program, like Microsoft Excel™, and simple analysis is performed to assign each signal a + or − value. Once this is accomplished, an individual's genotype can be determined by its pattern of hybridization to the SNP alleles present at a given loci. [0234]
  • Example 10 Genomic Analysis Using DOP-PCR
  • Genomic DNA isolated from approximately 40 individuals was subjected to DOP-PCR using primer BJ1 (CTC GAG NNN NNN AAG CGA TG) (SEQ ID NO: 4). 100 microliter of the DOP-PCR mixture was precipitated by addition of 10 microliters 3M sodium acetate (pH 5.2) and 110 microliters of isopropanol and were stored at −20° C. for at least 1 hour. The samples were spun down in a microcentrifuge for 30 minutes and the supernatant was removed. The pellets were rinsed with 70% ethanol and spun again for 30 minutes. The supernatant was removed and the pellets were air-dried overnight at room temperature. [0235]
  • The pellets were then resuspended in 12 microliters of distilled water and stored at −20° C. until denatured by the addition of 3 microliter of 2N NaOH/50 mM EDTA and maintained at 37° C. for 20 minutes and then at room temperature for 15 minutes. The samples were then spotted onto nylon coated-glass slides using a Genetic Microsystems GMS417 microarrayer. Upon completion of the spotting, the slides were placed in an 80° C. vacuum oven for 2 hours, and then stored at room temperature. A set of 2 allele specific SNP-ASOs consisting of two 17mers centered around a polymorphic nucleotide residue were synthesized. Each slide was prehybridized for 1 hour in Hyb Buffer (3M TMAC/0.5% SDS/1 mM EDTA/10 mM NaPO[0236] 4/5× Denhardt's solution/40 μg/ml yeast RNA) followed by hybridization with 0.66 picomoles per milliliter 33P-labeled kinase labeled SNP-ASO and a 50-fold excess of cold-competitor SNP-ASO of the opposite allele in Hyb Buffer. Hybridizations were carried out overnight at 52° C. The slides were washed twice for 30 minutes at room temperature in TMAC Wash Buffer (3M TMAC, 0.6% SDS, 1 mM EDTA, 10 mM NaPO4 pH 6.8) followed by 20 minutes at 54° C. The slides were exposed to Kodak BioMax MR X-ray film. The results are shown in FIG. 8. The genotypes were determined by the hybridization patterns shown in FIG. 8 wherein loci are indicated.
  • The foregoing written specification is considered to be sufficient to enable one skilled in the art to practice the invention. The present invention is not limited in scope by the examples provided, since the examples are intended as illustrations of various aspect of the invention and other functionally equivalent embodiments are within the scope of the invention. Various modifications of the invention in addition to those shown and described herein will become apparent to those skilled in the art from the foregoing description and fall within the scope of the appended claims. The advantages and objects of the invention are not necessarily encompassed by each embodiment of the invention. [0237]
  • All references, patents and patent publications that are recited in this application are incorporated in their entirety herein by reference. [0238]
  • 1 691 1 9 DNA Homo Sapiens variation (4)...(6) n = a, c, g, or t 1 cagnnnctg 9 2 13 DNA Homo Sapiens 2 tttttttttt cag 13 3 19 DNA Homo Sapiens 3 cttgcagtga gccgagatc 19 4 20 DNA Homo Sapiens variation (7)...(12) n = a, c, g, or t 4 ctcgagnnnn nnaagcgatg 20 5 11 DNA Homo Sapiens 5 taagtgtaca a 11 6 11 DNA Homo Sapiens 6 taagtataca a 11 7 11 DNA Homo Sapiens 7 cccacggaga a 11 8 11 DNA Homo Sapiens 8 cccacagaga a 11 9 11 DNA Homo Sapiens 9 aattgcttcc c 11 10 11 DNA Homo Sapiens 10 aattgtttcc c 11 11 11 DNA Homo Sapiens 11 aaattcaatg t 11 12 11 DNA Homo Sapiens 12 aaattaaatg t 11 13 24 DNA Homo Sapiens 13 attaaaggcg tgcgccacca tgcc 24 14 18 DNA Homo Sapiens 14 tttatgaagg cataaaaa 18 15 18 DNA Homo Sapiens 15 tttatggagg cataaaaa 18 16 18 DNA Homo Sapiens 16 tttatgaagg tataaaaa 18 17 17 DNA Homo Sapiens 17 ctgggctgta ttcattt 17 18 17 DNA Homo Sapiens 18 ctgggctgca ttcattt 17 19 17 DNA Homo Sapiens 19 tctgcctcct gagtgct 17 20 17 DNA Homo Sapiens 20 tctacctccc aagtgct 17 21 17 DNA Homo Sapiens 21 tagctagaat caagctt 17 22 17 DNA Homo Sapiens 22 tagctagagt caagctt 17 23 17 DNA Homo Sapiens 23 gctgtgcaac aaatcac 17 24 17 DNA Homo Sapiens 24 cagctgtgca aatcacc 17 25 17 DNA Homo Sapiens 25 tttcgtgatg tttctat 17 26 17 DNA Homo Sapiens 26 tttcgtgaat gtttcta 17 27 17 DNA Homo Sapiens 27 cactgtctac atcttta 17 28 17 DNA Homo Sapiens 28 cactgtctcc atcttta 17 29 17 DNA Homo Sapiens 29 taacattctt gaagcca 17 30 17 DNA Homo Sapiens 30 taacattcct gaagcca 17 31 17 DNA Homo Sapiens 31 gcttccattt cctaagg 17 32 17 DNA Homo Sapiens 32 gcttccactt cctaagg 17 33 17 DNA Homo Sapiens 33 aggaatggca ataatcc 17 34 17 DNA Homo Sapiens 34 aggaatggcg ataatcc 17 35 17 DNA Homo Sapiens 35 aggaatgaca ataatcc 17 36 17 DNA Homo Sapiens 36 ttaaattcgt aaatgga 17 37 17 DNA Homo Sapiens 37 ttaaattcat aaatgga 17 38 17 DNA Homo Sapiens 38 taacattctt gaagcca 17 39 17 DNA Homo Sapiens 39 taacattcct gaagcca 17 40 17 DNA Homo Sapiens 40 ttctgtgact ccacttg 17 41 17 DNA Homo Sapiens 41 ttctgtgact ccatttg 17 42 17 DNA Homo Sapiens 42 ttccctgtct ccatttg 17 43 17 DNA Homo Sapiens 43 gtagtttgcc aggaacc 17 44 17 DNA Homo Sapiens 44 gtagtttgtc aggaacc 17 45 17 DNA Homo Sapiens 45 tgctactcct ctactcg 17 46 17 DNA Homo Sapiens 46 tgctattcct ctgctcg 17 47 17 DNA Homo Sapiens 47 cttgatcacc ctgatga 17 48 17 DNA Homo Sapiens 48 cttggtcacc ctaatga 17 49 17 DNA Homo Sapiens 49 gaggtggtgc agagtga 17 50 17 DNA Homo Sapiens 50 gaggtggcgc agagtga 17 51 17 DNA Homo Sapiens 51 gaggtggccc agagtga 17 52 17 DNA Homo Sapiens 52 cccactgaac cgcacag 17 53 17 DNA Homo Sapiens 53 cccactgagc tgcacag 17 54 17 DNA Homo Sapiens 54 cccactcagc cgcacag 17 55 17 DNA Homo Sapiens 55 tgaagacaca gccagcc 17 56 17 DNA Homo Sapiens 56 tgaagacgca gccagcc 17 57 17 DNA Homo Sapiens 57 tgaagacgaa gccagcc 17 58 17 DNA Homo Sapiens 58 agaagttggt accaggg 17 59 17 DNA Homo Sapiens 59 agaagttgtt accaggg 17 60 17 DNA Homo Sapiens 60 tatgattacg taatgtt 17 61 17 DNA Homo Sapiens 61 tatgattatg taatgtt 17 62 17 DNA Homo Sapiens 62 atgattccag tgagtta 17 63 17 DNA Homo Sapiens 63 atgattcctg tgagtta 17 64 17 DNA Homo Sapiens 64 catactatta actggaa 17 65 17 DNA Homo Sapiens 65 catattatta acaggaa 17 66 17 DNA Homo Sapiens 66 gtcaagaaca ggcaata 17 67 17 DNA Homo Sapiens 67 gtcaagaata ggcaata 17 68 17 DNA Homo Sapiens 68 cagactaggg aaccttc 17 69 17 DNA Homo Sapiens 69 cagacgaggg aaccttc 17 70 17 DNA Homo Sapiens 70 cagactaggg agccttc 17 71 17 DNA Homo Sapiens 71 tgtccagttg tttgcat 17 72 17 DNA Homo Sapiens 72 tgtccagtcg tttgcat 17 73 17 DNA Homo Sapiens 73 ggggtagcca gtttggt 17 74 17 DNA Homo Sapiens 74 ggggtagcaa gtttggt 17 75 17 DNA Homo Sapiens 75 caggaagctg tagctcc 17 76 17 DNA Homo Sapiens 76 caggaagccg tagctcc 17 77 17 DNA Homo Sapiens 77 cctgagcctg tctacct 17 78 17 DNA Homo Sapiens 78 cctgagcccg tctacct 17 79 17 DNA Homo Sapiens 79 taacattctt gaagcca 17 80 17 DNA Homo Sapiens 80 taacattcct gaagcca 17 81 17 DNA Homo Sapiens 81 ccaactgaac cgcacag 17 82 17 DNA Homo Sapiens 82 ccaactgagc tgcacag 17 83 17 DNA Homo Sapiens 83 gagctagctc acattct 17 84 17 DNA Homo Sapiens 84 gagttagctc acgttct 17 85 17 DNA Homo Sapiens 85 acgggggggt ggcgtta 17 86 17 DNA Homo Sapiens 86 acggggggtg gcgttaa 17 87 17 DNA Homo Sapiens 87 tagacagcca gcgtcac 17 88 17 DNA Homo Sapiens 88 tagatagcca gcatcac 17 89 18 DNA Homo Sapiens 89 gcttttcttg agagtggc 18 90 18 DNA Homo Sapiens 90 gcttttcttt agagtggc 18 91 18 DNA Homo Sapiens 91 gcttttcgtg agagtggc 18 92 17 DNA Homo Sapiens 92 ctacagataa agttata 17 93 17 DNA Homo Sapiens 93 ctacagatga agttata 17 94 17 DNA Homo Sapiens 94 tagacctgct gctatct 17 95 17 DNA Homo Sapiens 95 tagacctgtt gctatct 17 96 17 DNA Homo Sapiens 96 tgttgttctg gcctcca 17 97 17 DNA Homo Sapiens 97 tgttgttttg gcctcca 17 98 17 DNA Homo Sapiens 98 ttctgagaat ttgttag 17 99 17 DNA Homo Sapiens 99 ttctgagagt ttgttag 17 100 17 DNA Homo Sapiens 100 caggaagcag tagctcc 17 101 17 DNA Homo Sapiens 101 caggaagccg tagctcc 17 102 17 DNA Homo Sapiens 102 agagtcaggt aagttgc 17 103 17 DNA Homo Sapiens 103 agagtcagat aagttgc 17 104 17 DNA Homo Sapiens 104 agatttcaaa aagtttt 17 105 17 DNA Homo Sapiens 105 agattccaaa aggtttt 17 106 17 DNA Homo Sapiens 106 agatttcaaa aagtttt 17 107 17 DNA Homo Sapiens 107 cctgagggga gcaatca 17 108 17 DNA Homo Sapiens 108 cctgagggaa gcaatca 17 109 17 DNA Homo Sapiens 109 aaggtaagat aactaag 17 110 17 DNA Homo Sapiens 110 aaggtaaggt aactaag 17 111 17 DNA Homo Sapiens 111 ggactacaca gagaaac 17 112 17 DNA Homo Sapiens 112 ggactacata gagaaac 17 113 17 DNA Homo Sapiens 113 cccaggctac acgaggg 17 114 17 DNA Homo Sapiens 114 cccaggctac atgaggg 17 115 17 DNA Homo Sapiens 115 cttaccagtt gtgagac 17 116 17 DNA Homo Sapiens 116 cttaccactt gtgagac 17 117 17 DNA Homo Sapiens 117 cttaccagtc gtgagac 17 118 17 DNA Homo Sapiens 118 ctgccctcag gtcttta 17 119 17 DNA Homo Sapiens 119 ctgccctccg gtcttta 17 120 17 DNA Homo Sapiens 120 gcaataaaat tgtttta 17 121 17 DNA Homo Sapiens 121 gcaatgagat cgtttta 17 122 17 DNA Homo Sapiens 122 tgttctgtgg agacccc 17 123 17 DNA Homo Sapiens 123 tgttctgtag agacccc 17 124 17 DNA Homo Sapiens 124 cacattgaat caaagcc 17 125 17 DNA Homo Sapiens 125 cacattgagt caaagcc 17 126 17 DNA Homo Sapiens 126 ggactaccca cccgttc 17 127 17 DNA Homo Sapiens 127 gcgactgcac ccattct 17 128 17 DNA Homo Sapiens 128 gcgactgccc ccattct 17 129 17 DNA Homo Sapiens 129 cctgggccag ccaggaa 17 130 17 DNA Homo Sapiens 130 cctgggcctg ccaggaa 17 131 17 DNA Homo Sapiens 131 ccccaggtaa ccatctt 17 132 17 DNA Homo Sapiens 132 ccccaggtga ccatctt 17 133 17 DNA Homo Sapiens 133 ttctgtatat tagctga 17 134 17 DNA Homo Sapiens 134 tttctatatt aactgac 17 135 17 DNA Homo Sapiens 135 ggacccggac ggtcttc 17 136 17 DNA Homo Sapiens 136 ggacccggtc ggtcttc 17 137 17 DNA Homo Sapiens 137 gtccctaatg ttagcat 17 138 17 DNA Homo Sapiens 138 gtccccaatg tcagcat 17 139 17 DNA Homo Sapiens 139 acgggggggt ggcgtta 17 140 17 DNA Homo Sapiens 140 acggggggtg gcgttaa 17 141 17 DNA Homo Sapiens 141 tagacagcca gcgtcac 17 142 17 DNA Homo Sapiens 142 tagatagcca gcatcac 17 143 17 DNA Homo Sapiens 143 gattcttcgt gttcctt 17 144 17 DNA Homo Sapiens 144 gattcttcat gttcctt 17 145 17 DNA Homo Sapiens 145 tgtaaaaact tagaata 17 146 17 DNA Homo Sapiens 146 tgtaaaaatt tagaata 17 147 17 DNA Homo Sapiens 147 tgtgaaagcg ctcccaa 17 148 17 DNA Homo Sapiens 148 tgtgaaagtg ctcccaa 17 149 17 DNA Homo Sapiens 149 caaaggctca gagaatc 17 150 17 DNA Homo Sapiens 150 caaaggctta gagaatc 17 151 17 DNA Homo Sapiens 151 ttaattctct ccaaaca 17 152 17 DNA Homo Sapiens 152 ttaaggctct ccggaca 17 153 17 DNA Homo Sapiens 153 ctgccaccgt gcacaca 17 154 17 DNA Homo Sapiens 154 ctgccaccat gcacaca 17 155 17 DNA Homo Sapiens 155 ccaaatattc tgattcc 17 156 17 DNA Homo Sapiens 156 ccaaatattc ttttttt 17 157 17 DNA Homo Sapiens 157 atgagctgac cctccct 17 158 17 DNA Homo Sapiens 158 atgagctgcc cctccct 17 159 17 DNA Homo Sapiens 159 acactaggta aaagctc 17 160 17 DNA Homo Sapiens 160 acactaggca aaagctc 17 161 17 DNA Homo Sapiens 161 agacaccacg accgagg 17 162 17 DNA Homo Sapiens 162 agacaccaag accgagg 17 163 17 DNA Homo Sapiens 163 gcagcgtccg gttaagt 17 164 17 DNA Homo Sapiens 164 gcagcgtctg gttaagt 17 165 17 DNA Homo Sapiens 165 cagatactac aaggatg 17 166 17 DNA Homo Sapiens 166 tacagataca aggatgc 17 167 17 DNA Homo Sapiens 167 tcagctagtg tatctgt 17 168 17 DNA Homo Sapiens 168 tcacctagtg tatttgt 17 169 17 DNA Homo Sapiens 169 ttttttattt ttggatt 17 170 17 DNA Homo Sapiens 170 ttttaatttt tggattt 17 171 17 DNA Homo Sapiens 171 gatattgttt tcattta 17 172 17 DNA Homo Sapiens 172 gatattgtct tcattta 17 173 17 DNA Homo Sapiens 173 agacccggtg ctggtgt 17 174 17 DNA Homo Sapiens 174 agacccggcg ctggtgt 17 175 17 DNA Homo Sapiens 175 cttctaagct ttgtctt 17 176 17 DNA Homo Sapiens 176 cttctaagtt ttgtctt 17 177 17 DNA Homo Sapiens 177 agttggcaac cagcatg 17 178 17 DNA Homo Sapiens 178 agttggcatc cagcatg 17 179 17 DNA Homo Sapiens 179 ggtgaaatgg taattac 17 180 17 DNA Homo Sapiens 180 ggtgaaatag taattac 17 181 17 DNA Homo Sapiens 181 acgggatata acgagtt 17 182 17 DNA Homo Sapiens 182 acgggataca acgagtt 17 183 17 DNA Homo Sapiens 183 gggatacaac gagtttc 17 184 17 DNA Homo Sapiens 184 gggatacacc gagtttc 17 185 17 DNA Homo Sapiens 185 gtatcttggg tgtcctg 17 186 17 DNA Homo Sapiens 186 gtaacttggg tgttctg 17 187 17 DNA Homo Sapiens 187 gggtgtcctg ccccatc 17 188 17 DNA Homo Sapiens 188 gggtgttctg ttttatc 17 189 17 DNA Homo Sapiens 189 tgtccagttg ttttgca 17 190 17 DNA Homo Sapiens 190 tgtccagtcg ttttgca 17 191 17 DNA Homo Sapiens 191 aagacagccg gaactct 17 192 17 DNA Homo Sapiens 192 aagacagcag gaactct 17 193 17 DNA Homo Sapiens 193 tgataggacc aaagaga 17 194 17 DNA Homo Sapiens 194 cgataggact aaagaga 17 195 17 DNA Homo Sapiens 195 tccaaagcca gggccca 17 196 17 DNA Homo Sapiens 196 tccaaattca gggccca 17 197 17 DNA Homo Sapiens 197 cctgggccag ccagaag 17 198 17 DNA Homo Sapiens 198 cctgggcctg ccagaag 17 199 17 DNA Homo Sapiens 199 gattctctga gcctttg 17 200 17 DNA Homo Sapiens 200 gattctctaa gcctttg 17 201 17 DNA Homo Sapiens 201 taccattttt tagatga 17 202 17 DNA Homo Sapiens 202 taccatttct tagatga 17 203 17 DNA Homo Sapiens 203 ctggaagggc agtgaat 17 204 17 DNA Homo Sapiens 204 tctggacgag ggtgaat 17 205 17 DNA Homo Sapiens 205 tagttgcagc acaaatg 17 206 17 DNA Homo Sapiens 206 tagttgtagc acaaatg 17 207 17 DNA Homo Sapiens 207 acactaccgc acagagc 17 208 17 DNA Homo Sapiens 208 acactaccac acagagc 17 209 17 DNA Homo Sapiens 209 aataataagt aaataag 17 210 17 DNA Homo Sapiens 210 aataataaat aaataag 17 211 17 DNA Homo Sapiens 211 tggcagtagt tgttcat 17 212 17 DNA Homo Sapiens 212 tggcagtaat tgttcat 17 213 17 DNA Homo Sapiens 213 aggtatgacg tcataag 17 214 17 DNA Homo Sapiens 214 aggtatgatg tcataag 17 215 17 DNA Homo Sapiens 215 gttgttgttg aagattt 17 216 17 DNA Homo Sapiens 216 ttgttgttga agattta 17 217 17 DNA Homo Sapiens 217 gatagtacag gttgtca 17 218 17 DNA Homo Sapiens 218 gatggtacag gtcgtca 17 219 17 DNA Homo Sapiens 219 aatataatgt aacagga 17 220 17 DNA Homo Sapiens 220 aatataatat aacagga 17 221 17 DNA Homo Sapiens 221 ttaaccattt atctgat 17 222 17 DNA Homo Sapiens 222 ttaaccatat atctgat 17 223 17 DNA Homo Sapiens 223 agagcccagc aaagttc 17 224 17 DNA Homo Sapiens 224 agagcccaac aaagttc 17 225 17 DNA Homo Sapiens 225 atcccgaacc ggaaaat 17 226 17 DNA Homo Sapiens 226 atcccaaacc gggaaat 17 227 17 DNA Homo Sapiens 227 atgacaccac cacaacc 17 228 17 DNA Homo Sapiens 228 atgacaccgc cacaacc 17 229 17 DNA Homo Sapiens 229 aggcaaacag atataac 17 230 17 DNA Homo Sapiens 230 aggcaaacgg atataac 17 231 17 DNA Homo Sapiens 231 tgtattcact aataaga 17 232 17 DNA Homo Sapiens 232 tgtattcatt aataaga 17 233 17 DNA Homo Sapiens 233 ttggcgtata cttcata 17 234 17 DNA Homo Sapiens 234 ttggcgtaca cttcata 17 235 17 DNA Homo Sapiens 235 ctcaccacgc tccatct 17 236 17 DNA Homo Sapiens 236 ctcaccaccc tccatct 17 237 16 DNA Homo Sapiens 237 atatctaaag gcacag 16 238 17 DNA Homo Sapiens 238 tatctacata aaggcac 17 239 17 DNA Homo Sapiens 239 gtgtctccta gtctccc 17 240 17 DNA Homo Sapiens 240 gtgtctccca gtctccc 17 241 17 DNA Homo Sapiens 241 atgagctgac cctccct 17 242 17 DNA Homo Sapiens 242 atgagctgcc cctccct 17 243 17 DNA Homo Sapiens 243 ggacaacatt taattgg 17 244 17 DNA Homo Sapiens 244 ggacaacact taattgg 17 245 17 DNA Homo Sapiens 245 gctttaaaat ttttatt 17 246 17 DNA Homo Sapiens 246 gctttaaatt ttttatt 17 247 17 DNA Homo Sapiens 247 aaatttgttc ctaaatg 17 248 17 DNA Homo Sapiens 248 aaatttgtac ctaaatg 17 249 17 DNA Homo Sapiens 249 gtgttgttct ggcctcc 17 250 17 DNA Homo Sapiens 250 gtgttgtttt ggcctcc 17 251 17 DNA Homo Sapiens 251 tgaatgacaa aaagaca 17 252 17 DNA Homo Sapiens 252 tgaatgacga aaagaca 17 253 18 DNA Homo Sapiens 253 actgagccat ctcwccag 18 254 17 DNA Homo Sapiens 254 acttaactta agctggc 17 255 17 DNA Homo Sapiens 255 gtacttaagc tggcctg 17 256 17 DNA Homo Sapiens 256 actctaatat cccacag 17 257 17 DNA Homo Sapiens 257 actctaatct cccacag 17 258 17 DNA Homo Sapiens 258 cggatcggct ctagttc 17 259 17 DNA Homo Sapiens 259 cggatcagct ctagttc 17 260 17 DNA Homo Sapiens 260 tcaaaccaat aaggagg 17 261 17 DNA Homo Sapiens 261 tcaaaccagt aaggagg 17 262 17 DNA Homo Sapiens 262 gtgtgtgtgt ggggggg 17 263 17 DNA Homo Sapiens 263 gtgtgtgtgg ggggggt 17 264 17 DNA Homo Sapiens 264 cttaataata atttcat 17 265 17 DNA Homo Sapiens 265 cttaataaca atttcat 17 266 17 DNA Homo Sapiens 266 gtgtctccat atgtgtg 17 267 17 DNA Homo Sapiens 267 gtgtctacac atgtgtg 17 268 17 DNA Homo Sapiens 268 aactcatcat gatggtt 17 269 17 DNA Homo Sapiens 269 aactcataat gatggtt 17 270 17 DNA Homo Sapiens 270 aactcatcac gatggtt 17 271 17 DNA Homo Sapiens 271 atcactcata gcccaga 17 272 17 DNA Homo Sapiens 272 atcacttata gcccaga 17 273 17 DNA Homo Sapiens 273 atcactcata tcccaga 17 274 17 DNA Homo Sapiens 274 catcttacca gcattga 17 275 17 DNA Homo Sapiens 275 catcttacta gcattga 17 276 17 DNA Homo Sapiens 276 agtcagccgg ctctggc 17 277 17 DNA Homo Sapiens 277 agtcagccag ctctggc 17 278 17 DNA Homo Sapiens 278 gggtaggagt ggatgag 17 279 17 DNA Homo Sapiens 279 gggcaggagt gggtgag 17 280 17 DNA Homo Sapiens 280 gggtaggagt gggtgag 17 281 17 DNA Homo Sapiens 281 tcagtattgt tcttctc 17 282 17 DNA Homo Sapiens 282 tcagtatttt tcttctc 17 283 17 DNA Homo Sapiens 283 agcagagact gagctcg 17 284 17 DNA Homo Sapiens 284 agcagagacc gagctcg 17 285 17 DNA Homo Sapiens 285 acaggggtcg attcgtc 17 286 17 DNA Homo Sapiens 286 acagggatcg attcgtc 17 287 17 DNA Homo Sapiens 287 acaggggtcg tttcgtc 17 288 17 DNA Homo Sapiens 288 tcccaaagca ttcaagg 17 289 17 DNA Homo Sapiens 289 tcccaaagta ttcaagg 17 290 17 DNA Homo Sapiens 290 gaccagggtt aatgact 17 291 17 DNA Homo Sapiens 291 gaccagggct aatgact 17 292 17 DNA Homo Sapiens 292 ctattaacag agtcgag 17 293 17 DNA Homo Sapiens 293 ctattaacgg agtcgag 17 294 17 DNA Homo Sapiens 294 gtgatactgg atgtctg 17 295 17 DNA Homo Sapiens 295 gtgataccga tgtctgg 17 296 17 DNA Homo Sapiens 296 ctctctcgat agtctaa 17 297 17 DNA Homo Sapiens 297 ctctctcgct agtctaa 17 298 17 DNA Homo Sapiens 298 tctctcgata gtctaat 17 299 17 DNA Homo Sapiens 299 tctctcgctg gtctaat 17 300 17 DNA Homo Sapiens 300 agatgcaaaa ttcttag 17 301 17 DNA Homo Sapiens 301 agatgcacag ttcttag 17 302 17 DNA Homo Sapiens 302 ggaaaatgct caggtag 17 303 17 DNA Homo Sapiens 303 ggaaaatgtt caggtag 17 304 17 DNA Homo Sapiens 304 tctgggcaga gtgcagg 17 305 17 DNA Homo Sapiens 305 tctgggcagc gtgcagg 17 306 17 DNA Homo Sapiens 306 tatggaacgg ttgcttc 17 307 17 DNA Homo Sapiens 307 tatggaactg ttgcttc 17 308 17 DNA Homo Sapiens 308 aagcctggta cccgctg 17 309 17 DNA Homo Sapiens 309 aagcctggca cccgctg 17 310 17 DNA Homo Sapiens 310 cattcttctt tttctga 17 311 17 DNA Homo Sapiens 311 cattcttcgt tttctga 17 312 17 DNA Homo Sapiens 312 ctgcaggctt gtctgtg 17 313 17 DNA Homo Sapiens 313 ctgcaggttt gtctgtg 17 314 17 DNA Homo Sapiens 314 tgccatttcc tataaca 17 315 17 DNA Homo Sapiens 315 tgccatttgc tataaca 17 316 17 DNA Homo Sapiens 316 ccgccacacc cgctcct 17 317 17 DNA Homo Sapiens 317 ccgccacagc cgctcct 17 318 17 DNA Homo Sapiens 318 caaataatgc tagttat 17 319 17 DNA Homo Sapiens 319 caaataatgt tagttat 17 320 17 DNA Homo Sapiens 320 ggatgttgac acgctac 17 321 17 DNA Homo Sapiens 321 ggatgttgtc acgctac 17 322 17 DNA Homo Sapiens 322 catgtgtcca acgccat 17 323 17 DNA Homo Sapiens 323 catgtgtcac aacgcca 17 324 17 DNA Homo Sapiens 324 aaaggggcct taaagga 17 325 17 DNA Homo Sapiens 325 aaaggggctt taaagga 17 326 17 DNA Homo Sapiens 326 tgaaaagttc ttttcat 17 327 17 DNA Homo Sapiens 327 tgaaaagtac ttttcat 17 328 17 DNA Homo Sapiens 328 cctctctatg tgtgagc 17 329 17 DNA Homo Sapiens 329 cctctctacg tgtgagc 17 330 17 DNA Homo Sapiens 330 gaagttttag gattctt 17 331 17 DNA Homo Sapiens 331 gaagatttag gagtctc 17 332 17 DNA Homo Sapiens 332 agggatgtat tttgtta 17 333 17 DNA Homo Sapiens 333 agggatgtgt tttgtta 17 334 17 DNA Homo Sapiens 334 acaattcaaa tgtatat 17 335 17 DNA Homo Sapiens 335 acaattcata tgtatat 17 336 17 DNA Homo Sapiens 336 cttgcctaac ctgcaca 17 337 17 DNA Homo Sapiens 337 cttgcctagc ctgcaca 17 338 17 DNA Homo Sapiens 338 caacagcacc tcatatc 17 339 17 DNA Homo Sapiens 339 acagcggtgc ctcgtat 17 340 17 DNA Homo Sapiens 340 actcacagtg tcagggc 17 341 17 DNA Homo Sapiens 341 actcacagcg tcagggc 17 342 17 DNA Homo Sapiens 342 ggctgctcct gtgtctg 17 343 17 DNA Homo Sapiens 343 ggctcttcct gtgtctg 17 344 17 DNA Homo Sapiens 344 ggctgctcct gtttctg 17 345 17 DNA Homo Sapiens 345 aatagatgcc cttctga 17 346 17 DNA Homo Sapiens 346 aatagatgcc ctcttga 17 347 17 DNA Homo Sapiens 347 aatcgatgcc cttctga 17 348 17 DNA Homo Sapiens 348 ttggtctagc aggtagc 17 349 17 DNA Homo Sapiens 349 ttggtctacc aggtagc 17 350 17 DNA Homo Sapiens 350 agccttggct cttaaaa 17 351 17 DNA Homo Sapiens 351 agccttggtt cttaaaa 17 352 17 DNA Homo Sapiens 352 agtctctggc gcctttg 17 353 17 DNA Homo Sapiens 353 agtctctgcc gcctttg 17 354 17 DNA Homo Sapiens 354 tagcaggagg cagctta 17 355 17 DNA Homo Sapiens 355 aagcaggagg caactta 17 356 17 DNA Homo Sapiens 356 aagcaggagg cagctta 17 357 17 DNA Homo Sapiens 357 tagcaggagg cagcttg 17 358 17 DNA Homo Sapiens 358 aggagagacc ggactcc 17 359 17 DNA Homo Sapiens 359 aggagagagc ggactcc 17 360 17 DNA Homo Sapiens 360 tacaagtcat ccttcct 17 361 17 DNA Homo Sapiens 361 tacaagtcgt ccttcct 17 362 17 DNA Homo Sapiens 362 atacctccct cagacaa 17 363 17 DNA Homo Sapiens 363 atacctcctc agacaag 17 364 17 DNA Homo Sapiens 364 aaacaaacaa acaaacc 17 365 17 DNA Homo Sapiens 365 aaacaaacca acaaacc 17 366 17 DNA Homo Sapiens 366 gtgcgccacc atgacca 17 367 17 DNA Homo Sapiens 367 gtgcgccatc atgacca 17 368 17 DNA Homo Sapiens 368 ggctttccca ttagtgg 17 369 17 DNA Homo Sapiens 369 ggctttccta ttagtgg 17 370 17 DNA Homo Sapiens 370 ccctcacctc tctctca 17 371 17 DNA Homo Sapiens 371 ccctcacccc tctctca 17 372 17 DNA Homo Sapiens 372 aatctctcgc gttcatt 17 373 17 DNA Homo Sapiens 373 aatctctcac gttcatt 17 374 17 DNA Homo Sapiens 374 aatgataccg atcctta 17 375 17 DNA Homo Sapiens 375 aatgatacag atcctta 17 376 17 DNA Homo Sapiens 376 ataaaactgc attcgtg 17 377 17 DNA Homo Sapiens 377 ataaaactac attcgtg 17 378 18 DNA Homo Sapiens 378 agttccagga cagccagg 18 379 17 DNA Homo Sapiens 379 atatctccga ctttgaa 17 380 17 DNA Homo Sapiens 380 atatctccaa ctttgaa 17 381 17 DNA Homo Sapiens 381 tggccctgca gagtctg 17 382 17 DNA Homo Sapiens 382 tggctctgca gagctgg 17 383 17 DNA Homo Sapiens 383 caatggatca aagatgc 17 384 17 DNA Homo Sapiens 384 atggatcaac aaagatg 17 385 17 DNA Homo Sapiens 385 gctgcctcaa ggtataa 17 386 17 DNA Homo Sapiens 386 ctgcctctta aggtata 17 387 17 DNA Homo Sapiens 387 acctatggct cctcatc 17 388 17 DNA Homo Sapiens 388 acctatggtt cctcatc 17 389 17 DNA Homo Sapiens 389 tcttctcccc tgcttta 17 390 17 DNA Homo Sapiens 390 tcttctcact gctttag 17 391 17 DNA Homo Sapiens 391 ccgcataaaa agctgag 17 392 17 DNA Homo Sapiens 392 ccgccataaa agctgag 17 393 17 DNA Homo Sapiens 393 agaatatagg gtttttt 17 394 17 DNA Homo Sapiens 394 tagaatacag ttttttt 17 395 17 DNA Homo Sapiens 395 agagttgctg tgcaggg 17 396 17 DNA Homo Sapiens 396 agagttgccg tgcaggg 17 397 17 DNA Homo Sapiens 397 agagttgcag tgcaggg 17 398 17 DNA Homo Sapiens 398 taagcagtgt tcttggc 17 399 17 DNA Homo Sapiens 399 taagcagtat tcttggc 17 400 17 DNA Homo Sapiens 400 tcttctcccc tgcttta 17 401 17 DNA Homo Sapiens 401 tcttctcact gctttag 17 402 17 DNA Homo Sapiens 402 ttttttttta ttattga 17 403 17 DNA Homo Sapiens 403 ttttttttat tattgaa 17 404 17 DNA Homo Sapiens 404 tgtggtacgc acatctg 17 405 17 DNA Homo Sapiens 405 tgtggtacac acatctg 17 406 17 DNA Homo Sapiens 406 agactcttag acttctg 17 407 17 DNA Homo Sapiens 407 agactcttag gcttctg 17 408 17 DNA Homo Sapiens 408 agactcataa gcttctg 17 409 17 DNA Homo Sapiens 409 agactcttag gcttctg 17 410 17 DNA Homo Sapiens 410 cacgtacccg aacgtga 17 411 17 DNA Homo Sapiens 411 cacgtacctg aacgtga 17 412 17 DNA Homo Sapiens 412 attacggttt gtcgtca 17 413 17 DNA Homo Sapiens 413 attacggttg gtcgtca 17 414 17 DNA Homo Sapiens 414 ccaagatacg aaaccag 17 415 17 DNA Homo Sapiens 415 ccaagatatg aaaccag 17 416 17 DNA Homo Sapiens 416 tgcaatgacc agcaacc 17 417 17 DNA Homo Sapiens 417 tgcaacgacc agcaacc 17 418 17 DNA Homo Sapiens 418 tgtaacgacc aacaact 17 419 17 DNA Homo Sapiens 419 tctaaaggga aagatgg 17 420 17 DNA Homo Sapiens 420 tctaaaggaa agatgga 17 421 17 DNA Homo Sapiens 421 ctggactcat acataca 17 422 17 DNA Homo Sapiens 422 ctggactcgt acataca 17 423 17 DNA Homo Sapiens 423 agtttggtcc cctggac 17 424 17 DNA Homo Sapiens 424 agtttggttt cctggac 17 425 17 DNA Homo Sapiens 425 tatagcttca tgtaaaa 17 426 17 DNA Homo Sapiens 426 tatagcttta tgtaaaa 17 427 17 DNA Homo Sapiens 427 ttttttttat tattgaa 17 428 17 DNA Homo Sapiens 428 ttttttttta ttattga 17 429 17 DNA Homo Sapiens 429 actcattgcc aatttaa 17 430 17 DNA Homo Sapiens 430 actcattcag aatttaa 17 431 17 DNA Homo Sapiens 431 atgcgtaatg ggggcta 17 432 17 DNA Homo Sapiens 432 atgcgtaacg ggggcta 17 433 17 DNA Homo Sapiens 433 ataattgctc ttttaaa 17 434 17 DNA Homo Sapiens 434 gtaattgctc ttttaaa 17 435 17 DNA Homo Sapiens 435 tctgattagt gatggat 17 436 17 DNA Homo Sapiens 436 tctgattatg atggatt 17 437 17 DNA Homo Sapiens 437 agcagagtgt ctcgtaa 17 438 17 DNA Homo Sapiens 438 agcagagtat ctcgtaa 17 439 17 DNA Homo Sapiens 439 gctggcagat atcggta 17 440 17 DNA Homo Sapiens 440 gctggcaggt atcggta 17 441 17 DNA Homo Sapiens 441 aactgcaatg accagca 17 442 17 DNA Homo Sapiens 442 aactgcaacg accagca 17 443 17 DNA Homo Sapiens 443 gctggtcatt gcagttt 17 444 17 DNA Homo Sapiens 444 gttggtcgtt acagttt 17 445 17 DNA Homo Sapiens 445 gctggtcgtt gcagttt 17 446 17 DNA Homo Sapiens 446 gctggcagat atcggta 17 447 17 DNA Homo Sapiens 447 gctggcaggt atcggta 17 448 17 DNA Homo Sapiens 448 atagaaagtc caccgtc 17 449 17 DNA Homo Sapiens 449 atagaaagcc caccgtc 17 450 17 DNA Homo Sapiens 450 ttagtgaccg tgtaaac 17 451 17 DNA Homo Sapiens 451 ttagtgactg tgtaaac 17 452 17 DNA Homo Sapiens 452 ggggaggagc tttgttc 17 453 17 DNA Homo Sapiens 453 ggggaggatc tttgttc 17 454 17 DNA Homo Sapiens 454 ggcctggaca caaaagc 17 455 17 DNA Homo Sapiens 455 ggcctggaaa caaaagc 17 456 17 DNA Homo Sapiens 456 cccttttcta gtattgt 17 457 17 DNA Homo Sapiens 457 cccttttcca gtattgt 17 458 17 DNA Homo Sapiens 458 gaattggttt taggaat 17 459 17 DNA Homo Sapiens 459 gaattggtat taggaat 17 460 17 DNA Homo Sapiens 460 acccagcttt ccatggt 17 461 17 DNA Homo Sapiens 461 acccagctct ccatggt 17 462 17 DNA Homo Sapiens 462 tcacgttcgg gtacgtg 17 463 17 DNA Homo Sapiens 463 tcacgttcag gtacgtg 17 464 17 DNA Homo Sapiens 464 tgccttccgg ttggcaa 17 465 17 DNA Homo Sapiens 465 tgccttccag ttggcaa 17 466 17 DNA Homo Sapiens 466 ttttatcata caattgc 17 467 17 DNA Homo Sapiens 467 ttttatcaga caattgc 17 468 17 DNA Homo Sapiens 468 atcttctctt ctttgag 17 469 17 DNA Homo Sapiens 469 atcttctcct ctttgag 17 470 17 DNA Homo Sapiens 470 cagtcctctg ctttctc 17 471 17 DNA Homo Sapiens 471 cagtcctcag ctttctc 17 472 17 DNA Homo Sapiens 472 ccaagatacg aaaccag 17 473 17 DNA Homo Sapiens 473 ccaagatatg aaaccag 17 474 17 DNA Homo Sapiens 474 ggtattcaag ggttact 17 475 17 DNA Homo Sapiens 475 ggtattcagg gttactg 17 476 17 DNA Homo Sapiens 476 acctatggct cctcatc 17 477 17 DNA Homo Sapiens 477 acctatggtt cctcatc 17 478 17 DNA Homo Sapiens 478 ttttatcata caattgc 17 479 17 DNA Homo Sapiens 479 ttttatcaga caattgc 17 480 17 DNA Homo Sapiens 480 aaccagggct taagtct 17 481 17 DNA Homo Sapiens 481 aaccagggat taagtct 17 482 17 DNA Homo Sapiens 482 cagaaaaaca gatatac 17 483 17 DNA Homo Sapiens 483 cagaaaaaga gatatac 17 484 17 DNA Homo Sapiens 484 tctgagcgtg agtgctg 17 485 17 DNA Homo Sapiens 485 tctgagcgcg agtgctg 17 486 17 DNA Homo Sapiens 486 acctcagaag cggaggt 17 487 17 DNA Homo Sapiens 487 acctcggaag gggaggt 17 488 17 DNA Homo Sapiens 488 acctcggaag cggaggt 17 489 17 DNA Homo Sapiens 489 taactcgatc gctatca 17 490 17 DNA Homo Sapiens 490 taactcgctt gctatca 17 491 17 DNA Homo Sapiens 491 taactcgctc gctatca 17 492 17 DNA Homo Sapiens 492 gaatttctca acttctt 17 493 17 DNA Homo Sapiens 493 gaatttctga acttctt 17 494 17 DNA Homo Sapiens 494 caggggtccc caatttg 17 495 17 DNA Homo Sapiens 495 caggggtctc caatttg 17 496 17 DNA Homo Sapiens 496 ttttgctgtg caggcta 17 497 17 DNA Homo Sapiens 497 ttttactgtg ccaggct 17 498 17 DNA Homo Sapiens 498 gacagccctg tctcaaa 17 499 17 DNA Homo Sapiens 499 agagaaaccc tgtctca 17 500 17 DNA Homo Sapiens 500 gcaccggtct gagcagt 17 501 17 DNA Homo Sapiens 501 gcaccggttt gagcagt 17 502 17 DNA Homo Sapiens 502 ccgtgcccct gaacaat 17 503 17 DNA Homo Sapiens 503 ccgtgccctt gaacaat 17 504 17 DNA Homo Sapiens 504 tcacgttcgg gtacgtg 17 505 17 DNA Homo Sapiens 505 tcacgttcag gtacgtg 17 506 17 DNA Homo Sapiens 506 tgattcgctg ggactct 17 507 17 DNA Homo Sapiens 507 tgattcgccg ggactct 17 508 17 DNA Homo Sapiens 508 ttgatatccg aggcctt 17 509 17 DNA Homo Sapiens 509 ttgatatctg aggcctt 17 510 17 DNA Homo Sapiens 510 tccctgggcc aagcata 17 511 17 DNA Homo Sapiens 511 tccctgggtc aagcata 17 512 17 DNA Homo Sapiens 512 ttatggctga ggatcac 17 513 17 DNA Homo Sapiens 513 ttatggctgc ggatcat 17 514 17 DNA Homo Sapiens 514 ttatggcagg ggatcac 17 515 17 DNA Homo Sapiens 515 ctctctgcgc tgaagca 17 516 17 DNA Homo Sapiens 516 ctctctgctc tgaagca 17 517 17 DNA Homo Sapiens 517 agatacagag atgtgtt 17 518 17 DNA Homo Sapiens 518 agatactgag gtgtgtt 17 519 17 DNA Homo Sapiens 519 cgacatctgg cagatgt 17 520 17 DNA Homo Sapiens 520 cgacatctag cagatgt 17 521 17 DNA Homo Sapiens 521 gtcacaaata gtatttc 17 522 17 DNA Homo Sapiens 522 gtcacaaaga gtatttc 17 523 17 DNA Homo Sapiens 523 aaggtgtgtg cgtgtgt 17 524 17 DNA Homo Sapiens 524 aaggtgtgcg cgtgtgt 17 525 17 DNA Homo Sapiens 525 agtctttttt ttcctga 17 526 17 DNA Homo Sapiens 526 tagtcttttt tcctgaa 17 527 17 DNA Homo Sapiens 527 caggctgtgg gaggctt 17 528 17 DNA Homo Sapiens 528 caggctgcgg aaggctt 17 529 17 DNA Homo Sapiens 529 ctgtaagtca ttcaata 17 530 17 DNA Homo Sapiens 530 ctgtaagtaa ttcaata 17 531 17 DNA Homo Sapiens 531 caggggtccc caatttg 17 532 17 DNA Homo Sapiens 532 caggggtctc caatttg 17 533 17 DNA Homo Sapiens 533 gactcatggc cgccttg 17 534 17 DNA Homo Sapiens 534 gactcattgc cgcctgg 17 535 17 DNA Homo Sapiens 535 gactcctggc cgcctgg 17 536 17 DNA Homo Sapiens 536 gactcctggc tgcctgg 17 537 17 DNA Homo Sapiens 537 gactcctggc cgcctgg 17 538 17 DNA Homo Sapiens 538 acaggggagg aaggaag 17 539 17 DNA Homo Sapiens 539 acaggggaag gaaggaa 17 540 17 DNA Homo Sapiens 540 ttgatataga ttgattc 17 541 17 DNA Homo Sapiens 541 ttgatatata ttgattc 17 542 17 DNA Homo Sapiens 542 atagaacagc aaagtaa 17 543 17 DNA Homo Sapiens 543 atagaacaac aaagtaa 17 544 17 DNA Homo Sapiens 544 aacaagcatc tatggat 17 545 17 DNA Homo Sapiens 545 aacaagcacc tatggat 17 546 17 DNA Homo Sapiens 546 gagcaggtta agcgatg 17 547 17 DNA Homo Sapiens 547 gagcaggtga agcgatg 17 548 17 DNA Homo Sapiens 548 ggcttccagc ttgattc 17 549 17 DNA Homo Sapiens 549 ggcttccaac ttgattc 17 550 17 DNA Homo Sapiens 550 agatagggat gaatccc 17 551 17 DNA Homo Sapiens 551 agataggggt gaatccc 17 552 17 DNA Homo Sapiens 552 tcattcaccg tttattg 17 553 17 DNA Homo Sapiens 553 tcattcactg tttattg 17 554 17 DNA Homo Sapiens 554 ctgacatact gcttagg 17 555 17 DNA Homo Sapiens 555 ctgacatatt gcttagg 17 556 17 DNA Homo Sapiens 556 ctaggaaagc ctaaatt 17 557 17 DNA Homo Sapiens 557 ctaggaaaac ctaaatt 17 558 17 DNA Homo Sapiens 558 atgtcaggat tttaaga 17 559 17 DNA Homo Sapiens 559 atgtcagggt tttaaga 17 560 17 DNA Homo Sapiens 560 ggtttccaat tggaaag 17 561 17 DNA Homo Sapiens 561 ggtttccagt tggaaag 17 562 17 DNA Homo Sapiens 562 cgaggagtgc aaagcga 17 563 17 DNA Homo Sapiens 563 cgaggagtcc aaagcga 17 564 17 DNA Homo Sapiens 564 tgtgtgtgtg tctgtct 17 565 17 DNA Homo Sapiens 565 tgtgtgtgcg tctgtct 17 566 17 DNA Homo Sapiens 566 gcaagatgca gctgcat 17 567 17 DNA Homo Sapiens 567 gcaagatgta gctgcat 17 568 17 DNA Homo Sapiens 568 gctggggcta ttctgta 17 569 17 DNA Homo Sapiens 569 gctggggcca ttctgta 17 570 17 DNA Homo Sapiens 570 caataacgga cctgcct 17 571 17 DNA Homo Sapiens 571 caataacgaa cctgcct 17 572 17 DNA Homo Sapiens 572 tagcctctct acatagg 17 573 17 DNA Homo Sapiens 573 tagcctctgt acatagg 17 574 17 DNA Homo Sapiens 574 catctatagg ttcactt 17 575 17 DNA Homo Sapiens 575 catctatatg ttcactt 17 576 17 DNA Homo Sapiens 576 gccaacaaca ttgagag 17 577 17 DNA Homo Sapiens 577 gccaacaaga ttgagag 17 578 17 DNA Homo Sapiens 578 gggtcgtgcg tccccct 17 579 17 DNA Homo Sapiens 579 gggtcgtgtg tccccct 17 580 17 DNA Homo Sapiens 580 attgtctcac atttctt 17 581 17 DNA Homo Sapiens 581 attgtctcgc atttctt 17 582 17 DNA Homo Sapiens 582 ggtgtggtcg cagaagg 17 583 17 DNA Homo Sapiens 583 ggtgtggttg cagaagg 17 584 17 DNA Homo Sapiens 584 tcattgccac acttgaa 17 585 17 DNA Homo Sapiens 585 tcattgccgc acttgaa 17 586 17 DNA Homo Sapiens 586 atctgtctac aatgatc 17 587 17 DNA Homo Sapiens 587 atctgtctgc aatgatc 17 588 17 DNA Homo Sapiens 588 ggctgggcac agtggct 17 589 17 DNA Homo Sapiens 589 ggctgggcgc agtggct 17 590 17 DNA Homo Sapiens 590 cagcctggag aacaagt 17 591 17 DNA Homo Sapiens 591 cagcctggcg aacaagt 17 592 17 DNA Homo Sapiens 592 tttgacaccc ggaagct 17 593 17 DNA Homo Sapiens 593 tttgacactc ggaagct 17 594 17 DNA Homo Sapiens 594 ctgcctttca tactgcc 17 595 17 DNA Homo Sapiens 595 ctgcctttta tactgcc 17 596 17 DNA Homo Sapiens 596 acaatagacg ttccccg 17 597 17 DNA Homo Sapiens 597 acaatagatg ttccccg 17 598 17 DNA Homo Sapiens 598 ggtgtttgat ttgtact 17 599 17 DNA Homo Sapiens 599 ggtgtttgct ttgtact 17 600 17 DNA Homo Sapiens 600 tccaactcaa aaaatgt 17 601 17 DNA Homo Sapiens 601 tccaactcta aaaatgt 17 602 17 DNA Homo Sapiens 602 gggccgctca cagtcca 17 603 17 DNA Homo Sapiens 603 gggccgctta cagtcca 17 604 17 DNA Homo Sapiens 604 gcatggctcg tgggttt 17 605 17 DNA Homo Sapiens 605 gcatggcttg tgggttt 17 606 17 DNA Homo Sapiens 606 gttgggaagt ggagcgg 17 607 17 DNA Homo Sapiens 607 gttgggaatt ggagcgg 17 608 17 DNA Homo Sapiens 608 aagggatgag gatgtga 17 609 17 DNA Homo Sapiens 609 aagggatggg gatgtga 17 610 17 DNA Homo Sapiens 610 tcctcgagag ctttgct 17 611 17 DNA Homo Sapiens 611 tcctcgaggg ctttgct 17 612 17 DNA Homo Sapiens 612 tgacaatgcg tgcccaa 17 613 17 DNA Homo Sapiens 613 tgacaatgtg tgcccaa 17 614 17 DNA Homo Sapiens 614 tccatgtcat agatttc 17 615 17 DNA Homo Sapiens 615 tccatgtcgt agatttc 17 616 17 DNA Homo Sapiens 616 tggaggacag tggaggg 17 617 17 DNA Homo Sapiens 617 tggaggactg tggaggg 17 618 17 DNA Homo Sapiens 618 acccatttcc tgaaaat 17 619 17 DNA Homo Sapiens 619 acccattttc tgaaaat 17 620 17 DNA Homo Sapiens 620 ctgagttcgg cactgct 17 621 17 DNA Homo Sapiens 621 ctgagttctg cactgct 17 622 17 DNA Homo Sapiens 622 accagtttgg ctcaaag 17 623 17 DNA Homo Sapiens 623 accagttttg ctcaaag 17 624 17 DNA Homo Sapiens 624 ccaatcagaa cgtgcag 17 625 17 DNA Homo Sapiens 625 ccaatcagag cgtgcag 17 626 17 DNA Homo Sapiens 626 acccacacag acactgc 17 627 17 DNA Homo Sapiens 627 acccacactg acactgc 17 628 17 DNA Homo Sapiens 628 ggacaaagcg ctggtgt 17 629 17 DNA Homo Sapiens 629 ggacaaagtg ctggtgt 17 630 17 DNA Homo Sapiens 630 agctggtccc cctmccc 17 631 17 DNA Homo Sapiens 631 agctggtctc cctmccc 17 632 17 DNA Homo Sapiens 632 ggtgtagtaa gcacagc 17 633 17 DNA Homo Sapiens 633 ggtgtagtca gcacagc 17 634 17 DNA Homo Sapiens 634 agcgaacacg ggggaaa 17 635 17 DNA Homo Sapiens 635 agcgaacatg ggggaaa 17 636 17 DNA Homo Sapiens 636 gtgacagcac caaactt 17 637 17 DNA Homo Sapiens 637 gtgacagcgc caaactt 17 638 17 DNA Homo Sapiens 638 gtctgttgct gttattt 17 639 17 DNA Homo Sapiens 639 gtctgttgtt gttattt 17 640 17 DNA Homo Sapiens 640 accagcatag cccagag 17 641 17 DNA Homo Sapiens 641 accagcatgg cccagag 17 642 17 DNA Homo Sapiens 642 cgtaggagac aagacct 17 643 17 DNA Homo Sapiens 643 cgtaggaggc aagacct 17 644 17 DNA Homo Sapiens 644 ctctgctgaa tctccca 17 645 17 DNA Homo Sapiens 645 ctctgctgga tctccca 17 646 17 DNA Homo Sapiens 646 aagcaaagac tgattca 17 647 17 DNA Homo Sapiens 647 aagcaaagtc tgattca 17 648 17 DNA Homo Sapiens 648 aggcagctag agggaga 17 649 17 DNA Homo Sapiens 649 aggcagctcg agggaga 17 650 17 DNA Homo Sapiens 650 ttccattccg ttcaatt 17 651 17 DNA Homo Sapiens 651 ttccattctg ttcaatt 17 652 17 DNA Homo Sapiens 652 tattgttact gattttg 17 653 17 DNA Homo Sapiens 653 tattgttatt gattttg 17 654 17 DNA Homo Sapiens 654 gagctttcag aggctga 17 655 17 DNA Homo Sapiens 655 gagctttcgg aggctga 17 656 17 DNA Homo Sapiens 656 gggggaagat atggagt 17 657 17 DNA Homo Sapiens 657 gggggaaggt atggagt 17 658 17 DNA Homo Sapiens 658 catggcctcg tgggttt 17 659 17 DNA Homo Sapiens 659 catggccttg tgggttt 17 660 17 DNA Homo Sapiens 660 gggkagggag accagct 17 661 17 DNA Homo Sapiens 661 gggkaggggg accagct 17 662 17 DNA Homo Sapiens 662 gcagtgtcag tgtgggt 17 663 17 DNA Homo Sapiens 663 gcagtgtctg tgtgggt 17 664 17 DNA Homo Sapiens 664 acaccagcac tttgatc 17 665 17 DNA Homo Sapiens 665 acaccagcgc tttgatc 17 666 17 DNA Homo Sapiens 666 ccttctgcaa ccacacc 17 667 17 DNA Homo Sapiens 667 ccttctgcga ccacacc 17 668 17 DNA Homo Sapiens 668 aaattcgcag gagccga 17 669 17 DNA Homo Sapiens 669 aaattcgcgg gagccga 17 670 17 DNA Homo Sapiens 670 aggtctagac gctcacc 17 671 17 DNA Homo Sapiens 671 aggtctaggc gctcacc 17 672 17 DNA Homo Sapiens 672 ggaggaacac ttcaaac 17 673 17 DNA Homo Sapiens 673 ggaggaacgc ttcaaac 17 674 17 DNA Homo Sapiens 674 tttgtgctat accttga 17 675 17 DNA Homo Sapiens 675 tttgtgctgt accttga 17 676 17 DNA Homo Sapiens 676 atgatgcaca caccctg 17 677 17 DNA Homo Sapiens 677 atgatgcata caccctg 17 678 17 DNA Homo Sapiens 678 tattgctccg cctcctc 17 679 17 DNA Homo Sapiens 679 tattgctctg cctcctc 17 680 17 DNA Homo Sapiens 680 ctcagagact gtgtgcc 17 681 17 DNA Homo Sapiens 681 ctcagagagt gtgtgcc 17 682 17 DNA Homo Sapiens 682 atcttctgcg tcactca 17 683 17 DNA Homo Sapiens 683 atcttctgtg tcactca 17 684 17 DNA Homo Sapiens 684 cagcatctag taaccac 17 685 17 DNA Homo Sapiens 685 cagcatctgg taaccac 17 686 17 DNA Homo Sapiens 686 attagtgcca aatacat 17 687 17 DNA Homo Sapiens 687 attagtgcta aatacat 17 688 17 DNA Homo Sapiens 688 tgctccacag cagccgt 17 689 17 DNA Homo Sapiens 689 tgctccactg cagccgt 17 690 17 DNA Homo Sapiens 690 taggggagaa tctgttt 17 691 17 DNA Homo Sapiens 691 taggggagca tctgttt 17

Claims (148)

We claim:
1. A method for detecting the presence or absence of a single nucleotide polymorphism (SNP) allele in a genomic sample, the method comprising:
preparing a reduced complexity genoime (RCG) from the genomic sample, and
analyzing the RCG for the presence or absence of a SNP allele.
2. The method of claim 1, wherein the analysis comprises hybridizing a SNP-ASO and the RCG, wherein the SNP-ASO is complementary to one allele of a SNP, whereby the allele of the SNP is present in the genomic sample if the SNP-ASO hybridizes with the RCG, and wherein the presence or absence of the SNP is used to characterize the genomic sample.
3. The method of claim 2, wherein the RCG is immobilized on a surface.
4. The method of claim 2, wherein the SNP-ASO is immobilized on a surface.
5. The method of claim 2, wherein the SNP-ASO is individually hybridized with a plurality of RCGs.
6. The method of claim 1, wherein the RCG is a PCR-derived RCG.
7. The method of claim 1, wherein the RCG is a native RCG.
8. The method of any one of claims 1-7, wherein the method further comprises identifying a genotype of the genomic sample, whereby the genotype is identified by the presence or absence of the alleles of the SNP in the RCG.
9. The method of any one of claims 1-7, wherein the genomic sample is obtained from a tumor.
10. The method of claim 9, wherein a plurality of RCGs are prepared from genomic samples isolated from a plurality of subjects and the plurality of RCGs are analyzed for the presence of the SNP.
11. The method of claim 8, wherein the presence or absence of the SNP allele is analyzed in a plurality of genomic samples selected randomly from a population, the method further comprising determining the allelic frequency of the SNP allele in the population by comparing the number of genomic samples in which the allele is detected and the number of genomic samples analyzed.
12. The method of claim 1, wherein the RCG is prepared by performing degenerate oligonucleotide priming-polymerase chain reaction (DOP-PCR) using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues, wherein x is an integer from 0-9, and wherein each N is any nucleotide residue, and wherein the tag is a polynucleotide having from about 0 to about 20 nucleotides.
13. The method of claim 12, wherein the TARGET nucleotide sequence includes at least 8 nucleotide residues.
14. The method of claim 6, wherein the RCG is prepared by interspersed repeat sequence-polymerase chain reaction (IRS-PCR).
15. The method of claim 6, wherein the RCG is prepared by arbitrarily primed-polymerase chain reaction (AP-PCR).
16. The method of claim 6, wherein the RCG is prepared by adapter-polymerase chain reaction.
17. The method of claim 2, wherein at least a fraction of the SNP-ASO is labeled.
18. The method of claim 17, wherein an excess of a non-labeled SNP-ASO is added during the hybridization step, wherein the non-labeled oligonucleotide is complementary to a different allele of the same SNP than the labeled SNP-ASO.
19. The method of claim 17, further comprising performing a parallel hybridization reaction wherein the RCG is hybridized with a labeled SNP-ASO, wherein the oligonucleotide is complementary to a different allele of the same SNP than the labeled SNP-ASO.
20. The method of claim 19, wherein the two SNP-AGOs are distinguishably labeled.
21. The method of claim 17, an excess of non-labeled SNP-ASO is present during the hybridization.
22. The method of claim 2, wherein the SNP-ASO is composed of from about 10 to about 50 nucleotides residues.
23. The method of claim 22, wherein the SNP-ASO is composed of from about 10 to about 25 nucleotides residues.
24. The method of claim 17, wherein the label is a radioactive isotope.
25. The method of claim 24, further comprising the step of exposing the RCG to a film to produce a signal on the film which corresponds to the radioactively labeled hybridization products if the SNP is present in the RCG.
26. The method of claim 17, wherein the label is a fluorescent molecule.
27. The method of claim 26, further comprising the step of exposing the RCG to an automated fluorescence reader to generate an output signal which corresponds to the fluorescently labeled hybridization products if the SNP is present in the RCG.
28. The method of claim 17, wherein a plurality of SNP-ASOs are labeled with fluorescent molecules, each SNP-ASO being labeled with a spectrally distinct fluorescent molecule.
29. The method of claim 28, wherein the number of SNP-ASOs having a spectrally distinct fluorescent molecule is at least two.
30. The method of claim 28, wherein the number is selected from the group consisting of three, four and eight.
31. The method of claim 2, wherein a plurality of RCGs are labeled with fluorescent molecules, each RCG being labeled with a spectrally distinct fluorescent molecule, and wherein all of the RCGs having a spectrally distinct fluorescent molecule.
32. The method of claim 1, wherein the RCG is prepared by performing degenerate oligonucleotide priming-polymerase chain reaction using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues wherein x is an integer from 0 to 9, wherein each N is any nucleotide residues, and wherein the tag is a polynucleotide having from about 0-20 nucleotides.
33. The method of claim 32 wherein the TARGET nucleotide sequence includes at least 5 nucleotide residues.
34. The method of claim 32 wherein the TARGET nucleotide sequence includes at least 6 nucleotide residues.
35. The method of claim 2, wherein the RCG is labeled.
36. The method of claim 4, wherein a plurality of different SNP-ASOs are attached to the surface.
37. The method of claim 1, wherein the RCG is prepared by performing multiple primed DOP-PCR.
38. The method of claim 2, wherein the genomic sample is characterized by generating a genomic pattern based on the presence or absence of the allele of the SNP in the genomic sample.
39. The method of claim 38, wherein the genomic pattern is a genomic classification code.
40. A method for characterizing a tumor, the method comprising:
isolating genomic DNA from tumor samples obtained from a plurality of subjects,
preparing a RCGs from each genomic DNA,
performing a hybridization reaction with a SNP-ASO and the plurality of RCGs, wherein the SNP-ASO is complementary to one allele of a SNP, and
characterizing the tumor based on whether the SNP-ASO hybridizes with at least some of the RCGs, whereby if the SNP oligonucleotide hybridizes with at least some of the RCGs, then the allele of the SNP is present in the genomic DNA of the tumor.
41. The method of claim 40, wherein the hybridization reaction is performed with a plurality of SNP-ASOs immobilized on a surface, and wherein the hybridization is performed on the plurality of RCGS, each RCG being analyzed separately.
42. The method of claim 40, wherein the RCGs are prepared by performing POP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x1 is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
43. The method of claim 42, wherein the TARGET nucleotide sequence includes at least 8 nucleotide residues.
44. The method of claim 40, wherein the RCGs are PCR-generated RCGs.
45. The method of claim 40, wherein the RCGs are native RCGs.
46. The method of claim 40, wherein the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes has fewer than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues
47. A method for generating a genomic pattern for an individual genome, the method comprising:
preparing a RCG from the individual genome,
analyzing the RCG for the presence or absence of at least one SNP allele, and
generating a genomic pattern for the individual genome based on the presence or absence of SNP alleles.
48. The method of claim 47, wherein analyzing the RCG involves a hybridizing the RCG with a panel of SNP-ASOs, each of which is complementary to one allele of a SNP, and identifying the genomic pattern by determining the ability of the RCG to hybridize with each SNP-ASO.
49. The method of claim 47, wherein the genomic pattern is a genomic classification code which is generated from the pattern of SNP alleles for each RCG.
50. The method of claim 49, wherein the genomic classification code is also generated using the allelic frequency of the SNPs.
51. The method of claim 47, wherein the genomic pattern is a visual pattern.
52. The method of claim 47, wherein the genomic pattern is a digital pattern.
53. The method of claim 48, wherein the SNP-ASOs are immobilized on a surface.
54. The method of claim 47, further comprising performing a parallel reaction wherein the hybridization reaction is performed using a panel of labeled complementary SNP-ASOs.
55. The method of claim 54, wherein the RCG is immobilized on a surface and wherein each SNP-ASO of the panel is hybridized with a separate surface.
56. The method of claim 54, wherein the RCGs is immobilized on a surface and wherein a plurality of SNP-ASOs of the panel are hybridized with a single surface, each SNP-ASO being labeled with a spectrally distinct fluorescent molecule.
57. The method of claim 47, wherein the RCGs is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
58. The method of claim 47, wherein the RCG is a PCR-generated RCG.
59. The method of claim 47, wherein the RCG is a native RCG.
60. The method of claim 47, wherein the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes less than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues
61. A method for generating a genomic classification code for a genome, the method comprising:
preparing a RCG from the genome,
analyzing the RCG for the presence or absence of SNP alleles of known allelic frequency, and
identifying a genomic pattern of SNP alleles for the RCG by determining the presence or absence therein of SNP alleles, and
generating a genomic classification code for the RCG based on the presence or absence and the allelic frequency of the SNP alleles.
62. The method of claim 61, wherein the RCG is hybridized reaction with a panel of SNP-ASOs of known allelic frequency, each of which is complementary to one allele of a SNP, and identifying the genomic pattern based on whether each SNP-ASO hybridizes with the RCG.
63. The method of claim 62, wherein the SNP-ASOs are immobilized on a surface.
64. The method of claim 62, wherein the RCG is immobilized on a surface.
65. The method of claim 61, wherein the RCG is prepared by performing POP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
66. The method of claim 61, wherein the RCG is a PCR-generated RCG.
67. The method of claim 61, wherein the RCG is a native RCG.
68. The method of claim 61, wherein the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes less than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
69. A composition, comprising:
a plurality of RCGs immobilized in an ordered array on a surface.
70. The composition of claim 69, wherein the RCGs prepared by the method of claim 125.
71. The composition of claim 69, wherein the RCGs are PCR-generated RCGs.
72. The composition of claim 69, wherein the RCGs are native RCGs.
73. A kit, comprising:
a container housing a set of polymerase chain reaction primers for reducing the complexity of a genome, and
a container housing a set of SNP-ASOs, wherein the SNPs are present with a frequency of at least 50% in a RCG made using the set of primers.
74. The kit of claim 73, wherein the SNP-ASOs are attached to a surface.
75. The kit of any one of claims 73 or 74, wherein the set of polymerase chain reaction primers are primers for DOP-PCR.
76. The kit of claim 75, wherein the degenerate oligonucleotide primer has a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
77. The kit of claim 76, wherein the TARGET nucleotide sequence includes at least 8 nucleotide residues.
78. The kit of claim 76, wherein the TARGET nucleotide sequence includes at least 9 nucleotide residues.
79. The kit of claim 76, wherein the TARGET nucleotide sequence includes at least 10 nucleotide residues.
80. The kit of claim 76, wherein the TARGET nucleotide sequence includes at least 11 nucleotide residues.
81. The kit of claim 76, wherein the TARGET nucleotide sequence includes 12 nucleotide residues.
82. The kit of any one of claims 73 or 74, wherein the set of polymerase chain reaction primers are primers for ISR-PCR.
83. The kit of any one of claims 73 or 74, wherein the set of polymerase chain reaction primers are primers for AP-PCR.
84. The kit of any one of claims 73 or 74, wherein the set of polymerase chain reaction primers are primers for adapter-polymerase chain reaction.
85. The kit of any one of claims 73 or 74, wherein the SNP-ASOs are composed from 10 and 50 nucleotide residues.
86. The kit of any one of claims 73 or 74, wherein the SNP-ASOs are composed of from 10 and 25 nucleotide residues.
87. The kit of any one of claims 73 or 74, wherein the SNP-ASOs are labeled with a fluorescent molecule.
88. The kit of claim 75, wherein the degenerate oligonucleotide primer has a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes fewer than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
89. The kit of claim 73, wherein the set of polymerase chain reaction primers are primers for multiple-primed DOP-PCR.
90. A composition comprising:
a plurality of RCGs immobilized on a surface, wherein the RCGs are composed of a plurality of DNA fragments, each DNA fragment comprising a (N)x-TARGET nucleotide portion, wherein the nucleotide sequence of TARGET is identical in each of the DNA fragments, wherein TARGET is a polynucleotide consisting of at least 7 nucleotide residues, wherein x is an integer from 0 to 9, and wherein N is any nucleotide residue.
91. The composition of claim 90, wherein the TARGET nucleotide sequence includes 8 nucleotide residues.
92. The composition of claim 90, wherein the TARGET nucleotide sequence includes 9 nucleotide residues.
93. The composition of claim 90, wherein the TARGET nucleotide sequence includes 10 nucleotide residues.
94. The composition of claim 90, wherein the TARGET nucleotide sequence includes 11 nucleotide residues.
95. The composition of claim 90, wherein the TARGET nucleotide sequence includes 12 nucleotide residues.
96. The composition of any one of claims 90-95, wherein x is from 3 to 9.
97. The composition of any one of 90-95, wherein x is 6.
98. The composition of any one of 90-95, wherein x is 7.
99. The composition of any one of 90-95, wherein x is 8.
100. The composition of any one of 90-95, wherein x is 9.
101. A method for identifying a SNP, the method comprising:
preparing a set of primers from a RCG, wherein the RCG comprises a set of polymerase chain reaction (PCR) products,
performing PCR using the set of primers on at least one of isolated genome to produce a set of DNA products, and
identifying a SNP on the set of DNA products.
102. The method of claim 101, wherein the plurality of isolated genomes is a pool of genomes.
103. The method of claim 101, wherein the isolated genomes are RCGs.
104. The method of claim 103, wherein the RCG is prepared by DOP-PCR.
105. The method of claim 101, wherein the step of preparing the set of primers is performed by at least the following steps:
preparing a RCG and separating the set of PCR products in the RCG into individual PCR products,
determining the sequence of each end of at least one of the PCR products, and
generating primers for use in the subsequent PCR step based on the sequence of the ends of the inserts.
106. The method of claim 105, wherein the set of PCR products are separated by gel electrophoresis.
107. The method of claim 106, further comprising the step of preparing libraries from segments of the gel containing several PCR products and isolating clones from the library, each clone including a PCR product containing plasmid from the library.
108. The method of claim 105, wherein the set of PCR products are separated by high pressure liquid chromatography.
109. The method of claim 105, wherein the set of PCR products are separated by column chromatography.
110. The method of claim 101, wherein the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
111. The method of claim 110, wherein the TARGET nucleotide sequence includes 8 nucleotide residues.
112. The method of claim 110, wherein the TARGET nucleotide sequence includes 9 nucleotide residues.
113. The method of claim 110, wherein the TARGET nucleotide sequence includes 10 nucleotide residues.
114. The method of claim 110, wherein the TARGET nucleotide sequence includes 11 nucleotide residues.
115. The method of claim 110, wherein the TARGET nucleotide sequence includes 12 nucleotide residues.
116. The method of claim 101, wherein the RCG is prepared by IRS-PCR.
117. The method of claim 101, wherein the RCG is prepared by AP-PCR.
118. The method of claim 101, wherein the RCG is prepared by adapter-polymerase chain reaction.
119. The method of claim 101, wherein the RCG is prepared by performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes less than 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
120. The method of claim 101, wherein x is greater than one.
121. The method of claim 101, wherein the first and second steps of PCR products are generated using the same primers.
122. A composition comprising:
a panel of SNP-ASOs immobilized on a surface, wherein the SNP-ASOs are prepared by the method of claim 101.
123. The composition of claim 122, wherein each SNP-ASO is immobilized in a discrete area of the surface.
124. The composition of claim 122, further comprising a panel of complementary SNP-ASOs immobilized on discrete areas of the surface.
125. A method for obtaining a RCG using DOP-PCR, the method comprising:
performing DOP-PCR using a degenerate oligonucleotide primer having a tag-(N)x-TARGET nucleotide sequence, wherein the TARGET nucleotide sequence includes at least 7 TARGET nucleotide residues and wherein x is an integer from 0 to 9, wherein each N is any nucleotide residue, and wherein each tag is a polynucleotide having from 0 to about 20 nucleotide residues.
126. The method of claim 125, wherein the TARGET nucleotide sequence includes 8 nucleotide residues.
127. The method of claim 125, wherein the TARGET nucleotide sequence includes 9 nucleotide residues.
128. The method of claim 125, wherein the TARGET nucleotide sequence includes 10 nucleotide residues.
129. The method of claim 125, wherein the TARGET nucleotide sequence includes 11 nucleotide residues.
130. The method of claim 125, wherein the TARGET nucleotide sequence includes 12 nucleotide residues.
131. The method of any one of 125-130, wherein x is from 3 to 9.
132. The method of any one of 125-130, wherein x is 6.
133. The method of any one of 125-130, wherein x is 7.
134. The method of any one of 125-130, wherein x is 8.
135. The method of any one of 125-130, wherein x is 9.
136. The method of claim 125, wherein the tag includes 6 nucleotide residues.
137. The method of any one of 125-136, further comprising using the RCG in a genotyping procedure.
138. The method of any one of 125-136, further comprising analyzing the RCG to detect a polymorphism.
139. The method of claim 138 wherein the RCG is analyzed using mass spectroscopy.
140. A method for assessing whether a subject is at risk for developing a disease, the method comprising:
preparing a RCG from a genomic sample obtained from the subject and characterizing the sample by the method of claim 1, whether one sample based on the presence or absence in the sample of a plurality of SNP alleles that occur in at least 10% of genomes obtained from individuals afflicted with the disease occur in the reduced subject complexity genome.
141. A method for identifying a set of SNP alleles associated with a disease, the method comprising:
preparing individual RCGs obtained from subjects afflicted with a disease using the same set of primers to prepare each RCG, and
comparing individual genetic loci in the RCGs with the same individual genetic loci in normal subjects to identify SNP associated with the disease.
142. A digital information product for representing genomic information, the product comprising:
a computer-readable medium having computer-readable signals stored thereon, wherein the signals define a data structure, the data structure including one or more data components,
wherein each data component includes:
a first data element defining a genomic classification code that identifies a corresponding genome, and wherein each genomic classification code classifies the corresponding genome based one or more single nucleotide polymorphisms of the corresponding genome.
143. The difital information proiduc of claim 142, wherein the genomic classification code is a unique identifier of the corresponding genome.
144. The digital information product of claim 142, wherein the genomic classification code is based on a pattern of the single nucleotide polymorphisms of the corresponding genome, the pattern indicating the presence or absence of each single nucleotide polymorphism.
145. The digital information product of claim 142, wherein each data component also includes:
one or more data elements, each data element defining an attribute of the corresponding genome.
146. A process for making a digital information product comprising computer data signals defining a genomic classification code for a genome, the process comprising:
preparing a reduced complexity genome,
performing a hybridization reaction with the reduced complexity genome and at least one surface having a panel of single nucleotide polymorphism oligonucleotides immobilized thereon,
identifying a genomic pattern of single nucleotide poymorphisms for the reduced complexity genome by determining the presence therein of single nucleotide polymorphisms based on whether each single nucleotide polymorphism oligonucleotide hybridizes to the reduced complexity genome,
generating a genomic classification code for the reduced complexity genome based on the genomic pattern of the single nucleotide polymorphisms, and
encoding the genomic classification code as one or more computer data signals on a computer-readable medium.
147. A process for making a digital information product comprising computer data signals defining a genomic classification code for a genome, the process comprising:
preparing a reduced complexity genome,
performing a hybridization reaction with a panel of single nucleotide polymorphism oligonucleotides of known allelic frequency and a surface having the reduced complexity genome immobilized thereon,
identifying a genomic pattern of single nucleotide polymorphisms for the reduced complexity genome by determining the presence therein of single nucleotide polymorphisms based on whether each single nucleotide polymorphism oligonucleotide hybridizes to the reduced complexity genome,
generating a genomic classification code for the reduced complexity genome based on the pattern and the allelic frequency of the single nucleotide polymorphisms, and
encoding the genomic classification code as one or more computer data signals on a computer-readable medium.
148. A method for performing linkage analysis, comprising:
preparing individual RCGs obtained from members of one or more families,
determining the presence or absence of SNP alleles in the RCGs, and
comparing the RCGs of the family members by comparing the presence or absence of the SNP alleles in the RCGs of the family members.
US10/676,154 1998-09-25 2003-09-29 Methods and products related to genotyping and DNA analysis Abandoned US20040081996A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/676,154 US20040081996A1 (en) 1998-09-25 2003-09-29 Methods and products related to genotyping and DNA analysis
US12/186,673 US20090098551A1 (en) 1998-09-25 2008-08-06 Methods and products related to genotyping and dna analysis
US14/164,770 US20140243229A1 (en) 1998-09-25 2014-01-27 Methods and products related to genotyping and dna analysis

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US10175798P 1998-09-25 1998-09-25
US09/404,912 US6703228B1 (en) 1998-09-25 1999-09-24 Methods and products related to genotyping and DNA analysis
US10/676,154 US20040081996A1 (en) 1998-09-25 2003-09-29 Methods and products related to genotyping and DNA analysis

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/404,912 Continuation US6703228B1 (en) 1998-09-25 1999-09-24 Methods and products related to genotyping and DNA analysis

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US12/186,673 Continuation US20090098551A1 (en) 1998-09-25 2008-08-06 Methods and products related to genotyping and dna analysis

Publications (1)

Publication Number Publication Date
US20040081996A1 true US20040081996A1 (en) 2004-04-29

Family

ID=31890742

Family Applications (4)

Application Number Title Priority Date Filing Date
US09/404,912 Expired - Lifetime US6703228B1 (en) 1998-09-25 1999-09-24 Methods and products related to genotyping and DNA analysis
US10/676,154 Abandoned US20040081996A1 (en) 1998-09-25 2003-09-29 Methods and products related to genotyping and DNA analysis
US12/186,673 Abandoned US20090098551A1 (en) 1998-09-25 2008-08-06 Methods and products related to genotyping and dna analysis
US14/164,770 Abandoned US20140243229A1 (en) 1998-09-25 2014-01-27 Methods and products related to genotyping and dna analysis

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/404,912 Expired - Lifetime US6703228B1 (en) 1998-09-25 1999-09-24 Methods and products related to genotyping and DNA analysis

Family Applications After (2)

Application Number Title Priority Date Filing Date
US12/186,673 Abandoned US20090098551A1 (en) 1998-09-25 2008-08-06 Methods and products related to genotyping and dna analysis
US14/164,770 Abandoned US20140243229A1 (en) 1998-09-25 2014-01-27 Methods and products related to genotyping and dna analysis

Country Status (1)

Country Link
US (4) US6703228B1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040038234A1 (en) * 2000-06-30 2004-02-26 Gut Ivo Glynne Sample generation for genotyping by mass spectrometry
US20040191813A1 (en) * 2002-12-23 2004-09-30 Laurakay Bruhn Comparative genomic hybridization assays using immobilized oligonucleotide features and compositions for practicing the same
US20050079532A1 (en) * 2003-09-12 2005-04-14 Perlegen Sciences, Inc. Methods and systems for identifying predisposition to the placebo effect
US20050100911A1 (en) * 2003-08-06 2005-05-12 Perlegen Sciences, Inc. Methods for enriching populations of nucleic acid samples
US20060183132A1 (en) * 2005-02-14 2006-08-17 Perlegen Sciences, Inc. Selection probe amplification
US20090098551A1 (en) * 1998-09-25 2009-04-16 Massachusetts Institute Of Technology Methods and products related to genotyping and dna analysis
US20090124514A1 (en) * 2003-02-26 2009-05-14 Perlegen Sciences, Inc. Selection probe amplification
US20110178110A1 (en) * 2008-05-15 2011-07-21 University Of Southern California Genotype and Expression Analysis for Use in Predicting Outcome and Therapy Selection
US20150232924A1 (en) * 2005-06-23 2015-08-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9951384B2 (en) 2012-01-13 2018-04-24 Data2Bio Genotyping by next-generation sequencing
US10023907B2 (en) 2006-04-04 2018-07-17 Keygene N.V. High throughput detection of molecular markers based on AFLP and high through-put sequencing
US10106850B2 (en) 2005-12-22 2018-10-23 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US10233494B2 (en) 2005-09-29 2019-03-19 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10316364B2 (en) 2005-09-29 2019-06-11 Keygene N.V. Method for identifying the source of an amplicon
US20220010371A1 (en) * 2012-03-26 2022-01-13 The Johns Hopkins University Rapid aneuploidy detection

Families Citing this family (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5710000A (en) * 1994-09-16 1998-01-20 Affymetrix, Inc. Capturing sequences adjacent to Type-IIs restriction sites for genomic library mapping
US5854033A (en) * 1995-11-21 1998-12-29 Yale University Rolling circle replication reporter systems
US20080064108A1 (en) * 1997-12-10 2008-03-13 Tony Baker Urine Preservation System
US7569342B2 (en) * 1997-12-10 2009-08-04 Sierra Molecular Corp. Removal of molecular assay interferences
EP1001037A3 (en) * 1998-09-28 2003-10-01 Whitehead Institute For Biomedical Research Pre-selection and isolation of single nucleotide polymorphisms
AU2144000A (en) * 1998-10-27 2000-05-15 Affymetrix, Inc. Complexity management and analysis of genomic dna
US20020119448A1 (en) * 1999-06-23 2002-08-29 Joseph A. Sorge Methods of enriching for and identifying polymorphisms
AU1574801A (en) * 1999-10-26 2001-05-08 Genometrix Genomics Incorporated Process for requesting biological experiments and for the delivery of experimental information
US6323009B1 (en) * 2000-06-28 2001-11-27 Molecular Staging, Inc. Multiply-primed amplification of nucleic acid sequences
US20020187490A1 (en) * 2001-06-07 2002-12-12 Michigan State University Microbial identification chip based on DNA-DNA hybridization
US20050256649A1 (en) * 2001-12-21 2005-11-17 Roses Allen D High throughput correlation of polymorphic forms with multiple phenotypes within clinical populations
US7553619B2 (en) * 2002-02-08 2009-06-30 Qiagen Gmbh Detection method using dissociated rolling circle amplification
US6977162B2 (en) * 2002-03-01 2005-12-20 Ravgen, Inc. Rapid analysis of variations in a genome
CA2477611A1 (en) * 2002-03-01 2003-09-12 Ravgen, Inc. Rapid analysis of variations in a genome
US20030186280A1 (en) * 2002-03-28 2003-10-02 Affymetrix, Inc. Methods for detecting genomic regions of biological significance
US7727720B2 (en) * 2002-05-08 2010-06-01 Ravgen, Inc. Methods for detection of genetic disorders
US20070178478A1 (en) * 2002-05-08 2007-08-02 Dhallan Ravinder S Methods for detection of genetic disorders
US7442506B2 (en) * 2002-05-08 2008-10-28 Ravgen, Inc. Methods for detection of genetic disorders
US7176002B2 (en) * 2002-05-16 2007-02-13 Applera Corporation Universal-tagged oligonucleotide primers and methods of use
US9388459B2 (en) * 2002-06-17 2016-07-12 Affymetrix, Inc. Methods for genotyping
EP1535232A2 (en) * 2002-06-28 2005-06-01 Applera Corporation A system and method for snp genotype clustering
AU2003278028B2 (en) 2002-10-02 2008-12-18 The University Of British Columbia Compositions for treatment of prostate and other cancers
US8722872B2 (en) * 2002-10-02 2014-05-13 The University Of British Columbia Compositions and methods for treatment of prostate and other cancers
US7459273B2 (en) * 2002-10-04 2008-12-02 Affymetrix, Inc. Methods for genotyping selected polymorphism
US20040121338A1 (en) * 2002-12-19 2004-06-24 Alsmadi Osama A. Real-time detection of rolling circle amplification products
US9487823B2 (en) * 2002-12-20 2016-11-08 Qiagen Gmbh Nucleic acid amplification
US7955795B2 (en) * 2003-06-06 2011-06-07 Qiagen Gmbh Method of whole genome amplification with reduced artifact production
US20040185475A1 (en) * 2003-01-28 2004-09-23 Affymetrix, Inc. Methods for genotyping ultra-high complexity DNA
US20040259125A1 (en) * 2003-02-26 2004-12-23 Omni Genetics, Inc. Methods, systems and apparatus for identifying genetic differences in disease and drug response
US8043834B2 (en) 2003-03-31 2011-10-25 Qiagen Gmbh Universal reagents for rolling circle amplification and methods of use
US20040248103A1 (en) * 2003-06-04 2004-12-09 Feaver William John Proximity-mediated rolling circle amplification
US20040259100A1 (en) * 2003-06-20 2004-12-23 Illumina, Inc. Methods and compositions for whole genome amplification and genotyping
US20050181394A1 (en) * 2003-06-20 2005-08-18 Illumina, Inc. Methods and compositions for whole genome amplification and genotyping
US7670810B2 (en) 2003-06-20 2010-03-02 Illumina, Inc. Methods and compositions for whole genome amplification and genotyping
US8114978B2 (en) 2003-08-05 2012-02-14 Affymetrix, Inc. Methods for genotyping selected polymorphism
US20060246472A1 (en) * 2003-11-13 2006-11-02 Council Of Scientific And Industrial Research Method for the detection of predisposition to high altitude pulmonary edema
EP1623996A1 (en) * 2004-08-06 2006-02-08 Deutsches Krebsforschungszentrum Stiftung des öffentlichen Rechts Improved method of selecting a desired protein from a library
US8309303B2 (en) * 2005-04-01 2012-11-13 Qiagen Gmbh Reverse transcription and amplification of RNA with simultaneous degradation of DNA
US7452671B2 (en) * 2005-04-29 2008-11-18 Affymetrix, Inc. Methods for genotyping with selective adaptor ligation
JP2007040905A (en) * 2005-08-04 2007-02-15 Hitachi High-Technologies Corp Chromatographic data processor
EP1762627A1 (en) 2005-09-09 2007-03-14 Qiagen GmbH Method for the activation of a nucleic acid for performing a polymerase reaction
CA2623938A1 (en) * 2005-09-30 2007-04-05 Perlegen Sciences, Inc. Methods and compositions for screening and treatment of disorders of blood glucose regulation
US11306351B2 (en) 2005-12-21 2022-04-19 Affymetrix, Inc. Methods for genotyping
DE102006020885A1 (en) * 2006-05-05 2007-11-08 Qiagen Gmbh Inserting a tag sequence into a nucleic acid comprises using an anchor oligonucleotide comprising a hybridizing anchor sequence and a nonhybridizing tag-template sequence
US9790538B2 (en) * 2013-03-07 2017-10-17 Apdn (B.V.I.) Inc. Alkaline activation for immobilization of DNA taggants
US10741034B2 (en) 2006-05-19 2020-08-11 Apdn (B.V.I.) Inc. Security system and method of marking an inventory item and/or person in the vicinity
US20080108712A1 (en) * 2006-09-27 2008-05-08 Washington University In St. Louis Nucleotide sequence associated with acute coronary syndrome and mortality
DK2518162T3 (en) * 2006-11-15 2018-06-18 Biospherex Llc Multi-tag sequencing and ecogenomic analysis
US20080131887A1 (en) * 2006-11-30 2008-06-05 Stephan Dietrich A Genetic Analysis Systems and Methods
US20080228700A1 (en) 2007-03-16 2008-09-18 Expanse Networks, Inc. Attribute Combination Discovery
US20080293589A1 (en) * 2007-05-24 2008-11-27 Affymetrix, Inc. Multiplex locus specific amplification
WO2009023676A1 (en) 2007-08-12 2009-02-19 Integrated Dna Technologies, Inc. Microarray system with improved sequence specificity
WO2009023733A1 (en) * 2007-08-13 2009-02-19 Trustees Of Tufts College Methods and microarrays for detecting enteric viruses
US9388457B2 (en) 2007-09-14 2016-07-12 Affymetrix, Inc. Locus specific amplification using array probes
US20090099789A1 (en) * 2007-09-26 2009-04-16 Stephan Dietrich A Methods and Systems for Genomic Analysis Using Ancestral Data
US9336177B2 (en) * 2007-10-15 2016-05-10 23Andme, Inc. Genome sharing
US9074244B2 (en) 2008-03-11 2015-07-07 Affymetrix, Inc. Array-based translocation and rearrangement assays
CN102187344A (en) * 2008-09-12 2011-09-14 纳维哲尼克斯公司 Methods and systems for incorporating multiple environmental and genetic risk factors
US8108406B2 (en) 2008-12-30 2012-01-31 Expanse Networks, Inc. Pangenetic web user behavior prediction system
EP3276526A1 (en) 2008-12-31 2018-01-31 23Andme, Inc. Finding relatives in a database
WO2010126614A2 (en) 2009-04-30 2010-11-04 Good Start Genetics, Inc. Methods and compositions for evaluating genetic markers
US20100287189A1 (en) * 2009-05-05 2010-11-11 Pioneer Hi-Bred International, Inc. Acceleration of tag placement using custom hardware
WO2011084470A1 (en) * 2009-12-15 2011-07-14 Mycare,Llc Health care device and systems and methods for using the same
AU2011229918B2 (en) * 2010-03-24 2015-02-05 Parker Proteomics, Llc Methods for conducting genetic analysis using protein polymorphisms
US9163281B2 (en) 2010-12-23 2015-10-20 Good Start Genetics, Inc. Methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction
US9228233B2 (en) 2011-10-17 2016-01-05 Good Start Genetics, Inc. Analysis methods
EP2812820A4 (en) * 2012-02-06 2015-09-23 Mycare Llc Methods for searching genomic databases
US8209130B1 (en) 2012-04-04 2012-06-26 Good Start Genetics, Inc. Sequence assembly
US8812422B2 (en) 2012-04-09 2014-08-19 Good Start Genetics, Inc. Variant database
US20140102899A1 (en) * 2012-04-09 2014-04-17 Applied Dna Sciences, Inc. Plasma treatment for dna binding
US10227635B2 (en) 2012-04-16 2019-03-12 Molecular Loop Biosolutions, Llc Capture reactions
US9266370B2 (en) 2012-10-10 2016-02-23 Apdn (B.V.I) Inc. DNA marking of previously undistinguished items for traceability
US9977708B1 (en) 2012-11-08 2018-05-22 23Andme, Inc. Error correction in ancestry classification
US9213947B1 (en) 2012-11-08 2015-12-15 23Andme, Inc. Scalable pipeline for local ancestry inference
US20140258299A1 (en) * 2013-03-07 2014-09-11 Boris A. Vinatzer Method for Assigning Similarity-Based Codes to Life Form and Other Organisms
US9963740B2 (en) 2013-03-07 2018-05-08 APDN (B.V.I.), Inc. Method and device for marking articles
EP2971159B1 (en) 2013-03-14 2019-05-08 Molecular Loop Biosolutions, LLC Methods for analyzing nucleic acids
US8847799B1 (en) 2013-06-03 2014-09-30 Good Start Genetics, Inc. Methods and systems for storing sequence read data
US10450601B2 (en) 2013-09-26 2019-10-22 Toyo Kohan Co., Ltd. Buffer composition for hybridization use, and hybridization method
CA2926436A1 (en) 2013-10-07 2015-04-16 Judith Murrah Multimode image and spectral reader
US10851414B2 (en) 2013-10-18 2020-12-01 Good Start Genetics, Inc. Methods for determining carrier status
WO2015057565A1 (en) 2013-10-18 2015-04-23 Good Start Genetics, Inc. Methods for assessing a genomic region of a subject
US10745825B2 (en) 2014-03-18 2020-08-18 Apdn (B.V.I.) Inc. Encrypted optical markers for security applications
CN106103121B (en) 2014-03-18 2019-12-06 亚普蒂恩(B.V.I.)公司 Encrypted optical marker for security applications
WO2015175530A1 (en) 2014-05-12 2015-11-19 Gore Athurva Methods for detecting aneuploidy
WO2016040446A1 (en) 2014-09-10 2016-03-17 Good Start Genetics, Inc. Methods for selectively suppressing non-target sequences
US10429399B2 (en) 2014-09-24 2019-10-01 Good Start Genetics, Inc. Process control for increased robustness of genetic assays
US10760182B2 (en) 2014-12-16 2020-09-01 Apdn (B.V.I.) Inc. Method and device for marking fibrous materials
EP4095261A1 (en) 2015-01-06 2022-11-30 Molecular Loop Biosciences, Inc. Screening for structural variants
EP3289106A4 (en) 2015-05-01 2019-03-20 Griffith University Diagnostic methods
US10395759B2 (en) 2015-05-18 2019-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for copy number variant detection
WO2017180302A1 (en) 2016-04-11 2017-10-19 Apdn (B.V.I.) Inc. Method of marking cellulosic products
US10274440B2 (en) * 2016-06-22 2019-04-30 International Business Machines Corporation Method to facilitate investigation of chemical constituents in chemical analysis data
US10995371B2 (en) 2016-10-13 2021-05-04 Apdn (B.V.I.) Inc. Composition and method of DNA marking elastomeric material
WO2018156352A1 (en) 2017-02-21 2018-08-30 Apdn (B.V.I) Inc. Nucleic acid coated submicron particles for authentication
JP2020515978A (en) * 2017-03-29 2020-05-28 ナントミクス,エルエルシー Multi-sequence file signature hash
FR3072495B1 (en) * 2017-10-18 2021-08-27 Centre Nat Rech Scient METHOD AND SYSTEM FOR GENERATING A UNIQUE IDENTIFIER OF A SUBJECT FROM ITS DNA
US11848073B2 (en) * 2019-04-03 2023-12-19 University Of Central Florida Research Foundation, Inc. Methods and system for efficient indexing for genetic genealogical discovery in large genotype databases
US11817176B2 (en) 2020-08-13 2023-11-14 23Andme, Inc. Ancestry composition determination

Citations (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4675283A (en) * 1984-07-19 1987-06-23 Massachusetts Institute Of Technology Detection and isolation of homologous, repeated and amplified nucleic acid sequences
US4683195A (en) * 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683202A (en) * 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4800159A (en) * 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US4965188A (en) * 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US5093245A (en) * 1988-01-26 1992-03-03 Applied Biosystems Labeling by simultaneous ligation and restriction
US5182377A (en) * 1988-09-09 1993-01-26 Hoffmann-La Roche Inc. Probes for detection of human papillomavirus
US5501964A (en) * 1992-11-12 1996-03-26 Cold Spring Harbor Laboratory Methods for producing probes capable of distinguishing DNA from related sources
US5508178A (en) * 1989-01-19 1996-04-16 Rose; Samuel Nucleic acid amplification using single primer
US5512439A (en) * 1988-11-21 1996-04-30 Dynal As Oligonucleotide-linked magnetic particles and uses thereof
US5565340A (en) * 1995-01-27 1996-10-15 Clontech Laboratories, Inc. Method for suppressing DNA fragment amplification during PCR
US5578467A (en) * 1992-01-10 1996-11-26 Life Technologies, Inc. Use of deoxyinosine containing primers to balance primer efficiency in the amplification of nucleic acid molecules
US5589330A (en) * 1994-07-28 1996-12-31 Genzyme Corporation High-throughput screening method for sequence or genetic alterations in nucleic acids using elution and sequencing of complementary oligonucleotides
US5712127A (en) * 1996-04-29 1998-01-27 Genescape Inc. Subtractive amplification
US5763239A (en) * 1996-06-18 1998-06-09 Diversa Corporation Production and use of normalized DNA libraries
US5807522A (en) * 1994-06-17 1998-09-15 The Board Of Trustees Of The Leland Stanford Junior University Methods for fabricating microarrays of biological samples
US5837832A (en) * 1993-06-25 1998-11-17 Affymetrix, Inc. Arrays of nucleic acid probes on biological chips
US5851770A (en) * 1994-04-25 1998-12-22 Variagenics, Inc. Detection of mismatches by resolvase cleavage using a magnetic bead support
US5858656A (en) * 1990-04-06 1999-01-12 Queen's University Of Kingston Indexing linkers
US5861242A (en) * 1993-06-25 1999-01-19 Affymetrix, Inc. Array of nucleic acid probes on biological chips for diagnosis of HIV and methods of using the same
US5888737A (en) * 1997-04-15 1999-03-30 Lynx Therapeutics, Inc. Adaptor-based sequence analysis
US6004783A (en) * 1994-03-18 1999-12-21 The General Hospital Corporation Cleaved amplified RFLP detection methods
US6013445A (en) * 1996-06-06 2000-01-11 Lynx Therapeutics, Inc. Massively parallel signature sequencing by ligation of encoded adaptors
US6027877A (en) * 1993-11-04 2000-02-22 Gene Check, Inc. Use of immobilized mismatch binding protein for detection of mutations and polymorphisms, purification of amplified DNA samples and allele identification
US6027945A (en) * 1997-01-21 2000-02-22 Promega Corporation Methods of isolating biological target materials using silica magnetic particles
US6033861A (en) * 1997-11-19 2000-03-07 Incyte Genetics, Inc. Methods for obtaining nucleic acid containing a mutation
US6040166A (en) * 1985-03-28 2000-03-21 Roche Molecular Systems, Inc. Kits for amplifying and detecting nucleic acid sequences, including a probe
US6045994A (en) * 1991-09-24 2000-04-04 Keygene N.V. Selective restriction fragment amplification: fingerprinting
US6060245A (en) * 1996-12-13 2000-05-09 Stratagene Methods and adaptors for generating specific nucleic acid populations
US6060240A (en) * 1996-12-13 2000-05-09 Arcaris, Inc. Methods for measuring relative amounts of nucleic acids in a complex mixture and retrieval of specific sequences therefrom
US6103463A (en) * 1992-02-19 2000-08-15 The Public Health Research Institute Of The City Of New York, Inc. Method of sorting a mixture of nucleic acid strands on a binary array
US6107023A (en) * 1988-06-17 2000-08-22 Genelabs Technologies, Inc. DNA amplification and subtraction techniques
US6124090A (en) * 1989-01-19 2000-09-26 Behringwerke Ag Nucleic acid amplification using single primer
US6156502A (en) * 1995-12-21 2000-12-05 Beattie; Kenneth Loren Arbitrary sequence oligonucleotide fingerprinting
US6197557B1 (en) * 1997-03-05 2001-03-06 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
US6232067B1 (en) * 1998-08-17 2001-05-15 The Perkin-Elmer Corporation Adapter directed expression analysis
US6277606B1 (en) * 1993-11-09 2001-08-21 Cold Spring Harbor Laboratory Representational approach to DNA analysis
US6287825B1 (en) * 1998-09-18 2001-09-11 Molecular Staging Inc. Methods for reducing the complexity of DNA sequences
US6297006B1 (en) * 1997-01-16 2001-10-02 Hyseq, Inc. Methods for sequencing repetitive sequences and for determining the order of sequence subfragments
US6306643B1 (en) * 1998-08-24 2001-10-23 Affymetrix, Inc. Methods of using an array of pooled probes in genetic analysis
US6361947B1 (en) * 1998-10-27 2002-03-26 Affymetrix, Inc. Complexity management and analysis of genomic DNA
US6368799B1 (en) * 1997-06-13 2002-04-09 Affymetrix, Inc. Method to detect gene polymorphisms and monitor allelic expression employing a probe array
US6383742B1 (en) * 1997-01-16 2002-05-07 Radoje T. Drmanac Three dimensional arrays for detection or quantification of nucleic acid species
US6472185B2 (en) * 1997-01-10 2002-10-29 Pioneer Hi-Bred International, Inc. Use of selective DNA fragment amplification products for hybridization-based genetic fingerprinting, marker assisted selection, and high-throughput screening
US6509160B1 (en) * 1994-09-16 2003-01-21 Affymetric, Inc. Methods for analyzing nucleic acids using a type IIs restriction endonuclease
US6514768B1 (en) * 1999-01-29 2003-02-04 Surmodics, Inc. Replicable probe array
US20030113737A1 (en) * 2001-01-24 2003-06-19 Genomic Expression Aps Assay and kit for analyzing gene expression
US6632611B2 (en) * 2001-07-20 2003-10-14 Affymetrix, Inc. Method of target enrichment and amplification
US6703228B1 (en) * 1998-09-25 2004-03-09 Massachusetts Institute Of Technology Methods and products related to genotyping and DNA analysis

Family Cites Families (95)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4588682A (en) 1982-12-13 1986-05-13 Integrated Genetics, Inc. Binding nucleic acid to a support
US5721098A (en) 1986-01-16 1998-02-24 The Regents Of The University Of California Comparative genomic hybridization
CA1284931C (en) 1986-03-13 1991-06-18 Henry A. Erlich Process for detecting specific nucleotide variations and genetic polymorphisms present in nucleic acids
US5604099A (en) 1986-03-13 1997-02-18 Hoffmann-La Roche Inc. Process for detecting specific nucleotide variations and genetic polymorphisms present in nucleic acids
US4829098A (en) 1986-06-19 1989-05-09 Washington Research Foundation Immobilized biomolecules and method of making same
US5034428A (en) 1986-06-19 1991-07-23 Board Of Regents Of The University Of Washington Immobilized biomolecules and method of making same
US6270961B1 (en) 1987-04-01 2001-08-07 Hyseq, Inc. Methods and apparatus for DNA sequencing and DNA identification
US5202231A (en) 1987-04-01 1993-04-13 Drmanac Radoje T Method of sequencing of genomes by hybridization of oligonucleotide probes
US5032502A (en) 1988-01-21 1991-07-16 The United States Of America As Represented By The United States Of Energy Purification of polymorphic components of complex genomes
US4963663A (en) 1988-12-23 1990-10-16 University Of Utah Genetic identification employing DNA probes of variable number tandem repeat loci
EP0333465B1 (en) 1988-03-18 1994-07-13 Baylor College Of Medicine Mutation detection by competitive oligonucleotide priming
CA1339731C (en) 1988-10-12 1998-03-17 Charles T. Caskey Multiplex genomic dna amplification for deletion detection
US4946980A (en) 1988-10-17 1990-08-07 Dow Corning Corporation Preparation of organosilanes
US5869237A (en) 1988-11-15 1999-02-09 Yale University Amplification karyotyping
US5639611A (en) 1988-12-12 1997-06-17 City Of Hope Allele specific polymerase chain reaction
US5043272A (en) 1989-04-27 1991-08-27 Life Technologies, Incorporated Amplification of nucleic acid sequences using oligonucleotides of random sequence as primers
US5106727A (en) 1989-04-27 1992-04-21 Life Technologies, Inc. Amplification of nucleic acid sequences using oligonucleotides of random sequences as primers
US5143854A (en) 1989-06-07 1992-09-01 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof
US5744101A (en) 1989-06-07 1998-04-28 Affymax Technologies N.V. Photolabile nucleoside protecting groups
US5192659A (en) 1989-08-25 1993-03-09 Genetype Ag Intron sequence analysis method for detection of adjacent and remote locus alleles as haplotypes
US5104792A (en) 1989-12-21 1992-04-14 The United States Of America As Represented By The Department Of Health And Human Services Method for amplifying unknown nucleic acid sequences
US6013431A (en) 1990-02-16 2000-01-11 Molecular Tool, Inc. Method for determining specific nucleotide variations by primer extension in the presence of mixture of labeled nucleotides and terminators
US5126239A (en) 1990-03-14 1992-06-30 E. I. Du Pont De Nemours And Company Process for detecting polymorphisms on the basis of nucleotide differences
US5667976A (en) 1990-05-11 1997-09-16 Becton Dickinson And Company Solid supports for nucleic acid hybridization assays
WO1992007095A1 (en) 1990-10-15 1992-04-30 Stratagene Arbitrarily primed polymerase chain reaction method for fingerprinting genomes
US5518900A (en) 1993-01-15 1996-05-21 Molecular Tool, Inc. Method for generating single-stranded DNA molecules
US5762876A (en) 1991-03-05 1998-06-09 Molecular Tool, Inc. Automatic genotype determination
US5578443A (en) 1991-03-06 1996-11-26 Regents Of The University Of Minnesota DNA sequence-based HLA typing method
US5994056A (en) 1991-05-02 1999-11-30 Roche Molecular Systems, Inc. Homogeneous methods for nucleic acid amplification and detection
US5220004A (en) 1991-05-07 1993-06-15 Cetus Corporation Methods and reagents for G -65 -globin typing
US5599921A (en) 1991-05-08 1997-02-04 Stratagene Oligonucleotide families useful for producing primers
FR2679255B1 (en) 1991-07-17 1993-10-22 Bio Merieux METHOD OF IMMOBILIZING A NUCLEIC FRAGMENT BY PASSIVE FIXING ON A SOLID SUPPORT, SOLID SUPPORT THUS OBTAINED AND ITS USE.
DE4129653A1 (en) 1991-09-06 1993-03-11 Boehringer Mannheim Gmbh PROCESS FOR DETECTION OF SIMILAR NUCLEIC ACIDS
WO1993009245A1 (en) 1991-10-31 1993-05-13 University Of Pittsburgh Reverse dot blot hybridization using tandem head-to-tail monomers containing probes synthesized by staggered complementary primers
JP3509859B2 (en) 1991-11-07 2004-03-22 ナノトロニクス,インコーポレイテッド Hybridization of chromophore and fluorophore conjugated polynucleotides to create donor-donor energy transfer systems
US5605662A (en) 1993-11-01 1997-02-25 Nanogen, Inc. Active programmable electronic devices for molecular biological analysis and diagnostics
US5787032A (en) 1991-11-07 1998-07-28 Nanogen Deoxyribonucleic acid(DNA) optical storage using non-radiative energy transfer between a donor group, an acceptor group and a quencher group
US5632957A (en) 1993-11-01 1997-05-27 Nanogen Molecular biological diagnostic systems including electrodes
US5849486A (en) 1993-11-01 1998-12-15 Nanogen, Inc. Methods for hybridization analysis utilizing electrically controlled hybridization
US5663062A (en) 1992-04-03 1997-09-02 Stratagene Oligonucleotide libraries useful for producing primers
US5981176A (en) 1992-06-17 1999-11-09 City Of Hope Method of detecting and discriminating between nucleic acid sequences
GB9214873D0 (en) 1992-07-13 1992-08-26 Medical Res Council Process for categorising nucleotide sequence populations
US5633134A (en) 1992-10-06 1997-05-27 Ig Laboratories, Inc. Method for simultaneously detecting multiple mutations in a DNA sample
EP0695366A4 (en) 1993-04-16 1999-07-28 F B Investments Pty Ltd Method of random amplification of polymorphic dna
US5695933A (en) 1993-05-28 1997-12-09 Massachusetts Institute Of Technology Direct detection of expanded nucleotide repeats in the human genome
US5858659A (en) 1995-11-29 1999-01-12 Affymetrix, Inc. Polymorphism detection
US20020048749A1 (en) 1998-04-15 2002-04-25 Robert J. Lipshutz Methods for polymorphism identifcation and profiling
US5731171A (en) 1993-07-23 1998-03-24 Arch Development Corp. Sequence independent amplification of DNA
US5702890A (en) 1993-07-26 1997-12-30 K.O. Technology, Inc. Inhibitors of alternative alleles of genes as a basis for cancer therapeutic agents
AU7551594A (en) 1993-07-29 1995-02-28 MURASHIGE, Kate H. Method for recognition of the nucleotide sequence of a purified dna segment
US5946431A (en) 1993-07-30 1999-08-31 Molecular Dynamics Multi-functional photometer with movable linkage for routing light-transmitting paths using reflective surfaces
DK88893D0 (en) 1993-07-30 1993-07-30 Radiometer As A METHOD AND APPARATUS FOR DETERMINING THE CONTENT OF A CONSTITUENT OF BLOOD OF AN INDIVIDUAL
US5597694A (en) 1993-10-07 1997-01-28 Massachusetts Institute Of Technology Interspersed repetitive element-bubble amplification of nucleic acids
US6045996A (en) 1993-10-26 2000-04-04 Affymetrix, Inc. Hybridization assays on oligonucleotide arrays
ES2240970T3 (en) 1993-11-03 2005-10-16 Orchid Biosciences, Inc. SIMPLE NUCLEOTIDE POLYMORPHYSMS AND ITS USE IN GENETIC ANALYSIS.
US5610287A (en) 1993-12-06 1997-03-11 Molecular Tool, Inc. Method for immobilizing nucleic acid molecules
AU1682595A (en) 1994-01-21 1995-08-08 North Carolina State University Methods for within family selection in woody perennials using genetic markers
EP0754240B1 (en) 1994-02-07 2003-08-20 Beckman Coulter, Inc. Ligase/polymerase-mediated genetic bit analysis of single nucleotide polymorphisms and its use in genetic analysis
CA2159907C (en) * 1994-02-14 2006-12-12 Rogier Maria Bertina A method for screening for the presence of a genetic defect associated with thrombosis and/or poor anticoagulant response to activated protein c
EP0668361B1 (en) 1994-02-22 2000-04-19 Mitsubishi Chemical Corporation Oligonucleotide and method for analyzing base sequence of nucleic acid
FR2716894B1 (en) 1994-03-07 1996-05-24 Pasteur Institut Genetic markers used jointly for the diagnosis of Alzheimer's disease, diagnostic method and kit.
US5545527A (en) 1994-07-08 1996-08-13 Visible Genetics Inc. Method for testing for mutations in DNA from a patient sample
US5834189A (en) 1994-07-08 1998-11-10 Visible Genetics Inc. Method for evaluation of polymorphic genetic sequences, and the use thereof in identification of HLA types
US5849483A (en) 1994-07-28 1998-12-15 Ig Laboratories, Inc. High throughput screening method for sequences or genetic alterations in nucleic acids
US5834181A (en) 1994-07-28 1998-11-10 Genzyme Corporation High throughput screening method for sequences or genetic alterations in nucleic acids
US5604097A (en) 1994-10-13 1997-02-18 Spectragen, Inc. Methods for sorting polynucleotides using oligonucleotide tags
US5512441A (en) 1994-11-15 1996-04-30 American Health Foundation Quantative method for early detection of mutant alleles and diagnostic kits for carrying out the method
JPH10509594A (en) 1994-11-28 1998-09-22 イー・アイ・デユポン・ドウ・ヌムール・アンド・カンパニー Composite microsatellite primers for detection of genetic polymorphism
US5959098A (en) 1996-04-17 1999-09-28 Affymetrix, Inc. Substrate preparation process
US5866337A (en) 1995-03-24 1999-02-02 The Trustees Of Columbia University In The City Of New York Method to detect mutations in a nucleic acid using a hybridization-ligation procedure
US5576180A (en) 1995-05-01 1996-11-19 Centre De Recherche De L'hopital Ste-Justine Primers and methods for simultaneous amplification of multiple markers for DNA fingerprinting
AU5972996A (en) 1995-06-02 1996-12-18 Incyte Pharmaceuticals, Inc. Improved method for obtaining full-length cdna sequences
US6015675A (en) 1995-06-06 2000-01-18 Baylor College Of Medicine Mutation detection by competitive oligonucleotide priming
US5814444A (en) 1995-06-07 1998-09-29 University Of Washington Methods for making and using single-chromosome amplfication libraries
US5707806A (en) 1995-06-07 1998-01-13 Genzyme Corporation Direct sequence identification of mutations by cleavage- and ligation-associated mutation-specific sequencing
WO1997022719A1 (en) 1995-12-18 1997-06-26 Washington University Method for nucleic acid analysis using fluorescence resonance energy transfer
US5789168A (en) 1996-05-01 1998-08-04 Visible Genetics Inc. Method for amplification and sequencing of nucleic acid polymers
US5795722A (en) 1997-03-18 1998-08-18 Visible Genetics Inc. Method and kit for quantitation and nucleic acid sequencing of nucleic acid analytes in a sample
WO1997029212A1 (en) 1996-02-08 1997-08-14 Affymetrix, Inc. Chip-based speciation and phenotypic characterization of microorganisms
WO1997031327A1 (en) 1996-02-26 1997-08-28 Motorola Inc. Personal human genome card and methods and systems for producing same
US5945675A (en) 1996-03-18 1999-08-31 Pacific Northwest Research Foundation Methods of screening for a tumor or tumor progression to the metastatic state
US5811239A (en) 1996-05-13 1998-09-22 Frayne Consultants Method for single base-pair DNA sequence variation detection
EP0912761A4 (en) 1996-05-29 2004-06-09 Cornell Res Foundation Inc Detection of nucleic acid sequence differences using coupled ligase detection and polymerase chain reactions
EP0943012A4 (en) 1996-09-19 2004-06-30 Affymetrix Inc Identification of molecular sequence signatures and methods involving the same
US6037124A (en) 1996-09-27 2000-03-14 Beckman Coulter, Inc. Carboxylated polyvinylidene fluoride solid supports for the immobilization of biomolecules and methods of use thereof
US5885775A (en) 1996-10-04 1999-03-23 Perseptive Biosystems, Inc. Methods for determining sequences information in polynucleotides using mass spectrometry
US5856104A (en) 1996-10-28 1999-01-05 Affymetrix, Inc. Polymorphisms in the glucose-6 phosphate dehydrogenase locus
EP0941366A2 (en) 1996-11-06 1999-09-15 Whitehead Institute For Biomedical Research Biallelic markers
US6114116A (en) 1996-12-02 2000-09-05 Lemieux; Bertrand Brassica polymorphisms
CA2276462C (en) 1996-12-31 2007-06-12 Genometrix Incorporated Multiplexed molecular analysis system apparatus and method
US6048689A (en) 1997-03-28 2000-04-11 Gene Logic, Inc. Method for identifying variations in polynucleotide sequences
US5760130A (en) 1997-05-13 1998-06-02 Molecular Dynamics, Inc. Aminosilane/carbodiimide coupling of DNA to glass substrate
US5919626A (en) 1997-06-06 1999-07-06 Orchid Bio Computer, Inc. Attachment of unmodified nucleic acids to silanized solid phase surfaces
US5888778A (en) 1997-06-16 1999-03-30 Exact Laboratories, Inc. High-throughput screening method for identification of genetic mutations or disease-causing microorganisms using segmented primers
AU729134B2 (en) * 1997-07-22 2001-01-25 Qiagen Genomics, Inc. Amplification and other enzymatic reactions performed on nucleic acid arrays

Patent Citations (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4675283A (en) * 1984-07-19 1987-06-23 Massachusetts Institute Of Technology Detection and isolation of homologous, repeated and amplified nucleic acid sequences
US4683202A (en) * 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
US4683202B1 (en) * 1985-03-28 1990-11-27 Cetus Corp
US6040166A (en) * 1985-03-28 2000-03-21 Roche Molecular Systems, Inc. Kits for amplifying and detecting nucleic acid sequences, including a probe
US4683195A (en) * 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4683195B1 (en) * 1986-01-30 1990-11-27 Cetus Corp
US4800159A (en) * 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US4965188A (en) * 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US5093245A (en) * 1988-01-26 1992-03-03 Applied Biosystems Labeling by simultaneous ligation and restriction
US5366877A (en) * 1988-01-26 1994-11-22 Applied Biosystems, Inc. Restriction/ligation labeling for primer initiated multiple copying of DNA ssequences
US6107023A (en) * 1988-06-17 2000-08-22 Genelabs Technologies, Inc. DNA amplification and subtraction techniques
US5182377A (en) * 1988-09-09 1993-01-26 Hoffmann-La Roche Inc. Probes for detection of human papillomavirus
US5512439A (en) * 1988-11-21 1996-04-30 Dynal As Oligonucleotide-linked magnetic particles and uses thereof
US5508178A (en) * 1989-01-19 1996-04-16 Rose; Samuel Nucleic acid amplification using single primer
US6124090A (en) * 1989-01-19 2000-09-26 Behringwerke Ag Nucleic acid amplification using single primer
US5858656A (en) * 1990-04-06 1999-01-12 Queen's University Of Kingston Indexing linkers
US6045994A (en) * 1991-09-24 2000-04-04 Keygene N.V. Selective restriction fragment amplification: fingerprinting
US5578467A (en) * 1992-01-10 1996-11-26 Life Technologies, Inc. Use of deoxyinosine containing primers to balance primer efficiency in the amplification of nucleic acid molecules
US6103463A (en) * 1992-02-19 2000-08-15 The Public Health Research Institute Of The City Of New York, Inc. Method of sorting a mixture of nucleic acid strands on a binary array
US5876929A (en) * 1992-11-12 1999-03-02 Cold Spring Harbor Laboratory Representational approach to DNA analysis
US5501964A (en) * 1992-11-12 1996-03-26 Cold Spring Harbor Laboratory Methods for producing probes capable of distinguishing DNA from related sources
US5837832A (en) * 1993-06-25 1998-11-17 Affymetrix, Inc. Arrays of nucleic acid probes on biological chips
US5861242A (en) * 1993-06-25 1999-01-19 Affymetrix, Inc. Array of nucleic acid probes on biological chips for diagnosis of HIV and methods of using the same
US6027877A (en) * 1993-11-04 2000-02-22 Gene Check, Inc. Use of immobilized mismatch binding protein for detection of mutations and polymorphisms, purification of amplified DNA samples and allele identification
US6277606B1 (en) * 1993-11-09 2001-08-21 Cold Spring Harbor Laboratory Representational approach to DNA analysis
US6004783A (en) * 1994-03-18 1999-12-21 The General Hospital Corporation Cleaved amplified RFLP detection methods
US5851770A (en) * 1994-04-25 1998-12-22 Variagenics, Inc. Detection of mismatches by resolvase cleavage using a magnetic bead support
US5807522A (en) * 1994-06-17 1998-09-15 The Board Of Trustees Of The Leland Stanford Junior University Methods for fabricating microarrays of biological samples
US5589330A (en) * 1994-07-28 1996-12-31 Genzyme Corporation High-throughput screening method for sequence or genetic alterations in nucleic acids using elution and sequencing of complementary oligonucleotides
US20050026212A1 (en) * 1994-09-16 2005-02-03 Affymetrix, Inc. Capturing sequences adjacent to type-IIS restriction sites for genomic library mapping
US6509160B1 (en) * 1994-09-16 2003-01-21 Affymetric, Inc. Methods for analyzing nucleic acids using a type IIs restriction endonuclease
US5565340A (en) * 1995-01-27 1996-10-15 Clontech Laboratories, Inc. Method for suppressing DNA fragment amplification during PCR
US5759822A (en) * 1995-01-27 1998-06-02 Clontech Laboratories, Inc. Method for suppressing DNA fragment amplification during PCR
US6156502A (en) * 1995-12-21 2000-12-05 Beattie; Kenneth Loren Arbitrary sequence oligonucleotide fingerprinting
US5712127A (en) * 1996-04-29 1998-01-27 Genescape Inc. Subtractive amplification
US6013445A (en) * 1996-06-06 2000-01-11 Lynx Therapeutics, Inc. Massively parallel signature sequencing by ligation of encoded adaptors
US5763239A (en) * 1996-06-18 1998-06-09 Diversa Corporation Production and use of normalized DNA libraries
US6001574A (en) * 1996-06-18 1999-12-14 Diversa Corporation Production and use of normalized DNA libraries
US6060240A (en) * 1996-12-13 2000-05-09 Arcaris, Inc. Methods for measuring relative amounts of nucleic acids in a complex mixture and retrieval of specific sequences therefrom
US6060245A (en) * 1996-12-13 2000-05-09 Stratagene Methods and adaptors for generating specific nucleic acid populations
US6472185B2 (en) * 1997-01-10 2002-10-29 Pioneer Hi-Bred International, Inc. Use of selective DNA fragment amplification products for hybridization-based genetic fingerprinting, marker assisted selection, and high-throughput screening
US6297006B1 (en) * 1997-01-16 2001-10-02 Hyseq, Inc. Methods for sequencing repetitive sequences and for determining the order of sequence subfragments
US6383742B1 (en) * 1997-01-16 2002-05-07 Radoje T. Drmanac Three dimensional arrays for detection or quantification of nucleic acid species
US6027945A (en) * 1997-01-21 2000-02-22 Promega Corporation Methods of isolating biological target materials using silica magnetic particles
US6197557B1 (en) * 1997-03-05 2001-03-06 The Regents Of The University Of Michigan Compositions and methods for analysis of nucleic acids
US5888737A (en) * 1997-04-15 1999-03-30 Lynx Therapeutics, Inc. Adaptor-based sequence analysis
US6368799B1 (en) * 1997-06-13 2002-04-09 Affymetrix, Inc. Method to detect gene polymorphisms and monitor allelic expression employing a probe array
US6033861A (en) * 1997-11-19 2000-03-07 Incyte Genetics, Inc. Methods for obtaining nucleic acid containing a mutation
US6232067B1 (en) * 1998-08-17 2001-05-15 The Perkin-Elmer Corporation Adapter directed expression analysis
US6306643B1 (en) * 1998-08-24 2001-10-23 Affymetrix, Inc. Methods of using an array of pooled probes in genetic analysis
US6287825B1 (en) * 1998-09-18 2001-09-11 Molecular Staging Inc. Methods for reducing the complexity of DNA sequences
US6703228B1 (en) * 1998-09-25 2004-03-09 Massachusetts Institute Of Technology Methods and products related to genotyping and DNA analysis
US6361947B1 (en) * 1998-10-27 2002-03-26 Affymetrix, Inc. Complexity management and analysis of genomic DNA
US7267966B2 (en) * 1998-10-27 2007-09-11 Affymetrix, Inc. Complexity management and analysis of genomic DNA
US6514768B1 (en) * 1999-01-29 2003-02-04 Surmodics, Inc. Replicable probe array
US20030113737A1 (en) * 2001-01-24 2003-06-19 Genomic Expression Aps Assay and kit for analyzing gene expression
US6632611B2 (en) * 2001-07-20 2003-10-14 Affymetrix, Inc. Method of target enrichment and amplification

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090098551A1 (en) * 1998-09-25 2009-04-16 Massachusetts Institute Of Technology Methods and products related to genotyping and dna analysis
US20040038234A1 (en) * 2000-06-30 2004-02-26 Gut Ivo Glynne Sample generation for genotyping by mass spectrometry
US8232055B2 (en) 2002-12-23 2012-07-31 Agilent Technologies, Inc. Comparative genomic hybridization assays using immobilized oligonucleotide features and compositions for practicing the same
US20040191813A1 (en) * 2002-12-23 2004-09-30 Laurakay Bruhn Comparative genomic hybridization assays using immobilized oligonucleotide features and compositions for practicing the same
US20090124514A1 (en) * 2003-02-26 2009-05-14 Perlegen Sciences, Inc. Selection probe amplification
US20050100911A1 (en) * 2003-08-06 2005-05-12 Perlegen Sciences, Inc. Methods for enriching populations of nucleic acid samples
US20050079532A1 (en) * 2003-09-12 2005-04-14 Perlegen Sciences, Inc. Methods and systems for identifying predisposition to the placebo effect
US7335474B2 (en) 2003-09-12 2008-02-26 Perlegen Sciences, Inc. Methods and systems for identifying predisposition to the placebo effect
US20060183132A1 (en) * 2005-02-14 2006-08-17 Perlegen Sciences, Inc. Selection probe amplification
US9898576B2 (en) 2005-06-23 2018-02-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9896721B2 (en) 2005-06-23 2018-02-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9447459B2 (en) 2005-06-23 2016-09-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9453256B2 (en) * 2005-06-23 2016-09-27 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9493820B2 (en) * 2005-06-23 2016-11-15 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10235494B2 (en) 2005-06-23 2019-03-19 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9898577B2 (en) 2005-06-23 2018-02-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US20150232924A1 (en) * 2005-06-23 2015-08-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10978175B2 (en) 2005-06-23 2021-04-13 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10095832B2 (en) 2005-06-23 2018-10-09 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10538806B2 (en) 2005-09-29 2020-01-21 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10316364B2 (en) 2005-09-29 2019-06-11 Keygene N.V. Method for identifying the source of an amplicon
US11649494B2 (en) 2005-09-29 2023-05-16 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10233494B2 (en) 2005-09-29 2019-03-19 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10106850B2 (en) 2005-12-22 2018-10-23 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US11008615B2 (en) 2005-12-22 2021-05-18 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US10023907B2 (en) 2006-04-04 2018-07-17 Keygene N.V. High throughput detection of molecular markers based on AFLP and high through-put sequencing
US20110178110A1 (en) * 2008-05-15 2011-07-21 University Of Southern California Genotype and Expression Analysis for Use in Predicting Outcome and Therapy Selection
US10704091B2 (en) 2012-01-13 2020-07-07 Data2Bio Genotyping by next-generation sequencing
US9951384B2 (en) 2012-01-13 2018-04-24 Data2Bio Genotyping by next-generation sequencing
US20220010371A1 (en) * 2012-03-26 2022-01-13 The Johns Hopkins University Rapid aneuploidy detection

Also Published As

Publication number Publication date
US6703228B1 (en) 2004-03-09
US20140243229A1 (en) 2014-08-28
US20090098551A1 (en) 2009-04-16

Similar Documents

Publication Publication Date Title
US6703228B1 (en) Methods and products related to genotyping and DNA analysis
EP1056889B1 (en) Methods related to genotyping and dna analysis
Hacia et al. Mutational analysis using oligonucleotide microarrays
Halushka et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis
EP1609875B1 (en) DNA typing with short tandem repeat polymorphisms and identification of polymorphic short tandem repeats
Shuber et al. High throughput parallel analysis of hundreds of patient samples for more than 100 mutations in multiple disease genes
Kim et al. SNP genotyping: technologies and biomedical applications
EP1124990B1 (en) Complexity management and analysis of genomic dna
EP2722395B1 (en) Multiplexed analysis of polymorphic loci by concurrent interrogation and enzyme-mediated detection
US6582908B2 (en) Oligonucleotides
US6821724B1 (en) Methods of genetic analysis using nucleic acid arrays
US6291182B1 (en) Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
US20030104410A1 (en) Human microarray
US20060177863A1 (en) Biallelic markers for use in constructing a high density disequilibrium map of the human genome
JP2014507164A (en) Method and system for haplotype determination
KR20020064298A (en) Methods for generating databases and databases for identifying polymorphic genetic markers
US20070003938A1 (en) Hybridization of genomic nucleic acid without complexity reduction
Abel et al. Genome-wide SNP association: identification of susceptibility alleles for osteoarthritis
US20040023237A1 (en) Methods for genomic analysis
Scheel et al. Yellow pages to the transcriptome
WO1999054500A9 (en) Biallelic markers for use in constructing a high density disequilibrium map of the human genome
US20040023275A1 (en) Methods for genomic analysis
WO1999058721A1 (en) Multiplex dna amplification using chimeric primers
US20080026367A9 (en) Methods for genomic analysis
Drmanac et al. Sequencing by hybridization arrays

Legal Events

Date Code Title Description
AS Assignment

Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSET

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LANDERS, JOHN;KLANDERMAN, BARBARA JORDAN;HOUSMAN, DAVID E.;AND OTHERS;REEL/FRAME:022823/0290;SIGNING DATES FROM 20080812 TO 20090521

AS Assignment

Owner name: NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:MASSACHUSETTS INSTITUTE OF TECHNOLOGY;REEL/FRAME:025854/0732

Effective date: 20110211

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION