US20050282227A1 - Treatment discovery based on CGH analysis - Google Patents

Treatment discovery based on CGH analysis Download PDF

Info

Publication number
US20050282227A1
US20050282227A1 US11/215,483 US21548305A US2005282227A1 US 20050282227 A1 US20050282227 A1 US 20050282227A1 US 21548305 A US21548305 A US 21548305A US 2005282227 A1 US2005282227 A1 US 2005282227A1
Authority
US
United States
Prior art keywords
treatment
signatures
malady
phenotypic
tissue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/215,483
Inventor
James Minor
Wilson Woo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agilent Technologies Inc
Original Assignee
Agilent Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agilent Technologies Inc filed Critical Agilent Technologies Inc
Priority to US11/215,483 priority Critical patent/US20050282227A1/en
Publication of US20050282227A1 publication Critical patent/US20050282227A1/en
Assigned to AGILENT TECHNOLOGIES, INC. reassignment AGILENT TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MINOR, JAMES M., WOO, WILSON
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S977/00Nanotechnology
    • Y10S977/70Nanostructure
    • Y10S977/788Of specified organic or carbon-based composition
    • Y10S977/789Of specified organic or carbon-based composition in array format
    • Y10S977/79Of specified organic or carbon-based composition in array format with heterogeneous nanostructures
    • Y10S977/791Molecular array
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S977/00Nanotechnology
    • Y10S977/839Mathematical algorithms, e.g. computer software, specifically adapted for modeling configurations or properties of nanostructure

Definitions

  • Comparative genomic hybridization is a technique that is used to evaluate variations in genomic copy number in cells.
  • genomic DNA is isolated from normal reference cells, as well as from test cells (e.g., tumor cells).
  • the two nucleic acids are differentially labeled and then simultaneously hybridized in situ to metaphase chromosomes of a reference cell.
  • Chromosomal regions in the test cells which are at increased or decreased copy number can be identified by detecting regions where the ratio of signal from the two distinguishably labeled nucleic acids is altered. For example, those regions that have been decreased in copy number in the test cells will show relatively lower signal from the test DNA that the reference shows, compared to other regions of the genome. Regions that have been increased in copy number in the test cells will show relatively higher signal from the test DNA.
  • aCGH array based comparative genomic hybridization
  • Methods, systems and computer readable media for discovering a combination of treatments to reduce the progress of, or eliminate a tissue malady include the steps of: (a) measuring gene expression values of at least one sample of tissue exhibiting the tissue malady and at least one reference sample tissue that does not exhibit the malady, using at least one CGH array designed to measure gene sequences and possible variations in gene sequences attributable to the malady; (b) generating gene expression signatures from differential expression values of ratios of the measured gene expression values between the at least one sample exhibiting the malady and the at least one reference sample, across all samples, respectively; (c) treating the at least one tissue sample exhibiting the malady with a treatment; (d) measuring a treatment-response value with respect to each of the tissue samples treated, as effected by the treatment; (e) generating a phenotypic signature representing the treatment-response values of each of the tissue samples treated; (f) repeating steps (c)-(e
  • Method, systems and computer readable media are provided for screening a combination of treatments to select treatments for tissue exhibiting a malady, to include the steps of: (a) providing differential expression levels of tissue samples exhibiting the malady relative to at least one reference tissue sample from respective features of CGH arrays designed to measure gene sequences and possible variations in gene sequences attributable to the malady; (b) for respective differential expression levels from respective features of respective CGH arrays for each tissue sample exhibiting the malady, providing a gene expression signature representing the differential expression level for each tissue sample for gene expression levels from that feature, respectively; (c) providing a treatment-response value, for each tissue sample exhibiting the malady having been treated with a treatment, as effected by the treatment; (d) generating a phenotypic signature representing the treatment-response values of each of the tissue samples having been treated; (e) repeating steps (c)-(d) with a different treatment at least once so that multiple phenotypic signatures have
  • Methods, systems and computer readable media for augmenting an original or existing single treatment or treatment combination for a disease with at least one additional treatment that covers gene activity of the disease not addressed by the original or existing treatment include the steps of: (a) providing differential expression levels of diseased tissue samples relative to at least one reference tissue for respective features of CGH arrays designed to measure gene sequences and possible variations in gene sequences attributable to the disease; (b) for respective features of respective CGH arrays for each diseased tissue sample, providing a gene expression signature representing the differential expression level for each tissue sample for that feature, respectively; (c) treating the diseased tissue samples with the original or existing single treatment or combination treatment; (d) measuring a treatment-response value with respect to each of the diseased tissue samples as effected by the original or existing single or combination treatment; (e) generating a phenotypic signature representing the treatment-response values of each of the diseased tissue samples as effected by the original or existing single or combination treatment; (f) treating the diseased tissue samples with
  • FIG. 1A is a block diagram of a cancer transcriptome model.
  • FIG. 1B shows a modification of the model of FIG. 1B .
  • FIG. 2 is a flowchart illustrating steps that may be performed for determining potential treatments for diseases or other maladies that may be caused by altered mRNA resulting from chromosome alteration.
  • FIG. 3 shows exemplary plots of gene expression signatures of measurements of genes in altered or abnormal tissue, relative to “normal” tissue across multiple samples.
  • FIG. 4 shows an example of a gene expression signature plotted together with a phenotypic response signature or profile.
  • FIG. 5 shows a matrix in which data points used to generate gene expression signatures, phenotypic response signatures and inverted phenotypic response signatures have been inserted as cell values of the matrix.
  • FIG. 6 illustrates plots of three treatment/phenotypic profiles from samples having been treated with three different treatments that were determined to be in synchronization with the gene expression profile shown.
  • FIG. 7 is a schematic representation of an ellipsoid that represents the plot of a cluster of vectors from a matrix, such as the matrix shown in FIG. 5 .
  • this is a schematic representation, as, in reality, data ellipsoids are hyper-dimensional with complicated radial and angular geometries.
  • FIG. 8 is a block diagram illustrating an example of a computer system that may be employed in carrying out the present invention.
  • a “genotype” refers to the actual makeup of one or more genes (DNA) in living tissue.
  • a genotypic signature is a textual or electronic representation that directly identifies the genotype.
  • a “phenotype” is related to a genotype, in that it is some sort of physical expression resulting from a blueprint provided by the genotype.
  • a phenotypic signature is a textual or electronic representation of values representing the expression that defines the phenotype.
  • mRNA expression might be considered to be either a genotype or phenotype, as it is in a gray area where the genotype executes the phenotype. It is referred to herein as genotype/phenotype.
  • a “treatment” refers to the administration of an agent to living tissue (generally a diseased tissue) that has some measurable effect on protein production by that tissue, which effect can be inferred by measurement of gene expression levels of the tissue, using microarray technology.
  • Treatment may refer to, but are not limited to drugs, compounds, genetic sequences used to target specific locations of the genetic makeup of the tissue, radiation, heat, cryogenics, or any other kind of application that produces an effect as described above.
  • transcriptome refers to the set of all messenger RNA (mRNA) (or transcripts) in one or a population of biological cells.
  • altered transcriptome refers to a transcriptome resulting from a tissue sample containing one or more regions of abnormality in one or more chromosome (e.g., amplification, deletion).
  • cancer transcriptome refers to the altered transcriptome of a biological cell sample that is cancerous.
  • a “biopolymer” is a polymer of one or more types of repeating units. Biopolymers are typically found in biological systems and particularly include polysaccharides (such as carbohydrates), and peptides (which term is used to include polypeptides and proteins) and polynucleotides as well as their analogs such as those compounds composed of or containing amino acid analogs or non-amino acid groups, or nucleotide analogs or non-nucleotide groups.
  • polynucleotides in which the conventional backbone has been replaced with a non-naturally occurring or synthetic backbone, and nucleic acids (or synthetic or naturally occurring analogs) in which one or more of the conventional bases has been replaced with a group (natural or synthetic) capable of participating in Watson-Crick type hydrogen bonding interactions.
  • Polynucleotides include single or multiple stranded configurations, where one or more of the strands may or may not be completely aligned with another.
  • a “nucleotide” refers to a sub-unit of a nucleic acid and has a phosphate group, a 5 carbon sugar and a nitrogen containing base, as well as functional analogs (whether synthetic or naturally occurring) of such sub-units which in the polymer form (as a polynucleotide) can hybridize with naturally occurring polynucleotides in a sequence specific manner analogous to that of two naturally occurring polynucleotides.
  • a “biopolymer” includes DNA (including cDNA), RNA, oligonucleotides, and PNA and other polynucleotides as described in U.S. Pat. No.
  • oligonucleotide generally refers to a nucleotide multimer of about 10 to 100 nucleotides in length, while a “polynucleotide” includes a nucleotide multimer having any number of nucleotides.
  • a “biomonomer” references a single unit, which can be linked with the same or other biomonomers to form a biopolymer (for example, a single amino acid or nucleotide with two linking groups one or both of which may have removable protecting groups).
  • a “chemical array”, “microarray”, “bioarray” or “array”, A chemical “array”, unless a contrary intention appears, includes any one, two or three-dimensional arrangement of addressable regions bearing a particular chemical moiety or moieties (for example, biopolymers such as polynucleotide sequences) associated with that region, where the chemical moiety or moieties are immobilized on the surface in that region.
  • immobilized is meant that the moiety or moieties are stably associated with the substrate surface in the region, such that they do not separate from the region under conditions of using the array, e.g., hybridization and washing and stripping conditions.
  • the moiety or moieties may be covalently or non-covalently bound to the surface in the region.
  • each region may extend into a third dimension in the case where the substrate is porous while not having any substantial third dimension measurement (thickness) in the case where the substrate is non-porous.
  • An array may contain more than ten, more than one hundred, more than one thousand more than ten thousand features, or even more than one hundred thousand features, in an area of less than 20 cm 2 or even less than 10 cm 2 .
  • features may have widths (that is, diameter, for a round spot) in the range of from about 10 ⁇ m to about 1.0 cm.
  • each feature may have a width in the range of about 1.0 ⁇ m to about 1.0 mm, such as from about 5.0 ⁇ m to about 500 ⁇ m, and including from about 10 ⁇ m to about 200 ⁇ m.
  • Non-round features may have area ranges equivalent to that of circular features with the foregoing width (diameter) ranges.
  • a given feature is made up of chemical moieties, e.g., nucleic acids, that bind to (e.g., hybridize to) the same target (e.g., target nucleic acid), such that a given feature corresponds to a particular target.
  • Interfeature areas will typically (but not essentially) be present which do not carry any polynucleotide. Such interfeature areas typically will be present where the arrays are formed by processes involving drop deposition of reagents but may not be present when, for example, light directed synthesis fabrication processes are used. It will be appreciated though, that the interfeature areas, when present, could be of various sizes and configurations.
  • An array is “addressable” in that it has multiple regions (sometimes referenced as “features” or “spots” of the array) of different moieties (for example, different polynucleotide sequences) such that a region at a particular predetermined location (an “address”) on the array will detect a particular target or class of targets (although a feature may incidentally detect non-targets of that feature).
  • the target for which each feature is specific is, in representative embodiments, known.
  • An array feature is generally homogenous in composition and concentration and the features may be separated by intervening spaces (although arrays without such separation can be fabricated).
  • the “target” will be referenced as a moiety in a mobile phase (typically fluid), to be detected by probes (“target probes”) which are bound to the substrate at the various regions.
  • probes typically fluid
  • either of the “target” or “target probes” may be the one which is to be detected by the other (thus, either one could be an unknown mixture of polynucleotides to be detected by binding with the other).
  • Additional sets of probes and analogous terms refer to the multiple regions of different moieties supported by or intended to be supported by the array surface.
  • sample as used herein relates to a material or mixture of materials, containing one or more components of interest.
  • Samples include, but are not limited to, samples obtained from an organism or from the environment (e.g., a soil sample, water sample, etc.) and may be directly obtained from a source (e.g., such as a biopsy or from a tumor) or indirectly obtained e.g., after culturing and/or one or more processing steps.
  • samples are a complex mixture of molecules, e.g., comprising at least about 50 different molecules, at least about 100 different molecules, at least about 200 different molecules, at least about 500 different molecules, at least about 1000 different molecules, at least about 5000 different molecules, at least about 10,000 molecules, etc. Given the significant, somewhat chaotic alteration of the genome by cancer, one expects some abnormal distribution in expression files of those tissue samples affected by cancer.
  • genome refers to all nucleic acid sequences (coding and non-coding) and elements present in any virus, single cell (prokaryote and eukaryote) or each cell type in a metazoan organism.
  • genome also applies to any naturally occurring or induced variation of these sequences that may be present in a mutant or disease variant of any virus or cell type.
  • sequences include, but are not limited to, those involved in the maintenance, replication, segregation, and higher order structures (e.g. folding and compaction of DNA in chromatin and chromosomes), or other functions, if any, of the nucleic acids as well as all the coding regions and their corresponding regulatory elements needed to produce and maintain each particle, cell or cell type in a given organism.
  • the human genome consists of approximately 3.0 ⁇ 10 9 base pairs of DNA organized into distinct chromosomes.
  • the genome of a normal diploid somatic human cell consists of 22 pairs of autosomes (chromosomes 1 to 22) and either chromosomes X and Y (males) or a pair of chromosome Xs (female) for a total of 46 chromosomes.
  • a genome of a cancer cell may contain variable numbers of each chromosome in addition to deletions, rearrangements and amplification of any sub-chromosomal region or DNA sequence.
  • a “genome” refers to nuclear nucleic acids, excluding mitochondrial nucleic acids; however, in other aspects, the term does not exclude mitochondrial nucleic acids.
  • the “mitochondrial genome” is used to refer specifically to nucleic acids found in mitochondrial fractions.
  • genomic source is meant the initial nucleic acids that are used as the original nucleic acid source from which the probe nucleic acids are produced, e.g., as a template in the nucleic acid amplification and/or labeling protocols.
  • a surface-bound polynucleotide or probe “corresponds to” a chromosomal region usually contains a sequence of nucleic acids that is unique to that chromosomal region. Accordingly, a surface-bound polynucleotide that corresponds to a particular chromosomal region usually specifically hybridizes to a labeled nucleic acid made from that chromosomal region, relative to labeled nucleic acids made from other chromosomal regions.
  • array layout refers to one or more physical, chemical or biological characteristics of the array, such as positioning of some or all the features within the array and on a substrate, one or more feature dimensions, or some indication of an identity or function (for example, chemical or biological) of a moiety at a given location, or how the array should be handled (for example, conditions under which the array is exposed to a sample, or array reading specifications or controls following sample exposure).
  • oligonucleotide bound to a surface of a solid support or “probe bound to a solid support” or a “target bound to a solid support” refers to an oligonucleotide or mimetic thereof, e.g., PNA, LNA or UNA molecule that is immobilized on a surface of a solid substrate, where the substrate can have a variety of configurations, e.g., a sheet, bead, particle, slide, wafer, web, fiber, tube, capillary, microfluidic channel or reservoir, or other structure.
  • the collections of oligonucleotide elements employed herein are present on a surface of the same planar support, e.g., in the form of an array.
  • probe and “target” are relative terms and that a molecule considered as a probe in certain assays may function as a target in other assays.
  • test nucleic acid sample or “test nucleic acids” refer to nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is being assayed.
  • test genomic acids or a “test genomic sample” refers to genomic nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is being assayed.
  • a “reference nucleic acid sample” or “reference nucleic acids” refers to nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is known.
  • “reference genomic acids” or a “reference genomic sample” refers to genomic nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is known.
  • a “reference nucleic acid sample” may be derived independently from a “test nucleic acid sample,” i.e., the samples can be obtained from different organisms or different cell populations of the sample organism.
  • a reference nucleic acid is present in a “test nucleic acid sample” which comprises one or more sequences whose quantity or identity or degree of representation in the sample is unknown while containing one or more sequences (the reference sequences) whose quantity or identity or degree of representation in the sample is known.
  • the reference nucleic acid may be naturally present in a sample (e.g., present in the cell from which the sample was obtained) or may be added to or spiked in the sample.
  • a surface-bound polynucleotide or probe “corresponds to” a chromosome the polynucleotide usually contains a sequence of nucleic acids that is unique to that chromosome. Accordingly, a surface-bound polynucleotide that corresponds to a particular chromosome usually specifically hybridizes to a labeled nucleic acid made from that chromosome, relative to labeled nucleic acids made from other chromosomes. Array features, because they usually contain surface-bound polynucleotides, can also correspond to a chromosome.
  • non-cellular chromosome composition is a composition of chromosomes synthesized by mixing pre-determined amounts of individual chromosomes. These synthetic compositions can include selected concentrations and ratios of chromosomes that do not naturally occur in a cell, including any cell grown in tissue culture. Non-cellular chromosome compositions may contain more than an entire complement of chromosomes from a cell, and, as such, may include extra copies of one or more chromosomes from that cell. Non-cellular chromosome compositions may also contain less than the entire complement of chromosomes from a cell.
  • CGH or “Comparative Genomic Hybridization” refers generally to techniques for identification of chromosomal alterations (such as in cancer cells, for example). Using CGH, ratios between tumor or test sample and normal or control sample enable the detection of chromosomal amplifications and deletions of regions that may include oncogenes and tumor suppressive genes, for example.
  • a “CGH array” or “aCGH array” refers to an array that can be used to compare DNA samples for relative differences in copy number.
  • an aCGH array can be used in any assay in which it is desirable to scan a genome with a sample of nucleic acids.
  • an aCGH array can be used in location analysis as described in U.S. Pat. No. 6,410,243, the entirety of which is incorporated herein.
  • a CGH array provides probes for screening or scanning a genome of an organism and comprises probes from a plurality of regions of the genome.
  • the array comprises probe sequences for scanning an entire chromosome arm, wherein probes targets are separated by at least about 500 bp, at least about 1 kb, at least about 5 kb, at least about 10 kb, at least about 25 kb, at least about 50 kb, at least about 100 kb, at least about 250 kb, at least about 500 kb and at least about 1 Mb.
  • the array comprises probes sequences for scanning an entire chromosome, a set of chromosomes, or the complete complement of chromosomes forming the organism's genome.
  • resolution is meant the spacing on the genome between sequences found in the probes on the array.
  • all sequences in the genome can be present in the array.
  • the spacing between different locations of the genome that are represented in the probes may also vary, and may be uniform, such that the spacing is substantially the same between sampled regions, or non-uniform, as desired.
  • An assay performed at low resolution on one array, e.g., comprising probe targets separated by larger distances, may be repeated at higher resolution on another array, e.g., comprising probe targets separated by smaller distances.
  • both coding and non-coding genomic regions are included as probes, whereby “coding region” refers to a region comprising one or more exons that is transcribed into an mRNA product and from there translated into a protein product, while by non-coding region is meant any sequences outside of the exon regions, where such regions may include regulatory sequences, e.g., promoters, enhancers, untranslated but transcribed regions, introns, origins of replication, telomeres, etc.
  • one can have at least some of the probes directed to non-coding regions and others directed to coding regions. In certain embodiments, one can have all of the probes directed to non-coding sequences.
  • individual probes comprise sequences that do not normally occur together, e.g., to detect gene rearrangements, for example, as in a case where the mangled or abnormal genome, resulting from cancer, is expected to be less successful at expressing the normal distribution of mRNA that is normally observed for tissue of the same type that has not been affected by cancer, and therefore also less successful at producing translation mRNA that is normally observed.
  • At least 5% of the polynucleotide probes on the solid support hybridize to regulatory regions of a nucleotide sample of interest while other embodiments may have at least 30% of the polynucleotide probes on the solid support hybridize to exonic regions of a nucleotide sample of interest. In yet other embodiments, at least 50% of the polynucleotide probes on the solid support hybridize to intergenic (e.g., non-coding) regions of a nucleotide sample of interest. In certain aspects, probes on the array represent random selection of genomic sequences (e.g., both coding and non-coding).
  • particular regions of the genome are selected for representation on the array, e.g., such as CpG islands, genes belonging to particular pathways of interest or whose expression and/or copy number are associated with particular physiological responses of interest (e.g., disease, such as cancer, drug resistance, toxological responses and the like).
  • particular genes are identified as being of interest, intergenic regions proximal to those genes are included on the array along with, optionally, all or portions of the coding sequence corresponding to the genes.
  • At least about 100 bp, 500 bp, 1,000 bp, 5,000 bp, 10,000 kb or even 100,000 kb of genomic DNA upstream of a transcriptional start site is represented on the array in discrete or overlapping sequence probes.
  • at least one probe sequence comprises a motif sequence to which a protein of interest (e.g., such as a transcription factor) is known or suspected to bind.
  • repetitive sequences are excluded as probes on the arrays. However, in another aspect, repetitive sequences are included.
  • nucleic acids to use as probes may be influenced by prior knowledge of the association of a particular chromosome or chromosomal region with certain disease conditions.
  • International Application WO 93/18186 provides a list of exemplary chromosomal abnormalities and associated diseases, which are described in the scientific literature. Alternatively, whole genome screening to identify new regions subject to frequent changes in copy number can be performed using the methods of the present invention discussed further below.
  • the array can include probes which “tile” a particular region (e.g., which have been identified in a previous assay or from a genetic analysis of linkage), by which is meant that the probes correspond to a region of interest as well as genomic sequences found at defined intervals on either side, i.e., 5′ and 3′ of, the region of interest, where the intervals may or may not be uniform, and may be tailored with respect to the particular region of interest and the assay objective. In other words, the tiling density may be tailored based on the particular region of interest and the assay objective.
  • tiled arrays and assays employing the same are useful in a number of applications, including applications where one identifies a region of interest at a first resolution, and then uses tiled array tailored to the initially identified region to further assay the region at a higher resolution, e.g., in an iterative protocol.
  • the array includes probes to sequences associated with diseases associated with chromosomal imbalances for prenatal testing.
  • the array comprises probes complementary to all or a portion of chromosome 21 (e.g., Down's syndrome), all or a portion of the X chromosome (e.g., to detect an X chromosome deficiency as in Turner's Syndrome) and/or all or a portion of the Y chromosome Klinefelter Syndrome (to detect duplication of an X chromosome and the presence of a Y chromosome), all or a portion of chromosome 7 (e.g., to detect William's Syndrome), all or a portion of chromosome 8 (e.g., to detect Langer-Giedon Syndrome), all or a portion of chromosome 15 (e.g., to detect Prader-Willi or Angelman's Syndrome, all or a portion of chromosome 22 (e.g., to detect Di George
  • arrays including whose duplications or deletions are associated with specific types of cancer e.g., breast cancer, prostate cancer and the like.
  • the selection of such arrays may be based on patient information such as familial inheritance of particular genetic abnormalities.
  • an array for scanning an entire genome is first contacted with a sample and then a higher-resolution array is selected based on the results of such scanning.
  • Themed arrays also can be fabricated for use in gene expression assays, for example, to detect expression of genes involved in selected pathways of interest, or genes associated with particular diseases of interest.
  • a plurality of probes on the array is selected to have a duplex T m within a predetermined range. For example, in one aspect, at least about 50% of the probes have a duplex T m within a temperature range of about 75° C. to about 85° C. In one embodiment, at least 80% of said polynucleotide probes have a duplex T m within a temperature range of about 75° C. to about 85° C., within a range of about 77° C. to about 83° C., within a range of from about 78° C. to about 82° C. or within a range from about 79° C. to about 82° C.
  • At least about 50% of probes on an array have range of T m 's of less than about 4° C., less then about 3° C., or even less than about 2° C., e.g., less than about 1.5° C., less than about 1.0° C. or about 0.5° C.
  • the probes on the microarray in certain embodiments have a nucleotide length in the range of at least 30 nucleotides to 200 nucleotides, or in the range of at least about 30 to about 150 nucleotides. In other embodiments, at least about 50% of the polynucleotide probes on the solid support have the same nucleotide length, and that length may be about 60 nucleotides.
  • probes may be used as probes.
  • cDNAs, or inserts from phage BACs (bacterial artificial chromosomes) or plasmid clones can be arrayed. Probes may therefore also range from about 201-5000 bases in length, from about 5001-50,000 bases in length, or from about 50,001-200,000 bases in length, depending on the platform used. If other polynucleotide features are present on a subject array, they may be interspersed with, or in a separately-hybridizable part of the array from the subject oligonucleotides.
  • probes on the array comprise at least coding sequences.
  • probes represent sequences from an organism such as Drosophila melanogaster, Caenorhabditis elegans , yeast, zebrafish, a mouse, a rat, a domestic animal, a companion animal, a primate, a human, etc.
  • probes representing sequences from different organisms are provided on a single substrate, e.g., on a plurality of different arrays.
  • a “CGH assay” using an aCGH array can be generally performed as follows.
  • a population of nucleic acids contacted with an aCGH array comprises at least two sets of nucleic acid populations, which can be derived from different sample sources.
  • a target population contacted with the array comprises a set of target molecules from a reference sample and from a test sample.
  • the reference sample is from an organism having a known genotype and/or phenotype, while the test sample has an unknown genotype and/or phenotype or a genotype and/or phenotype that is known and is different from that of the reference sample.
  • the reference sample is from a healthy patient while the test sample is from a patient suspected of having cancer or known to have cancer.
  • a target population being contacted to an array in a given assay comprises at least two sets of target populations that are differentially labeled (e.g., by spectrally distinguishable labels).
  • control target molecules in a target population are also provided as two sets, e.g., a first set labeled with a first label and a second set labeled with a second label corresponding to first and second labels being used to label reference and test target molecules, respectively.
  • control target molecules in a population are present at a level comparable to a haploid amount of a gene represented in the target population.
  • control target molecules are present at a level comparable to a diploid amount of a gene.
  • control target molecules are present at a level that is different from a haploid or diploid amount of a gene represented in the target population.
  • the relative proportions of complexes formed labeled with the first label vs. the second label can be used to evaluate relative copy numbers of targets found in the two samples.
  • test and reference populations of nucleic acids may be applied separately to separate but identical arrays (e.g., having identical probe molecules) and the signals from each array can be compared to determine relative copy numbers of the nucleic acids in the test and reference populations.
  • Arrays can be fabricated using drop deposition from pulse jets of either polynucleotide precursor units (such as monomers) in the case of in situ fabrication, or the previously obtained polynucleotide.
  • polynucleotide precursor units such as monomers
  • Such methods are described in detail in, for example, the previously cited references including U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat. No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S. patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren et al., and the references cited therein. As already mentioned, these references are incorporated herein by reference.
  • Other drop deposition methods can be used for fabrication, as previously described herein.
  • photolithographic array fabrication methods may be used. Interfeature areas need not be present particularly when the arrays are made by photolithographic methods
  • an array Following receipt by a user, an array will typically be exposed to a sample (for example, a fluorescently labeled polynucleotide or protein containing sample) and the array then read. Reading of the array may be accomplished by illuminating the array and reading the location and intensity of resulting fluorescence at multiple regions on each feature of the array.
  • a scanner may be used for this purpose which is similar to the AGILENT MICROARRAY SCANNER manufactured by Agilent Technologies, Palo Alto, Calif.
  • Other suitable apparatus and methods are described in U.S. patent applications: Ser. No. 10/087,447 “Reading Dry Chemical Arrays Through The Substrate” by Corson et al.; and in U.S. Pat. Nos.
  • arrays may be read by any other method or apparatus than the foregoing, with other reading methods including other optical techniques (for example, detecting chemiluminescent or electroluminescent labels) or electrical techniques (where each feature is provided with an electrode to detect hybridization at that feature in a manner disclosed in U.S. Pat. No. 6,251,685, U.S. Pat. No. 6,221,583 and elsewhere).
  • a result obtained from the reading may be used in accordance with the techniques of the present invention in screening and finding multiple drug treatment therapies.
  • a result of the reading (whether further processed or not) may be forwarded (such as by communication) to a remote location if desired, and received there for further use (such as further processing).
  • a “gene expression profile”, “expression profile”, “gene expression signature” or “expression signature” as used herein refers to a serial pattern (e.g., time series data) generated from a plot of gene expression values for a gene across all samples processed for gene expression. For example, ten different cell lines may be processed on ten different arrays. A gene expression profile for a particular gene “A” is generated by plotting differential gene expression values for gene “A” versus a control or reference, against each for each the ten cell lines against those cell lines and then forming a serial pattern to define the expression profile.
  • the two items are in different locations, e.g., in different rooms or buildings, and may be at least one mile, ten miles, or at least one hundred miles apart.
  • “Communicating” information references transmitting the data representing that information as electrical signals over a suitable communication channel (for example, a private or public network). “Forwarding” an item refers to any means of getting that item from one location to the next, whether by physically transporting that item or otherwise (where that is possible) and includes, at least in the case of data, physically transporting a medium carrying the data or communicating the data.
  • a “processor” references any hardware and/or software combination which will perform the functions required of it.
  • any processor herein may be a programmable digital microprocessor such as available in the form of a mainframe, server, or personal computer (desktop or portable).
  • suitable programming can be communicated from a remote location to the processor, or previously saved in a computer program product (such as a portable or fixed computer readable storage medium, whether magnetic, optical or solid state device based).
  • a magnetic or optical disk may carry the programming, and can be read by a suitable disk reader communicating with each processor at its corresponding station.
  • CGH arrays may be custom designed to map out chromosome alterations, as noted above.
  • chromosome alteration there is a resultant alteration in transcripts (mRNA) produced by the alteration in genetic material on the chromosome.
  • mRNA transcripts
  • the altered mRNA in turn causes an abnormal or altered protein expression, relative to what would be seen in tissue having normal, non-altered chromosomes, which altered expression can lead to diseases and other abnormalities in the subject from which the tissue having altered chromosomal material originated.
  • FIG. 1A is a block diagram 100 of a cancer transcriptome model.
  • the abnormal or altered mRNA resulting from the altered regions of the chromosomes of the cancerous tissue are at least a factor in causing the formation of a cancer tumor.
  • the cancerous transcriptome can be characterized as a tumor gene expression profile.
  • the tumor gene expression profile includes tumor gene expression 102 that results in altered or abnormal protein expression 104 .
  • the abnormal protein expression 104 influences, causes, enhances and/or continues tumor growth 106 .
  • the present invention derives model 110 in FIG. 1B , which assumes that there is a proportional connection/relationship between tumor gene expression and tumor growth 106 .
  • FIG. 2 is a flowchart 200 illustrating steps that may be performed for determining potential treatments for diseases or other maladies that may be caused by altered mRNA resulting from chromosome alteration.
  • CGH arrays By designing one or more CGH arrays to effectively map the chromosomes of the tissue to be studied, gene sequences and possible variations in gene sequences can be predicted based on location analysis performed from results outputted by such CGH arrays.
  • CGH implies what kinds of modifications, if any, have occurred in the chromosomes, such as amplification and or deletions. Based on these modifications, prediction of a comprehensive set of candidate mRNA sequences that may be expressed by the altered chromosome or chromosomes is made possible.
  • transcription initializers e.g., promoters/primers or transcription factors that act upstream of the exons in enabling transcription
  • transcription initializers e.g., promoters/primers or transcription factors that act upstream of the exons in enabling transcription
  • Such assumption will tend to be foreign to editing processes, likely producing mangled (abnormal) RNA sequences.
  • These abnormal sequences may modify cell processes in many ways, including abnormal translations and/or interference with other cellular processes. If transcription initializers are altered, then it is likely that the genes that originate the same are missing, or altered in some other way so as to render them dysfunctional.
  • the custom array is designed to measure the altered transcriptome.
  • the custom would be designed to measure the predicted cancer transcriptome of the particular cancer being studied, as well as expected mRNA sequences for “normal” tissues of the same type that have not been affected by cancer.
  • the custom array may be designed to capture as many potential variations in mRNA sequences that may be outputted by the altered tissue, given the potential variations in alteration of the chromosomes.
  • chromosome alteration For treatment of cancer or other disease or malady causally related to chromosome alteration, it may be useful to know not only the locations where the chromosome(s) have been altered, but also what transcripts are expressed that may be causally related to the disease. For example, certain transcripts in a cancer transcriptome may cause the production of proteins that are necessary to allow a tumor to exist and/or proliferate.
  • each cell line of the altered or abnormal tissues is run on one of such custom arrays to measure the altered transcriptome at event 204 via the probes on the array that are designed for mutations in mRNA predicted to be present in the altered tissues, as noted.
  • gene expression signatures are generated at event 206 .
  • each differential expression of each gene measured in the altered tissue, relative to the “normal” tissue is measured and plotted, to generate gene expression signatures 302 across the cell lines, as exemplified in FIG. 3 .
  • samples of each of the cell lines are provided with a treatment, and experimentation is performed to dose the treatment at a level that reduces the disease or malady by a predetermined amount at event 208 .
  • a dose of the treatment may be provided that reduces tumor growth by 50% in one day.
  • the treatment given may be a drug or compound or combinations of drugs or compounds, genetic sequences, radiation, thermal, electrical magnetic, or other forms of treatment expected to reduce the disease state, or combinations of the same.
  • a phenotypic response profile resultant from the treatment is then generated from measuring effects of the treatment on the cell lines (phenotypic response) in event 210 . If a predetermined amount of treatment was applied to the cell lines for a predetermined time, then the amount of reduction in tumor growth is measured with respect to each cell line, and those amount values make up the response profile.
  • the phenotypic response profile shows the signature of the variation in the phenotypic response (in this example, tumor reduction) to the treatment across the various cell lines, as the treatment impacts disease-active proteins in the cell lines samples, the levels of which disease-active proteins vary across samples. The production of the disease-active proteins is regulated by certain genes, as noted in the models of FIGS. 1A and 1B .
  • the phenotypic response profile is generated by the amount of treatment required, with respect to each cell line, to reduce the tumor growth in that cell line by the predetermined amount, which varies across samples, thereby creating a meaningful phenotypic response profile for the treated samples/cell lines.
  • the response profile (treatment-impact profile) 402 is next compared with the gene expression signatures 302 of the untreated cell lines to look for expression signatures 302 that may be in synchronization or one hundred eighty degrees out of synchronization (anti-synchronization) with the treatment impact profile 402 .
  • Synchronization depends only on the waveform of two signatures that are being compared, and not on the scale of each signature. However, covariance similarity will be larger for signatures with larger amplitudes, whereas correlation will not be. Hence, covariance-based clustering may be more meaningful.
  • replicates may be run of each tissue sample (cell line) against each treatment, to perform an average response with regard to each, to reduce the noise in the phenotypic signatures.
  • replicate CGH arrays may be processed to reduce noise in the gene expression signatures resulting therefrom by generating an average gene expression signature for each gene from the average of the responses in the replicates.
  • phenotypic signatures may be processed using “self-self” prediction techniques such as described in co-pending, commonly owned application Ser. No. 10/400,372 filed Mar. 27, 2003 and titled “Method and System for Predicting Multi-Variable Outcomes”, and in Application Ser. No. 60/368,586 filed Mar. 29, 2002 and titled “Generalized Similarity Least Squares Predictor”, both of which are incorporated, in their entireties, by reference thereto.
  • a phenotypic signature that is significantly in synchronization with or in anti-synchronization with a gene expression signature can be considered to be causally related to the gene from which the gene expression signature is derived. That is, when the treatment is applied, the resultant changes in the tissues (the phenotypic response) are proportional to the expression in that gene (from which the synchronized or anti-synchronized gene expression signature was generated) in the abnormal tissue relative to the expression of the same gene in the normal tissue, as it is hypothesized that a change (reduction or increase) in the proteins generated from expression of that gene has occurred in response to the treatment.
  • the phenotypic response in this example, reduction of tumor growth, may be correlated with a decrease in diseased or otherwise abnormal (tissue malady) gene expression (synchronization), to cause a decrease in proteins that proliferate the disease, or an increase in gene expression (anti-synchronization) to cause an increase in proteins that inhibit tumor growth or to regulate (reduce) the production of proteins that cause the disease to proliferate.
  • tissue malady tissue malady gene expression
  • anti-synchronization increase in gene expression to cause an increase in proteins that inhibit tumor growth or to regulate (reduce) the production of proteins that cause the disease to proliferate.
  • phase link between the phenotypic response and gene expression signatures from diseased tissues are likely more complicated, since protein relationships are not able to be identified at this time, as noted elsewhere in this disclosure, approximations are made according to the techniques described herein based on identifying synchronized and anti-synchronized signatures, between phenotypic response signatures and gene expression signatures from tissues know to have a malady.
  • a plurality of treatments is provided to produce an equal number of phenotypic response signatures 402 , to search for one or more potentially effective treatments for the abnormality being studied.
  • the phenotypic signatures are inverted.
  • FIG. 5 shows a matrix 500 in which the data points used to generate each of the gene expression signatures 302 , phenotypic response signatures 402 and inverted phenotypic response signatures 402 i have been inserted as cell values of the matrix 500 , with reach row representing a signature, and each column representing one of the cell lines.
  • each treatment-induced phenotypic signature 402 and inverse signature 402 i in FIG. 5 is compared with differential-expression signatures 302 , to map treatments to the transcriptome locations of the correlated genes in the abnormal tissue samples.
  • An approach to try and find relationships among data on this order of magnitude may include generating a matrix containing the data points used to generate each of the gene expression signatures 302 , and another matrix containing the data points used to generate the phenotypic response signatures 402 and inverted phenotypic response signatures 402 i and producing a correlation matrix having a number of values equal to the product of the number of gene expression signatures times the sum of the number of phenotypic response signatures and inverted phenotypic response signatures.
  • Such a correlation matrix is generated by calculating the inner product of the values of the phenotypic response signatures 402 and inverse phenotypic response signatures 402 i with the values of the gene expression signatures 302 , and typically centering by means and scaling by standard deviation.
  • the values in the correlation matrix are a cross product of normalized or standardized vectors, and may be optionally centered by subtracting a vector mean value.
  • various clustering techniques may be applied to try and identify blocking patterns in the data, or “clusters” which might indicate that certain treatments are effecting certain groups of genes.
  • clustering techniques may be applied to treatment and gene expression matrices prior to performing the cross product operation, in an effort to reduce the sizes to something less than what is started with.
  • the present inventor believes that important information is dropped or lost when such data is processed into a correlation matrix, such as by performing a cross product/inner product procedure to obtain the correlation matrix, as described above.
  • phase information is lost, as is well-known, particularly among electrical engineers.
  • the phenotypic responses to the treatments at issue are measured as the output (i.e., phenotypic signature) and the inputs are the gene expression ratio values, there is a protein link between the input and output, as noted above with regard to FIGS. 1A and 1B , i.e., the gene expression instructs the generation of proteins which effect the disease state. Treatments impact disease proteins to produce the phenotypic output response.
  • the present inventor believes that it is important to consider the phase relationships between the inputs (gene expression) and outputs (phenotypic responses), as important factors based on the protein links between the inputs and outputs.
  • the approach taken by the present invention is akin to a lumped parameters modeling approach.
  • the genotypic signature (gene expression signature) is modeled as the input 102
  • the phenotypic signature is modeled as the output 106
  • a “black box” 104 represents a transfer function that transforms input 102 to output 106 .
  • input 102 and output 106 may be single inputs and outputs, e.g., a single gene and single drug impact or single protein production, or multiple inputs and outputs, e.g., groups of genes producing groups of proteins and/or multiple drug impacts.
  • the present invention is adapted to further modifications within the black box as additional information becomes known about the relationships between genes and the production of proteins resultant therefrom.
  • This will enable the input and output to be modeled, for example, like an electrical circuit, with black box 104 containing “resistances”, “inductances”, “capacitances”, “emfs”, etc., that model the best knowledge that is gained with regard to the relationships between the genes and proteins produced therefrom, which will enable the modeling of an accurate phase relationship between the gene expression signatures and the phenotypic response signatures.
  • Such complete modeling knowledge is currently not known however, and it may be at least five years, or more, before such knowledge is obtained in sufficient detail.
  • this technique assumes that an input signature is either in phase (0°, synchronized), or out of phase (i.e., 180° out of phase, anti-synchronized) with the output signature, as noted above.
  • this methodology assembles all of the signatures, after appropriate transformation such as by use of Log transforms, into one large list of profiles or matrix of profile values 500 , properly normalized, as shown in FIG. 5 .
  • the rows of matrix 500 include the gene expression signatures 302 and phenotypic response signatures 402 and inverted phenotypic response signatures 402 i.
  • clustering procedures are performed over the entire matrix 500 as a whole.
  • Each of the signatures are normalized for comparison purposes.
  • One method of normalizing used is by Z-scoring, although other normalization methods may be substituted.
  • weighting techniques may be applied so that highly up-regulated features do not receive over-amplified attention during the clustering process. Further, noisy profiles may be weighted with relatively reduced weight.
  • Clustering of the information may be performed using any cluster technique that is capable of clustering similar signatures, such as K-means or other known techniques. However, it is preferred to perform the clustering using the tools and techniques described in co-pending, commonly owned application Ser. No. 09/986,746 filed Nov. 9, 2001 and titled “System and Method for Dynamic Data Clustering” which application is incorporated herein, in its entirety, by reference thereto. Similarity may be defined by the Euclidean distance between the normalized vectors in matrix 500 . These dynamic clustering techniques are scalable, and parallellizable, so that they can handle large scale problems such as those presented in the context of the present invention.
  • clusters which contain mixtures of phenotypic response signatures 402 , 402 i and gene expression signatures 302 .
  • the clustering occurs among genes in phase and sometimes with genes out of phase.
  • Some clusters may contain only phenotypic response signatures 402 , 402 i and some clusters may contain only gene expression signatures 302 .
  • the clusters of interest to the present techniques contain both types of signatures to imply gene-drug associations.
  • FIG. 6 illustrates plots of three treatment/phenotypic profiles 602 , 604 , 606 resultant in NCI Lung Cancer Cell Lines having been treated with three different treatments, that were determined, by dynamic data clustering techniques described in application Ser. No. 09/986,746, to be in synchronization with gene expression profile 608 .
  • Visual observation of these plots makes it evident that the peaks and valleys of the phenotypic profiles 602 , 604 , 606 generally follow the same contour evident in the gene expression profile 608 .
  • FIG. 7 is a schematic representation of an ellipsoid 700 which represents the plot of a cluster of vectors from a matrix such as matrix 500 or a similar matrix assembled, when the vectors are plotted in high dimensional space.
  • the clustering techniques described above are designed to find and identify such clusters, as well as locate the densest part of each ellipsoid cluster, shown as diagonal or “ridge” 611 in FIG. 7 .
  • the system uses force functions to converge a mathematical probe to the densest part of a profile cluster.
  • the system not only identifies the clusters but defines a point of reference for all profiles in the cluster, with respect to each cluster identified.
  • the distance of each profile from the center of the cluster it belongs to can be defined by any viable distance metric.
  • An example of a distance metric used is Euclidean distance.
  • the relative angular positions of the profiles may also be consequential for selecting combinations of effective treatments.
  • Treatment 512 is noted as being a member of a group of similar treatments that are the closest to the densest location 610 of the ellipsoid 700 .
  • the distances measured do not identify angularity between profiles, or which side of the ridge 611 (dominant, principal axis running through the densest location) a particular vector (profile) lies on, the distance value does give an indication of how close to the ridge 611 and densest location 610 that vector lies.
  • the distance for treatment 512 is shown close to ridge 611 in FIG. 7 .
  • Another treatment vector 514 was measured to be further from ridge 611 as shown in FIG. 7 .
  • a third treatment vector 516 is shown at a somewhat intermediate distance from ridge 611 , relative to distances for 512 and 514
  • the treatment vector 518 is shown at an intermediate distance between the distances for 512 and 514 .
  • One approach is to find a combination of treatments, such as those represented in FIG. 7 , that show relationships to different genes in the clustered profile, so that each treatment appears to be related to strategic combinations of genes involved in the disease process.
  • the treatments are then combined and tested together, each in a low dose/amount to observe whether a combinatorial synergistic effect on this disease is achieved by the combination.
  • the low dose/amount combination reduces the side effects of each individual treatment in the combination.
  • any combination where an adverse reaction occurs among two or more treatments in the combination used would be discarded as being unsuitable for use as a potential treatment combination.
  • Useful combinations will specifically and effectively target the disease process being studied.
  • events 208 and 210 may be repeated with additional different treatments, and analysis of phenotypic responses from these additional treatments may be carried out to select alternative treatments or additional treatments to be added to the treatment regimen from which the most recent tests were conducted. Selection may be made by again selecting those treatments that produce phenotypic profiles that are in synchronization or anti-synchronization with gene expression profiles of interest (gene expression profiles of those genes thought to be effecting or effected by the abnormal condition).
  • various combinations of treatments in a treatment family (such as, for example, drugs in the drug family, or related compounds in a compound family) of each identified treatment vector may be tested in the manner described above, as related treatments in a treatment family (e.g., drugs in a drug family) will generally fall within similar relative distances from the densest location of the ellipsoid.
  • the combinations of treatments may be predicted to try as potential combinations for multiple treatments in patients which will address the broadest spectrum of genes related to the production of proteins seen as elevated or inhibited when a tissue is in the disease state. By testing the predicted combinations, useful combinations for treatment in patients will be much easier to identify.
  • the present invention is a forward-looking way of choosing and predicting specific combinations of treatments to test, e.g., using high-throughput (HTP) screening of treatment combinations, and as such, greatly reduces the time to finding successful combinations, which currently have only been discovered accidentally, through hindsight and experiences gained through individual treatments.
  • HTTP high-throughput
  • the treatments identified are targeted to the genes involved in the disease process/malady. Because of this, the chances of significant side effects are reduced. For those combinations found to be effective in the sample tissues, further testing, such as animal testing would be warranted to study any effects the treatments may have on normal tissues within an organism, since the testing with the disease tissue samples only proves that the combination of drugs applied is effective at treating the diseased tissues. For the cancer examples discussed above, testing with the tissue samples would only show that the combination of treatments effectively kills the cancer cells, or not. Animal testing would further show the effects of the treatment combination on the normal tissues in the organism, to see if the animal survives the treatment combination.
  • a technique for excluding potential treatments may also be carried out.
  • One example of such an exclusion technique is to generate at least one phenotypic signature representing treatment-response values of each of the tissue samples exhibiting the malady, resultant from treating the tissue samples exhibiting the malady, with at least one treatment having known undesirable characteristics (e.g., is toxic to normal tissues, or ineffective, or has other undesirable side effects, etc.) for treatment of the tissues exhibiting the malady, using the techniques described above.
  • This one or more phenotypic signature(s) are then included with all other signatures (e.g., the signatures generated in matrix 500 ) and then subjected to clustering as described above.
  • Any phenotypic signature representing treatment-response values that are close to a phenotypic signature generated from a treatment known to have undesirable effects is then eliminated from candidacy for selection as a potential treatment.
  • a predefined distance may be established as a threshold, so that any phenotypic signature that is less than or equal to the predefined distance from a location of a phenotypic signature resulting from treatment with a treatment having known undesirable characteristics is eliminated.
  • a phenotypic signature representing the treatment-response values of each of the tissue samples exhibiting the malady, as effected by the original or existing single or combination treatment may be generated according to the techniques described above. Those tissue samples may then be treated with one or more treatments not included in the original or existing single or combination treatment and then one or more phenotypic signatures may be generated from the treatment-response values of the tissues resulting from the one or more treatments.
  • Clustering may then be performed, as described above, based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together, including both the phenotypic signature from the original treatment and the phenotypic signature(s) from the additional treatment(s).
  • At least one additional treatment may then be selected for incorporating with the original treatment, by identifying the treatment-response phenotypic signature(s) caused by the additional treatment(s), and which are clustered with phenotypic signatures identifying the treatment-response phenotypic signature(s) caused by the treatment or treatments in the original treatment, as well as with gene expression signatures representing differential expression levels representative of the diseased tissue samples, but separated from the phenotypic signatures identifying the treatment-response phenotypic signatures caused by the treatment or treatments in the original treatment, so as to address malady-related gene activity not currently addressed by the treatment or treatments in the original or existing treatment.
  • Protein pathways implicated by differential gene expression levels when comparing treated tissues to non-treated tissue and among diseased and non-diseased tissues are used to produce phase relations between treatment responses and expression profiles across the tissue samples being tested.
  • An example of such is the twenty cancer cell lines referred to above.
  • all phase-related profiles are normalized and clustered. The number and sizes of the resulting clusters may indicate their relative importance as to effective treatments.
  • the structure of each cluster infers treatment-gene associations to guide multi-treatment selections.
  • FIG. 8 illustrates a typical computer system in accordance with an embodiment of the present invention.
  • the computer system 800 may include any number of processors 802 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 806 (typically a random access memory, or RAM), primary storage 804 (typically a read only memory, or ROM).
  • primary storage 804 acts to transfer data and instructions uni-directionally to the CPU and primary storage 806 is used typically to transfer data and instructions in a bi-directional manner Both of these primary storage devices may include any suitable computer-readable media such as those described above.
  • a mass storage device 808 is also coupled bi-directionally to CPU 802 and provides additional data storage capacity and may include any of the computer-readable media described above.
  • Mass storage device 808 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk that is slower than primary storage. It will be appreciated that the information retained within the mass storage device 808 , may, in appropriate cases, be incorporated in standard fashion as part of primary storage 806 as virtual memory. A specific mass storage device such as a CD-ROM 814 may also pass data uni-directionally to the CPU.
  • CPU 802 is also coupled to an interface 810 that includes one or more input/output devices such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • CPU 802 optionally may be coupled to a computer or telecommunications network using a network connection as shown generally at 812 . With such a network connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps.
  • the above-described devices and materials will be familiar to those of skill in the computer hardware and software arts.
  • instructions for clustering vectors may be stored on mass storage device 808 or 814 and executed on CPU 808 in conjunction with primary memory 806 .
  • embodiments of the present invention further relate to computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations.
  • the media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts.
  • Examples of computer-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM, CDRW, DVD-ROM, or DVD-RW disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
  • Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.

Abstract

Methods, systems and computer readable media for discovering a combination of treatments to reduce the progress of, or eliminate a tissue malady. Gene expression values of at least one sample of tissue exhibiting the tissue malady and at least one reference sample tissue that does not exhibit the malady are measured using at least one CGH array designed to measure gene sequences and possible variations in gene sequences attributable to the malady. Gene expression signatures are generated from differential expression values of ratios of the measured gene expression values between the at least one sample exhibiting the malady and the at least one reference sample, across all samples, respectively. The tissue samples exhibiting the malady are treated with a treatment, and a treatment-response value is measured with respect to each of the tissue samples treated, as effected by the treatment. A phenotypic signature representing the treatment-response values of each of the tissue samples treated is generated for characterizing the effects of the treatment on the tissues treated. Processing may be repeated with a different treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments. A clustering operation is then based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together, and treatments are selected by identifying the treatment-response phenotypic signatures caused by those treatments, and which are clustered with gene expression signatures representing differential expression levels representative of the at least one tissue sample exhibiting the malady.

Description

    CROSS-REFERENCE
  • This application is a continuation-in-part application of application Ser. No. 10/640,081, filed Aug. 13, 2003, which is incorporated herein by reference in its entirety and to which application we claim priority under 35 USC §120.
  • BACKGROUND OF THE INVENTION
  • Many genomic and genetic studies are directed to the identification of differences in gene dosage or expression among cell populations for the study and detection of disease. For example, many malignancies involve the gain or loss of DNA sequences (alterations in copy number), sometimes entire chromosomes, that may result in activation of oncogenes or inactivation of tumor suppressor genes. Identification of the genetic events leading to neoplastic transformation and subsequent progression can facilitate efforts to define the biological basis for disease, improve prognostication of therapeutic response, and permit earlier tumor detection. In addition, perinatal genetic problems frequently result from loss or gain of chromosome segments such as trisomy 21 or the micro deletion syndromes. Trisomy of chromosome 13 results in Patau syndrome. Abnormal numbers of sex chromosomes result in various developmental disorders. Thus, methods of prenatal detection of such abnormalities can be helpful in early diagnosis of disease.
  • Comparative genomic hybridization (CGH) is a technique that is used to evaluate variations in genomic copy number in cells. In one implementation of CGH, genomic DNA is isolated from normal reference cells, as well as from test cells (e.g., tumor cells). The two nucleic acids are differentially labeled and then simultaneously hybridized in situ to metaphase chromosomes of a reference cell. Chromosomal regions in the test cells which are at increased or decreased copy number can be identified by detecting regions where the ratio of signal from the two distinguishably labeled nucleic acids is altered. For example, those regions that have been decreased in copy number in the test cells will show relatively lower signal from the test DNA that the reference shows, compared to other regions of the genome. Regions that have been increased in copy number in the test cells will show relatively higher signal from the test DNA.
  • A recent technology development introduced an oligonucleotide array platform for array based comparative genomic hybridization (aCGH) analyses. Such approaches offer benefits over immobilized chromosome approaches, including a higher resolution, as defined by the ability of the assay to localize chromosomal alterations to specific areas of the genome. For further detailed description regarding aCGH technology, the reader is referred to co-pending application Ser. No. 10/744,595 filed Dec. 22, 2003 and titled “Comparative Genomic Hybridization Assays Using Immobilized Oligonucleotide Features and Compositions for Practicing the Same”, which is incorporated herein, in its entirety, by reference thereto.
  • There is a continuing need for techniques and systems for using array CGH technology not only for analyzing disease states and associated locations of chromosomal alterations, but also for discovering potential treatments for the underlying diseases.
  • SUMMARY OF THE INVENTION
  • Methods, systems and computer readable media for discovering a combination of treatments to reduce the progress of, or eliminate a tissue malady, are provided to include the steps of: (a) measuring gene expression values of at least one sample of tissue exhibiting the tissue malady and at least one reference sample tissue that does not exhibit the malady, using at least one CGH array designed to measure gene sequences and possible variations in gene sequences attributable to the malady; (b) generating gene expression signatures from differential expression values of ratios of the measured gene expression values between the at least one sample exhibiting the malady and the at least one reference sample, across all samples, respectively; (c) treating the at least one tissue sample exhibiting the malady with a treatment; (d) measuring a treatment-response value with respect to each of the tissue samples treated, as effected by the treatment; (e) generating a phenotypic signature representing the treatment-response values of each of the tissue samples treated; (f) repeating steps (c)-(e) with a different treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments; (g) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and (h) selecting treatments by identifying the treatment-response phenotypic signatures caused by those treatments, and which are clustered with gene expression signatures representing differential expression levels representative of the at least one tissue sample exhibiting the malady.
  • Method, systems and computer readable media are provided for screening a combination of treatments to select treatments for tissue exhibiting a malady, to include the steps of: (a) providing differential expression levels of tissue samples exhibiting the malady relative to at least one reference tissue sample from respective features of CGH arrays designed to measure gene sequences and possible variations in gene sequences attributable to the malady; (b) for respective differential expression levels from respective features of respective CGH arrays for each tissue sample exhibiting the malady, providing a gene expression signature representing the differential expression level for each tissue sample for gene expression levels from that feature, respectively; (c) providing a treatment-response value, for each tissue sample exhibiting the malady having been treated with a treatment, as effected by the treatment; (d) generating a phenotypic signature representing the treatment-response values of each of the tissue samples having been treated; (e) repeating steps (c)-(d) with a different treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments; (f) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and (g) selecting treatments by identifying the treatment-response phenotypic signatures caused by those treatments, and which are clustered with gene expression signatures representing differential expression levels representative of the tissue samples exhibiting the malady.
  • Methods, systems and computer readable media for augmenting an original or existing single treatment or treatment combination for a disease with at least one additional treatment that covers gene activity of the disease not addressed by the original or existing treatment are provided to include the steps of: (a) providing differential expression levels of diseased tissue samples relative to at least one reference tissue for respective features of CGH arrays designed to measure gene sequences and possible variations in gene sequences attributable to the disease; (b) for respective features of respective CGH arrays for each diseased tissue sample, providing a gene expression signature representing the differential expression level for each tissue sample for that feature, respectively; (c) treating the diseased tissue samples with the original or existing single treatment or combination treatment; (d) measuring a treatment-response value with respect to each of the diseased tissue samples as effected by the original or existing single or combination treatment; (e) generating a phenotypic signature representing the treatment-response values of each of the diseased tissue samples as effected by the original or existing single or combination treatment; (f) treating the diseased tissue samples with a treatment that is not included in the original or existing single or combination treatment; (g) measuring a treatment-response value with respect to each of the diseased tissue samples as effected by the treatment that is not included in the original or existing single or combination treatment; (h) generating a phenotypic signature representing the treatment-response values of each of the diseased tissue samples as effected by the treatment that is not included in the original or existing single or combination treatment; (i) repeating steps (f)-(h) with a different treatment that is also not included in the original or existing single or combination treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments not included in the original or existing single or combination treatment; (j) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and (k) selecting at least one treatment by identifying the treatment-response phenotypic signatures caused by the at least one treatment, and which are clustered with phenotypic signatures identifying the treatment-response phenotypic signatures caused by the treatment or treatments in the original treatment, as well as with gene expression signatures representing differential expression levels representative of the diseased tissue samples, but separated from the phenotypic signatures identifying the treatment-response phenotypic signatures caused by the treatment or treatments in the original treatment, so as to address disease-gene activity not currently addressed by the treatment or treatments in the original or existing treatment.
  • These and other advantages and features of the invention will become apparent to those persons skilled in the art upon reading the details of the methods, systems and computer readable media as more fully described below.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1A is a block diagram of a cancer transcriptome model.
  • FIG. 1B shows a modification of the model of FIG. 1B.
  • FIG. 2 is a flowchart illustrating steps that may be performed for determining potential treatments for diseases or other maladies that may be caused by altered mRNA resulting from chromosome alteration.
  • FIG. 3 shows exemplary plots of gene expression signatures of measurements of genes in altered or abnormal tissue, relative to “normal” tissue across multiple samples.
  • FIG. 4 shows an example of a gene expression signature plotted together with a phenotypic response signature or profile.
  • FIG. 5 shows a matrix in which data points used to generate gene expression signatures, phenotypic response signatures and inverted phenotypic response signatures have been inserted as cell values of the matrix.
  • FIG. 6 illustrates plots of three treatment/phenotypic profiles from samples having been treated with three different treatments that were determined to be in synchronization with the gene expression profile shown.
  • FIG. 7 is a schematic representation of an ellipsoid that represents the plot of a cluster of vectors from a matrix, such as the matrix shown in FIG. 5. As noted, this is a schematic representation, as, in reality, data ellipsoids are hyper-dimensional with complicated radial and angular geometries.
  • FIG. 8 is a block diagram illustrating an example of a computer system that may be employed in carrying out the present invention.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Before the present systems, methods and computer readable media are described, it is to be understood that this invention is not limited to particular treatments, drugs, diseases, methods, method steps, statistical methods, hardware or software described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
  • Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
  • Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
  • It must be noted that as used herein and in the appended claims, the singular forms “a”, “and”, and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a compound” includes a plurality of such compounds and reference to “the expression profile” includes reference to one or more expression profiles and equivalents thereof known to those skilled in the art, and so forth.
  • The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
  • DEFINITIONS
  • A “genotype” refers to the actual makeup of one or more genes (DNA) in living tissue. A genotypic signature is a textual or electronic representation that directly identifies the genotype.
  • A “phenotype” is related to a genotype, in that it is some sort of physical expression resulting from a blueprint provided by the genotype. A phenotypic signature is a textual or electronic representation of values representing the expression that defines the phenotype. mRNA expression might be considered to be either a genotype or phenotype, as it is in a gray area where the genotype executes the phenotype. It is referred to herein as genotype/phenotype.
  • A “treatment” refers to the administration of an agent to living tissue (generally a diseased tissue) that has some measurable effect on protein production by that tissue, which effect can be inferred by measurement of gene expression levels of the tissue, using microarray technology. “Treatments” may refer to, but are not limited to drugs, compounds, genetic sequences used to target specific locations of the genetic makeup of the tissue, radiation, heat, cryogenics, or any other kind of application that produces an effect as described above.
  • The term “transcriptome” refers to the set of all messenger RNA (mRNA) (or transcripts) in one or a population of biological cells.
  • The term “altered transcriptome” refers to a transcriptome resulting from a tissue sample containing one or more regions of abnormality in one or more chromosome (e.g., amplification, deletion). A “cancer transcriptome” refers to the altered transcriptome of a biological cell sample that is cancerous.
  • A “biopolymer” is a polymer of one or more types of repeating units. Biopolymers are typically found in biological systems and particularly include polysaccharides (such as carbohydrates), and peptides (which term is used to include polypeptides and proteins) and polynucleotides as well as their analogs such as those compounds composed of or containing amino acid analogs or non-amino acid groups, or nucleotide analogs or non-nucleotide groups. This includes polynucleotides in which the conventional backbone has been replaced with a non-naturally occurring or synthetic backbone, and nucleic acids (or synthetic or naturally occurring analogs) in which one or more of the conventional bases has been replaced with a group (natural or synthetic) capable of participating in Watson-Crick type hydrogen bonding interactions. Polynucleotides include single or multiple stranded configurations, where one or more of the strands may or may not be completely aligned with another.
  • A “nucleotide” refers to a sub-unit of a nucleic acid and has a phosphate group, a 5 carbon sugar and a nitrogen containing base, as well as functional analogs (whether synthetic or naturally occurring) of such sub-units which in the polymer form (as a polynucleotide) can hybridize with naturally occurring polynucleotides in a sequence specific manner analogous to that of two naturally occurring polynucleotides. For example, a “biopolymer” includes DNA (including cDNA), RNA, oligonucleotides, and PNA and other polynucleotides as described in U.S. Pat. No. 5,948,902 and references cited therein (all of which are incorporated herein by reference), regardless of the source. An “oligonucleotide” generally refers to a nucleotide multimer of about 10 to 100 nucleotides in length, while a “polynucleotide” includes a nucleotide multimer having any number of nucleotides. A “biomonomer” references a single unit, which can be linked with the same or other biomonomers to form a biopolymer (for example, a single amino acid or nucleotide with two linking groups one or both of which may have removable protecting groups).
  • A “chemical array”, “microarray”, “bioarray” or “array”, A chemical “array”, unless a contrary intention appears, includes any one, two or three-dimensional arrangement of addressable regions bearing a particular chemical moiety or moieties (for example, biopolymers such as polynucleotide sequences) associated with that region, where the chemical moiety or moieties are immobilized on the surface in that region. By “immobilized” is meant that the moiety or moieties are stably associated with the substrate surface in the region, such that they do not separate from the region under conditions of using the array, e.g., hybridization and washing and stripping conditions. As is known in the art, the moiety or moieties may be covalently or non-covalently bound to the surface in the region. For example, each region may extend into a third dimension in the case where the substrate is porous while not having any substantial third dimension measurement (thickness) in the case where the substrate is non-porous. An array may contain more than ten, more than one hundred, more than one thousand more than ten thousand features, or even more than one hundred thousand features, in an area of less than 20 cm2 or even less than 10 cm2. For example, features may have widths (that is, diameter, for a round spot) in the range of from about 10 μm to about 1.0 cm. In other embodiments each feature may have a width in the range of about 1.0 μm to about 1.0 mm, such as from about 5.0 μm to about 500 μm, and including from about 10 μm to about 200 μm. Non-round features may have area ranges equivalent to that of circular features with the foregoing width (diameter) ranges. A given feature is made up of chemical moieties, e.g., nucleic acids, that bind to (e.g., hybridize to) the same target (e.g., target nucleic acid), such that a given feature corresponds to a particular target. At least some, or all, of the features are of different compositions (for example, when any repeats of each feature composition are excluded the remaining features may account for at least 5%, 10%, or 20% of the total number of features). Interfeature areas will typically (but not essentially) be present which do not carry any polynucleotide. Such interfeature areas typically will be present where the arrays are formed by processes involving drop deposition of reagents but may not be present when, for example, light directed synthesis fabrication processes are used. It will be appreciated though, that the interfeature areas, when present, could be of various sizes and configurations. An array is “addressable” in that it has multiple regions (sometimes referenced as “features” or “spots” of the array) of different moieties (for example, different polynucleotide sequences) such that a region at a particular predetermined location (an “address”) on the array will detect a particular target or class of targets (although a feature may incidentally detect non-targets of that feature). The target for which each feature is specific is, in representative embodiments, known. An array feature is generally homogenous in composition and concentration and the features may be separated by intervening spaces (although arrays without such separation can be fabricated).
  • In the case of an array, the “target” will be referenced as a moiety in a mobile phase (typically fluid), to be detected by probes (“target probes”) which are bound to the substrate at the various regions. However, either of the “target” or “target probes” may be the one which is to be detected by the other (thus, either one could be an unknown mixture of polynucleotides to be detected by binding with the other). “Addressable sets of probes” and analogous terms refer to the multiple regions of different moieties supported by or intended to be supported by the array surface.
  • The term “sample” as used herein relates to a material or mixture of materials, containing one or more components of interest. Samples include, but are not limited to, samples obtained from an organism or from the environment (e.g., a soil sample, water sample, etc.) and may be directly obtained from a source (e.g., such as a biopsy or from a tumor) or indirectly obtained e.g., after culturing and/or one or more processing steps. In one embodiment, samples are a complex mixture of molecules, e.g., comprising at least about 50 different molecules, at least about 100 different molecules, at least about 200 different molecules, at least about 500 different molecules, at least about 1000 different molecules, at least about 5000 different molecules, at least about 10,000 molecules, etc. Given the significant, somewhat chaotic alteration of the genome by cancer, one expects some abnormal distribution in expression files of those tissue samples affected by cancer.
  • The term “genome” refers to all nucleic acid sequences (coding and non-coding) and elements present in any virus, single cell (prokaryote and eukaryote) or each cell type in a metazoan organism. The term genome also applies to any naturally occurring or induced variation of these sequences that may be present in a mutant or disease variant of any virus or cell type. These sequences include, but are not limited to, those involved in the maintenance, replication, segregation, and higher order structures (e.g. folding and compaction of DNA in chromatin and chromosomes), or other functions, if any, of the nucleic acids as well as all the coding regions and their corresponding regulatory elements needed to produce and maintain each particle, cell or cell type in a given organism.
  • For example, the human genome consists of approximately 3.0×109 base pairs of DNA organized into distinct chromosomes. The genome of a normal diploid somatic human cell consists of 22 pairs of autosomes (chromosomes 1 to 22) and either chromosomes X and Y (males) or a pair of chromosome Xs (female) for a total of 46 chromosomes. A genome of a cancer cell may contain variable numbers of each chromosome in addition to deletions, rearrangements and amplification of any sub-chromosomal region or DNA sequence. In certain aspects, a “genome” refers to nuclear nucleic acids, excluding mitochondrial nucleic acids; however, in other aspects, the term does not exclude mitochondrial nucleic acids. In still other aspects, the “mitochondrial genome” is used to refer specifically to nucleic acids found in mitochondrial fractions.
  • By “genomic source” is meant the initial nucleic acids that are used as the original nucleic acid source from which the probe nucleic acids are produced, e.g., as a template in the nucleic acid amplification and/or labeling protocols.
  • If a surface-bound polynucleotide or probe “corresponds to” a chromosomal region, the polynucleotide usually contains a sequence of nucleic acids that is unique to that chromosomal region. Accordingly, a surface-bound polynucleotide that corresponds to a particular chromosomal region usually specifically hybridizes to a labeled nucleic acid made from that chromosomal region, relative to labeled nucleic acids made from other chromosomal regions.
  • An “array layout” or “array characteristics”, refers to one or more physical, chemical or biological characteristics of the array, such as positioning of some or all the features within the array and on a substrate, one or more feature dimensions, or some indication of an identity or function (for example, chemical or biological) of a moiety at a given location, or how the array should be handled (for example, conditions under which the array is exposed to a sample, or array reading specifications or controls following sample exposure).
  • The phrase “oligonucleotide bound to a surface of a solid support” or “probe bound to a solid support” or a “target bound to a solid support” refers to an oligonucleotide or mimetic thereof, e.g., PNA, LNA or UNA molecule that is immobilized on a surface of a solid substrate, where the substrate can have a variety of configurations, e.g., a sheet, bead, particle, slide, wafer, web, fiber, tube, capillary, microfluidic channel or reservoir, or other structure. In certain embodiments, the collections of oligonucleotide elements employed herein are present on a surface of the same planar support, e.g., in the form of an array. It should be understood that the terms “probe” and “target” are relative terms and that a molecule considered as a probe in certain assays may function as a target in other assays.
  • As used herein, a “test nucleic acid sample” or “test nucleic acids” refer to nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is being assayed. Similarly, “test genomic acids” or a “test genomic sample” refers to genomic nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is being assayed.
  • As used herein, a “reference nucleic acid sample” or “reference nucleic acids” refers to nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is known. Similarly, “reference genomic acids” or a “reference genomic sample” refers to genomic nucleic acids comprising sequences whose quantity or degree of representation (e.g., copy number) or sequence identity is known. A “reference nucleic acid sample” may be derived independently from a “test nucleic acid sample,” i.e., the samples can be obtained from different organisms or different cell populations of the sample organism. However, in certain embodiments, a reference nucleic acid is present in a “test nucleic acid sample” which comprises one or more sequences whose quantity or identity or degree of representation in the sample is unknown while containing one or more sequences (the reference sequences) whose quantity or identity or degree of representation in the sample is known. The reference nucleic acid may be naturally present in a sample (e.g., present in the cell from which the sample was obtained) or may be added to or spiked in the sample.
  • If a surface-bound polynucleotide or probe “corresponds to” a chromosome, the polynucleotide usually contains a sequence of nucleic acids that is unique to that chromosome. Accordingly, a surface-bound polynucleotide that corresponds to a particular chromosome usually specifically hybridizes to a labeled nucleic acid made from that chromosome, relative to labeled nucleic acids made from other chromosomes. Array features, because they usually contain surface-bound polynucleotides, can also correspond to a chromosome.
  • A “non-cellular chromosome composition” is a composition of chromosomes synthesized by mixing pre-determined amounts of individual chromosomes. These synthetic compositions can include selected concentrations and ratios of chromosomes that do not naturally occur in a cell, including any cell grown in tissue culture. Non-cellular chromosome compositions may contain more than an entire complement of chromosomes from a cell, and, as such, may include extra copies of one or more chromosomes from that cell. Non-cellular chromosome compositions may also contain less than the entire complement of chromosomes from a cell.
  • “CGH” or “Comparative Genomic Hybridization” refers generally to techniques for identification of chromosomal alterations (such as in cancer cells, for example). Using CGH, ratios between tumor or test sample and normal or control sample enable the detection of chromosomal amplifications and deletions of regions that may include oncogenes and tumor suppressive genes, for example.
  • A “CGH array” or “aCGH array” refers to an array that can be used to compare DNA samples for relative differences in copy number. In general, an aCGH array can be used in any assay in which it is desirable to scan a genome with a sample of nucleic acids. For example, an aCGH array can be used in location analysis as described in U.S. Pat. No. 6,410,243, the entirety of which is incorporated herein. In certain aspects, a CGH array provides probes for screening or scanning a genome of an organism and comprises probes from a plurality of regions of the genome. In one aspect, the array comprises probe sequences for scanning an entire chromosome arm, wherein probes targets are separated by at least about 500 bp, at least about 1 kb, at least about 5 kb, at least about 10 kb, at least about 25 kb, at least about 50 kb, at least about 100 kb, at least about 250 kb, at least about 500 kb and at least about 1 Mb. In another aspect, the array comprises probes sequences for scanning an entire chromosome, a set of chromosomes, or the complete complement of chromosomes forming the organism's genome. By “resolution” is meant the spacing on the genome between sequences found in the probes on the array. In some embodiments (e.g., using a large number of probes of high complexity) all sequences in the genome can be present in the array. The spacing between different locations of the genome that are represented in the probes may also vary, and may be uniform, such that the spacing is substantially the same between sampled regions, or non-uniform, as desired. An assay performed at low resolution on one array, e.g., comprising probe targets separated by larger distances, may be repeated at higher resolution on another array, e.g., comprising probe targets separated by smaller distances.
  • In certain aspects, in constructing the arrays, both coding and non-coding genomic regions are included as probes, whereby “coding region” refers to a region comprising one or more exons that is transcribed into an mRNA product and from there translated into a protein product, while by non-coding region is meant any sequences outside of the exon regions, where such regions may include regulatory sequences, e.g., promoters, enhancers, untranslated but transcribed regions, introns, origins of replication, telomeres, etc. In certain embodiments, one can have at least some of the probes directed to non-coding regions and others directed to coding regions. In certain embodiments, one can have all of the probes directed to non-coding sequences. In certain embodiments, one can have all of the probes directed to coding sequences. In certain other aspects, individual probes comprise sequences that do not normally occur together, e.g., to detect gene rearrangements, for example, as in a case where the mangled or abnormal genome, resulting from cancer, is expected to be less successful at expressing the normal distribution of mRNA that is normally observed for tissue of the same type that has not been affected by cancer, and therefore also less successful at producing translation mRNA that is normally observed.
  • In some embodiments, at least 5% of the polynucleotide probes on the solid support hybridize to regulatory regions of a nucleotide sample of interest while other embodiments may have at least 30% of the polynucleotide probes on the solid support hybridize to exonic regions of a nucleotide sample of interest. In yet other embodiments, at least 50% of the polynucleotide probes on the solid support hybridize to intergenic (e.g., non-coding) regions of a nucleotide sample of interest. In certain aspects, probes on the array represent random selection of genomic sequences (e.g., both coding and non-coding). However, in other aspects, particular regions of the genome are selected for representation on the array, e.g., such as CpG islands, genes belonging to particular pathways of interest or whose expression and/or copy number are associated with particular physiological responses of interest (e.g., disease, such as cancer, drug resistance, toxological responses and the like). In certain aspects, where particular genes are identified as being of interest, intergenic regions proximal to those genes are included on the array along with, optionally, all or portions of the coding sequence corresponding to the genes. In one aspect, at least about 100 bp, 500 bp, 1,000 bp, 5,000 bp, 10,000 kb or even 100,000 kb of genomic DNA upstream of a transcriptional start site is represented on the array in discrete or overlapping sequence probes. In certain aspects, at least one probe sequence comprises a motif sequence to which a protein of interest (e.g., such as a transcription factor) is known or suspected to bind.
  • In certain aspects, repetitive sequences are excluded as probes on the arrays. However, in another aspect, repetitive sequences are included.
  • The choice of nucleic acids to use as probes may be influenced by prior knowledge of the association of a particular chromosome or chromosomal region with certain disease conditions. International Application WO 93/18186 provides a list of exemplary chromosomal abnormalities and associated diseases, which are described in the scientific literature. Alternatively, whole genome screening to identify new regions subject to frequent changes in copy number can be performed using the methods of the present invention discussed further below.
  • In some embodiments, previously identified regions from a particular chromosomal region of interest are used as probes. In certain embodiments, the array can include probes which “tile” a particular region (e.g., which have been identified in a previous assay or from a genetic analysis of linkage), by which is meant that the probes correspond to a region of interest as well as genomic sequences found at defined intervals on either side, i.e., 5′ and 3′ of, the region of interest, where the intervals may or may not be uniform, and may be tailored with respect to the particular region of interest and the assay objective. In other words, the tiling density may be tailored based on the particular region of interest and the assay objective. Such “tiled” arrays and assays employing the same are useful in a number of applications, including applications where one identifies a region of interest at a first resolution, and then uses tiled array tailored to the initially identified region to further assay the region at a higher resolution, e.g., in an iterative protocol.
  • In certain aspects, the array includes probes to sequences associated with diseases associated with chromosomal imbalances for prenatal testing. For example, in one aspect, the array comprises probes complementary to all or a portion of chromosome 21 (e.g., Down's syndrome), all or a portion of the X chromosome (e.g., to detect an X chromosome deficiency as in Turner's Syndrome) and/or all or a portion of the Y chromosome Klinefelter Syndrome (to detect duplication of an X chromosome and the presence of a Y chromosome), all or a portion of chromosome 7 (e.g., to detect William's Syndrome), all or a portion of chromosome 8 (e.g., to detect Langer-Giedon Syndrome), all or a portion of chromosome 15 (e.g., to detect Prader-Willi or Angelman's Syndrome, all or a portion of chromosome 22 (e.g., to detect Di George's syndrome).
  • Other “themed” arrays may be fabricated, for example, arrays including whose duplications or deletions are associated with specific types of cancer (e.g., breast cancer, prostate cancer and the like). The selection of such arrays may be based on patient information such as familial inheritance of particular genetic abnormalities. In certain aspects, an array for scanning an entire genome is first contacted with a sample and then a higher-resolution array is selected based on the results of such scanning.
  • Themed arrays also can be fabricated for use in gene expression assays, for example, to detect expression of genes involved in selected pathways of interest, or genes associated with particular diseases of interest.
  • In one embodiment, a plurality of probes on the array is selected to have a duplex Tm within a predetermined range. For example, in one aspect, at least about 50% of the probes have a duplex Tm within a temperature range of about 75° C. to about 85° C. In one embodiment, at least 80% of said polynucleotide probes have a duplex Tm within a temperature range of about 75° C. to about 85° C., within a range of about 77° C. to about 83° C., within a range of from about 78° C. to about 82° C. or within a range from about 79° C. to about 82° C. In one aspect, at least about 50% of probes on an array have range of Tm's of less than about 4° C., less then about 3° C., or even less than about 2° C., e.g., less than about 1.5° C., less than about 1.0° C. or about 0.5° C.
  • The probes on the microarray, in certain embodiments have a nucleotide length in the range of at least 30 nucleotides to 200 nucleotides, or in the range of at least about 30 to about 150 nucleotides. In other embodiments, at least about 50% of the polynucleotide probes on the solid support have the same nucleotide length, and that length may be about 60 nucleotides.
  • In certain aspects, longer polynucleotides may be used as probes. In addition to the oligonucleotide probes described above, cDNAs, or inserts from phage BACs (bacterial artificial chromosomes) or plasmid clones, can be arrayed. Probes may therefore also range from about 201-5000 bases in length, from about 5001-50,000 bases in length, or from about 50,001-200,000 bases in length, depending on the platform used. If other polynucleotide features are present on a subject array, they may be interspersed with, or in a separately-hybridizable part of the array from the subject oligonucleotides.
  • In still other aspects, probes on the array comprise at least coding sequences.
  • In one aspect, probes represent sequences from an organism such as Drosophila melanogaster, Caenorhabditis elegans, yeast, zebrafish, a mouse, a rat, a domestic animal, a companion animal, a primate, a human, etc. In certain aspects, probes representing sequences from different organisms are provided on a single substrate, e.g., on a plurality of different arrays.
  • A “CGH assay” using an aCGH array can be generally performed as follows. In one embodiment, a population of nucleic acids contacted with an aCGH array comprises at least two sets of nucleic acid populations, which can be derived from different sample sources. For example, in one aspect, a target population contacted with the array comprises a set of target molecules from a reference sample and from a test sample. In one aspect, the reference sample is from an organism having a known genotype and/or phenotype, while the test sample has an unknown genotype and/or phenotype or a genotype and/or phenotype that is known and is different from that of the reference sample. For example, in one aspect, the reference sample is from a healthy patient while the test sample is from a patient suspected of having cancer or known to have cancer.
  • In one embodiment, a target population being contacted to an array in a given assay comprises at least two sets of target populations that are differentially labeled (e.g., by spectrally distinguishable labels). In one aspect, control target molecules in a target population are also provided as two sets, e.g., a first set labeled with a first label and a second set labeled with a second label corresponding to first and second labels being used to label reference and test target molecules, respectively.
  • In one aspect, the control target molecules in a population are present at a level comparable to a haploid amount of a gene represented in the target population. In another aspect, the control target molecules are present at a level comparable to a diploid amount of a gene. In still another aspect, the control target molecules are present at a level that is different from a haploid or diploid amount of a gene represented in the target population. The relative proportions of complexes formed labeled with the first label vs. the second label can be used to evaluate relative copy numbers of targets found in the two samples.
  • In certain aspects, test and reference populations of nucleic acids may be applied separately to separate but identical arrays (e.g., having identical probe molecules) and the signals from each array can be compared to determine relative copy numbers of the nucleic acids in the test and reference populations.
  • Arrays can be fabricated using drop deposition from pulse jets of either polynucleotide precursor units (such as monomers) in the case of in situ fabrication, or the previously obtained polynucleotide. Such methods are described in detail in, for example, the previously cited references including U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat. No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S. patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren et al., and the references cited therein. As already mentioned, these references are incorporated herein by reference. Other drop deposition methods can be used for fabrication, as previously described herein. Also, instead of drop deposition methods, photolithographic array fabrication methods may be used. Interfeature areas need not be present particularly when the arrays are made by photolithographic methods as described in those patents.
  • Following receipt by a user, an array will typically be exposed to a sample (for example, a fluorescently labeled polynucleotide or protein containing sample) and the array then read. Reading of the array may be accomplished by illuminating the array and reading the location and intensity of resulting fluorescence at multiple regions on each feature of the array. For example, a scanner may be used for this purpose which is similar to the AGILENT MICROARRAY SCANNER manufactured by Agilent Technologies, Palo Alto, Calif. Other suitable apparatus and methods are described in U.S. patent applications: Ser. No. 10/087,447 “Reading Dry Chemical Arrays Through The Substrate” by Corson et al.; and in U.S. Pat. Nos. 6,518,556; 6,486,457; 6,406,849; 6,371,370; 6,355,921; 6,320,196; 6,251,685; and 6,222,664. However, arrays may be read by any other method or apparatus than the foregoing, with other reading methods including other optical techniques (for example, detecting chemiluminescent or electroluminescent labels) or electrical techniques (where each feature is provided with an electrode to detect hybridization at that feature in a manner disclosed in U.S. Pat. No. 6,251,685, U.S. Pat. No. 6,221,583 and elsewhere). A result obtained from the reading may be used in accordance with the techniques of the present invention in screening and finding multiple drug treatment therapies. A result of the reading (whether further processed or not) may be forwarded (such as by communication) to a remote location if desired, and received there for further use (such as further processing).
  • A “gene expression profile”, “expression profile”, “gene expression signature” or “expression signature” as used herein refers to a serial pattern (e.g., time series data) generated from a plot of gene expression values for a gene across all samples processed for gene expression. For example, ten different cell lines may be processed on ten different arrays. A gene expression profile for a particular gene “A” is generated by plotting differential gene expression values for gene “A” versus a control or reference, against each for each the ten cell lines against those cell lines and then forming a serial pattern to define the expression profile.
  • When one item is indicated as being “remote” from another, this is referenced that the two items are in different locations, e.g., in different rooms or buildings, and may be at least one mile, ten miles, or at least one hundred miles apart.
  • “Communicating” information references transmitting the data representing that information as electrical signals over a suitable communication channel (for example, a private or public network). “Forwarding” an item refers to any means of getting that item from one location to the next, whether by physically transporting that item or otherwise (where that is possible) and includes, at least in the case of data, physically transporting a medium carrying the data or communicating the data.
  • A “processor” references any hardware and/or software combination which will perform the functions required of it. For example, any processor herein may be a programmable digital microprocessor such as available in the form of a mainframe, server, or personal computer (desktop or portable). Where the processor is programmable, suitable programming can be communicated from a remote location to the processor, or previously saved in a computer program product (such as a portable or fixed computer readable storage medium, whether magnetic, optical or solid state device based). For example, a magnetic or optical disk may carry the programming, and can be read by a suitable disk reader communicating with each processor at its corresponding station.
  • Reference to a singular item, includes the possibility that there are plural of the same items present.
  • “May” means optionally.
  • Methods recited herein may be carried out in any order of the recited events which is logically possible, as well as the recited order of events.
  • CGH arrays may be custom designed to map out chromosome alterations, as noted above. As a result of chromosome alteration, there is a resultant alteration in transcripts (mRNA) produced by the alteration in genetic material on the chromosome. The altered mRNA in turn causes an abnormal or altered protein expression, relative to what would be seen in tissue having normal, non-altered chromosomes, which altered expression can lead to diseases and other abnormalities in the subject from which the tissue having altered chromosomal material originated.
  • FIG. 1A is a block diagram 100 of a cancer transcriptome model. The abnormal or altered mRNA resulting from the altered regions of the chromosomes of the cancerous tissue are at least a factor in causing the formation of a cancer tumor. The cancerous transcriptome can be characterized as a tumor gene expression profile. The tumor gene expression profile includes tumor gene expression 102 that results in altered or abnormal protein expression 104. The abnormal protein expression 104 influences, causes, enhances and/or continues tumor growth 106.
  • Because the field of proteomics is at an early stage, it is not currently sophisticated enough to analyze the protein expression 104 causally with respect to tumor growth 106. However, based on the model 100 of FIG. 1A, the present invention derives model 110 in FIG. 1B, which assumes that there is a proportional connection/relationship between tumor gene expression and tumor growth 106.
  • FIG. 2 is a flowchart 200 illustrating steps that may be performed for determining potential treatments for diseases or other maladies that may be caused by altered mRNA resulting from chromosome alteration. By designing one or more CGH arrays to effectively map the chromosomes of the tissue to be studied, gene sequences and possible variations in gene sequences can be predicted based on location analysis performed from results outputted by such CGH arrays. CGH implies what kinds of modifications, if any, have occurred in the chromosomes, such as amplification and or deletions. Based on these modifications, prediction of a comprehensive set of candidate mRNA sequences that may be expressed by the altered chromosome or chromosomes is made possible. For example, initially an assumption may be made that transcription initializers (e.g., promoters/primers or transcription factors that act upstream of the exons in enabling transcription) have not been altered, but have possibly been relocated. Consequently, a prediction of the transcriptions with altered sequences is made. Such assumption will tend to be foreign to editing processes, likely producing mangled (abnormal) RNA sequences. These abnormal sequences may modify cell processes in many ways, including abnormal translations and/or interference with other cellular processes. If transcription initializers are altered, then it is likely that the genes that originate the same are missing, or altered in some other way so as to render them dysfunctional. The custom array is designed to measure the altered transcriptome. For example, for studying a particular type of cancer, the custom would be designed to measure the predicted cancer transcriptome of the particular cancer being studied, as well as expected mRNA sequences for “normal” tissues of the same type that have not been affected by cancer. There may be a lot of variety in the alterations of the chromosomes for a particular malady being studied. For example, alterations may vary a lot in different stages of cancer, etc. Therefore, the custom array may be designed to capture as many potential variations in mRNA sequences that may be outputted by the altered tissue, given the potential variations in alteration of the chromosomes.
  • For treatment of cancer or other disease or malady causally related to chromosome alteration, it may be useful to know not only the locations where the chromosome(s) have been altered, but also what transcripts are expressed that may be causally related to the disease. For example, certain transcripts in a cancer transcriptome may cause the production of proteins that are necessary to allow a tumor to exist and/or proliferate.
  • After designing a custom array to measure the potential mRNA sequences that may be expressed in the abnormal tissue to be studied at event 202, or, alternatively, being provided with such a custom array, each cell line of the altered or abnormal tissues is run on one of such custom arrays to measure the altered transcriptome at event 204 via the probes on the array that are designed for mutations in mRNA predicted to be present in the altered tissues, as noted. Based on the measured differential expression ratios, gene expression signatures are generated at event 206. Thus, each differential expression of each gene measured in the altered tissue, relative to the “normal” tissue is measured and plotted, to generate gene expression signatures 302 across the cell lines, as exemplified in FIG. 3. Note that for simplicity in illustration and explanation, only three such gene expression signatures 302 (i.e., 302 a, 302 b, 302 c) are shown in FIG. 3. Typically, there may be thousands, if not tens of thousands of such gene expression signatures 302 generated.
  • Next, samples of each of the cell lines are provided with a treatment, and experimentation is performed to dose the treatment at a level that reduces the disease or malady by a predetermined amount at event 208. For example, for cancer lines, a dose of the treatment may be provided that reduces tumor growth by 50% in one day. Of course, this is only an example of the predetermined amount, as various other standards may be chosen. Alternatively, a predefined amount and duration of treatment may be provided, and the effect of such on each cell line may be measured. The treatment given may be a drug or compound or combinations of drugs or compounds, genetic sequences, radiation, thermal, electrical magnetic, or other forms of treatment expected to reduce the disease state, or combinations of the same. Once the desired treatment dosage has been determined, this treatment dosage is applied to the cell lines and, at about the time over which the predetermined reduction amount has been determined to occur, the treated cell lines/samples are processed, each on one of the custom arrays, for example.
  • A phenotypic response profile resultant from the treatment is then generated from measuring effects of the treatment on the cell lines (phenotypic response) in event 210. If a predetermined amount of treatment was applied to the cell lines for a predetermined time, then the amount of reduction in tumor growth is measured with respect to each cell line, and those amount values make up the response profile. The phenotypic response profile shows the signature of the variation in the phenotypic response (in this example, tumor reduction) to the treatment across the various cell lines, as the treatment impacts disease-active proteins in the cell lines samples, the levels of which disease-active proteins vary across samples. The production of the disease-active proteins is regulated by certain genes, as noted in the models of FIGS. 1A and 1B. If the treatment was provided to determine the amount of treatment that was required to reduce the tumor growth in the cell lines by a predetermined amount over a predetermined time, then the phenotypic response profile is generated by the amount of treatment required, with respect to each cell line, to reduce the tumor growth in that cell line by the predetermined amount, which varies across samples, thereby creating a meaningful phenotypic response profile for the treated samples/cell lines.
  • The response profile (treatment-impact profile) 402 is next compared with the gene expression signatures 302 of the untreated cell lines to look for expression signatures 302 that may be in synchronization or one hundred eighty degrees out of synchronization (anti-synchronization) with the treatment impact profile 402. Note that only one gene expression signature 302 is represented in plot 400 of FIG. 4 for illustration of the comparison with the response profile 402. However, all gene expression signatures 302 may be compared in this manner. For each cell line, the measured value of the effect of the treatment is plotted, and response profile 402 is generated by interconnecting the plotted data points with straight lines, as shown in FIG. 4. Synchronization depends only on the waveform of two signatures that are being compared, and not on the scale of each signature. However, covariance similarity will be larger for signatures with larger amplitudes, whereas correlation will not be. Hence, covariance-based clustering may be more meaningful.
  • Additionally, replicates may be run of each tissue sample (cell line) against each treatment, to perform an average response with regard to each, to reduce the noise in the phenotypic signatures. Likewise, replicate CGH arrays may be processed to reduce noise in the gene expression signatures resulting therefrom by generating an average gene expression signature for each gene from the average of the responses in the replicates. Additionally or alternatively, phenotypic signatures may be processed using “self-self” prediction techniques such as described in co-pending, commonly owned application Ser. No. 10/400,372 filed Mar. 27, 2003 and titled “Method and System for Predicting Multi-Variable Outcomes”, and in Application Ser. No. 60/368,586 filed Mar. 29, 2002 and titled “Generalized Similarity Least Squares Predictor”, both of which are incorporated, in their entireties, by reference thereto.
  • A phenotypic signature that is significantly in synchronization with or in anti-synchronization with a gene expression signature can be considered to be causally related to the gene from which the gene expression signature is derived. That is, when the treatment is applied, the resultant changes in the tissues (the phenotypic response) are proportional to the expression in that gene (from which the synchronized or anti-synchronized gene expression signature was generated) in the abnormal tissue relative to the expression of the same gene in the normal tissue, as it is hypothesized that a change (reduction or increase) in the proteins generated from expression of that gene has occurred in response to the treatment. Note that the phenotypic response, in this example, reduction of tumor growth, may be correlated with a decrease in diseased or otherwise abnormal (tissue malady) gene expression (synchronization), to cause a decrease in proteins that proliferate the disease, or an increase in gene expression (anti-synchronization) to cause an increase in proteins that inhibit tumor growth or to regulate (reduce) the production of proteins that cause the disease to proliferate. Although the phase link between the phenotypic response and gene expression signatures from diseased tissues are likely more complicated, since protein relationships are not able to be identified at this time, as noted elsewhere in this disclosure, approximations are made according to the techniques described herein based on identifying synchronized and anti-synchronized signatures, between phenotypic response signatures and gene expression signatures from tissues know to have a malady.
  • Typically, a plurality of treatments (with or without replicates) is provided to produce an equal number of phenotypic response signatures 402, to search for one or more potentially effective treatments for the abnormality being studied. To test for anti-synchronization, the phenotypic signatures are inverted.
  • FIG. 5 shows a matrix 500 in which the data points used to generate each of the gene expression signatures 302, phenotypic response signatures 402 and inverted phenotypic response signatures 402 i have been inserted as cell values of the matrix 500, with reach row representing a signature, and each column representing one of the cell lines. Thus, after appropriate transformation (e.g., smoothing and adjustment for phase shifts, if known) and normalization, each treatment-induced phenotypic signature 402 and inverse signature 402 i in FIG. 5 is compared with differential-expression signatures 302, to map treatments to the transcriptome locations of the correlated genes in the abnormal tissue samples.
  • An approach to try and find relationships among data on this order of magnitude may include generating a matrix containing the data points used to generate each of the gene expression signatures 302, and another matrix containing the data points used to generate the phenotypic response signatures 402 and inverted phenotypic response signatures 402 i and producing a correlation matrix having a number of values equal to the product of the number of gene expression signatures times the sum of the number of phenotypic response signatures and inverted phenotypic response signatures. Such a correlation matrix is generated by calculating the inner product of the values of the phenotypic response signatures 402 and inverse phenotypic response signatures 402 i with the values of the gene expression signatures 302, and typically centering by means and scaling by standard deviation. The inner product may be calculated as follows: D -> j · G -> k N = i j , k ( 1 )
    where
    • {right arrow over (Dj)} is the vector representing the phenotypic expression or inverted phenotypic expression across samples (cell lines) 402, 402 i, respectively, when treated with the jth treatment in the list, where j ranges from 1 to (m+m), since both signatures and inverted signatures are considered, where m is an integer representing the total number of treatments;
    • {right arrow over (G)}k is the vector representing the gene expression signature 302 for the kth gene expression signature (gene), where k ranges from 1 to N;
      • N is the number of samples, e.g., 20 cell lines; and
      • ij,k is the value in a correlation matrix filling the jth column and the kth row.
  • Thus, the values in the correlation matrix are a cross product of normalized or standardized vectors, and may be optionally centered by subtracting a vector mean value. Once the correlation matrix is produced, then various clustering techniques may be applied to try and identify blocking patterns in the data, or “clusters” which might indicate that certain treatments are effecting certain groups of genes. Alternatively, clustering techniques may be applied to treatment and gene expression matrices prior to performing the cross product operation, in an effort to reduce the sizes to something less than what is started with.
  • The present inventor believes that important information is dropped or lost when such data is processed into a correlation matrix, such as by performing a cross product/inner product procedure to obtain the correlation matrix, as described above. When such an operation is performed, phase information is lost, as is well-known, particularly among electrical engineers. Because the phenotypic responses to the treatments at issue are measured as the output (i.e., phenotypic signature) and the inputs are the gene expression ratio values, there is a protein link between the input and output, as noted above with regard to FIGS. 1A and 1B, i.e., the gene expression instructs the generation of proteins which effect the disease state. Treatments impact disease proteins to produce the phenotypic output response. Thus, the present inventor believes that it is important to consider the phase relationships between the inputs (gene expression) and outputs (phenotypic responses), as important factors based on the protein links between the inputs and outputs.
  • Hence, the approach taken by the present invention is akin to a lumped parameters modeling approach. As noted above with regard to FIG. 1A, the genotypic signature (gene expression signature) is modeled as the input 102, the phenotypic signature is modeled as the output 106, and a “black box” 104 represents a transfer function that transforms input 102 to output 106. Note that input 102 and output 106 may be single inputs and outputs, e.g., a single gene and single drug impact or single protein production, or multiple inputs and outputs, e.g., groups of genes producing groups of proteins and/or multiple drug impacts. The present invention is adapted to further modifications within the black box as additional information becomes known about the relationships between genes and the production of proteins resultant therefrom. This will enable the input and output to be modeled, for example, like an electrical circuit, with black box 104 containing “resistances”, “inductances”, “capacitances”, “emfs”, etc., that model the best knowledge that is gained with regard to the relationships between the genes and proteins produced therefrom, which will enable the modeling of an accurate phase relationship between the gene expression signatures and the phenotypic response signatures. Such complete modeling knowledge is currently not known however, and it may be at least five years, or more, before such knowledge is obtained in sufficient detail. In the meantime, this technique assumes that an input signature is either in phase (0°, synchronized), or out of phase (i.e., 180° out of phase, anti-synchronized) with the output signature, as noted above.
  • Rather than taking a cross product and forming a correlation matrix, this methodology assembles all of the signatures, after appropriate transformation such as by use of Log transforms, into one large list of profiles or matrix of profile values 500, properly normalized, as shown in FIG. 5. Continuing with the example discussed above, the rows of matrix 500 include the gene expression signatures 302 and phenotypic response signatures 402 and inverted phenotypic response signatures 402 i.
  • Rather than clustering within a single zone (i.e., the zone containing the gene expression signatures 302 or the zone containing the phenotypic response signatures 402 and/or the inverted phenotypic response signatures 402 i, or performing a cross product correlation to combine these zones (thereby losing valuable information, as discussed above), clustering procedures are performed over the entire matrix 500 as a whole. Each of the signatures are normalized for comparison purposes. One method of normalizing used is by Z-scoring, although other normalization methods may be substituted. Also weighting techniques may be applied so that highly up-regulated features do not receive over-amplified attention during the clustering process. Further, noisy profiles may be weighted with relatively reduced weight.
  • Clustering of the information may be performed using any cluster technique that is capable of clustering similar signatures, such as K-means or other known techniques. However, it is preferred to perform the clustering using the tools and techniques described in co-pending, commonly owned application Ser. No. 09/986,746 filed Nov. 9, 2001 and titled “System and Method for Dynamic Data Clustering” which application is incorporated herein, in its entirety, by reference thereto. Similarity may be defined by the Euclidean distance between the normalized vectors in matrix 500. These dynamic clustering techniques are scalable, and paralellizable, so that they can handle large scale problems such as those presented in the context of the present invention. The result of these clustering operations gives clusters which contain mixtures of phenotypic response signatures 402, 402 i and gene expression signatures 302. Sometimes the clustering occurs among genes in phase and sometimes with genes out of phase. Some clusters may contain only phenotypic response signatures 402, 402 i and some clusters may contain only gene expression signatures 302. The clusters of interest to the present techniques contain both types of signatures to imply gene-drug associations.
  • FIG. 6 illustrates plots of three treatment/ phenotypic profiles 602, 604, 606 resultant in NCI Lung Cancer Cell Lines having been treated with three different treatments, that were determined, by dynamic data clustering techniques described in application Ser. No. 09/986,746, to be in synchronization with gene expression profile 608. Visual observation of these plots makes it evident that the peaks and valleys of the phenotypic profiles 602, 604, 606 generally follow the same contour evident in the gene expression profile 608.
  • FIG. 7 is a schematic representation of an ellipsoid 700 which represents the plot of a cluster of vectors from a matrix such as matrix 500 or a similar matrix assembled, when the vectors are plotted in high dimensional space. The clustering techniques described above are designed to find and identify such clusters, as well as locate the densest part of each ellipsoid cluster, shown as diagonal or “ridge” 611 in FIG. 7. Using a dynamic data clustering system as disclosed in application Ser. No. 09/986,746, the system uses force functions to converge a mathematical probe to the densest part of a profile cluster. Thus, the system not only identifies the clusters but defines a point of reference for all profiles in the cluster, with respect to each cluster identified. The distance of each profile from the center of the cluster it belongs to can be defined by any viable distance metric. An example of a distance metric used is Euclidean distance. The relative angular positions of the profiles may also be consequential for selecting combinations of effective treatments.
  • Treatment 512 is noted as being a member of a group of similar treatments that are the closest to the densest location 610 of the ellipsoid 700. Although the distances measured do not identify angularity between profiles, or which side of the ridge 611 (dominant, principal axis running through the densest location) a particular vector (profile) lies on, the distance value does give an indication of how close to the ridge 611 and densest location 610 that vector lies. For example, the distance for treatment 512 is shown close to ridge 611 in FIG. 7. Another treatment vector 514 was measured to be further from ridge 611 as shown in FIG. 7. A third treatment vector 516 is shown at a somewhat intermediate distance from ridge 611, relative to distances for 512 and 514, and the treatment vector 518 is shown at an intermediate distance between the distances for 512 and 514.
  • One approach is to find a combination of treatments, such as those represented in FIG. 7, that show relationships to different genes in the clustered profile, so that each treatment appears to be related to strategic combinations of genes involved in the disease process. The treatments are then combined and tested together, each in a low dose/amount to observe whether a combinatorial synergistic effect on this disease is achieved by the combination. Serendipitously, the low dose/amount combination reduces the side effects of each individual treatment in the combination. Of course any combination where an adverse reaction occurs among two or more treatments in the combination used would be discarded as being unsuitable for use as a potential treatment combination. Useful combinations will specifically and effectively target the disease process being studied.
  • When testing with one or more identified treatments that provide symmetric and/or anti-symmetric phenotypic response profiles relative to one or more gene expression profiles thought to be causally connected to the abnormality being studied (e.g., tumor growth), if the results do not effect all of the genes thought to be responsible for the abnormality, then events 208 and 210 may be repeated with additional different treatments, and analysis of phenotypic responses from these additional treatments may be carried out to select alternative treatments or additional treatments to be added to the treatment regimen from which the most recent tests were conducted. Selection may be made by again selecting those treatments that produce phenotypic profiles that are in synchronization or anti-synchronization with gene expression profiles of interest (gene expression profiles of those genes thought to be effecting or effected by the abnormal condition).
  • Further, by identifying the treatment vectors as in the example shown above, various combinations of treatments in a treatment family (such as, for example, drugs in the drug family, or related compounds in a compound family) of each identified treatment vector may be tested in the manner described above, as related treatments in a treatment family (e.g., drugs in a drug family) will generally fall within similar relative distances from the densest location of the ellipsoid. In this way, the combinations of treatments may be predicted to try as potential combinations for multiple treatments in patients which will address the broadest spectrum of genes related to the production of proteins seen as elevated or inhibited when a tissue is in the disease state. By testing the predicted combinations, useful combinations for treatment in patients will be much easier to identify. The present invention is a forward-looking way of choosing and predicting specific combinations of treatments to test, e.g., using high-throughput (HTP) screening of treatment combinations, and as such, greatly reduces the time to finding successful combinations, which currently have only been discovered accidentally, through hindsight and experiences gained through individual treatments.
  • The treatments identified are targeted to the genes involved in the disease process/malady. Because of this, the chances of significant side effects are reduced. For those combinations found to be effective in the sample tissues, further testing, such as animal testing would be warranted to study any effects the treatments may have on normal tissues within an organism, since the testing with the disease tissue samples only proves that the combination of drugs applied is effective at treating the diseased tissues. For the cancer examples discussed above, testing with the tissue samples would only show that the combination of treatments effectively kills the cancer cells, or not. Animal testing would further show the effects of the treatment combination on the normal tissues in the organism, to see if the animal survives the treatment combination.
  • A technique for excluding potential treatments may also be carried out. One example of such an exclusion technique is to generate at least one phenotypic signature representing treatment-response values of each of the tissue samples exhibiting the malady, resultant from treating the tissue samples exhibiting the malady, with at least one treatment having known undesirable characteristics (e.g., is toxic to normal tissues, or ineffective, or has other undesirable side effects, etc.) for treatment of the tissues exhibiting the malady, using the techniques described above. This one or more phenotypic signature(s) are then included with all other signatures (e.g., the signatures generated in matrix 500) and then subjected to clustering as described above. Any phenotypic signature representing treatment-response values that are close to a phenotypic signature generated from a treatment known to have undesirable effects is then eliminated from candidacy for selection as a potential treatment. For example, a predefined distance may be established as a threshold, so that any phenotypic signature that is less than or equal to the predefined distance from a location of a phenotypic signature resulting from treatment with a treatment having known undesirable characteristics is eliminated.
  • Alteration of existing treatments for tissue maladies may also be performed. For example, a phenotypic signature representing the treatment-response values of each of the tissue samples exhibiting the malady, as effected by the original or existing single or combination treatment may be generated according to the techniques described above. Those tissue samples may then be treated with one or more treatments not included in the original or existing single or combination treatment and then one or more phenotypic signatures may be generated from the treatment-response values of the tissues resulting from the one or more treatments. Clustering may then be performed, as described above, based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together, including both the phenotypic signature from the original treatment and the phenotypic signature(s) from the additional treatment(s). At least one additional treatment may then be selected for incorporating with the original treatment, by identifying the treatment-response phenotypic signature(s) caused by the additional treatment(s), and which are clustered with phenotypic signatures identifying the treatment-response phenotypic signature(s) caused by the treatment or treatments in the original treatment, as well as with gene expression signatures representing differential expression levels representative of the diseased tissue samples, but separated from the phenotypic signatures identifying the treatment-response phenotypic signatures caused by the treatment or treatments in the original treatment, so as to address malady-related gene activity not currently addressed by the treatment or treatments in the original or existing treatment.
  • Protein pathways, implicated by differential gene expression levels when comparing treated tissues to non-treated tissue and among diseased and non-diseased tissues are used to produce phase relations between treatment responses and expression profiles across the tissue samples being tested. An example of such is the twenty cancer cell lines referred to above. Using the techniques described above, all phase-related profiles are normalized and clustered. The number and sizes of the resulting clusters may indicate their relative importance as to effective treatments. The structure of each cluster infers treatment-gene associations to guide multi-treatment selections.
  • FIG. 8 illustrates a typical computer system in accordance with an embodiment of the present invention. The computer system 800 may include any number of processors 802 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 806 (typically a random access memory, or RAM), primary storage 804 (typically a read only memory, or ROM). As is well known in the art, primary storage 804 acts to transfer data and instructions uni-directionally to the CPU and primary storage 806 is used typically to transfer data and instructions in a bi-directional manner Both of these primary storage devices may include any suitable computer-readable media such as those described above. A mass storage device 808 is also coupled bi-directionally to CPU 802 and provides additional data storage capacity and may include any of the computer-readable media described above. Mass storage device 808 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk that is slower than primary storage. It will be appreciated that the information retained within the mass storage device 808, may, in appropriate cases, be incorporated in standard fashion as part of primary storage 806 as virtual memory. A specific mass storage device such as a CD-ROM 814 may also pass data uni-directionally to the CPU.
  • CPU 802 is also coupled to an interface 810 that includes one or more input/output devices such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers. CPU 802 optionally may be coupled to a computer or telecommunications network using a network connection as shown generally at 812. With such a network connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps. The above-described devices and materials will be familiar to those of skill in the computer hardware and software arts.
  • The hardware elements described above may implement the instructions of multiple software modules for performing the operations of this invention. For example, instructions for clustering vectors may be stored on mass storage device 808 or 814 and executed on CPU 808 in conjunction with primary memory 806.
  • In addition, embodiments of the present invention further relate to computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations. The media and program instructions may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known and available to those having skill in the computer software arts. Examples of computer-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM, CDRW, DVD-ROM, or DVD-RW disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
  • While the present invention has been described with reference to the specific embodiments thereof, it should be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the true spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation, treatment, tissue sample, process, process step or steps, to the objective, spirit and scope of the present invention. All such modifications are intended to be within the scope of the claims appended hereto.

Claims (28)

1. A method for discovering a combination of treatments to reduce the progress of, or eliminate a tissue malady, said method comprising the steps of:
(a) measuring gene expression values of at least one sample of tissue exhibiting the tissue malady and at least one reference sample tissue that does not exhibit the malady, using at least one CGH array designed to measure gene sequences and possible variations in gene sequences attributable to the malady;
(b) generating gene expression signatures from differential expression values of ratios of the measured gene expression values between the at least one sample exhibiting the malady and the at least one reference sample, across all samples, respectively;
(c) treating the at least one tissue sample exhibiting the malady with a treatment;
(d) measuring a treatment-response value with respect to each of the tissue samples treated, as effected by the treatment;
(e) generating a phenotypic signature representing the treatment-response values of each of the tissue samples treated;
(f) repeating steps (c)-(e) with a different treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments;
(g) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and
(h) selecting treatments by identifying the treatment-response phenotypic signatures caused by those treatments, and which are clustered with gene expression signatures representing differential expression levels representative of the at least one tissue sample exhibiting the malady.
2. The method of claim 1, further comprising designing the CGH array designed to measure gene sequences and possible variations in gene sequences attributable to the malady; and
providing the CGH array for said measuring gene expression values.
3. The method of claim 1, wherein each said treatment is selected from the group consisting of: a drug, a combination of drugs, a compound, a combination of compounds, radiation, a genetic sequence, a combination of genetic sequences, heat, cryogenics and a combination of two or more of any of the previous members in this group.
4. The method of claim 1, further comprising the steps of:
labeling the phenotypic signatures as “in phase” signatures;
generating “out of phase” signatures by inverting the “in phase” signatures; and
including the “out of phase” signatures with the “in phase” signatures and the gene expression signatures when performing steps (g) and (h).
5. The method of claim 1, wherein said clustering operation includes finding a density center of a cluster, and calculating distances of the phenotypic signatures, belonging to the cluster, from the density center.
6. The method of claim 5, wherein the selection of treatments is made to address a broad spectrum of genes involved in the process of the malady.
7. The method of claim 6, wherein the treatments are selected by selecting treatment-response signatures within a cluster and having varying distances from the density center.
8. The method of claim 1, wherein said phenotypic signatures are normalized prior to said clustering.
9. The method of claim 4, wherein said phenotypic signatures are normalized prior to said clustering.
10. The method of claim 1, wherein said CGH array is processed on a two-color, two channel microarray apparatus to measure said gene expression values.
11. The method of claim 1, wherein said gene expression values are measured on a single channel microarray apparatus, wherein one of said CGH arrays per sample is used to process each sample exhibiting the malady and one of said CGH arrays per sample is used to process each reference sample.
12. The method of claim 1, wherein each treatment-response value comprises a concentration level or amount of the treatment used to block or retard the progress of the malady by a predetermined percentage over a predetermined period of time after treatment.
13. The method of claim 1, wherein each treatment-response value comprises a value characterizing the amount of blocking or retardation of the malady over a predetermined period of time after treatment with a fixed amount of the treatment.
14. The method of claim 1, further comprising generating at least one phenotypic signature representing treatment-response values of each of the tissue samples exhibiting the malady, resultant from treating the tissue samples exhibiting the malady, with at least one treatment having known undesirable characteristics for treatment of the tissues exhibiting the malady;
including that at least one phenotypic signature resulting from said treatment having known undesirable characteristics with all other signatures included in performing the clustering step (g); and
discarding any phenotypic signature representing treatment-response values from candidacy for the selection step (h) when the phenotypic signature is less than or equal to a predefined distance from a location of the at least one phenotypic signature resulting from treatment with a treatment having known undesirable characteristics.
15. The method of claim 14, wherein said known undesirable characteristics comprise an unacceptable level of toxicity.
16. The method of claim 14, wherein said known undesirable characteristics comprise an insufficient efficacy.
17. A method for screening a combination of treatments to select treatments for tissue exhibiting a malady, said method comprising the steps of:
(a) providing differential expression levels of tissue samples exhibiting the malady relative to at least one reference tissue sample from respective features of CGH arrays designed to measure gene sequences and possible variations in gene sequences attributable to the malady;
(b) for respective differential expression levels from respective features of respective CGH arrays for each tissue sample exhibiting the malady, providing a gene expression signature representing the differential expression level for each tissue sample for gene expression levels from that feature, respectively;
(c) providing a treatment-response value, for each tissue sample exhibiting the malady having been treated with a treatment, as effected by the treatment;
(d) generating a phenotypic signature representing the treatment-response values of each of the tissue samples having been treated;
(e) repeating steps (c)-(d) with a different treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments;
(f) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and
(g) selecting treatments by identifying the treatment-response phenotypic signatures caused by those treatments, and which are clustered with gene expression signatures representing differential expression levels representative of the tissue samples exhibiting the malady.
18. A method of augmenting an original or existing single treatment or treatment combination for a disease with at least one additional treatment that covers gene activity of the disease not addressed by the original or existing treatment, said method comprising the steps of:
(a) providing differential expression levels of diseased tissue samples relative to at least one reference tissue for respective features of CGH arrays designed to measure gene sequences and possible variations in gene sequences attributable to the disease;
(b) for respective features of respective CGH arrays for each diseased tissue sample, providing a gene expression signature representing the differential expression level for each tissue sample for that feature, respectively;
(c) treating the diseased tissue samples with the original or existing single treatment or combination treatment;
(d) measuring a treatment-response value with respect to each of the diseased tissue samples as effected by the original or existing single or combination treatment;
(e) generating a phenotypic signature representing the treatment-response values of each of the diseased tissue samples as effected by the original or existing single or combination treatment;
(f) treating the diseased tissue samples with a treatment that is not included in the original or existing single or combination treatment;
(g) measuring a treatment-response value with respect to each of the diseased tissue samples as effected by the treatment that is not included in the original or existing single or combination treatment;
(h) generating a phenotypic signature representing the treatment-response values of each of the diseased tissue samples as effected by the treatment that is not included in the original or existing single or combination treatment;
(i) repeating steps (f)-(h) with a different treatment that is also not included in the original or existing single or combination treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments not included in the original or existing single or combination treatment;
(j) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and
(k) selecting at least one treatment by identifying the treatment-response phenotypic signatures caused by the at least one treatment, and which are clustered with phenotypic signatures identifying the treatment-response phenotypic signatures caused by the treatment or treatments in the original treatment, as well as with gene expression signatures representing differential expression levels representative of the diseased tissue samples, but separated from the phenotypic signatures identifying the treatment-response phenotypic signatures caused by the treatment or treatments in the original treatment, so as to address disease-gene activity not currently addressed by the treatment or treatments in the original or existing treatment.
19. The method of claim 18, wherein each said treatment is selected from the group consisting of: a drug, a combination of drugs, a compound, a combination of compounds, radiation, a genetic sequence, a combination of genetic sequences, heat, cryogenics and a combination of two or more of any of the previous members in this group.
20. The method of claim 18, further comprising the steps of:
labeling the phenotypic signatures as “in phase” signatures;
generating “out of phase” signatures by inverting the “in phase” signatures; and
including the “out of phase” signatures with the “in phase” signatures and the gene expression signatures when performing steps (j) and (k).
21. A system for discovering a combination of treatments to reduce the progress of, or eliminate a tissue malady, said system comprising:
means for generating a gene expression signature representing differential expression levels of each of a plurality of tissue samples exhibiting the malady, relative to at least one reference tissue from gene expression values determined from respective features of CGH arrays designed to measure gene sequences of the tissues and possible variations in gene sequences attributable to the malady;
means for measuring a treatment-response value with respect to each of the tissue samples exhibiting the malady, after treating each tissue sample exhibiting the malady with a treatment;
means for generating a phenotypic signature representing the treatment-response values of each of the tissue samples having been treated; and
means for performing a clustering operation while considering the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together.
22. The system of claim 21, further comprising at least one of said CGH arrays designed to measure gene sequences of the tissues and possible variations in gene sequences attributable to the malady.
23. The system of claim 21, further comprising microarray apparatus for processing the tissue samples exhibiting the malady and the at least one reference tissue to obtain the differential expression levels of the diseased tissues relative to the at least one reference tissue.
24. The system of claim 21, wherein multiple treatments are successively and independently applied to treat the tissues exhibiting the malady, with respective treatment-response values measured for each and a treatment-response phenotypic signature is generated for each treatment applied.
25. The system of claim 21, further comprising means for generating out-of-phase phenotypic signatures by inverting said phenotypic signatures.
26. The system of claim 25, wherein said means for clustering includes said out-of-phase phenotypic signatures with said gene expression signatures said phenotypic signatures of the treatment-response values when performing said clustering operation.
27. The system of claim 21, further comprising means for determining a center of density of a cluster identified by said means for clustering, and means for determining a distance of a phenotypic signature found to belong to said cluster, from said center of density.
28. A computer readable medium carrying one or more sequences of instructions for discovering a combination of treatments to reduce the progress of, or eliminate a tissue malady, wherein execution of one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
(a) measuring gene expression values of at least one sample of tissue exhibiting the tissue malady and at least one reference sample tissue that does not exhibit the malady, using at least one CGH array designed to measure gene sequences and possible variations in gene sequences attributable to the malady;
(b) generating gene expression signatures from differential expression values of ratios of the measured gene expression values between the at least one sample exhibiting the malady and the at least one reference sample, across all samples, respectively;
(c) treating the at least one tissue sample exhibiting the malady with a treatment;
(d) measuring a treatment-response value with respect to each of the tissue samples treated, as effected by the treatment;
(e) generating a phenotypic signature representing the treatment-response values of each of the tissue samples treated;
(f) repeating steps (c)-(e) with a different treatment at least once so that multiple phenotypic signatures have been generated for multiple treatments;
(g) performing a clustering operation based on the gene expression signatures of the differential expression levels and the phenotypic signatures of the treatment-response values together; and
(h) selecting treatments by identifying the treatment-response phenotypic signatures caused by those treatments, and which are clustered with gene expression signatures representing differential expression levels representative of the at least one tissue sample exhibiting the malady.
US11/215,483 2003-08-13 2005-08-30 Treatment discovery based on CGH analysis Abandoned US20050282227A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/215,483 US20050282227A1 (en) 2003-08-13 2005-08-30 Treatment discovery based on CGH analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/640,081 US7348144B2 (en) 2003-08-13 2003-08-13 Methods and system for multi-drug treatment discovery
US11/215,483 US20050282227A1 (en) 2003-08-13 2005-08-30 Treatment discovery based on CGH analysis

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/640,081 Continuation-In-Part US7348144B2 (en) 2003-08-13 2003-08-13 Methods and system for multi-drug treatment discovery

Publications (1)

Publication Number Publication Date
US20050282227A1 true US20050282227A1 (en) 2005-12-22

Family

ID=34136015

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/640,081 Expired - Fee Related US7348144B2 (en) 2003-08-13 2003-08-13 Methods and system for multi-drug treatment discovery
US11/215,483 Abandoned US20050282227A1 (en) 2003-08-13 2005-08-30 Treatment discovery based on CGH analysis

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/640,081 Expired - Fee Related US7348144B2 (en) 2003-08-13 2003-08-13 Methods and system for multi-drug treatment discovery

Country Status (3)

Country Link
US (2) US7348144B2 (en)
EP (1) EP1654686A2 (en)
WO (1) WO2005017804A2 (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007034343A (en) * 2005-07-21 2007-02-08 Fujitsu Ltd Genetic information display device, genetic information display method, genetic information display program and recording medium
WO2007038414A2 (en) * 2005-09-27 2007-04-05 Indiana University Research & Technology Corporation Mining protein interaction networks
TWI695067B (en) 2013-08-05 2020-06-01 美商扭轉生物科技有限公司 De novo synthesized gene libraries
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
CA2975855A1 (en) 2015-02-04 2016-08-11 Twist Bioscience Corporation Compositions and methods for synthetic gene assembly
US9981239B2 (en) 2015-04-21 2018-05-29 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
KR20180050411A (en) 2015-09-18 2018-05-14 트위스트 바이오사이언스 코포레이션 Oligonucleotide mutant library and its synthesis
CN108698012A (en) 2015-09-22 2018-10-23 特韦斯特生物科学公司 Flexible substrates for nucleic acid synthesis
EP3384077A4 (en) 2015-12-01 2019-05-08 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
AU2017315294B2 (en) 2016-08-22 2023-12-21 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10417457B2 (en) 2016-09-21 2019-09-17 Twist Bioscience Corporation Nucleic acid based data storage
CN110366613A (en) 2016-12-16 2019-10-22 特韦斯特生物科学公司 The Mutant libraries of immunological synapse and its synthesis
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
CN110913865A (en) 2017-03-15 2020-03-24 特韦斯特生物科学公司 Library of variants of immune synapses and synthesis thereof
KR102628876B1 (en) 2017-06-12 2024-01-23 트위스트 바이오사이언스 코포레이션 Methods for seamless nucleic acid assembly
WO2018231864A1 (en) 2017-06-12 2018-12-20 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
CN111566125A (en) 2017-09-11 2020-08-21 特韦斯特生物科学公司 GPCR binding proteins and synthesis thereof
CA3079613A1 (en) 2017-10-20 2019-04-25 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
CA3088911A1 (en) 2018-01-04 2019-07-11 Twist Bioscience Corporation Dna-based digital information storage
AU2019270243A1 (en) 2018-05-18 2021-01-07 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
KR20210143766A (en) 2019-02-26 2021-11-29 트위스트 바이오사이언스 코포레이션 Variant Nucleic Acid Libraries for the GLP1 Receptor
KR20210144698A (en) 2019-02-26 2021-11-30 트위스트 바이오사이언스 코포레이션 Variant Nucleic Acid Libraries for Antibody Optimization
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5860917A (en) * 1997-01-15 1999-01-19 Chiron Corporation Method and apparatus for predicting therapeutic outcomes
US6188969B1 (en) * 1998-02-26 2001-02-13 Chiron Corporation Multi-measurement method of comparing and normalizing assays
US6203987B1 (en) * 1998-10-27 2001-03-20 Rosetta Inpharmatics, Inc. Methods for using co-regulated genesets to enhance detection and classification of gene expression patterns
JP2002533699A (en) * 1998-12-23 2002-10-08 ロゼッタ・インファーマティクス・インコーポレーテッド Powerful profile identification method
US6801859B1 (en) * 1998-12-23 2004-10-05 Rosetta Inpharmatics Llc Methods of characterizing drug activities using consensus profiles
US6453241B1 (en) * 1998-12-23 2002-09-17 Rosetta Inpharmatics, Inc. Method and system for analyzing biological response signal data
US6320196B1 (en) * 1999-01-28 2001-11-20 Agilent Technologies, Inc. Multichannel high dynamic range scanner
US6251685B1 (en) * 1999-02-18 2001-06-26 Agilent Technologies, Inc. Readout method for molecular biological electronically addressable arrays
US20020115070A1 (en) * 1999-03-15 2002-08-22 Pablo Tamayo Methods and apparatus for analyzing gene expression data
CA2369969A1 (en) * 1999-04-15 2000-10-26 Curagen Corporation Method of identifying toxic agents using differential gene expression
US6323043B1 (en) * 1999-04-30 2001-11-27 Agilent Technologies, Inc. Fabricating biopolymer arrays
US6242266B1 (en) * 1999-04-30 2001-06-05 Agilent Technologies Inc. Preparation of biopolymer arrays
US6355921B1 (en) 1999-05-17 2002-03-12 Agilent Technologies, Inc. Large dynamic range light detection
US6371370B2 (en) 1999-05-24 2002-04-16 Agilent Technologies, Inc. Apparatus and method for scanning a surface
US6180351B1 (en) * 1999-07-22 2001-01-30 Agilent Technologies Inc. Chemical array fabrication with identifier
US6222664B1 (en) * 1999-07-22 2001-04-24 Agilent Technologies Inc. Background reduction apparatus and method for confocal fluorescence detection systems
US6486457B1 (en) 1999-10-07 2002-11-26 Agilent Technologies, Inc. Apparatus and method for autofocus
US6232072B1 (en) * 1999-10-15 2001-05-15 Agilent Technologies, Inc. Biopolymer array inspection
US6171797B1 (en) * 1999-10-20 2001-01-09 Agilent Technologies Inc. Methods of making polymeric arrays
US6406849B1 (en) 1999-10-29 2002-06-18 Agilent Technologies, Inc. Interrogating multi-featured arrays
AU2001239838A1 (en) * 2000-02-24 2001-09-03 Advancis Pharmaceutical Corporation Therapeutic product, use and formulation thereof
US20030049701A1 (en) * 2000-09-29 2003-03-13 Muraca Patrick J. Oncology tissue microarrays
US20030088437A1 (en) * 2001-11-08 2003-05-08 Iobst Susanne Teklits Method for identifying skin conditions, selecting suitable treatment products, or predicting product efficacy
WO2004025258A2 (en) * 2002-09-10 2004-03-25 Sydney Kimmel Cancer Center Gene segregation and biological sample classification methods

Also Published As

Publication number Publication date
US7348144B2 (en) 2008-03-25
WO2005017804A3 (en) 2006-03-23
WO2005017804A2 (en) 2005-02-24
US20050037363A1 (en) 2005-02-17
EP1654686A2 (en) 2006-05-10

Similar Documents

Publication Publication Date Title
US20050282227A1 (en) Treatment discovery based on CGH analysis
US8521441B2 (en) Method and computer program product for reducing fluorophore-specific bias
Kurella et al. DNA microarray analysis of complex biologic processes
Amaratunga et al. Exploration and analysis of DNA microarray and protein array data
EP1019536B1 (en) Polymorphism detection utilizing clustering analysis
JP5632382B2 (en) Genomic classification of non-small cell lung cancer based on gene copy number change patterns
Brentani et al. Gene expression arrays in cancer research: methods and applications
Amaratunga et al. Exploration and analysis of DNA microarray and other high-dimensional data
US20050240357A1 (en) Methods and systems for differential clustering
JP5608169B2 (en) Genomic classification of malignant melanoma based on pattern of gene copy number change
Kim et al. The promise of microarray technology in melanoma care
US20070031883A1 (en) Analyzing CGH data to identify aberrations
McConnell et al. An introduction to DNA microarrays
Vaidya et al. A review of bioinformatics application in breast cancer research
US20070128611A1 (en) Negative control probes
Ranjan et al. DYNAMICS OF STATISTICS IN GENOMICS, PROTEOMICS AND TRANSCRIPTOMICS IN EMERGING ERA OF BIOINFORMATICS
JP2002525079A (en) Geometric and hierarchical classification based on gene expression
US20080090236A1 (en) Methods and systems for identifying tumor progression in comparative genomic hybridization data
Dago Performance assessment of different microarray designs using RNA-Seq as reference
Mariani et al. Microarray Techniques and Data in Asthma/Chronic Obstructuve Pulmonary Disease
Dunckley et al. Single Nucleotide Polymorphism Genotyping for Linkage and LOH Studies
KOHANE Genomics and proteomics in lung disease: microarrays and bioinformatics issues
Bakewell et al. Toward'smart'DNA microarrays: algorithms for improving data quality and statistical inference
Crick cDNA Microarrays

Legal Events

Date Code Title Description
AS Assignment

Owner name: AGILENT TECHNOLOGIES, INC., COLORADO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MINOR, JAMES M.;WOO, WILSON;REEL/FRAME:017390/0160

Effective date: 20050824

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE