WO2009109753A2 - Multiplex selection and sequencing - Google Patents

Multiplex selection and sequencing Download PDF

Info

Publication number
WO2009109753A2
WO2009109753A2 PCT/GB2009/000601 GB2009000601W WO2009109753A2 WO 2009109753 A2 WO2009109753 A2 WO 2009109753A2 GB 2009000601 W GB2009000601 W GB 2009000601W WO 2009109753 A2 WO2009109753 A2 WO 2009109753A2
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
array
dna
molecules
capture
Prior art date
Application number
PCT/GB2009/000601
Other languages
French (fr)
Other versions
WO2009109753A3 (en
Inventor
Kalim Mir
Original Assignee
Kalim Mir
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kalim Mir filed Critical Kalim Mir
Publication of WO2009109753A2 publication Critical patent/WO2009109753A2/en
Publication of WO2009109753A3 publication Critical patent/WO2009109753A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation

Definitions

  • the invention provides technology and methods for selectively characterising specific sub-sets of molecules from mixtures of molecules. Methods of the invention are particularly relevant to reactions and analyses carried out in a multiplex format. It is particularly relevant for multiplex DNA sequencing of many selected loci/ genes and/or of many individuals.
  • the invention can be implemented for research (genetics, genomics, biology, microbiology, cell biology, stem cell science, agriculture, medicine etc) orfore ⁇ sics, diagnostic, prognostic and screening applications.
  • Biological cells and organisms contain a plurality of molecular species that determine their biology. Although a comprehensive picture of the molecular inventory of cells and organisms is an appealing concept, in many cases information of all molecular species is not necessarily needed. For example, in some cases certain sub-sets of molecular species may be prioritized as being more worthy of investigation than others. Often it is more cost and time effective to concentrate on such prioritized molecules than to obtain a comprehensive molecular view of the cell, tissue or organism. In other cases, particular genes or loci have already have been identified as susceptibility loci, such BRCA1 whose sequencing may be necessary.
  • test samples that are amenable for analysis such as secreted fluids or amenable cells particular subsets of the molecular inventory may be useful as biomarkers of particular diseases under investigation.
  • the test samples may contain molecular species originating from organs or tissues that may be in early to late stages of disease progression. The detection and characterisation of relevant sub-sets of molecules without interference from sets of molecules which are not linked to the study are needed.
  • SNPs Single Nucleotide Polymorphisms
  • Wellcome Trust case Control consortium The regions of association that are mapped in these studies are typically 200Kb to megabases in length and must be further characterised to determine the causative polymorphisms and/or mutations that underpin the disease.
  • HIV infection it is desirable to monitor the sequence of variants within an infected subject's viral population, avoiding sequencing of the patients own genomic DNA.
  • Multiplexing that is the ability to process or analyse a large number of parameters in parallel, is needed.
  • a high multiplicity of molecular species can be addressed by highly parallel platforms such as the microarray or Genechips.
  • Sanger sequencing is currently used for typing BRCA1 and BRCA2 mutations for breast cancer screening, but its cost is high and throughput low. In particular Sanger sequencing is not economically viable for mass screening.
  • One candidate technology for resequencing is Sequencing by Hybridisation (SbH) on GeneChips.
  • SbH Hybridisation
  • this technology is deficient for the task in a number of ways. Firstly, the sequence coverage of regions on any given sequence can be less than 70%, therefore a significant part of relevant sequences would be missed. Secondly, while GeneChips are able to call base substitutions such as missense and nonsense mutations or polymorphisms, they are unable to call insertions and deletions without greatly increasing the complexity of the array.
  • next generation sequencing technologies due to their random 'shotgun' arrangement were not developed with selective sequencing in mind. Therefore selection must be done before the sample is applied to the platform. This typically involves multi-step manipulations based around locus- specific PCR 1 microarray affinity selection or microarray generated selectors used in solution.
  • the objectives of the invention are to provide technology to: Characterize one or a few genomic regions in one or a few individuals; Characterize one or a few genomic regions in a large number of individuals; Characterize a large number of genomic regions in one or a few individuals; Characterize a large number of genomic regions in a large number of individuals
  • the problem that the invention seeks to solve is how to sequence effectively and in a streamline manner many samples together; How to selectively sequence only the desired members of a population of molecules; How to provide a technology that is appropriate and cost-effective for both large and small-scale tasks.
  • the invention describes how to access different chosen subsets from a mixture of molecules. This includes how to how to separate, how to purify, how to enrich, how to detect, how to selectively process and how to characterise.
  • the characterisation in particular involves nucleic acid sequencing but may also include genotyping and enumerating.
  • the solid-phase selection and characterisation concept is relevant for biologically and medically relevant molecules and molecular complexes, including biomolecules such as carbohydrates, lipids, polypeptides, DNA, and RNA.
  • selective sequencing may involve systematically sequencing substantially all of a genome; nevertheless certain parts of the genome, such as certain repetitive elements would need to be de-selected; and other contaminating genomes such as microbial genomes that may contaminate samples may need to be de-selected.
  • biomolecules e.g. DNA/RNA
  • the biomolecules may be from a single cell. In another embodiment the biomolecules may be a few cells. In a further emobodiment the biomolecules may be from less than 10,000 cells. In a still further embodiment the biomolecules may be from greater than 10,000 cells.
  • targets are selectively captured by probes attached to a surface. With short oligos e.g. 25mers, the template is progressively lost as cycles commence. Therefore following sequence specific binding the template is fixed to template.
  • the captured molecules are fixed to the capture entity by FLAP mediated reaction.
  • the capture molecules are highly stable being either long or more stable base-pairing nucleotides and do not require fixing.
  • beads carrying amplified DNA fragments are selected on an array.
  • the sample DNA is fragmented, bound to a primer on a bead and emulsion PCR is undertaken.
  • the beads are released and added to a spatially addressable array of probes, under condition that allow probes and sequence on bead to form specific interactions. Unbound beads are washed away. Sequencing is then conducted on the beads that have been captured on the surface.
  • a multiplicity of capture probes targeting different species is attached to a surface.
  • a plurality of different target molecules are captured according to their individual sequence by sequence specific probes on a surface and then the captured molecules are copied by a surface attached primer.
  • the captured molecule is amplified by bridge amplification to create a DNA colony or cluster on the surface.
  • a first allele is captured by array probe and a second allele is characterised by fluorescence labelling.
  • haplotypes are isolated by allele specific hybridisation followed by characterisation.
  • a flexible method of selective sequencing in which the number of probes on the array and the number of cycles conducted is determined by the size of the region to be sequenced. For short regions the number of cycles is few, saving on reagents; whereas the tiling of the array is more closely spaced; the cost of closely tiled array for a short sequence will be comparable with the cost of less closely tiled array for a long sequence. In addition to savings in cost there will be savings in time.
  • an algorithm for calculating the optimal array capture and sequencing design.
  • a system comprising an algorithm for calculating the optimal array/cycle design, a database containing DNA to be sequenced, a computer processor and memory to perform calculation, a device that incorporates an array manufacture device and a device for carrying out sequencing by synthesis, control algorithm to direct operation of device according to the optimal array/cycle design.
  • certain sub-sets of molecules are suppressed to allow detection of targeted molecules.
  • the detection of the highly abundant globin tnRNA should be suppressed in order to detect other molecular species.
  • certain sub-sets of molecules are suppressed to allow detection of targeted molecules.
  • the detection of the highly abundant globin mRNA should be suppressed in order to detect other molecular species.
  • molecules are bound by the array of probes, are not to be characterised. Rather the molecules that do not bind to the array are to be sequenced. Hence the de-selection process, like a filter catches the undesired molecular species but lets through the desirable species. This process can be iterated several times. The molecules that do not bind to the array can be amplified before re-iteration of the filtration process. In a final selection step the sample is bound to a selection array and then sequenced.
  • the multiplex addressing system involves attaching a tag that carries information that is extrinsic to the molecules being tagged and does not reflect any characteristic of the molecules being tagged. This includes addition of a specific sequence tag (so that the identity of each sample can be decoded by analysing the tag sequence). It also includes the addition of a label, which can include an atom ' , molecule and material (such as a sub-microscopic particle). The label may have properties that can be detected and differentiated. This includes, electromagnetic, magnetic, fluorescent, light scattering, raman, plasmonic, electrochemical, electrical and electronic properties.
  • the present invention streamlines this process thus:
  • the process of the invention comprises:
  • the preferred embodiment is encapsulated in the following: A method for selectively sequencing portions of one or more complex genome sample(s) comprising the steps:
  • sequencing of the selected sample DNA may comprise but is not limited to the steps
  • each library comprises a defined base, a label exclusively identifying the defined base and a number of non-defined bases, wherein the defined base is different from any previous defined base
  • Characterization means determining one or more properties of the target molecule.
  • the property may be the sequence of bases, the number of a particular molecule or molecule type or any other feature of the target molecule.
  • the meaning of the term selection includes decomplexing, isolating, purifying, enriching, marking, tagging or labelling.
  • the selection enriches the target molecules sufficiently to make make further characterization worthwhile i.e. the majority of the molecules will be the target molecules and any non-targeted molecules will be in a minority.
  • the characterisation will be worthwhile if the accuracy in subsequent DNA sequencing is at least 75% accurate at first base but is preferably at least 95-99.99% accurate. Selection is the opposite of de-selection or filtering.
  • Selectors, selection probes and capture probes are oligonucleotides that are designed to bind to specific molecules in a sample.
  • capture is used synonymously with hybridization and binding and in the case of DNA includes the case where substantial Watson-Crick base-pairing takes place.
  • the capture probe may also function as a primer.
  • genomic regions can be substituted with, genes, ENCODE regions, exons, the exome, genie regions, RNA, miRNA, and other biomolecular species that can be characterized. The few or large number of genomic regions may be contiguous or non contiguous.
  • Figure 1 Schematic illustrating the capture of a bead carrying clonal DNA (right) and the non-capture of non-selected clonal beads.
  • Figure 2 Scheme illustrating selective capture of targeting molecule by hybridisation and washing away of non-targeted molecules. This is followed by sequencing of the selected molecule.
  • Figure 3 Schematic illustrating the process of reading two bases on array attached olligonucleotides using the preferred sequencing chemistry.
  • Figure 4 The results of first and second base sequencing of the PhiX174 genome by the four colout ligation-based method described in Figure 2 and 3.
  • Figure 5 Flap-mediated Ligation products run on a gel.
  • Reaction components Oligos A, B, C; 2.5U FideliTaq polymerase ; 5U Ampligase (Epicentre); 1x PCR buffer ( 1OmM MgCI2, 1mM NAD), 5OnM KCL
  • Oligo B 5'-TAC CAT TCT GCT TTT ATT TTT TTT TTT TTT TTT-N H2
  • Incoming oligos C-Cy3, G-Cy5, A-Alexa594, T-OG489
  • washing is done at high stringency to remove any template molecules that are not covalantly bound by ligation.
  • the high signal for D indicates that ligation only occurs where template has been fixed into place by the Flap reaction.
  • the aim of this invention is to streamline the selective characterisation of molecules, where the preferred molecules are polynucleoitdes and the preferred characterisation is sequencing and/or enumeration.
  • the non-selected molecules are also characterised to the extent that they do not contain the sequences that have been selected, hence the invention comprises:
  • a method comprising the steps of i) Selection of one or more entities from a plurality of entities, ii) characterisation of selected entities and thereby the plurality of entities.
  • the selected molecules are amplified and applied back to same array or another array.
  • the array may be of increasing stringency or conditions may be of increasing stringency. This is repeated until a high degree of enrichment of targeted molecules is achieved.
  • the stringency of the array can be achieved by providing a second array to which targets bind less well under the conditions of the first array. For example if the oligonucleotide length is shortened from 60 nt to 50nt, the more stable hybrids will be retained in preference to the less stable hybrids (including mismatches).
  • surface attached probes are used to select target molecules from a complex mixture, by forming nucleic interactions (including hybridization) between probe and complementary target and thereby isolating the target molecules to spatially addressable locations on a surface.
  • the probes can be arranged as clusters of molecules of the same identity or clusters of a set of identities. Alternatively, the probes can be arranged as isolated single molecules within an array.
  • the probes are arranged as spatially addressable, microarrays and are manufactured by methods known in the art including light-directed spatial patterning (using photolabile protecting groups or protecting groups labile to photo-generated acids), ink jet synthesis, electronic control of deposition and deposition by robotic spotting of oligonucleotides.
  • light-directed spatial patterning using photolabile protecting groups or protecting groups labile to photo-generated acids
  • ink jet synthesis electronic control of deposition and deposition by robotic spotting of oligonucleotides.
  • the selected molecules are amplified and applied back to same array or another array.
  • the array may be of increasing stringency or conditions may be of increasing stringency. This is repeated until a high degree of enrichment of targeted molecules is achieved.
  • the stringency of the array can be achieved by providing a second array to which targets bind less well under the conditions of the first array. For example if the oligonucleotide length is shortened from 60 nt to 50nt, the more stable hybrids will be retained in preference to the less stable hybrids (including mismatches).
  • the enriched molecules are captured on the array and characterised. Accessing specific sequence by hybridisation
  • the target DNA is double stranded and measures need be taken to allow the probe to access the sequence.
  • the target DNA can be fragmented into short pieces to facilitate hybridisation. They can also be heat denatured before hybridisation doe example by heating to 95°C for 5 minutes. Extreme heat denaturisation can be done by boiling the target DNA for 1 , 2 or 3 minutes. After heat denaturation the sample can be used directly or snap-cooled on ice. Alkali denaturation can be carried out followed by neutralizaiton.
  • the addition of exonuclease, at a concentration where statistically only one exonuclease binds to the end of each molecule can be used to generate single strands by exposure for a limited time.
  • Single-stranded binding proteins and helicases can be used to keep the DNA single stranded following denaturation.
  • the following probe structure is used.
  • a first oligonucleode has a target sequence specific region and a common region.
  • a second oligonucleotide is complementary to the common region only. Both the first and second oligonucleotide must be attached to the surface. Alternatively, the second oligonucleotide is attached to a moiety that enables it to be attached to the first oligonucleotide. For this crosslinking can be used.
  • first and second oligo can be incorporated with a single hairpin oligonucleitdes, where the first and second oligo are separated by a loop sequence.
  • a region of the target may overlap the common region. This hangs off like a flap and can be cleaved by Flap endonucelases or Taq polymerase (which also has Flap cleavage activity). Following cleavage the template is ligated to the second oligonucleotide. Thus the target is permanently (inked after capture.
  • the probes are preferably able to form strong interactions with the target molecules.
  • the strong interactions can be formed by using long oligonucleotide probes.
  • the length of probe is preferably 60 to 70 nucleotides in length or longer depending on Tm. In some cases primers as long as 120 bases may be preferable.
  • Shorter probes are progressively less favourable but may still be useable and in certain contexts they may be desirable.
  • the strong interactions are needed so that selection can be done using high temperature, typically at or above 64 or 65 0 C (in other instances, especially for AT rich targets temperature at or above cases 55°C can be used) and by using reaction buffers that enable good discrimination between match and mismatch.
  • high temperature typically at or above 64 or 65 0 C (in other instances, especially for AT rich targets temperature at or above cases 55°C can be used) and by using reaction buffers that enable good discrimination between match and mismatch.
  • it is to impart high stringency, so that interactions with mismatched sequences are disfavoured compared to interactions with the perfect match.
  • it may be desirable to be able to achieve discrimination of alleles for example where haplotype specific sequencing is required.
  • washes are undertaken to remove non-specifically bound molecules.
  • the washes can be at the same stringency as the hybridization or preferably at a lower stringency in order to retain as much as possible of the captured molecule, as specificity is already imparted during the high stringency hybridisation.
  • the captured targets are then ready for further characterization including sequencing. It is preferable to not allow complete drying of the slide or chip carrying the array.
  • the array may be placed in humid environment or under low stringency buffer and low temperature. To avoid unnecessary wastage of reagents during sequencing it is preferable to check if the target is bound to the probes. This may be done by methods such interferometry without the need for labeling.
  • the target molecules may carry a detectable label or may be stained with DNA stains such as SYBR Gold or SYBR Green 1 in the case of double- stranded DNA and OliGree ⁇ and SYBR Green Il in the case of single strands or RNA. It is preferable to begin the sequencing reaction immediately following the capture and wash reactions. The sequencing may involve less stringent conditions then the capture. It is preferable not to subject the microarray to high stringency conditions as template will become lost become denatured. It is desirable to fix the template in place before sequencing commences (see below) however when the target probe is stable enough the template-probe complex may be considered for practical purposes as fixed.
  • DNA stains such as SYBR Gold or SYBR Green 1 in the case of double- stranded DNA and OliGree ⁇ and SYBR Green Il in the case of single strands or RNA.
  • the capture probe may also act as primer.
  • the capture probe In order to act as a primer the capture probe should have a free 3' end in the case of extension with a DNA Polymerase. In the case where extension is via ligation, the capture probe could have either a free 3' or 5' end depending on the sequencing biochemistry being used. Preferably the free end of the probe-primer will pointing away from or have minimal interactions with the surface. Also it is desirable that adjacent probe-primers are not able to interact with each other. It is also desirable that the probe- primer is not able to foldback on itself so that it is able to self-prime or is unable to prime the target. A separate oligo may be used for priming sequencing than the capture probe.
  • the primer may be an oligonucleotide that is bound at a distal site to the capture primer.
  • capture may be at low or an intermediate stringency but sufficient specificity is achieved by using higher stringency conditions for washing steps carried out after the capture (e.g. 30 0 C capture, 37 0 C washing).
  • the capture and sequencing can occur in a single step without washing in between. This is preferably done under conditions where hybridization is of reasonable stringency but at which the enzyme can also work effectively.
  • Thermophilic (including extremophilic) enzymes can be employed for the sequencing allowing temperatures in excess of 55°C can be used. Also non-thermostable enzymes can be used at temperatures in excess of recommended.
  • thermocycling which can be used with both thermophilic and non-thermophillic enzymes can serve to provide a temperature difference between the capture (lower) and the extension step (higher) or capture (higher) and extension (lower) within a homogeneous reaction, i.e. without exchange of reagents.
  • promiscuous interactions will be destroyed by for example including an enzyme that recognizes mismatches and cleaves them.
  • the reaction would require adequate mixing to remove molecules de-annealed mismatch oligos from the vicinity of the probe.
  • the capture conditions must be balanced (i.e. not too strong) so that the correct target sequences can be captured including where their variant from the probe complement due to polymorphism.
  • Known polymorphisms can also be tackled by providing mixed bases at the relevant positions in the oligonucleotides so that probes for polymorphic molecules are provided.
  • the capture and selection is done directly from genomic DNA or RNAs from cells without any prior selection by PCR or other sequence selection methods known in the art.
  • DNA/RNAs may be isolated from any type of cell including those from buccal swabs, blood and biopsies. It may however be desirable to fractionate different types of biomolecules, proteins from nucleic acids and RNA from DNA. It may also be desirable to select certain sub-set of polynucleoitdes. For example, RNA molecules that generically contain a poly A tail may be fractionated from other RNAs; and miRNAs may be fractionated from other RNAs. Also, It may be desirable to de-select highly repetitive DNA from genomic DNA; this may be done by isolation of specific non-repetitive Cot fractions. Highly repetitive DNA may be used to mop up repetitive DNA during the capture process or to clean up the sample before the capture process.
  • the target can be directly extracted from one or more cells and then subjected to capture on a random array or on an ordered microarray.
  • the contents of the cell(s) may be released directly onto the capture array.
  • sequencing from as few as around 10,000 cells is useful because this is typical amount harvested from clinical biopsies. Studying single cells is useful in development biology for example and has clinical applications in pre-implantation testing. In some cases it may be useful to sequence from a single cell or to enumerate the quantitites of specific molecules. Because the amount of capture material may be small when the target molecules are released from one or a few cells, capture can be coupled with generic amplification in solution or on-chip amplification or single molecule analysis.
  • each specific molecule may be at too low a concentration to provide sufficient captured molecules for detectable sequencing. There may also be a high degree of cross-reaction during capture. To retain specificity intermediate to high stringency conditions must be used. A certain degree of mismatching can be tolerated when sequencing is at the single molecule level. Here during further characterization, the presence of non-targeted molecules can be accounted for due to their variant characteristics or sequence; will be revealed after a certain amount of sequence information has been obtained.
  • mismatched capture can be minimized by selectively destabilizing the mismatch template by selective cleavage (see above).
  • a hybrid with mismatch may be cleaved chemically or by DNA repair enzymes. After cleavage the mismatched template can be separated from the probe; this can be done at denaturing conditions that would remove the cleaved but nor the intact molecule or it can be done by using enzymes with exonuclease activity that begin processive degradation of the target strand from the site of cleavage.
  • a single capture probe molecule may lead to sequencing of one of the following cases: two homozygous sequences, two sequences which are heterozygous at one or more positions, two sequences which are completely or have significant stretches of difference due to a translocation or rearrangement event at the location in one of the copies.
  • a region is segmentally duplicated to give more than two similar regions, even when the sequence covered by the probe is similar enough to retain hybridisation the sequence to be sequenced may diverge significantly and different molecules captured within the same spot would give significantly different sequences. This can be resolved at the level of single molecule sequencing but it poses a problem for bulk sequencing. For bulk sequencing known duplicated regions maybe deselected or masked as would certain classes of repetitive DNA. It should be noted that in some instances single molecule sequencing may be equivalent to the case where a single molecule is amplified as is the case with DNA colonies and polonies.
  • the selection array should be able to target any sequence effectively. This can be challenging because some probes are not as effective as others. There is substantial variation between the efficiency of capture. This is largely due to differences in base composition and sequence. These differences can be ironed out by making the probes long enough so that stable hybrid can be made (e.g. 60-120mers) or making probes isotherhmal by adjusting Tm with a flexible length. The differences can also be ironed out by using buffers such as Tetramethylammonium chroride, betaine and C-tab.
  • each specific species that is to be captured may be at too low a concentration to provide sufficient captured molecules for detectable sequencing.
  • 0.1 % or fewer molecules or less are typically captured from solution by surface attached probes (Harris et al).
  • the signal may not be observable over background signal, particularly using equipment and detection methods set up for bulk analysis. Detection sensitivity can be improved by reducing background.
  • Substrate with low intrinsic fluorescence should be used. The signal due to background scattered light can be removed for example by using example evanescent illumination or time-gating the detection so that short-lived signal from light scattering is not detected. Detectors with low intrinsic noise must be used.
  • the signal due to random nonspecific attachment of fluorescent material to the surface needs to be minimized; this can be done for example by choosing an appropriate surface chemistry (e.g. polyelectrolyte multilayers) or by treating surfaces with casein, BSA, salmon sperm DNA, an/or Denhardts solution. Also on a microarray transfer of probe molecules to non-spotted inter-spot areas during slide processing needs to be minimised. Finally, if single molecule detection is used then detection can be digital, which by its nature will reduce noise and artefactual signal can be removed.
  • the sample may also be useful to increase the amount of DNA in the sample by a non-locus specific manner, so that it can be detected more easily.
  • This can be done one of the whole genome amplification methods, notably Multiple Displacement Amplification (MDA), Rolling Circle Amplification (RCA) and OmniPlex technology.
  • MDA Multiple Displacement Amplification
  • RCA Rolling Circle Amplification
  • OmniPlex technology OmniPlex technology.
  • the complexity and content of the sample remains largely unaltered but the amount of sample increases.
  • the sample may be generically amplified in a way that also fractionates the molecules; this can do done by amplifying DNA of certain size range after restriction digestion.
  • An alternative approach is to clonally amplify the sample using emulsion PCR (see below). The high concentrations of genomic DNA that is generated by generic amplification is needed to drive the capture reaction.
  • reagents used to carry out the sequencing reaction may be subject signal amplification.
  • This may be direct labeling with bright labels such as Qdots, Fluospheres, or phycoerythrin and its FRET conjugates (all available form Life technologies).
  • polylabelled dendrimers or branched structures may be directly attached.
  • amplification by antibody layering can be conducted or by branched DNA or dendrimers (Genisphere Inc.) after the reaction.
  • a four-colour sequencing scheme can be implemented using Phycoerythrin and PE-FRET conjugates and Quantum Dots.
  • the template in the locality of the primer or so that it is physically contacting the primer is desirable to fix. This can be done for example, by either co- immobilising primer and template to the same place on a surface so that they are able to make contact easily or by having them become attached together. In this way if the primer and template denature they are quickly able to re-anneal and because they are both fixed they do not diffuse away from each other.
  • the template and primer can be linked together by crosslinking including the use of psoralen and/or UV light irradiation. If sample molecules are tailed with homopolymer sequence such as poly T using terminal transferase and a homopolymer poly A sequence is provided in the primer linked to psoralen, both primer could become attached, upon UV irradation.
  • the psoralen may also be pre-attached to the template before hybridisation to the surface oligo.
  • Other available crosslinking systems could be used in the same way.
  • both template and primer could be biotinylated at multiple sites. This can be done with photoprobe biotinylation kit from Vector labs. Then streptavidn is added to link the molecules together.
  • the probes can also be linked by an intercalating, or groove binding molecule. This binder may become covalently attached to the two strands.
  • the probes may have a thymidine tail to which the target becomes crosslinked by binding of a psoralen derivative from solution (e.g trioxalen).
  • crosslinking or streptavidin-biotin interaction can be used in direct array based selective sequencing if the linking part is supplementary to the selection sequence and does not prevent selection.
  • the target remains double stranded and is bound to the immobilised capture probe by a RecA mediated reaction.
  • the target is partially denatured and the probe is a PNA or LNA and is able to bind by strand invasion or by outcompeting the renaturation of the native complementary strand to the sequence that is targeted.
  • a number of DNA mimics/analogues including Locked Nucleic Acid (LNA), Peptide Nucleic acid (PNA), Ethylene Bridged Nucleic Acids (ENA), Methylene bridged nucleic acid, (BNA), guanidinium nucleic acid (DNG), morpholino, methylphosponate can be used to bind double or single stranded target DNA.
  • an intercalator dye such as acridine can be used to increase stability.
  • the intercalator may cap one of the ends of the capture probe, preferably a free end and can be added during oligo synthesis (Glen Research),.
  • the partially double-stranded "sticky" primer complex comprises two oligonucleotides annealed together with one oligonucleotide overhanging the other and carrying the sequence to which the target molecule binds by complementary base-pairing. This then allows the longer oligonucleotide to prime an extension reaction with the captured target molecule as template.
  • the drawback of this approach is that it can target only the ends of molecules. Hence, substantially all the sequences of interest must be located close to the ends of fragmented DNA.
  • the sticky probes are not created by hybridisation of two primers, but they are created by designing an oligonuceotide that will form a stem loop structure.
  • This stem loop can be attached to the surface by an amino group modifying one of the nucleosides in the loop.
  • the capture probe does not prime sequencing but a separate primer is provided for the sequencing reaction.
  • This may be a random primer or a primer that binds to a PBS ligated to one end of the template or a promoter.
  • the template may be adapted at one end with a stem loop capable of priming sequencing. This can be done by ligation of a priming structure such a self-priming stem loop onto the end of the template. This can be done before or after capture.
  • a primase can be used to create a primer on the template.
  • the capture primer does not initiate the sequencing, the capture can be used to facilitate the fixing of the template to the surface.
  • One preferred embodiment of the present invention preferably using the sticky probes described above, provides the ability to ligate to a priming sequence bound anywhere along a target molecule (this is in contrast to Gundersons approach described above which only binds to the end of a molecule.
  • a priming sequence bound anywhere along a target molecule (this is in contrast to Gundersons approach described above which only binds to the end of a molecule.
  • the sequence at the 5' and 3' overhang the duplex.
  • One overhang is the template for sequencing.
  • the other overhang prevents ligation to the sticky primer complex.
  • we can overcome this by clipping off the overhang end that is not targeted for sequencing, so that an end is generated that is able to ligate with the sticky primer complex.
  • Flap reaction can be achieved by the Flap reaction, using Flap enzymes. This type of enzyme is used in the Invader genotyping assay (Third Wave Inc). Taq DNA polymerase also has a Flap activity that can be used.
  • the sticky primer complex is added to the template in the presence of a DNA ligase and a Flap enzyme, the overhang can be clipped off and the target molecule is covalently linked to the primer complex by ligation.
  • the covalently linked target molecule can then act as a template for a template-directed extension reaction using the longer oligonucleotide of the sticky primer complex as a primer.
  • the extension can be achieved by a DNA ligase.
  • the clipping and ligation can occur in the same step as the first ligation step of the sequencing reaction. In another embodiment sequencing and flap-ligation reactions occur separately.
  • the template can be bound to the sticky primer complex in one reaction or alternatively, hybridisation can be done first followed by washing steps, before the flap/ligation reaction is implemented.
  • Selective capture on random arrays can be done in two different ways. The first is described immediately below and the second is described as part of he description of clonally amplified beads.
  • capture probes are arrayed randomly on a surface, at a low density (e.g. one molecule per 3-5 micro ⁇ s); primers complementary to P1 and and anti-P2 are co-immobilised on the surface at high density (primers P1 and P2 are the two bridge amplification preimers).
  • the target molecules are fragmented (e.g. to around 200bp) and adapted with anti-P1 and P2 primers.
  • the target molecules are hybridised to the selector probes (60mer) at high stringency (e.g. 65-75°C in Agilent Hybridisation buffer). After washing, buffer at low stringency is added (e.g.
  • first strand synthesis is carried out (e.g using Taq DNA polymerase under standard buffer; or see Turcatti et al).
  • the duplex is then denatured using heat or chemical (e.g. Urea, Formamide buffer) denaturation and then both the original template and the newly synthesized strand are hybridised to complementary primers and second strand synthesis is carried out. This process is carried out for a number of cycles (e.g. 20-40) until substantially spatially distinct DNA colonies are formed. These colonies can then be enumerated and/or sequenced.
  • the capture probe serves as primer to make a cDNA copy.
  • an oligonucleotide primer binding site is ligated to the 3' end of the strand and after denaturation, a primer is annealed to synthesise a second strand.
  • the primer is preferably also localised on the surface.
  • One aim of this invention is to be able to selectively sequence the parts of the genome of interest without having to perform separate selection steps by locus-specific PCR or some form of laborious multi-step array-derived selection. Rather, in a preferred embodiment the selection and sequencing should occur on a single platform. In one embodiment the selection is on a microarray. In another embodiment the selection is on a random array of selection probes.
  • the embodiments of the invention related to selection and sequencing described above can be carried out by either bulk analysis or single molecule analysis.
  • Single molecules allows the analysis to be digital.
  • bulk/ensemble analysis where measurements are analogue, when the signal is low there may be difficulty in picking up true signal over background, whereas where the measurements are digital, due to each signal being from individually detectable molecules, true signal can be differentiated from background and molecules can be quantitated by recording and counting discrete signals.
  • the amount of sample and the concentration of sample material can be increased by generic amplification methods including MDA, OmniPlex and RCA as described above.
  • sample preparation methods employed in some of the next generation DNA sequencing methods e.g. 454, ligation-based "Polony” sequencing and SOLID sequencing, namely clonal amplification on a bead using emulsion PCR as first described by Dressman et al, can be used.
  • Clonal beads can also be generated by non-emulsion PCR method described by Brenner et al. The digital analysis can then be carried out by counting beads.
  • the beads maybe magnetic bead such as Dynal Beads available form Life Technologies.
  • the beads may also be non-magentic beads.
  • the existing surface-based bead sequencing methods attach the beads on the surface of a glass slide via a chemical interaction. This process does not involve hybridisation capture, sequence specific selection or spatial addressing. What differentiates the present invention from these methods is that in one favoured embodiment of the present invention, beads carrying the clonally amplified un- selected molecules are captured by hybridisation in sequence specific manner according to the probe sequences that are provided as selectors.
  • the capture is to spatial addresses on a surface (e.g. in a microarray format). These addresses comprise specific spots or features within a microarray.
  • the capture is to random positions on the surface (although location of molecules on the random array may be according to a specific periodicity and may be tightly packed).
  • the beads can be captured in hybridisation buffer at temperatures ranging from 25°C to 65 "C depending on probe oligonucleotide length.
  • the capture also depends on bead size; Beads of 20nm diameter could be captured with no mixing. Larger beads, carrying many probes may require mixing, relatively higher temperatures and it may be preferable to use a lower density of molecules so that fewer interactions form.
  • the problem as beads increase diameter is that if there is a higher number of probe molecules on the surface and if the majority of these find complements on the surface it may be difficult for a mismatched bead to dissociate from the surface to find it is correct target; the binding may be cooperative and would require a high temperature for release.
  • RNA/DNA from as little as a single cell is clonally amplified.
  • the beads may be coded according to which sample they are from.
  • each bead comprises a plurality of molecules are available for both capture and to serve as templates for extension.
  • the beads within a spot can be counted. Due to the known sequence of probes at specific locations on the surface, some sequence information about the target DNA molecules on the bead is revealed due to just their spatial immobilisation. As well as getting sequence information from the beads isolated in the spot, the number of occurrences of beads within the spot is obtained. If the probes and experimental conditions are well designed, so that all sequences interact with relatively similar effectiveness, the number of captured molecules can be enumerated to provide an indication of the quantity of each molecule within the sample. For genomic DNA this will indicate whether there is aneuploidy, polyploidy, segmental duplication or copy number of variation at a particular locus in the sample.
  • the DNA is first fragmented and then subjected to emulsion PCR. Fragmentation is done randomly. Alternatively, fragmentation is done by a restriction enzyme, in this case we would know what fragments to expect. Preferably one or more primers for amplification are attached to a bead.
  • the capture probes may capture at different sites on the same fragments or fragments of different size can be captured on the same array. Following selective capture, sequencing can commence.
  • Each spot captures a specific targeted bead (via interaction of DNA/RNA on bead with DNA on surface) and beads that are not targeted are not captured (unless some mismatching occurs).
  • Subsequences of the sequence present on the each bead may be represented on more than one spot, where each spot captures a different section of the DNA on the bead.
  • the DNA/RNA on the bead may extend to 100s of bases in length.
  • Some sequence motifs may be represented on more than one bead.
  • several beads will be captured on each spot, each of these beads carrying the sequence complementary to the probe on the array.
  • the method involves sequence-specific capture of beads carrying clonal copies of nucleic acid species onto an array followed by optional enumeration of the number of beads per array element and/or sequence determination of the molecules on the beads.
  • each sample nucleic acid molecule (which may be a fragment) is bound to a primer on a bead and clonally amplified (e.g. in an individual droplet within a water in oil emulsion according to methods described in the art); (ii) the beads are released from the droplets and added to a spatially addressable array of probes, under conditions that allow probes and sequence on bead to form specific interactions by which the beads are sorted to specific location on the array; (iii) unbound beads are removed; (iv) The occurrence of particular nucleic acid sequences in the sample can be quantitated digitally by enumerating the number of beads captured per array element (by detecting beads with or without fluorescence); (v) the sequence on each bead can be determined by using AB's SOLiD, Sequencing by Ligation (SBL) chemistry or other suitable chemistry.
  • SBL Sequencing by Ligation
  • the selection can be done on a spatially addressable microarray or any other type of array.
  • the beads have sufficient number and distribution of molecules to enable a fraction of the molecules to engage in the capture process and a portion to act as templates for DNA sequencing.
  • the results of quantitation may inform subsequent steps, such as whether to sequence. This may act as a quality control step, before entering into the long-run time commitment and expense of DNA sequencing.
  • the method provides the flexibility to systematically address every detectable molecular species of a particular type (e.g. mRNA, whole genomic DNA) in the sample or to selectively target a subset (e.g. exonic genomic regions, candidate genes) per array.
  • the method enables selective sequencing without having to perform locus specific PCR 1 array hybridisation and elution etc. prior to entering the in vitro cloning step.
  • the method significantly streamlines the selective sequencing pipeline compared to other methods, reducing 5 steps to 3 steps. This means increase in speed and reduction in cost and also offers the possibility of easier automation of the selective sequencing pipeline.
  • the new method has the additional benefit of enabling digital quantitation of the sample.
  • the method also offers an intermediate check- point before sequencing commences; the enumeration of beads per array element will report on the representation of species within the nucleic acid sample, for example if some particular nucleic acids are overrepresented and others are underrepresented or absent, this may rule out further sequencing.
  • each sample genomic DNA is optionally tagged so that one can be distinguished from another and performing digital detection and analysis
  • the selector probes for bead capture are not arranged in a spatially addressable array but are distributed randomly on a surface. In this case there will not be an opportunity immediate ability after hybridisation to enumerate the number of each molecule in the sample, but it will be possible to do this after sequence is obtained.
  • probe molecules are spread randomly on a surface in a manner that when the target molecule is immobilised it can individually resolved. Then each probe molecule binds to a single target molecule and immobilises it onto the surface.
  • each target molecule is first clonally amplified (by for example bead-based emulsion PCR). Then the single probe molecule binds to one of the complementary molecules on the bead and immobilises it to the surface.
  • the beads must not be able to non-specifically bind to the surface; the surface may contain a repulsive coating, for example may be negatively charged or may have negative electrical field around it.
  • a single molecule interaction must be able to hold the bead onto the surface during capture.
  • the beads can be attached to the surface by a separate means, by for example by changing the pH so that the charge on the surface of the slide is altered to enable chemical groups on the bead to bind to the surface.
  • the single molecules on the surface may comprise a dendrimer carrying probe replicates at each location; this is then able to bind the bead strongly.
  • the single molecule probe on the surface may include tandem replicates of the probe sequence created by rolling circle amplification. This can be created by providing the selection probes as circles, containing a PBS and the complement to the selector (anti-selector) sequence. A primer complementary to the PBS common to the anti-selectors then binds to the circles and rolling circle amplification is carried out.
  • the anti-selector probes can be created by methods known in the art using reagents available from Glen Research and as described by Eric Kool.
  • the method for selection comprises the following steps: i. Take unselected DNA/RNA sample ii. Do clonal amplification on beads by emulsion PCR iii. Add bead to microarray and allow sequence specific interaction of DNA/RNA on bead with DNA on spots across the array.
  • the DNA on the beads can be subjected to sequencing. After capture of the beads sequencing can be initiated by binding of a primer to the primer binding site on the clonally amplified templates.
  • the primer could be crosslinked to the template or may be created by the ligation of a stem-loop to the template on the bead.
  • ligation-based sequencing can proceed directly fro the capture probes in the 3' to 5' direction.
  • the selection primer would need to have a phosphate group at its end for each cycle of sequencing by extension.
  • the 3' immobilisation will initiate sequencing away from the surface, towards the bead.
  • ligation using 5' phosphorylated oligonucleotides
  • polymerase based sequencing can be used.
  • extension will be away form the bead, towards the surface.
  • the beads are immobilised within a gel or a hydrogel; the hydrogel may have waveguiding properties which may intensify illumination and restrict it to around the beads.
  • the target molecules can be adapted with a primer binding site before or after immobilisaiton and a separate primer to the capture probe is used to carry out an extension reaction for sequencing.
  • sequencing by hybridiation fo a acomplete set of oligos can be be added iteratively to the array as described by Pihlak et al.
  • DNA on beads can be clonally amplified as DNA nanoballs (DNB) which can be produced by rolling circle amplification.
  • DNB DNA nanoballs
  • the DNA selected from several populations can be multiplexed and sequenced together.
  • the beads generated from one individual can be labelled with a particular colour. This may be by hybridisation to still vacant primer, hybridisation to a second (non-primer sequence on the bead) or interaction of a label in some other way, e.g. non-specific interaction or adsorption.
  • This approach can be multiplexed in two ways: the beads from a specific preparation can be labelled with specific dyes; beads from separate preparations can be placed at separate sub-arrays separated by barriers.
  • the process starts with a single mRNA molecule in a water-oil droplet.
  • the array is made by reference to the consensus gene sequence or the known information about previously described or predicted RNA sequences.
  • the RNA is hybridised to a primer, which may bind to the poly A tail or region around the polyadenylation signal or to some other characteristic sequence or motif (e.g Cap) or an adaptor tag that has been ligated to the molecule using T4 RNA ligase.
  • the primer then synthesizes a complementary strand.
  • a second strand is then synthesized and the process repeated until many complementary copies of the single molecule have been generated and are attached to the bead.
  • the RNA can be converted to a cDNA copy before the process begins.
  • the beads are contacted with an array, preferably a spatially addressable array.
  • an array preferably a spatially addressable array.
  • the interaction of the beads to the surface is a chemical one.
  • the interaction between bead DNA and the surface is by sequence specific molecular interaction, such as hybridisation or annealing.
  • the array is composed of a spatially addressable array comprising spots/features targeting different molecular species in the target mixture. The array may aim to systematically address every molecular species in the sample or it may selectively target a subset.
  • the sample can be characterised by counting the number of beads isolated at each location. If a mRNA species is in low abundance then only a few beads will be isolated within the spot. If the abundance is high then a large number will be isolated per spot. Alternatively, the systematically and selectively arranged beads can be subjected to DNA sequencing for further characterisation.
  • transcripts will be possible to detect alternative transcripts by obtaining sequencing reads from separated sites on the captured molecule. This will be possible for example, punctuating extensions with labelled reagents (from which sequence is obtained) with running unlabeled extension reactions (e.g. comprising contiguous ligation of unlabelled degenerate oligonucelotdes or unlabelled dNTPs or NTPs)
  • unlabeled extension reactions e.g. comprising contiguous ligation of unlabelled degenerate oligonucelotdes or unlabelled dNTPs or NTPs
  • the mixture of probes is then randomly arrayed out on a surface.
  • the probes may be attached to the surface at a high density so that single molecules are not individually resolvable.
  • the probes may be attached to the surface at a low density so that single molecules are individually resolvable.
  • the probes may be arrayed in a completely random way, in that there is neither specific order in the spatial arrangement of the molecule nor order in which molecule is where on the surface.
  • the mixture of probes are arrayed in an ordered single molecule array in which each molecule is located in a non-random fashion but the position of any specific probe is not specified (is random). This array of sequence specific probes can then be used in three different ways.
  • the array is contacted with the target mixture and selected target molecules are separated from non- targeted molecules.
  • the target molecules bind to the array and the non-target molecules do not and therefore are removed during a washing process.
  • the washing can typically be done under stringency conditions which favour retention of perfect matches and remove mismatched molecules from the array.
  • the captured molecules can be removed from the array and sequenced according to any available sequencing method. Alternatively the captured molecules are sequenced directly on the array. In this case the probes need to be arrayed in a manner that each probe or the resulting probe-target complex can be resolved individually according to the method of detection.
  • the method of detection may be optical. A higher density of molecules can be used when the detection method uses scanning probe microscopy or electron microscopy.
  • the captured molecules are amplified on the array before sequencing.
  • DNA or RNA colonies can be produced which are individually resolvable. These colonies can then be counted and/or subjected to sequencing according to available methods.
  • Amplification may be by thermoscycling or by isothermal reactions. Where the reaction is isothermal it may use helicases (Biohelix. USA) or by flushing in denaturant including from formamide, urea, alkali solution, acidic solution, DMSO and other denaturants of duplexes; this is a substitute for the heating step in PCR.
  • amplification may be by bridge amplification.
  • they can be amplified by rolling circle amplification or any other amplification method that can be applied on a surface.
  • the bridge reaction can be conducted by having primers complementary to a primer binding site which has been attached to the target.
  • the capture primer itself my be bifurcating and so may have the second primer already attached.
  • certain sub-sets of molecules are suppressed to allow detection of targeted molecules.
  • the detection of the highly abundant globin mRNA should be suppressed in order to detect other molecular species.
  • molecules are bound by the array of probes, are not to be characterised. Rather the molecules that do not bind to the array are to be sequenced. Hence the de-selection process, like a filter catches the undesired molecular species but lets through the desirable species. This process can be iterated several times. The molecules that do not bind to the array can be amplified before re-iteration of the filtration process. In a final selection step the sample is bound to a selection array and then sequenced.
  • sequencing can be conducted on the template by one of several methods.
  • One example is sequencing by synthesis using a polymerase (see Harris et al; Bentley et al; Eid et al) or a ligase (see Shendure et al).
  • Another is sequencing by hybridisation and combinatorial decoding (Walt et al).
  • sequencing by hybridisation can be conducted by iteratively hybridising a complete set of oligonucleotides to the target, as has been described by Pihlak et al.
  • sequencing is achieved by a single extension reaction from an array that tiles through every base of the sequence.
  • This tiling may be by using only reference sequence probes for capture and four coloured base specific nucleotides or oligos. Alternatively, tiling may contain 4 A 1 C 1 G 1 T variants for each position. Sequencing may be enzymatic extension with the variant position at or near the point of extension.
  • Sequencing may be stepwise sequencing by synthesis where either one base is added at a time or where all four bases are added simultaneously but may carry a terminator which may need to be removed at each cycle. Sequencing may be by non-stepwise sequencing by synthesis including realtime or continuous sequencing.
  • sequencing by synthesis as the extension proceeds it is important to overcome attrition. This can be due to the probe-primer detaching form the surface. Steps need to be taken to form strong attachment to the surface (e.g. as described by Turcatti et al). Another reason for attrition is that the target molecule detaches from the primer. Steps need to be taken to keep the target in place. Another reason for attrition is that a molecule is no longer available for further extension.
  • the primer may become attached to the surface in an orientation which prevents it from being extended; the surface may need to be engineered to prevent this.
  • a 3D hydrogel matrix may be a preferable substrate (see Mir et al).
  • the primer may become damaged; the reaction may be supplemented with DNA repair enzymes or other relevant reagents, for example glycerol and beta- mercaptoethanol and ascorbic acid which may counter light induced DNA breakage.
  • DNA repair enzymes or other relevant reagents for example glycerol and beta- mercaptoethanol and ascorbic acid which may counter light induced DNA breakage.
  • stepwise yield can be maintained if any cleavage steps that are used are effective.
  • the extending primer-target complex need to kept in an environment which is conducive to the enzyme action, avoiding, "sticky" interactions with the surface or components of the reaction mix.
  • the primers should be synthesized without capping. This is to prevent n-1 products within a spot from initiation synthesis from different positions on the template. Extension may involve the provision of supplements such as Single strand-binding protein, DMSO, Betaine, Sodium ascorbate.
  • supplements such as Single strand-binding protein, DMSO, Betaine, Sodium ascorbate.
  • bright labels may need to be used. This may include high absorption coefficient, high quantum yield, high photostability and good solubility; Dyes should preferably have minimal dark states. It may also include fluorescent nanoparticles such as Fluospheres, Transfluospheres and Quantum Dots (all available from Life Technologies, CA) or light scattering particles such as gold and silver particles of various dimensions.
  • sequencing is followed on an array of single molecules.
  • Single molecule analysis is particularly favourable for sequencing-by-synthesis. This is because it enables reactions with less than 100% stepwise yields to be followed without the confounding effects of dephasing.
  • Sequencing on single molecule microarrays could also offer the following advantages for sequencing: Longer read lengths because monitoring individual molecules circumvents the problem of out-of phase or asynchronous extension; High statistical significance of base calling because it is based on the consensus base-call of many copies of each target region co-immobilised within a single microarray spot; Detection of a rare allele in the presence of a majority allele; this is important for detection of cancer related alleles and will also enable highly accurate allele frequency determination in pooled samples; Low amounts of sample material can be used due to single molecule sensitivity; this is particularly important when using limited amounts of biopsy material; possibility of sequencing directly from genomic DNA; this will produce savings in cost and time.
  • the array is composed of spots comprising a plurality of molecules and each spot on the array may target a particular subset of target molecules, for example to capture each of the exons associated with a particular gene. Therefore, probes of different sequence will be present within the same spot.
  • each spot in the array will carry a plurality of sequences and/or sequence lengths to capture one or more selected targets. Sequencing of each of the plurality of sequences can be followed at the single molecule elevel within the spot. When sequencing is selective, for many applications, because the genome has been decomplexed within an array spot (i.e. a known species or a plurality of known sequences have been enriched), the read length can be very short and still be useful.
  • the oligonucleotide libraries provided in solution are coded at different positions along their length at each cycle.
  • 4 oligonucleotide libraries are provided where the nucleoside at the oligo end at the ligation junction is a defined base but subsequent bases are degenerate, being randomized and/or are universal bases.
  • the oligonucleotide can be completely removed. This can be done by making the nucleotide at the ligated oligo end to be an RNA base which can be cleaved by alkali or RNases such as cocktail of RNases and/or RNASeH.
  • oligonucelitde that ligates from solution is removed but the primer remains attached to the template (and may have been previously fixed to the template, especially when alkali is used for removal).
  • a second base is read by adding a second set of oligonucleotides, where the second base is defined and the 1 st base and all others except the second base are degenerate.
  • the RNA nucleoside is degenerate and the defined base is DNA. This process is iterated for all the positions along the sequence.
  • 10 sets of oligo libraries will need to be used.
  • Each set of oligo will comprise 4 libraries and in each library the defined base will be one base from A 1 C, G, or T and each specification will carry a corresponding label that can be differentiated or distinguished from the other 3 labels.
  • the method may also be possible to extend the read-length further, especially if the interrogaton bases furthest away form the ligation junction are cycled first, i.e. in the first cycle base 20 to be defined; ligation conditions for the distal interrogation positions may need to be more stringent than those closer to the ligation junction and can be optimized by those with basic skills in the art.
  • the label is typically attached to the non-ligating end of the solution oligo and can therefore terminate extension at each ligation step.
  • the homopolymer problem which is a feature of Roche 454 and Helicos sequencing is eliminated.
  • the addition of all four bases simultaneously ensures fewer addition and wash steps per cycle and lessens the chance of mis-incorporation of bases compared to approaches where each base is added sequentially.
  • oligo probes One way to prevent the oligo probes from being degraded as well as residual sample from a previous run, is to use exonucleeases that are specific for the probe termini attached to the surface, or the oligo chemistry may be an analogue that is resistant to degradation, such as PNA.
  • Selection microarrays for the exome or candidate genes for cancer or heart disease could be mass produced and each used many times. A large candidate gene based case-control study would require only a handful of selection microarrays.
  • the sequence can take into account the sequence of the selection probe sequence, especially the probe is an allele-specific probe. Preferably this will have a different weighting than sequence obtained by extension.
  • extension will sequence through the probe-primer sequence.
  • the probe-primer will not be sequenced through by primer extension but will be used for sequence reconstruction, a different confidence or quality score will be attached to sequence obtained from hybridisation than that from extension.
  • a sequencing system where instrument time can be tailored to short sequencing runs and reagents can be used in small amounts, such that small sequencing runs are as cost-effective as long sequencing runs. If not as cost effective, then at least significantly improved over what is currently available. This can be achieved by inexpensive chip substrates or substrates of which portions can be used on different occasions.
  • An instrument that does not require operations that are redundant for the small sequencing run and reagents that can be aliquoted and used in small volumes, data acquisition and processing and base calling algorithm that can operate in the context of a small sequencing run.
  • One solution to this problem is a system comprising an algorithm for calculating the optimal array/cycle design, a database containing DNA to be sequenced, a computer processor and memory to perform calculation/computation, a device that incorporates an array manufacture device and a device for carrying out sequencing by synthesis, control algorithm to direct operation of device according to the optimal array/cycle design.
  • the array may be made by in situ synthesis. Methods include light-directed synthesis and ink-jet synthesis. In the context of resequencing the process starts with information about the region or regions of the genome or other sequences such as RNA populations, that need to be sequenced. Primers are designed based on the reference sequence and knowledge known in the art aboutwhat makes a good primer. This is balanced with the number of cycles that are implemented. For a short sequence, a tiling array of primers may be designed that tile at every consecutive position. Then the sequencing process requires just one cycle. To increase confidence two or more cycles may be carried out so that redundant sequence information is obtained.
  • primers may be tiled at spacings of several bases apart (depending on the read length achievable by the sequencing process, if it is 1000 bases then the primers will target sequences 1000 bases apart in the sample). Then sequencing cycles, that number at least as many or more than the gap between the tiled primers, so that the intervening sequence can be obtained. Preferably the sequencing should cover or continue beyond the next primer. This is because primer binding, although provides some sequence information, it may cover up sites of polymorphism which are not discriminated by the primer due to mismatching. Indeed the conditions should be such that the primers are not stopped from working by a mismatch unless allele specificity needs to obtained by the primer (for allele specific capture).
  • the sample genomic DNA is made single stranded and then passed over the array to perform a hybridization reaction.
  • the array may be inside a microfluidic conduit. Efficient mixing methods are used to ensure fast hybridization kinetics. If necessary the sample DNA can be iteratively enriched by amplification of selected DNA and passing over the array repeatedly. Following selection the unreacted DNA is washed away with appropriate stringency washes. That the targeted regions have reacted can be tested by imaging the array by using intercalating dyes and highly sensitive CCD based imaging.
  • a system for carrying out sequencing cycles and for detecting the signal involves: (A) Software- controlled automated CCD-based imaging system; (B) Flow Cell; (C) Software- controlled automated syringe-pump based reagent exchange system; (D) lntergrated microfluidic fluid exchange and CCD imaging; (E) Streaming raw data into a hard drive.
  • the use of high resolution stage movements and high numerical aperture objective lends and EMM-CCD cameras enables analysis to be done at the single molecule level. The raw data generated in will be processed.As an alternative the Polonator from Dover Systems can be used to carry out the work.
  • the instrument can be adapted for imaging slides or alternatively the array can be created on compatible substrates.
  • a 4 laser microarray scanner can be used or a single or dual laser scanner with appropriate emission filters, depending on the labeling system used.
  • Software gor extracting information from the images includes the tissue microarray drop-in available for MetaMorph software or GenePix or similar software can be used.
  • Regions that remain refractory to capture will be explored by testing various high affinity probe chemistries using spotted arrays. Once the best probes have been selected, an optimised array will contain 30-60 probes per exon tiled at every 5-10 bases in the sequence and made using Febit, Roche/Nimblegen or Agilent technology.
  • microarrays enable complex mixtures (e.g. mRNA populations) to be resolved by sorting of specific sequences by hybridization to unique spatial addresses on a surface. This allows microarrays to systematically address whole chromosomes or to select multiple specific genomic regions of choice.
  • sequencing on microarrays has been limited to obtaining a single base or less of information per feature.
  • sequencing by extension or iterative hybridisation with a complete set of oligos on arrays has the potential to read 100s of bases of sequence from every array feature.
  • a microarray approach uses the reference sequence at the outset to design a target specific array. Therefore, in order to be informative, the random array approach requires 20-50 bases to anchor the sequence read to a location in the genome.
  • microarray-based resequencing starts with a known location and therefore the sequence read is informative right from the start; sequence assembly is not required, only base calling.
  • Example 1 Capture and sequence directly from unamplified genomic DNA Fragmentation of Phi X174 single stranded DNA
  • Phi X174 DNA Fifty microlitre of single stranded Phi X174 DNA (1ug/ul, New England Biolabs, Ipswich, MA, USA) was mixed with 5OuI of nuclease-free water (Ambion, Austin, TX, USA) in a 1.5ml Eppendorf tube and placed on ice. The tube was raised to the sonicator probe (Digital Sonifier, Branson, Danbury, CT, USA) taking care not to touch the sides of the tube. A cycle consisted of sonication burst for 20 seconds at 10% amplitude followed by pause on ice for 40 seconds. This cycle was carried out for a total of 7 times.
  • a 2% agarose gel analysis of the sample against suitable fragment size markers was performed to check that the average fragment size was around 300 bases (range 100-600 bases). If necessary, further cycles would be carried out until the gel analysis confirmed that the average fragment size was close to the expected value. This protocol can be applied to double stranded template as well.
  • an 8OuI hybridisation mix containing the following was prepared: 4OuI 2x hybridisation buffer (Agilent Technologies, Santa Clara, CA, USA), 500ng (1ul at 500ng/ul) sonicated single stranded Phi X174 genome ( New England Biolabs, Ipswich, MA, USA) or beta-globin DNA and 39ul nuclease free water (Ambion, Austin, TX, USA).
  • 4OuI 2x hybridisation buffer Align Technologies, Santa Clara, CA, USA
  • 500ng (1ul at 500ng/ul) sonicated single stranded Phi X174 genome New England Biolabs, Ipswich, MA, USA
  • beta-globin DNA 39ul nuclease free water
  • Genomic DNA (32 ⁇ l of 100 ng/ ⁇ l in 10 mM Tris-HCI/0.1 mM EDTA, pH 7.5) is then denatured in at 98°C for 3 min, followed by 4 0 C for 10 min, and suspended in 8 ⁇ l of 20* SSPE.
  • Hybridisation is carried out at 30 0 C for 3 hours before the slide is washed twice at 37 0 C for 10 min in 2x saline phosphate /EDTA (SSPE)(3M NAcCI/20mMNa2HPO4/20mM EDTA). 1 st Ligation cycle on arrays
  • RNA base For the first cycle of ligation a set of 10mer degenerate oligos with only one base defined in position 10 was used. This base was a cleavable RNA base.
  • the mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top.
  • the Agilent hybridization chamber was assembled and placed at 46 0 C for 1 h in an oven incubator fitted .with an oscillating table. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1, Wash 2 (1x SSC) and Wash 3 (0.1x SSC). All wash solutions were kept at room temperature.
  • the slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired.
  • the scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 43; OG 60, Cy3 52, Alexa 50.
  • the ligation products were cleaved by incubating the array with an 8OuI mix containing 68ul 1x TED buffer (1OmM Tris pH 7.0, 5mM EDTA, 2mM DTT) and 12ul (12U) RiboShredder RNAse blend (Epicentre, Madison, Wl, USA).
  • the mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top.
  • the Agilent hybridization chamber was assembled and placed at 37 0 C for 1 h in an oven incubator fitted with an oscillating table. The slide was washed using the same reagents and wash protocol described above.
  • the Agilent hybridization chamber was assembled as per standard instructions and placed at 65°C for 24 hours inside an Agilent oven incubator fitted with a rotating rack (rotisserie style). At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 and Wash 2 (1x SSC). The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, NJ. , USA).
  • a set of 10mer degenerate oligos with only one DNA base defined in position 10 and a cleavable RNA base in position 9 was used.
  • a 35ul ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCb, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB) 1 3.5ul (1400U 5600 Cohesive End U) T4 DNA ligase (NEB), 3.5ul (35U) T4 PNK kinase (NEB), 200pmoles of 10mer degenerate oligonucleotide RNA position 9 C- Cy5 (IBA, Gottingen, Germany) and G-Cy3 (IBA), 800 pmoles 10mer degenerate oligonucleotide RNA position 9 T-Oregon Green (!BA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 35ul.
  • 1x T4 ligase buffer 5OmM Tris-
  • the mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top.
  • the Agilent hybridization chamber was assembled and placed at 46 0 C for 1h in an Agilent oven incubator fitted with a rotating rack. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 , Wash 2 (1x SSC) and Wash 3 (0.1x SSC). All wash solutions were kept at room temperature.
  • the slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired.
  • the scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 56; OG 67, Cy3 55, Alexa 50.
  • the mix was placed on the Agilent slide gasket and the Agilent microarray slide was laid on top.
  • the Agilent hybridization chamber was assembled as per standard instructions and placed at 65°C for 24 hours inside an Agilent oven incubator fitted with a rotating rack (rotisserie style). At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 and Wash 2 (1x SSC). The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, NJ. , USA).
  • a set of 10mer degenerate oligos with only one DNA base defined in positron 10 and a cleavable RNA base in position 9 was used.
  • a 4OuI ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCI2, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB), 4ul (1600U 6400 Cohesive End U) T4 DNA ligase (NEB), 4ul (40U) T4 PNK kinase (NEB), 400pmoles of 10mer degenerate oligonucleotide RNA position 9 C-Cy5 (IBA, Gottingen, Germany) and G-Cy3 (IBA), 1600 pmoles 10mer degenerate oligonucleotide RNA position 9 T- Oregon Green (IBA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 40ul.
  • 1x T4 ligase buffer 5OmM Tris-HCI,
  • the mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top.
  • the Agilent hybridization chamber was assembled and placed at 46 °C for 1h in an Agilent oven incubator fitted with a rotating rack. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 , Wash 2 (1x SSC) and Wash 3 (0.1x SSC). All wash solutions were kept at room temperature.
  • the slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired.
  • the scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 56; OG 67, Cy3 55, Alexa 50.
  • Atto Dyes can be replaced with Atto Dyes.
  • Atto 647N is a good substitute for Cy5 and Atto 488 is a good substitute for Oregon Green.
  • 8 mers and 9mers can be used instead of 10 mers.
  • suitable combinations fo wavelengths from a Qdot sampler kit can be used (QD525, QD565, QD585, QD605, QD655, QD705; All at 1uM, all streptavidin conjugates).
  • Streptavidin-Phycoerythrin conjugate (SAPE) and SAPE-Alexa conjugates (Invitrogen) can be used: streptavidin, Alexa Fluor(r) 647-R-phycoerythrin conjugate; streptavidin, Alexa Fluor(r) 610-R-phycoerythrin conjugate; streptavidin, Alexa Fluor(r) 680-R-phycoerythrin conjugate streptavidin; R-phycoerythrin conjugate (SAPE). These are not only brighter but also only require a single wavelength to be used for excitation.
  • these labels are available as streptavidin conjugates, they can be linked to biotinylated oligos (where biotin has replaced the label in the above table) with a different label used for each library. Fluospheres (Invitrogen) can also be used. The above type of labels can be mixed and matched.
  • Betaglobin PCR product Fragmentation of Betaglobin PCR product
  • the 400 bp PCR product of a betaglobin exon (BGE3,) was fragmented by DNase I digestion.
  • the ⁇ O ⁇ l reaction mix comprised the following: 17.4ug of PCR product BGE3 (36ul at 438ng/ul) was mixed with 9.5ul 5mM CaCI2 (Fluka, Gillingham, UK), 1ul of diluted DNase I enzyme (0.025U/ul New England Biolabs, Ipswich, MA, USA) and 43.5ul nuclease-free water (Ambion, Austin, TX, USA). The mix was incubated at room temperature for 40 minutes.
  • Hybridisation and sequencing was done as for example 1. A hybridisation temperature of 55 °C can be used. Protocol for oligonucleotide library containing RNA in position 9
  • a 8OuI hybridisation mix containing the following was prepared: 4OuI 2x hybridisation buffer (Agilent Technologies, Santa Clara, CA, USA), 568ng (2ul at 284ng/ul) fragmented betaglobin PCR product, and 38ul nuclease-free water (Ambion, Austin, TX, USA).
  • the mix was denaturated at 95 0 C for 7 minutes. After a pulse centrifugation the mix was placed on ice.
  • the Agilent hybridization chamber and slide gasket had been pre-warmed to 55 0 C .
  • the hybridization mix was placed on the Agilent slide gasket and the Agilent microarray slide was laid on top.
  • the Agilent hybridization chamber was assembled as per standard instructions and placed at 55 0 C for 22 hours inside an oven incubator fitted with an oscillating table. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 and Wash 2 (1x SSC). The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, N.J., USA).
  • a set of 10mer degenerate oligos with only one DNA base defined in position 10 and a cleavable RNA base in position 9 was used.
  • An 8OuI ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCI2, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB), 8ul (3200U 12800 Cohesive End U) T4 DNA ligase (NEB), 8ul (80U) T4 PNK kinase (NEB), 400pmoles of 10mer degenerate oligonucleotide RNA position 9 C-Cy5 (IBA, Gottingen, Germany) and G-Cy3 (IBA), 1600 pmoles 10mer degenerate oligonucleotide RNA position 9 T- Oregon Green (IBA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 8OuI.
  • 1x T4 ligase buffer 5OmM Tris-HC
  • the mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top.
  • the Agilent hybridization chamber was assembled and placed at 46 0 C for 1h in an oven incubator fitted with an oscillating table. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1, Wash 2 (Ix SSC) and Wash 3 (O.ix SSC). All wash solutions were kept at room temperature.
  • the slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired.
  • the scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 58; OG 71 , Cy3 65, Alexa 56.
  • Hybridisation is undertaken for 66h at 65°C in 2x hybridization buffer (10 x SSPE, 10x Denhardts, 1OmM EDTA, 0.2% SDS). Wash once with 1M NaCI 1OmM TrisCI pH 7.5 and 1mM EDTA. Wash twice at 20 "C for 15 minutes with 1xSSC/0.1 % SDS
  • Example 6 Whole genome amplification and capture
  • genomic DNA 1-10ng of genomic DNA is taken and MDA is performed according to manufacturers protocol to produce either 10ug (Qiagen; REPLI-gTM Mini kit) or 40ug (Qiagen; REPLI-gTM Midi kit) of genomic DNA.
  • the capture reaction is then performed as described in examples 1 ,2 and 3.Extra measures for background reduction as described in the specification and bright labels and/or signal amplification can be used.
  • Example 7 Extracting DNA from a single cell and capture after Multiple Displacement Amplification
  • DNA from sperm is extracted and two rounds of MDA is performed before sequencing
  • Example 8 Extracting DNA from a single cell and direct anaysis by capture
  • RNA is directly extracted onto a nano-array of capture probes containing probes complementary to the PoIyA sequence and adjacent variable sequence.Also DNA or RNA is selectively extracted directly onto a surface and becomes attached to the surface in a random manner
  • Each 30 ⁇ l extraction contained 300-500 ng genomic DNA, 1x H-Buffer, which contains a polymerase, dNTPs and biotinylated dNTPs (Qiagen, Cat. # 4340004, 2x initial concentration) and DNAse free water, to denature the DNA at 95°C for 15 min. 20 min incubation at 64 0 C during which the allele-specific oligos anneal and are extended, incorporating dNTPs.
  • the copy can then be sequenced. Or a different primer can be used to sequence the captured template.
  • Sample DNA (optionally amplified) fragmented genomic DNA (up to 4ug) is added to 45 mM Tris-HCI at pH 8.8, 11 mM ammonium sulphate, 4.5 mM MgCI2, 6.7 mM 2-mercaptoethanol, 4.4 ⁇ M EDTA, plus 2 ⁇ g/mL single- stranded high-molecular-weight herring sperm DNA and 1.5 ⁇ M helper oligo in a volume of 2OuI. Up to 16ug of DNA can be processed in a 8OuI hybridization volume. The hybridization solution is denatured at 96 0 C (preferably directly on the array for 75 seconds to 150 seconds.
  • thermocycler capabable of taking slides such as the G-Storm is used and temperature is stepped down from 75 to 66 to 65 over 3 minutes and held at 65 for a minimum of 2 minutes. Alternatively, after step down the temperature can be held for a longer.
  • Array is washed at room temperature with 50 ⁇ L 45 mM Tris-HCI at pH 8.8, 11 mM ammonium sulphate, 4.5 mM MgCI2, 6.7 mM 2-mercaptoethanol, 4.4 ⁇ M EDTA, containing 10 ⁇ g/mL BSA (Ultra Pure, Ambion), and then with the same composition solution for 1 min at 65°C.
  • helper oligonucleotide targets the opposite strand of the duplex than the array probe
  • the helper oligonucleotide may be longer than the array probe or may form more stable base-pairs such C-5 Propyne modified DNA, LNA or PNA.
  • the helper probe may intiate a solution based primer extension reaction.
  • the helper oligo may span the region of interest with the array oligo lying at the centre. Preferably any unreacted reactor oligos are removed before hybridation. This can be done by passing the DNA through a chromospin or other column with appropriate cut-off.
  • the helper oligo may not be complementary to the array oligo but may be a probe complementary to the non-targeted allele.
  • Example 11 Clonal Bead Amplification and Capture Clonal bead amplification of human genomic DNA is carried out as descried in Porreca GJ, Shendure J, Church GM. Polony DNA sequencing. Curr Protoc MoI Biol. 2006 Nov; Unit 7.8.
  • Basic protocol 1 and 2 described in this reference are carried out.
  • Basic Protocol 3 is optional.
  • the recovered beads are then resuspended in hybridization buffer and hybridized to the array.
  • the beads and/or the array are blocked before hybridisation.
  • the blocking of the bead can be done by biocytin, BSA, Caesin, Denhardts solution. Reagent kits are available from Dover Systems (Salem, New Hampshire).
  • the magnetic bead can be subjected to switching magnetic field to facilitate mixing and release of beads bound to the wrong surface attached-probes.
  • Aminated capture and amplification oligonucleotides are coated onto a BTA coated glass slide (see Turcatti et al).
  • the capture reaction is carried out as described in Example 1 , 2 and 3.
  • the stringency of the wash required can be established empirically by testing increasing temperatures of wash components. A wash temperature range between 4 and 90 °C can be investigated. Colony amplification is then carried out according to Turcatti et al as above.

Abstract

The present invention relates to methods and platforms for the selection and characterisation of molecules. The characterisation may involve sequencing constituent sub-units of a polymer.

Description

Multiplex Selection and Sequencing
Field of the Invention
The invention provides technology and methods for selectively characterising specific sub-sets of molecules from mixtures of molecules. Methods of the invention are particularly relevant to reactions and analyses carried out in a multiplex format. It is particularly relevant for multiplex DNA sequencing of many selected loci/ genes and/or of many individuals. The invention can be implemented for research (genetics, genomics, biology, microbiology, cell biology, stem cell science, agriculture, medicine etc) orforeπsics, diagnostic, prognostic and screening applications.
Background
Biological cells and organisms contain a plurality of molecular species that determine their biology. Although a comprehensive picture of the molecular inventory of cells and organisms is an appealing concept, in many cases information of all molecular species is not necessarily needed. For example, in some cases certain sub-sets of molecular species may be prioritized as being more worthy of investigation than others. Often it is more cost and time effective to concentrate on such prioritized molecules than to obtain a comprehensive molecular view of the cell, tissue or organism. In other cases, particular genes or loci have already have been identified as susceptibility loci, such BRCA1 whose sequencing may be necessary. In test samples that are amenable for analysis such as secreted fluids or amenable cells particular subsets of the molecular inventory may be useful as biomarkers of particular diseases under investigation. The test samples may contain molecular species originating from organs or tissues that may be in early to late stages of disease progression. The detection and characterisation of relevant sub-sets of molecules without interference from sets of molecules which are not linked to the study are needed. Furthermore, in order to understand the biology or pathology of a set of cells it may be useful to compare two or more cells, tissues or organisms. For example in the case of humans, in order to find molecular species that are associated to a particular trait, condition, phenotype or state it may be useful to compare molecules from one or more individuals that carry the trait with one or more that do not. Such comparisons of Single Nucleotide Polymorphisms (SNPs) when done at a large-scale have revealed genomic regions associated with a number of diseases (Wellcome Trust case Control consortium). The regions of association that are mapped in these studies are typically 200Kb to megabases in length and must be further characterised to determine the causative polymorphisms and/or mutations that underpin the disease. In certain disease contexts it is desirable to sequence particular single or multiple regions of the genome in order to diagnose a genetic disorder, classify its sub-type or to monitor its progression. In the case HIV infection it is desirable to monitor the sequence of variants within an infected subject's viral population, avoiding sequencing of the patients own genomic DNA.
Because of the need for mass screening of selected regions of the genome, a technology that can be adapted to high throughput is urgently needed. This can be achieved by streamlining of the sequencing pipeline and multiplexing of the reactions, as will be described. The new technology needs to be highly sensitive, both in terms of being able to detect different types of mutations including deletions, insertions, inversions or substitutions and being able to work with small amounts of sample material or target molecules comprising only a tiny fraction of the total material. Furthermore, in order for mass screening technology to be adopted by national health services in for example the EU countries, it needs to be particularly cost effective and fast.
Multiplexing, that is the ability to process or analyse a large number of parameters in parallel, is needed. A high multiplicity of molecular species can be addressed by highly parallel platforms such as the microarray or Genechips.
Sanger sequencing is currently used for typing BRCA1 and BRCA2 mutations for breast cancer screening, but its cost is high and throughput low. In particular Sanger sequencing is not economically viable for mass screening. Despite continuing efforts towards miniaturization (3,4), the electrophoresis-based implementation of the Sanger method cannot compete with the parallelism of surface-based platforms. One candidate technology for resequencing is Sequencing by Hybridisation (SbH) on GeneChips. However, this technology is deficient for the task in a number of ways. Firstly, the sequence coverage of regions on any given sequence can be less than 70%, therefore a significant part of relevant sequences would be missed. Secondly, while GeneChips are able to call base substitutions such as missense and nonsense mutations or polymorphisms, they are unable to call insertions and deletions without greatly increasing the complexity of the array.
Recently, a number of "2nd generation" sequencing technologies have emerged (Shendure, J. et al.; ; Margulies, M. et al) and have been brought to the market by companies such as Roche, Applied Biosystems (AB) and lllumina. See articles by Metzker, Voelkerding or Bayley for review. These technologies offer faster, potentially cheaper and smaller-sized sequencing formats. For these new methods the sequencing pipeline comprises new "in vitro" forms of cloning followed by cyclical sequencing reactions. While these technologies are suited to large-scale sequencing, particularly the complete sequencing of small genomes, they are untested in the diagnostic setting and in their present form are not cost-effective for this purpose. Moreover, because they are shotgun methods they are not adapted for sequencing particular genomic regions of choice. For this reason an extra selection step must be introduced prior to in vitro cloning. This adds significantly to the cost and time it takes to undertake the sequencing. Moreover, these technologies require a large degree of oversampling to achieve sufficient accuracy. Hence to sequence a human genome 5OX sequencing oversampling may be needed. Hence it would be cost effective, and useful, if technologies were available that sequence regions of the genome selectively. For one or a few species locus-specific PCR may be sufficient, however when the numbers of regions runs in the tens (positionally mapped loci), hundreds (candidate genes) and more (all exons), then PCR is not cost and time effective. It would be advantageous if the new techniques could be implemented in ways that are partially or wholly systematic.
Roche 454 sequencing has been used to sequence the genome of James Watson at a cost close to a million dollars, which is about 10 times lower than the cost of Sanger sequencing in large sequencing centers (11). Although commercial systems now claim to have brought the cost of human genome resequencing down to around $100 000, innovation must continue towards sequencing human genomes for $1000 or less. Until this goal is reached, because the current generation sequencing technologies do not include an integral means for selecting the molecules that are sequenced, their utility in human genetics is hindered. This is unless a separate technology is used first — adding significant cost and time to the process — for selecting sequences of interest. Lovett showed that selection cDNAs encoded by large genomic regions can be selected by hybridisation (Lovett M. Kere J and Hinton LM PNAS 88: 9628-9632). An array based method for selection of sequences before DNA sequencing has been described in WO 2007/057652. Other relevant publications include Callow et al and Dahl et al. However these methods do not perform selection and sequencing directly on a single platform.
When a large number of individuals need to be sequenced it would be advantageous to pool the samples together and to sequence. To obtain sequence information that is specific to each individual in the population the DNA coming from each different sample needs to be distinguishable. This is now implemented in commercial 2nd Generation sequencing systems by adding sequence tags as identifier of each individual, which can be decoded upon sequencing.
Present next generation sequencing technologies, due to their random 'shotgun' arrangement were not developed with selective sequencing in mind. Therefore selection must be done before the sample is applied to the platform. This typically involves multi-step manipulations based around locus- specific PCR1 microarray affinity selection or microarray generated selectors used in solution. We show in the present invention that the same array oligonucleotides can be used first as a probe to capture a template from solution and can then act as a primer to extend DNA bases cyclically. This suggests the possibility of selection and sequencing on a single platform; studies have shown that single base extension at ~500K genomic locations from primers attached to a surface can be done directly on whole genome amplified DNA, without the need for locus specific PCR (Steemers et al). If selective sequencing can be applied seamlessly to a large number of individuals it would find wide application in human genetics, for example genomic regions pinpointed by high-density SNP scans of case and control populations could be further investigated by selective sequencing in individuals to find causative mutations. Indeed, it would be possible to start genetic studies with the sequencing of all exons and control regions if not whole human genomes.
Summary of invention
The objectives of the invention are to provide technology to: Characterize one or a few genomic regions in one or a few individuals; Characterize one or a few genomic regions in a large number of individuals; Characterize a large number of genomic regions in one or a few individuals; Characterize a large number of genomic regions in a large number of individuals
The problem that the invention seeks to solve is how to sequence effectively and in a streamline manner many samples together; How to selectively sequence only the desired members of a population of molecules; How to provide a technology that is appropriate and cost-effective for both large and small-scale tasks.
The invention describes how to access different chosen subsets from a mixture of molecules. This includes how to how to separate, how to purify, how to enrich, how to detect, how to selectively process and how to characterise. The characterisation in particular involves nucleic acid sequencing but may also include genotyping and enumerating. In broad terms the solid-phase selection and characterisation concept is relevant for biologically and medically relevant molecules and molecular complexes, including biomolecules such as carbohydrates, lipids, polypeptides, DNA, and RNA.
In some cases selective sequencing may involve systematically sequencing substantially all of a genome; nevertheless certain parts of the genome, such as certain repetitive elements would need to be de-selected; and other contaminating genomes such as microbial genomes that may contaminate samples may need to be de-selected.
The particular aim of the work is to begin with unamplified biomolecules (e.g. DNA/RNA) and then perform selection and analysis directly without needing to perform locus-specific PCR reactions. In one embodiment the biomolecules may be from a single cell. In another embodiment the biomolecules may be a few cells. In a further emobodiment the biomolecules may be from less than 10,000 cells. In a still further embodiment the biomolecules may be from greater than 10,000 cells.
In a further embodiment selection and sequencing is done directly on an array. In one preferred embodiment targets are selectively captured by probes attached to a surface. With short oligos e.g. 25mers, the template is progressively lost as cycles commence. Therefore following sequence specific binding the template is fixed to template. In a further embodiment the captured molecules are fixed to the capture entity by FLAP mediated reaction. In a further embodiment , the capture molecules are highly stable being either long or more stable base-pairing nucleotides and do not require fixing.
In a preferred embodiment beads carrying amplified DNA fragments are selected on an array. Here the sample DNA is fragmented, bound to a primer on a bead and emulsion PCR is undertaken. Following this, the beads are released and added to a spatially addressable array of probes, under condition that allow probes and sequence on bead to form specific interactions. Unbound beads are washed away. Sequencing is then conducted on the beads that have been captured on the surface. In a one embodiment a multiplicity of capture probes targeting different species is attached to a surface. In a further embodiment ,a plurality of different target molecules are captured according to their individual sequence by sequence specific probes on a surface and then the captured molecules are copied by a surface attached primer. In a related preferred embodiment, the captured molecule is amplified by bridge amplification to create a DNA colony or cluster on the surface.
In a further embodiment a first allele is captured by array probe and a second allele is characterised by fluorescence labelling. In a further preferred embodiment haplotypes are isolated by allele specific hybridisation followed by characterisation.
In a further embodiment a flexible method of selective sequencing is provided in which the number of probes on the array and the number of cycles conducted is determined by the size of the region to be sequenced. For short regions the number of cycles is few, saving on reagents; whereas the tiling of the array is more closely spaced; the cost of closely tiled array for a short sequence will be comparable with the cost of less closely tiled array for a long sequence. In addition to savings in cost there will be savings in time.
In further embodiment an algorithm is provided for calculating the optimal array capture and sequencing design. A system comprising an algorithm for calculating the optimal array/cycle design, a database containing DNA to be sequenced, a computer processor and memory to perform calculation, a device that incorporates an array manufacture device and a device for carrying out sequencing by synthesis, control algorithm to direct operation of device according to the optimal array/cycle design.
In a further embodiment certain sub-sets of molecules are suppressed to allow detection of targeted molecules. For example in blood the detection of the highly abundant globin tnRNA should be suppressed in order to detect other molecular species.
In a further embodiment certain sub-sets of molecules are suppressed to allow detection of targeted molecules. For example in blood the detection of the highly abundant globin mRNA should be suppressed in order to detect other molecular species. In a further embodiment molecules are bound by the array of probes, are not to be characterised. Rather the molecules that do not bind to the array are to be sequenced. Hence the de-selection process, like a filter catches the undesired molecular species but lets through the desirable species. This process can be iterated several times. The molecules that do not bind to the array can be amplified before re-iteration of the filtration process. In a final selection step the sample is bound to a selection array and then sequenced.
The multiplex addressing system involves attaching a tag that carries information that is extrinsic to the molecules being tagged and does not reflect any characteristic of the molecules being tagged. This includes addition of a specific sequence tag (so that the identity of each sample can be decoded by analysing the tag sequence). It also includes the addition of a label, which can include an atom', molecule and material (such as a sub-microscopic particle). The label may have properties that can be detected and differentiated. This includes, electromagnetic, magnetic, fluorescent, light scattering, raman, plasmonic, electrochemical, electrical and electronic properties.
The exisiting state of the art for selection involves the following step:
1) Extract Sample Nucleic Acid; 2) Array Hybridisation; 3) Elution; 4) Clonal Aplification; 5) Attach to surface and Sequence.
The present invention streamlines this process thus:
1) Extract Sample Nucleic Acid; Whole Genome or Clonal Amplficiation; 3) Capture and Sequence Alternatively the process of the invention comprises:
1) Extract Sample Nucleic Acid; 2) Capture on array and perform Colony amplification; 3) Sequence
The preferred embodiment is encapsulated in the following: A method for selectively sequencing portions of one or more complex genome sample(s) comprising the steps:
(i) Obtaining sample genomic DNA and an array of selection probes complementary to the portions of the genome to be selected
(ii) Optionally amplifying the sample genome (s) in a non-locus specific manner and optionally amplifying the genome clonally on beads
(iii) preparing the sample DNA so that it is amenable to hybridisation including but not limited to fragmentation, digestion and/or denaturation
(iv) hybridising the sample DNA to the array of selection probes under conditions that deselect hybridisation of non-targeted DNA
(v) optionally fixing the selected sample DNA to the array probes (vi) carrying out wash steps to remove de-selected DNA
(vi) carrying out sequencing of the selected sample DNA while it remains in association with the array probes; wherein sequencing of the selected sample DNA may comprise but is not limited to the steps
(a) hybridisation of four libraries of oligonucleotides to the selection probe/ sample DNA complexes across the array, where each library comprises a defined base, a label exclusively identifying the defined base and a number of non-defined bases
(b) carrying out a template-directed ligation reaction using the selection probe as a primer and the sample DNA as a template where optionally there is no reagent exchange between steps an and b
(c) carrying out one or more wash steps
(d) detection of label across the array, base calling and add to compiled sequence
(e) removal of extended portion of the primer to regenerate the non-extended primer- template duplex
(f) hybridisation of four libraries of oligonucleotides to the selection probe/ sample DNA complexes across the array, where each library comprises a defined base, a label exclusively identifying the defined base and a number of non-defined bases, wherein the defined base is different from any previous defined base
(g) repeat b-f Definitions
For the purpose of this invention the following definitions are used. Characterization means determining one or more properties of the target molecule. The property may be the sequence of bases, the number of a particular molecule or molecule type or any other feature of the target molecule. The meaning of the term selection includes decomplexing, isolating, purifying, enriching, marking, tagging or labelling. The selection enriches the target molecules sufficiently to make make further characterization worthwhile i.e. the majority of the molecules will be the target molecules and any non-targeted molecules will be in a minority. The characterisation will be worthwhile if the accuracy in subsequent DNA sequencing is at least 75% accurate at first base but is preferably at least 95-99.99% accurate. Selection is the opposite of de-selection or filtering. Selectors, selection probes and capture probes are oligonucleotides that are designed to bind to specific molecules in a sample. The term capture is used synonymously with hybridization and binding and in the case of DNA includes the case where substantial Watson-Crick base-pairing takes place.The capture probe may also function as a primer.The term genomic regions can be substituted with, genes, ENCODE regions, exons, the exome, genie regions, RNA, miRNA, and other biomolecular species that can be characterized. The few or large number of genomic regions may be contiguous or non contiguous.
Description of Drawings
Figure 1 : Schematic illustrating the capture of a bead carrying clonal DNA (right) and the non-capture of non-selected clonal beads. A spatially addressable array on which beads are immobilised according to clonal sequence amplified on their surface. The presence of absence of particulr sequencies can be visualized and the abundance of each captured sequence can be digitally quantified (i.e. counted).
Figure 2: Scheme illustrating selective capture of targeting molecule by hybridisation and washing away of non-targeted molecules. This is followed by sequencing of the selected molecule.
Figure 3: Schematic illustrating the process of reading two bases on array attached olligonucleotides using the preferred sequencing chemistry.
Figure 4: The results of first and second base sequencing of the PhiX174 genome by the four colout ligation-based method described in Figure 2 and 3.
Figure 5: Flap-mediated Ligation products run on a gel.
Reaction components: Oligos A, B, C; 2.5U FideliTaq polymerase ; 5U Ampligase (Epicentre); 1x PCR buffer ( 1OmM MgCI2, 1mM NAD), 5OnM KCL
Reaction conditions: 95°C 15min, 600C 20 min, 500C overnight.
Bands: D are cleavage fragments, E is a 44mer marker, F is a 24mer marker, G is Flap product
Figure 6: Fixation of template to primer complex by Flap mediated ligation
Following template (oligo A) attachment to the primer complex by the Flap reaction and stringent washing a ligation reaction was performed with labeled incoming oligonucleotides (the first step of a ligation-mediated sequencing scheme). The result of this was that only the spot which contained both oligo C and B lit up. Where only oligo B was spotted the signal was close to background
Oligo A, BG11-Tem55-flap
Oligo C, 5'-NH2-A12N
Oligo B, 5'-TAC CAT TCT GCT TTT ATT TTT TTT TTT TTT-N H2
D: Co-immobilised with oligo B and C
E: immobilised with oligo B only
F: Incoming oligos: C-Cy3, G-Cy5, A-Alexa594, T-OG489 After the FLAP reaction, washing is done at high stringency to remove any template molecules that are not covalantly bound by ligation. The high signal for D indicates that ligation only occurs where template has been fixed into place by the Flap reaction.
Detailed Description of Invention
The examples and embodiments described in this invention should not be taken as limiting the scope of the invention that is claimed: Processes described in the context of one example or embodiment can be used in a different embodiment or example of the invention. The methods described in the references are incorporated herein and teach the prior art to those skilled in the art.
The aim of this invention is to streamline the selective characterisation of molecules, where the preferred molecules are polynucleoitdes and the preferred characterisation is sequencing and/or enumeration.
When selected molecules from a plurality of entities have been characterised, the non-selected molecules are also characterised to the extent that they do not contain the sequences that have been selected, hence the invention comprises:
A method comprising the steps of i) Selection of one or more entities from a plurality of entities, ii) characterisation of selected entities and thereby the plurality of entities.
In a further embodiment, following a first selection, the selected molecules are amplified and applied back to same array or another array. The array may be of increasing stringency or conditions may be of increasing stringency. This is repeated until a high degree of enrichment of targeted molecules is achieved. The stringency of the array can be achieved by providing a second array to which targets bind less well under the conditions of the first array. For example if the oligonucleotide length is shortened from 60 nt to 50nt, the more stable hybrids will be retained in preference to the less stable hybrids (including mismatches).
Selection/Capture and Sequencing on a Single Platform
In one embodiment of the invention surface attached probes are used to select target molecules from a complex mixture, by forming nucleic interactions (including hybridization) between probe and complementary target and thereby isolating the target molecules to spatially addressable locations on a surface. The probes can be arranged as clusters of molecules of the same identity or clusters of a set of identities. Alternatively, the probes can be arranged as isolated single molecules within an array.
In one embodiment, the probes are arranged as spatially addressable, microarrays and are manufactured by methods known in the art including light-directed spatial patterning (using photolabile protecting groups or protecting groups labile to photo-generated acids), ink jet synthesis, electronic control of deposition and deposition by robotic spotting of oligonucleotides. In the following where methods are illustrated in the context of microarrays can also be applied in other platforms such as random arrays.
In a further embodiment, following a first selection, the selected molecules are amplified and applied back to same array or another array. The array may be of increasing stringency or conditions may be of increasing stringency. This is repeated until a high degree of enrichment of targeted molecules is achieved. The stringency of the array can be achieved by providing a second array to which targets bind less well under the conditions of the first array. For example if the oligonucleotide length is shortened from 60 nt to 50nt, the more stable hybrids will be retained in preference to the less stable hybrids (including mismatches). In a final step the enriched molecules are captured on the array and characterised. Accessing specific sequence by hybridisation
In most cases the target DNA is double stranded and measures need be taken to allow the probe to access the sequence. The target DNA can be fragmented into short pieces to facilitate hybridisation. They can also be heat denatured before hybridisation doe example by heating to 95°C for 5 minutes. Extreme heat denaturisation can be done by boiling the target DNA for 1 , 2 or 3 minutes. After heat denaturation the sample can be used directly or snap-cooled on ice. Alkali denaturation can be carried out followed by neutralizaiton. The addition of exonuclease, at a concentration where statistically only one exonuclease binds to the end of each molecule can be used to generate single strands by exposure for a limited time. Single-stranded binding proteins and helicases can be used to keep the DNA single stranded following denaturation. In order to be able to target any sequence but be able to covalently link the template to primer. The following probe structure is used. A first oligonucleode has a target sequence specific region and a common region. A second oligonucleotide is complementary to the common region only. Both the first and second oligonucleotide must be attached to the surface. Alternatively, the second oligonucleotide is attached to a moiety that enables it to be attached to the first oligonucleotide. For this crosslinking can be used. For example a psoralen can cross link to T in the first oligo (the first oligo must contain a T within reach of the psoralen. This mode allows the approach to be applied to large microarrays made by in situ synthesis. Alternatively, both first and second oligo can be incorporated with a single hairpin oligonucleitdes, where the first and second oligo are separated by a loop sequence. After binding of the template to the first oligonucleotide, a region of the target may overlap the common region. This hangs off like a flap and can be cleaved by Flap endonucelases or Taq polymerase (which also has Flap cleavage activity). Following cleavage the template is ligated to the second oligonucleotide. Thus the target is permanently (inked after capture.
Capture at high stringency followed by extension
The probes are preferably able to form strong interactions with the target molecules. The strong interactions can be formed by using long oligonucleotide probes. The length of probe is preferably 60 to 70 nucleotides in length or longer depending on Tm. In some cases primers as long as 120 bases may be preferable.
Shorter probes are progressively less favourable but may still be useable and in certain contexts they may be desirable. The strong interactions are needed so that selection can be done using high temperature, typically at or above 64 or 65 0C (in other instances, especially for AT rich targets temperature at or above cases 55°C can be used) and by using reaction buffers that enable good discrimination between match and mismatch. There are two reasons for this. Firstly, it is to impart high stringency, so that interactions with mismatched sequences are disfavoured compared to interactions with the perfect match. In some instances it may be desirable to be able to achieve discrimination of alleles, for example where haplotype specific sequencing is required. In most instances it is desirable that interactions with the perfect match are strong and interactions with non- perfect matches are as low as possible notwithstanding mismatching due to polymorphisms in the target not represented in the probe sequence. Secondly the high temperature minimizes re-annealing of complementary strands in solution and minimizes the formation of secondary structure; this is also achieved by supplements to the reaction buffer which can include, Formamide, DMSO and Urea. Hybridisation can be conducted using standard microarray buffers from Agilent Inc.
After selection by hybridization to the microarray, washes are undertaken to remove non-specifically bound molecules. The washes can be at the same stringency as the hybridization or preferably at a lower stringency in order to retain as much as possible of the captured molecule, as specificity is already imparted during the high stringency hybridisation. The captured targets are then ready for further characterization including sequencing. It is preferable to not allow complete drying of the slide or chip carrying the array. The array may be placed in humid environment or under low stringency buffer and low temperature. To avoid unnecessary wastage of reagents during sequencing it is preferable to check if the target is bound to the probes. This may be done by methods such interferometry without the need for labeling. Alternatively, the target molecules may carry a detectable label or may be stained with DNA stains such as SYBR Gold or SYBR Green 1 in the case of double- stranded DNA and OliGreeπ and SYBR Green Il in the case of single strands or RNA. It is preferable to begin the sequencing reaction immediately following the capture and wash reactions. The sequencing may involve less stringent conditions then the capture. It is preferable not to subject the microarray to high stringency conditions as template will become lost become denatured. It is desirable to fix the template in place before sequencing commences (see below) however when the target probe is stable enough the template-probe complex may be considered for practical purposes as fixed. In the case where sequencing involves extension from a primer as is done by sequencing by synthesis and Sanger dideoxy sequencing, the capture probe may also act as primer. In order to act as a primer the capture probe should have a free 3' end in the case of extension with a DNA Polymerase. In the case where extension is via ligation, the capture probe could have either a free 3' or 5' end depending on the sequencing biochemistry being used. Preferably the free end of the probe-primer will pointing away from or have minimal interactions with the surface. Also it is desirable that adjacent probe-primers are not able to interact with each other. It is also desirable that the probe- primer is not able to foldback on itself so that it is able to self-prime or is unable to prime the target. A separate oligo may be used for priming sequencing than the capture probe. The primer may be an oligonucleotide that is bound at a distal site to the capture primer.
Capture under low stringency, washing at higher stringency
In some instances, especially when short oligonucleotide probes are used (< <60mers) capture may be at low or an intermediate stringency but sufficient specificity is achieved by using higher stringency conditions for washing steps carried out after the capture (e.g. 30 0C capture, 37 0C washing).
Capture and extension under a single condition
The capture and sequencing can occur in a single step without washing in between. This is preferably done under conditions where hybridization is of reasonable stringency but at which the enzyme can also work effectively. Thermophilic (including extremophilic) enzymes can be employed for the sequencing allowing temperatures in excess of 55°C can be used. Also non-thermostable enzymes can be used at temperatures in excess of recommended. A primer extension reaction on captured molecules can be conducted at 46 0C for 1 hour using T4 DNA ligase which is normally reacted at <= 37 0C; a ligation can be conducted for 10-30 minutes at 55 0C; even higher temperatures can be used as long as exposure is only for a short time (e.g. thermocycling between 65 and 46 "C). Moreover, thermocycling which can be used with both thermophilic and non-thermophillic enzymes can serve to provide a temperature difference between the capture (lower) and the extension step (higher) or capture (higher) and extension (lower) within a homogeneous reaction, i.e. without exchange of reagents.
Methods to reduce promiscuous capture
To counter the effect of promiscuous hybridization especially when using single conditions for hybridization and extension, where capture is at lower stringency, promiscuous interactions will be destroyed by for example including an enzyme that recognizes mismatches and cleaves them. The reaction would require adequate mixing to remove molecules de-annealed mismatch oligos from the vicinity of the probe.
In some cases promiscuous hybridization may be from a template that is just one nucleotide different but in other cases it should be substantially different (e.g. =>3nt) as small differences may need to be tolerated due to the existence of polymorphisms in the target DNA. Generally, the capture conditions must be balanced (i.e. not too strong) so that the correct target sequences can be captured including where their variant from the probe complement due to polymorphism. Known polymorphisms can also be tackled by providing mixed bases at the relevant positions in the oligonucleotides so that probes for polymorphic molecules are provided.
Direct selection and sequencing on an array
In a preferred embodiment the capture and selection is done directly from genomic DNA or RNAs from cells without any prior selection by PCR or other sequence selection methods known in the art. DNA/RNAs may be isolated from any type of cell including those from buccal swabs, blood and biopsies. It may however be desirable to fractionate different types of biomolecules, proteins from nucleic acids and RNA from DNA. It may also be desirable to select certain sub-set of polynucleoitdes. For example, RNA molecules that generically contain a poly A tail may be fractionated from other RNAs; and miRNAs may be fractionated from other RNAs. Also, It may be desirable to de-select highly repetitive DNA from genomic DNA; this may be done by isolation of specific non-repetitive Cot fractions. Highly repetitive DNA may be used to mop up repetitive DNA during the capture process or to clean up the sample before the capture process.
The target can be directly extracted from one or more cells and then subjected to capture on a random array or on an ordered microarray. The contents of the cell(s) may be released directly onto the capture array. In some cases it is desirable to sequence from one or a few cells. Moreover, sequencing from as few as around 10,000 cells is useful because this is typical amount harvested from clinical biopsies. Studying single cells is useful in development biology for example and has clinical applications in pre-implantation testing. In some cases it may be useful to sequence from a single cell or to enumerate the quantitites of specific molecules. Because the amount of capture material may be small when the target molecules are released from one or a few cells, capture can be coupled with generic amplification in solution or on-chip amplification or single molecule analysis.
Specificity from highly complex genomic DNA
Capture of a large number of molecules from a complex mixture when the complexity is high is challenging. Each specific molecule may be at too low a concentration to provide sufficient captured molecules for detectable sequencing. There may also be a high degree of cross-reaction during capture. To retain specificity intermediate to high stringency conditions must be used. A certain degree of mismatching can be tolerated when sequencing is at the single molecule level. Here during further characterization, the presence of non-targeted molecules can be accounted for due to their variant characteristics or sequence; will be revealed after a certain amount of sequence information has been obtained.
To overcome cross reaction, mismatched capture can be minimized by selectively destabilizing the mismatch template by selective cleavage (see above). For example, a hybrid with mismatch may be cleaved chemically or by DNA repair enzymes. After cleavage the mismatched template can be separated from the probe; this can be done at denaturing conditions that would remove the cleaved but nor the intact molecule or it can be done by using enzymes with exonuclease activity that begin processive degradation of the target strand from the site of cleavage. Because of redundancy in the genome- that there are two chromosome copies and that there is significant segmental duplication- a single capture probe molecule may lead to sequencing of one of the following cases: two homozygous sequences, two sequences which are heterozygous at one or more positions, two sequences which are completely or have significant stretches of difference due to a translocation or rearrangement event at the location in one of the copies. Where a region is segmentally duplicated to give more than two similar regions, even when the sequence covered by the probe is similar enough to retain hybridisation the sequence to be sequenced may diverge significantly and different molecules captured within the same spot would give significantly different sequences. This can be resolved at the level of single molecule sequencing but it poses a problem for bulk sequencing. For bulk sequencing known duplicated regions maybe deselected or masked as would certain classes of repetitive DNA. It should be noted that in some instances single molecule sequencing may be equivalent to the case where a single molecule is amplified as is the case with DNA colonies and polonies.
The selection array should be able to target any sequence effectively. This can be challenging because some probes are not as effective as others. There is substantial variation between the efficiency of capture. This is largely due to differences in base composition and sequence. These differences can be ironed out by making the probes long enough so that stable hybrid can be made (e.g. 60-120mers) or making probes isotherhmal by adjusting Tm with a flexible length. The differences can also be ironed out by using buffers such as Tetramethylammonium chroride, betaine and C-tab.
Obtaining sufficient signal when number of individual molecules is low
Another challenge of sequencing directly from genomic DNA is that each specific species that is to be captured may be at too low a concentration to provide sufficient captured molecules for detectable sequencing. 0.1 % or fewer molecules or less are typically captured from solution by surface attached probes (Harris et al). In some cases the signal may not be observable over background signal, particularly using equipment and detection methods set up for bulk analysis. Detection sensitivity can be improved by reducing background. Substrate with low intrinsic fluorescence should be used. The signal due to background scattered light can be removed for example by using example evanescent illumination or time-gating the detection so that short-lived signal from light scattering is not detected. Detectors with low intrinsic noise must be used. Also the signal due to random nonspecific attachment of fluorescent material to the surface needs to be minimized; this can be done for example by choosing an appropriate surface chemistry (e.g. polyelectrolyte multilayers) or by treating surfaces with casein, BSA, salmon sperm DNA, an/or Denhardts solution. Also on a microarray transfer of probe molecules to non-spotted inter-spot areas during slide processing needs to be minimised. Finally, if single molecule detection is used then detection can be digital, which by its nature will reduce noise and artefactual signal can be removed.
Despite the measures described above, it may also be useful to increase the amount of DNA in the sample by a non-locus specific manner, so that it can be detected more easily. This can be done one of the whole genome amplification methods, notably Multiple Displacement Amplification (MDA), Rolling Circle Amplification (RCA) and OmniPlex technology. Here the complexity and content of the sample remains largely unaltered but the amount of sample increases. Alternatively the sample may be generically amplified in a way that also fractionates the molecules; this can do done by amplifying DNA of certain size range after restriction digestion. An alternative approach is to clonally amplify the sample using emulsion PCR (see below). The high concentrations of genomic DNA that is generated by generic amplification is needed to drive the capture reaction.
If the concentration of specific sequences are still low or as an alternative pre-amplification, reagents used to carry out the sequencing reaction (e.g. nucleotides or oligonucleotides) may be subject signal amplification. This may be direct labeling with bright labels such as Qdots, Fluospheres, or phycoerythrin and its FRET conjugates (all available form Life technologies). In addition polylabelled dendrimers or branched structures may be directly attached. Alternatively, amplification by antibody layering can be conducted or by branched DNA or dendrimers (Genisphere Inc.) after the reaction. A four-colour sequencing scheme can be implemented using Phycoerythrin and PE-FRET conjugates and Quantum Dots.
Fixing the Template
To avoid attrition during sequencing (see below) it is desirable to fix the template in the locality of the primer or so that it is physically contacting the primer. This can be done for example, by either co- immobilising primer and template to the same place on a surface so that they are able to make contact easily or by having them become attached together. In this way if the primer and template denature they are quickly able to re-anneal and because they are both fixed they do not diffuse away from each other.
The template and primer can be linked together by crosslinking including the use of psoralen and/or UV light irradiation. If sample molecules are tailed with homopolymer sequence such as poly T using terminal transferase and a homopolymer poly A sequence is provided in the primer linked to psoralen, both primer could become attached, upon UV irradation. The psoralen may also be pre-attached to the template before hybridisation to the surface oligo. Other available crosslinking systems could be used in the same way. Also both template and primer could be biotinylated at multiple sites. This can be done with photoprobe biotinylation kit from Vector labs. Then streptavidn is added to link the molecules together. They can also be linked by an intercalating, or groove binding molecule. This binder may become covalently attached to the two strands. The probes may have a thymidine tail to which the target becomes crosslinked by binding of a psoralen derivative from solution (e.g trioxalen).
The above two approaches, crosslinking or streptavidin-biotin interaction can be used in direct array based selective sequencing if the linking part is supplementary to the selection sequence and does not prevent selection.
In one embodiment the target remains double stranded and is bound to the immobilised capture probe by a RecA mediated reaction. In another embodiment the target is partially denatured and the probe is a PNA or LNA and is able to bind by strand invasion or by outcompeting the renaturation of the native complementary strand to the sequence that is targeted. A number of DNA mimics/analogues including Locked Nucleic Acid (LNA), Peptide Nucleic acid (PNA), Ethylene Bridged Nucleic Acids (ENA), Methylene bridged nucleic acid, (BNA), guanidinium nucleic acid (DNG), morpholino, methylphosponate can be used to bind double or single stranded target DNA. In addition an intercalator dye such as acridine can be used to increase stability. The intercalator may cap one of the ends of the capture probe, preferably a free end and can be added during oligo synthesis (Glen Research),.
Gunderson et al (Genome Research) have published a method for ligating the ends of target DNA molecules to a partially double-stranded probe on a surface. The partially double-stranded "sticky" primer complex comprises two oligonucleotides annealed together with one oligonucleotide overhanging the other and carrying the sequence to which the target molecule binds by complementary base-pairing. This then allows the longer oligonucleotide to prime an extension reaction with the captured target molecule as template. The drawback of this approach is that it can target only the ends of molecules. Hence, substantially all the sequences of interest must be located close to the ends of fragmented DNA. As an alternative in one embodiment of the invention, the sticky probes are not created by hybridisation of two primers, but they are created by designing an oligonuceotide that will form a stem loop structure. This stem loop can be attached to the surface by an amino group modifying one of the nucleosides in the loop.
In one embodiment, the capture probe does not prime sequencing but a separate primer is provided for the sequencing reaction. This may be a random primer or a primer that binds to a PBS ligated to one end of the template or a promoter. The template may be adapted at one end with a stem loop capable of priming sequencing. This can be done by ligation of a priming structure such a self-priming stem loop onto the end of the template. This can be done before or after capture. Alternatively, a primase can be used to create a primer on the template. In the case the capture primer does not initiate the sequencing, the capture can be used to facilitate the fixing of the template to the surface.
FLAP capture
One preferred embodiment of the present invention, preferably using the sticky probes described above, provides the ability to ligate to a priming sequence bound anywhere along a target molecule (this is in contrast to Gundersons approach described above which only binds to the end of a molecule. Normally, when hybridisation occurs between an oligonucleotide probe and an internal sequence of a molecule (distal from the ends of the molecule), the sequence at the 5' and 3' overhang the duplex. One overhang is the template for sequencing. The other overhang prevents ligation to the sticky primer complex. In the present invention we can overcome this by clipping off the overhang end that is not targeted for sequencing, so that an end is generated that is able to ligate with the sticky primer complex. This can be achieved by the Flap reaction, using Flap enzymes. This type of enzyme is used in the Invader genotyping assay (Third Wave Inc). Taq DNA polymerase also has a Flap activity that can be used. When the sticky primer complex is added to the template in the presence of a DNA ligase and a Flap enzyme, the overhang can be clipped off and the target molecule is covalently linked to the primer complex by ligation. The covalently linked target molecule can then act as a template for a template-directed extension reaction using the longer oligonucleotide of the sticky primer complex as a primer. The extension can be achieved by a DNA ligase. In one embodiment the clipping and ligation can occur in the same step as the first ligation step of the sequencing reaction. In another embodiment sequencing and flap-ligation reactions occur separately. The template can be bound to the sticky primer complex in one reaction or alternatively, hybridisation can be done first followed by washing steps, before the flap/ligation reaction is implemented.
Random arrays
Selective capture on random arrays can be done in two different ways. The first is described immediately below and the second is described as part of he description of clonally amplified beads.
Capture in sequence specific manner with selection probes and amplify in situ to make DNA colonies
In one embodiment, capture probes are arrayed randomly on a surface, at a low density (e.g. one molecule per 3-5 microηs); primers complementary to P1 and and anti-P2 are co-immobilised on the surface at high density (primers P1 and P2 are the two bridge amplification preimers). The target molecules are fragmented (e.g. to around 200bp) and adapted with anti-P1 and P2 primers. The target molecules are hybridised to the selector probes (60mer) at high stringency (e.g. 65-75°C in Agilent Hybridisation buffer). After washing, buffer at low stringency is added (e.g. at 30-55 °C) and one of the primers P1 and and anti-P2 is able to hybridise to the captured molecule and a first strand synthesis is carried out (e.g using Taq DNA polymerase under standard buffer; or see Turcatti et al). The duplex is then denatured using heat or chemical (e.g. Urea, Formamide buffer) denaturation and then both the original template and the newly synthesized strand are hybridised to complementary primers and second strand synthesis is carried out. This process is carried out for a number of cycles (e.g. 20-40) until substantially spatially distinct DNA colonies are formed. These colonies can then be enumerated and/or sequenced.
Alternatively after capture the capture probe serves as primer to make a cDNA copy. Then an oligonucleotide primer binding site is ligated to the 3' end of the strand and after denaturation, a primer is annealed to synthesise a second strand. The primer is preferably also localised on the surface.
Selectively capturing beads of clonally amplified DNA
One aim of this invention is to be able to selectively sequence the parts of the genome of interest without having to perform separate selection steps by locus-specific PCR or some form of laborious multi-step array-derived selection. Rather, in a preferred embodiment the selection and sequencing should occur on a single platform. In one embodiment the selection is on a microarray. In another embodiment the selection is on a random array of selection probes.
The embodiments of the invention related to selection and sequencing described above can be carried out by either bulk analysis or single molecule analysis. Single molecules allows the analysis to be digital. In bulk/ensemble analysis, where measurements are analogue, when the signal is low there may be difficulty in picking up true signal over background, whereas where the measurements are digital, due to each signal being from individually detectable molecules, true signal can be differentiated from background and molecules can be quantitated by recording and counting discrete signals.
When working with complex genomic DNA from a large complex genome such as the human genome, the amount of sample and the concentration of sample material can be increased by generic amplification methods including MDA, OmniPlex and RCA as described above. However, in an alternative and preferred embodiment that combines both digital analysis and amplification, the sample preparation methods employed in some of the next generation DNA sequencing methods (e.g. 454, ligation-based "Polony" sequencing and SOLID sequencing), namely clonal amplification on a bead using emulsion PCR as first described by Dressman et al, can be used. Clonal beads can also be generated by non-emulsion PCR method described by Brenner et al.The digital analysis can then be carried out by counting beads.
In vitro clonal amplification on beads produces beads of different flavours (i.e. derived from different single molecules). The array then captures beads of different flavours onto the surface.
The beads maybe magnetic bead such as Dynal Beads available form Life Technologies.The beads may also be non-magentic beads.
The existing surface-based bead sequencing methods attach the beads on the surface of a glass slide via a chemical interaction. This process does not involve hybridisation capture, sequence specific selection or spatial addressing. What differentiates the present invention from these methods is that in one favoured embodiment of the present invention, beads carrying the clonally amplified un- selected molecules are captured by hybridisation in sequence specific manner according to the probe sequences that are provided as selectors. In one embodiment the capture is to spatial addresses on a surface (e.g. in a microarray format). These addresses comprise specific spots or features within a microarray. In another embodiment the capture is to random positions on the surface (although location of molecules on the random array may be according to a specific periodicity and may be tightly packed).
The beads can be captured in hybridisation buffer at temperatures ranging from 25°C to 65 "C depending on probe oligonucleotide length. The capture also depends on bead size; Beads of 20nm diameter could be captured with no mixing. Larger beads, carrying many probes may require mixing, relatively higher temperatures and it may be preferable to use a lower density of molecules so that fewer interactions form. The problem as beads increase diameter is that if there is a higher number of probe molecules on the surface and if the majority of these find complements on the surface it may be difficult for a mismatched bead to dissociate from the surface to find it is correct target; the binding may be cooperative and would require a high temperature for release. Thermocycling can facilitate getting the right bead to the right spot; typically high denaturation temperatures for duration longer than usual can be used. In one embodiment RNA/DNA from as little as a single cell is clonally amplified. The beads may be coded according to which sample they are from.
After capture the bead-DNA and surface-oligo interaction can be locked. As each bead comprises a plurality of molecules are available for both capture and to serve as templates for extension.
Bead Capture on a microarray
In the case of capture onto a microarray the beads within a spot can be counted. Due to the known sequence of probes at specific locations on the surface, some sequence information about the target DNA molecules on the bead is revealed due to just their spatial immobilisation. As well as getting sequence information from the beads isolated in the spot, the number of occurrences of beads within the spot is obtained. If the probes and experimental conditions are well designed, so that all sequences interact with relatively similar effectiveness, the number of captured molecules can be enumerated to provide an indication of the quantity of each molecule within the sample. For genomic DNA this will indicate whether there is aneuploidy, polyploidy, segmental duplication or copy number of variation at a particular locus in the sample. Initially before enumeration just the density of beads within a spot will indicate if there is more or less copies of that part of the genome compared to any other spot on the surface. In order to obtain information quickly, not all beads within a spot will need to be addressed in order to be obtain a consensus sequence.
For selective capture of regions of genomic DNA the DNA is first fragmented and then subjected to emulsion PCR. Fragmentation is done randomly. Alternatively, fragmentation is done by a restriction enzyme, in this case we would know what fragments to expect. Preferably one or more primers for amplification are attached to a bead. The capture probes may capture at different sites on the same fragments or fragments of different size can be captured on the same array. Following selective capture, sequencing can commence.
Each spot captures a specific targeted bead (via interaction of DNA/RNA on bead with DNA on surface) and beads that are not targeted are not captured (unless some mismatching occurs). Subsequences of the sequence present on the each bead may be represented on more than one spot, where each spot captures a different section of the DNA on the bead. The DNA/RNA on the bead may extend to 100s of bases in length. Some sequence motifs may be represented on more than one bead. Typically several beads will be captured on each spot, each of these beads carrying the sequence complementary to the probe on the array.
In summary, the method involves sequence-specific capture of beads carrying clonal copies of nucleic acid species onto an array followed by optional enumeration of the number of beads per array element and/or sequence determination of the molecules on the beads.
The method has application in DNA sequencing wherein: (i) each sample nucleic acid molecule (which may be a fragment) is bound to a primer on a bead and clonally amplified (e.g. in an individual droplet within a water in oil emulsion according to methods described in the art); (ii) the beads are released from the droplets and added to a spatially addressable array of probes, under conditions that allow probes and sequence on bead to form specific interactions by which the beads are sorted to specific location on the array; (iii) unbound beads are removed; (iv) The occurrence of particular nucleic acid sequences in the sample can be quantitated digitally by enumerating the number of beads captured per array element (by detecting beads with or without fluorescence); (v) the sequence on each bead can be determined by using AB's SOLiD, Sequencing by Ligation (SBL) chemistry or other suitable chemistry.
The selection can be done on a spatially addressable microarray or any other type of array. The beads have sufficient number and distribution of molecules to enable a fraction of the molecules to engage in the capture process and a portion to act as templates for DNA sequencing. The results of quantitation may inform subsequent steps, such as whether to sequence. This may act as a quality control step, before entering into the long-run time commitment and expense of DNA sequencing.
The following iterates some of the benefits of the new approach. The method provides the flexibility to systematically address every detectable molecular species of a particular type (e.g. mRNA, whole genomic DNA) in the sample or to selectively target a subset (e.g. exonic genomic regions, candidate genes) per array. The method enables selective sequencing without having to perform locus specific PCR1 array hybridisation and elution etc. prior to entering the in vitro cloning step. The method significantly streamlines the selective sequencing pipeline compared to other methods, reducing 5 steps to 3 steps. This means increase in speed and reduction in cost and also offers the possibility of easier automation of the selective sequencing pipeline. In addition the new method has the additional benefit of enabling digital quantitation of the sample. The method also offers an intermediate check- point before sequencing commences; the enumeration of beads per array element will report on the representation of species within the nucleic acid sample, for example if some particular nucleic acids are overrepresented and others are underrepresented or absent, this may rule out further sequencing.
If more than one genome is to be sequenced each sample genomic DNA is optionally tagged so that one can be distinguished from another and performing digital detection and analysis
Bead Capture on a random array
In one embodiment, the selector probes for bead capture are not arranged in a spatially addressable array but are distributed randomly on a surface. In this case there will not be an opportunity immediate ability after hybridisation to enumerate the number of each molecule in the sample, but it will be possible to do this after sequence is obtained.
In this case probe molecules are spread randomly on a surface in a manner that when the target molecule is immobilised it can individually resolved. Then each probe molecule binds to a single target molecule and immobilises it onto the surface. In one embodiment each target molecule is first clonally amplified (by for example bead-based emulsion PCR). Then the single probe molecule binds to one of the complementary molecules on the bead and immobilises it to the surface. For this selection to be successful the beads must not be able to non-specifically bind to the surface; the surface may contain a repulsive coating, for example may be negatively charged or may have negative electrical field around it. A single molecule interaction must be able to hold the bead onto the surface during capture. Once the capture reaction is completed the beads can be attached to the surface by a separate means, by for example by changing the pH so that the charge on the surface of the slide is altered to enable chemical groups on the bead to bind to the surface. In one embodiment the single molecules on the surface may comprise a dendrimer carrying probe replicates at each location; this is then able to bind the bead strongly. Similarly the single molecule probe on the surface may include tandem replicates of the probe sequence created by rolling circle amplification. This can be created by providing the selection probes as circles, containing a PBS and the complement to the selector (anti-selector) sequence. A primer complementary to the PBS common to the anti-selectors then binds to the circles and rolling circle amplification is carried out. The anti-selector probes can be created by methods known in the art using reagents available from Glen Research and as described by Eric Kool.
The method for selection comprises the following steps: i. Take unselected DNA/RNA sample ii. Do clonal amplification on beads by emulsion PCR iii. Add bead to microarray and allow sequence specific interaction of DNA/RNA on bead with DNA on spots across the array.
Once the beads are captured on the array the DNA on the beads can be subjected to sequencing. After capture of the beads sequencing can be initiated by binding of a primer to the primer binding site on the clonally amplified templates. As described above, the primer could be crosslinked to the template or may be created by the ligation of a stem-loop to the template on the bead.
For ligation sequencing, where a 3' immobilised selection primer is used, ligation-based sequencing can proceed directly fro the capture probes in the 3' to 5' direction. The selection primer would need to have a phosphate group at its end for each cycle of sequencing by extension. The 3' immobilisation will initiate sequencing away from the surface, towards the bead. Where a 5' immobilised selection primer is used, either ligation (using 5' phosphorylated oligonucleotides) or polymerase based sequencing can be used. In this case extension will be away form the bead, towards the surface. In one embodiment the beads are immobilised within a gel or a hydrogel; the hydrogel may have waveguiding properties which may intensify illumination and restrict it to around the beads.
Alternatively, the target molecules can be adapted with a primer binding site before or after immobilisaiton and a separate primer to the capture probe is used to carry out an extension reaction for sequencing. Alternatively sequencing by hybridiation fo a acomplete set of oligos can be be added iteratively to the array as described by Pihlak et al.
In addition to DNA on beads, the DNA can be clonally amplified as DNA nanoballs (DNB) which can be produced by rolling circle amplification.
Multiplexing bead populations
The DNA selected from several populations can be multiplexed and sequenced together. The beads generated from one individual can be labelled with a particular colour. This may be by hybridisation to still vacant primer, hybridisation to a second (non-primer sequence on the bead) or interaction of a label in some other way, e.g. non-specific interaction or adsorption. This approach can be multiplexed in two ways: the beads from a specific preparation can be labelled with specific dyes; beads from separate preparations can be placed at separate sub-arrays separated by barriers.
Digital RNA profiling and sequencing
The process starts with a single mRNA molecule in a water-oil droplet. The array is made by reference to the consensus gene sequence or the known information about previously described or predicted RNA sequences. The RNA is hybridised to a primer, which may bind to the poly A tail or region around the polyadenylation signal or to some other characteristic sequence or motif (e.g Cap) or an adaptor tag that has been ligated to the molecule using T4 RNA ligase. The primer then synthesizes a complementary strand. A second strand is then synthesized and the process repeated until many complementary copies of the single molecule have been generated and are attached to the bead. Alternatively the RNA can be converted to a cDNA copy before the process begins. After amplification the beads are contacted with an array, preferably a spatially addressable array. In the case of existing methods (e.g. SOLID),the interaction of the beads to the surface is a chemical one. In the present invention, the interaction between bead DNA and the surface is by sequence specific molecular interaction, such as hybridisation or annealing. In a preferred embodiment the array is composed of a spatially addressable array comprising spots/features targeting different molecular species in the target mixture. The array may aim to systematically address every molecular species in the sample or it may selectively target a subset.
The sample can be characterised by counting the number of beads isolated at each location. If a mRNA species is in low abundance then only a few beads will be isolated within the spot. If the abundance is high then a large number will be isolated per spot. Alternatively, the systematically and selectively arranged beads can be subjected to DNA sequencing for further characterisation.
In one embodiment it will be possible to detect alternative transcripts by obtaining sequencing reads from separated sites on the captured molecule. This will be possible for example, punctuating extensions with labelled reagents (from which sequence is obtained) with running unlabeled extension reactions (e.g. comprising contiguous ligation of unlabelled degenerate oligonucelotdes or unlabelled dNTPs or NTPs)
Solid-phase selection from a random array of selection probes
The mixture of probes is then randomly arrayed out on a surface. The probes may be attached to the surface at a high density so that single molecules are not individually resolvable. Alternatively, the probes may be attached to the surface at a low density so that single molecules are individually resolvable. The probes may be arrayed in a completely random way, in that there is neither specific order in the spatial arrangement of the molecule nor order in which molecule is where on the surface. Alternatively the mixture of probes are arrayed in an ordered single molecule array in which each molecule is located in a non-random fashion but the position of any specific probe is not specified (is random). This array of sequence specific probes can then be used in three different ways.
The array is contacted with the target mixture and selected target molecules are separated from non- targeted molecules. The target molecules bind to the array and the non-target molecules do not and therefore are removed during a washing process. The washing can typically be done under stringency conditions which favour retention of perfect matches and remove mismatched molecules from the array.
The captured molecules can be removed from the array and sequenced according to any available sequencing method. Alternatively the captured molecules are sequenced directly on the array. In this case the probes need to be arrayed in a manner that each probe or the resulting probe-target complex can be resolved individually according to the method of detection. The method of detection may be optical. A higher density of molecules can be used when the detection method uses scanning probe microscopy or electron microscopy.
Alternatively the captured molecules are amplified on the array before sequencing. When the single molecules are individually resolvable, DNA or RNA colonies can be produced which are individually resolvable. These colonies can then be counted and/or subjected to sequencing according to available methods. Amplification may be by thermoscycling or by isothermal reactions. Where the reaction is isothermal it may use helicases (Biohelix. USA) or by flushing in denaturant including from formamide, urea, alkali solution, acidic solution, DMSO and other denaturants of duplexes; this is a substitute for the heating step in PCR.
For example, amplification may be by bridge amplification. Alternatively they can be amplified by rolling circle amplification or any other amplification method that can be applied on a surface.The bridge reaction can be conducted by having primers complementary to a primer binding site which has been attached to the target. The capture primer itself my be bifurcating and so may have the second primer already attached.
The capture filter
In a further embodiment certain sub-sets of molecules are suppressed to allow detection of targeted molecules. For example in blood the detection of the highly abundant globin mRNA should be suppressed in order to detect other molecular species.
In a further embodiment molecules are bound by the array of probes, are not to be characterised. Rather the molecules that do not bind to the array are to be sequenced. Hence the de-selection process, like a filter catches the undesired molecular species but lets through the desirable species. This process can be iterated several times. The molecules that do not bind to the array can be amplified before re-iteration of the filtration process. In a final selection step the sample is bound to a selection array and then sequenced.
Sequencing
Once the target is captured, sequencing can be conducted on the template by one of several methods. One example is sequencing by synthesis using a polymerase (see Harris et al; Bentley et al; Eid et al) or a ligase (see Shendure et al). Another is sequencing by hybridisation and combinatorial decoding (Walt et al). Alternatively, sequencing by hybridisation can be conducted by iteratively hybridising a complete set of oligonucleotides to the target, as has been described by Pihlak et al. In one embodiment following selection from a complex mixture, sequencing is achieved by a single extension reaction from an array that tiles through every base of the sequence. This tiling may be by using only reference sequence probes for capture and four coloured base specific nucleotides or oligos. Alternatively, tiling may contain 4 A1C1G1T variants for each position. Sequencing may be enzymatic extension with the variant position at or near the point of extension.
Sequencing may be stepwise sequencing by synthesis where either one base is added at a time or where all four bases are added simultaneously but may carry a terminator which may need to be removed at each cycle. Sequencing may be by non-stepwise sequencing by synthesis including realtime or continuous sequencing. During sequencing by synthesis, as the extension proceeds it is important to overcome attrition. This can be due to the probe-primer detaching form the surface. Steps need to be taken to form strong attachment to the surface (e.g. as described by Turcatti et al). Another reason for attrition is that the target molecule detaches from the primer. Steps need to be taken to keep the target in place. Another reason for attrition is that a molecule is no longer available for further extension. The primer may become attached to the surface in an orientation which prevents it from being extended; the surface may need to be engineered to prevent this. A 3D hydrogel matrix may be a preferable substrate (see Mir et al). The primer may become damaged; the reaction may be supplemented with DNA repair enzymes or other relevant reagents, for example glycerol and beta- mercaptoethanol and ascorbic acid which may counter light induced DNA breakage. For sequencing by synthesis it is also important to maintain stepwise yield as high as possible, this can be achieved by using enzymes that are highly processive and are able to extend many primers in the course of one cycle. Stepwise yield can also be maintained if any cleavage steps that are used are effective. Also in general, the extending primer-target complex need to kept in an environment which is conducive to the enzyme action, avoiding, "sticky" interactions with the surface or components of the reaction mix. The primers should be synthesized without capping. This is to prevent n-1 products within a spot from initiation synthesis from different positions on the template. Extension may involve the provision of supplements such as Single strand-binding protein, DMSO, Betaine, Sodium ascorbate. In order to obtain high signal during the sequencing, bright labels may need to be used. This may include high absorption coefficient, high quantum yield, high photostability and good solubility; Dyes should preferably have minimal dark states. It may also include fluorescent nanoparticles such as Fluospheres, Transfluospheres and Quantum Dots (all available from Life Technologies, CA) or light scattering particles such as gold and silver particles of various dimensions.
In one embodiment, after capture, sequencing is followed on an array of single molecules. Single molecule analysis is particularly favourable for sequencing-by-synthesis. This is because it enables reactions with less than 100% stepwise yields to be followed without the confounding effects of dephasing. Sequencing on single molecule microarrays could also offer the following advantages for sequencing: Longer read lengths because monitoring individual molecules circumvents the problem of out-of phase or asynchronous extension; High statistical significance of base calling because it is based on the consensus base-call of many copies of each target region co-immobilised within a single microarray spot; Detection of a rare allele in the presence of a majority allele; this is important for detection of cancer related alleles and will also enable highly accurate allele frequency determination in pooled samples; Low amounts of sample material can be used due to single molecule sensitivity; this is particularly important when using limited amounts of biopsy material; possibility of sequencing directly from genomic DNA; this will produce savings in cost and time. In a further embodiment the array is composed of spots comprising a plurality of molecules and each spot on the array may target a particular subset of target molecules, for example to capture each of the exons associated with a particular gene. Therefore, probes of different sequence will be present within the same spot. In a further embodiment each spot in the array will carry a plurality of sequences and/or sequence lengths to capture one or more selected targets. Sequencing of each of the plurality of sequences can be followed at the single molecule elevel within the spot. When sequencing is selective, for many applications, because the genome has been decomplexed within an array spot (i.e. a known species or a plurality of known sequences have been enriched), the read length can be very short and still be useful. Particularly, in contrast to most second generation sequencing methods, which require a minimum read-length of around 20 bases to be able to map the read onto the genome reference sequence, as the proposed method is already pre-selects a part of the genome, the minimum read-length requirement is avoided. A read-length of 8-14 bases is sufficient for many applications and will pick up substitutions, small insertion/deletions (indels) and the sequence following a breakpoint that has linked two sequences together not apparent from the reference genome. When the sample to be sequence in the spot is of reduced complexity compared to the whole genome, shorter sequence reads will be needed to reconstruct sequence. Fewer sequencing cycles will be needed and therefore the process will be cheaper and quicker. This is useful for diagnostic sequencing.
Having only to obtain a short sequence read-length circumvents substantially the problem of attrition described above. It also opens the way for an alternative and preferred means than sequencing by synthesis for undertaking ligation based sequencing on the selected molecules. This process starts by selecting the sequences of interest on the array such that the majority of sequences in a particular array spot are the targeted molecules. For obtaining 10 bases of sequence information a way of sequencing on selected genomic DNA is therefore not to do iterative extension of the primer by ligation of labeled oligos from solution (as described in Mir et al), where each cycle increases the length of the primer beyond the previous extension length as in sequencing-by-synthesis but instead, after reading each individual base the primer is regenerated back to its original position before another cycle is undertaken. Then for reading every subsequent base in the selected sample target, the oligonucleotide libraries provided in solution are coded at different positions along their length at each cycle. For example, for reading the first base, 4 oligonucleotide libraries are provided where the nucleoside at the oligo end at the ligation junction is a defined base but subsequent bases are degenerate, being randomized and/or are universal bases. After the identity of the best ligating oligos is determined, the oligonucleotide can be completely removed. This can be done by making the nucleotide at the ligated oligo end to be an RNA base which can be cleaved by alkali or RNases such as cocktail of RNases and/or RNASeH. Hence the oligonucelitde that ligates from solution is removed but the primer remains attached to the template (and may have been previously fixed to the template, especially when alkali is used for removal). Then a second base is read by adding a second set of oligonucleotides, where the second base is defined and the 1st base and all others except the second base are degenerate. Here the RNA nucleoside is degenerate and the defined base is DNA. This process is iterated for all the positions along the sequence. Hence for a 10 base read, 10 sets of oligo libraries will need to be used. Each set of oligo will comprise 4 libraries and in each library the defined base will be one base from A1C, G, or T and each specification will carry a corresponding label that can be differentiated or distinguished from the other 3 labels.
The method may also be possible to extend the read-length further, especially if the interrogaton bases furthest away form the ligation junction are cycled first, i.e. in the first cycle base 20 to be defined; ligation conditions for the distal interrogation positions may need to be more stringent than those closer to the ligation junction and can be optimized by those with basic skills in the art.
The label is typically attached to the non-ligating end of the solution oligo and can therefore terminate extension at each ligation step. Hence, the homopolymer problem which is a feature of Roche 454 and Helicos sequencing is eliminated. Also the addition of all four bases simultaneously ensures fewer addition and wash steps per cycle and lessens the chance of mis-incorporation of bases compared to approaches where each base is added sequentially.
One advantage of sequencing in this way is that the arrays will be re-usable. Therefore after every capture and sequencing, at a final step the priemr can be regerated and tehn the template can be dentured off the arraythe array can be regenerated for further use. This will require the oligos to be attached via strong non-labile linkages. In order to make sure that DNA from one sample has been completely removed and will not contaminate the next sample to undergo selection, exonucleases may need to be used. One way to prevent the oligo probes from being degraded as well as residual sample from a previous run, is to use exonucleeases that are specific for the probe termini attached to the surface, or the oligo chemistry may be an analogue that is resistant to degradation, such as PNA. Selection microarrays for the exome or candidate genes for cancer or heart disease could be mass produced and each used many times. A large candidate gene based case-control study would require only a handful of selection microarrays.
The sequence can take into account the sequence of the selection probe sequence, especially the probe is an allele-specific probe. Preferably this will have a different weighting than sequence obtained by extension. In some embodiments extension will sequence through the probe-primer sequence. In other embodiments the probe-primer will not be sequenced through by primer extension but will be used for sequence reconstruction, a different confidence or quality score will be attached to sequence obtained from hybridisation than that from extension.
When the sample to be sequenced in the spot is of reduced complexity, compared to the shotgun sequencing of whole genomes, shorter sequence reads will be needed to reconstruct sequence. Fewer sequencing cycles will be needed and therefore the process will be cheaper and quicker. This is useful for diagnostic sequencing.
System for making optimal array design and carrying out sequencing by synthesis
It would be highly advantageous to have a single flexible sequencing system for sequencing as little or as much as required. It would be useful if the cost of sequencing per base was the same, whether one needed to sequence 100Kb or 100MB. The 2nd generation sequencing systems that currently exist are cost effective when a lot of sequencing needs to be done but not as effective when a small amount of sequencing needs to be done. Diverse samples can be added to the same plate for sequencing by ABI capillary array sequencing but this is less flexible for the second generation sequencing. Also it would be helpful if rather than having to wait for a number of samples to be ready for sequencing before an instrument run, that an instrument run can occur for just a single sequence.
A sequencing system where instrument time can be tailored to short sequencing runs and reagents can be used in small amounts, such that small sequencing runs are as cost-effective as long sequencing runs. If not as cost effective, then at least significantly improved over what is currently available. This can be achieved by inexpensive chip substrates or substrates of which portions can be used on different occasions. An instrument that does not require operations that are redundant for the small sequencing run and reagents that can be aliquoted and used in small volumes, data acquisition and processing and base calling algorithm that can operate in the context of a small sequencing run.
One solution to this problem is a system comprising an algorithm for calculating the optimal array/cycle design, a database containing DNA to be sequenced, a computer processor and memory to perform calculation/computation, a device that incorporates an array manufacture device and a device for carrying out sequencing by synthesis, control algorithm to direct operation of device according to the optimal array/cycle design.
The array may be made by in situ synthesis. Methods include light-directed synthesis and ink-jet synthesis. In the context of resequencing the process starts with information about the region or regions of the genome or other sequences such as RNA populations, that need to be sequenced. Primers are designed based on the reference sequence and knowledge known in the art aboutwhat makes a good primer. This is balanced with the number of cycles that are implemented. For a short sequence, a tiling array of primers may be designed that tile at every consecutive position. Then the sequencing process requires just one cycle. To increase confidence two or more cycles may be carried out so that redundant sequence information is obtained. In the case of a long sequence, primers may be tiled at spacings of several bases apart (depending on the read length achievable by the sequencing process, if it is 1000 bases then the primers will target sequences 1000 bases apart in the sample). Then sequencing cycles, that number at least as many or more than the gap between the tiled primers, so that the intervening sequence can be obtained. Preferably the sequencing should cover or continue beyond the next primer. This is because primer binding, although provides some sequence information, it may cover up sites of polymorphism which are not discriminated by the primer due to mismatching. Indeed the conditions should be such that the primers are not stopped from working by a mismatch unless allele specificity needs to obtained by the primer (for allele specific capture).
The sample genomic DNA is made single stranded and then passed over the array to perform a hybridization reaction. The array may be inside a microfluidic conduit. Efficient mixing methods are used to ensure fast hybridization kinetics. If necessary the sample DNA can be iteratively enriched by amplification of selected DNA and passing over the array repeatedly. Following selection the unreacted DNA is washed away with appropriate stringency washes. That the targeted regions have reacted can be tested by imaging the array by using intercalating dyes and highly sensitive CCD based imaging.
Set up of automated reagent addition and imaging system
A system for carrying out sequencing cycles and for detecting the signal. This involves: (A) Software- controlled automated CCD-based imaging system; (B) Flow Cell; (C) Software- controlled automated syringe-pump based reagent exchange system; (D) lntergrated microfluidic fluid exchange and CCD imaging; (E) Streaming raw data into a hard drive. The use of high resolution stage movements and high numerical aperture objective lends and EMM-CCD cameras enables analysis to be done at the single molecule level. The raw data generated in will be processed.As an alternative the Polonator from Dover Systems can be used to carry out the work. Although current iterations of this technology require coverglass thickness with a coated back as available from Dover systems, the instrument can be adapted for imaging slides or alternatively the array can be created on compatible substrates. For manual sequencing a 4 laser microarray scanner can be used or a single or dual laser scanner with appropriate emission filters, depending on the labeling system used.Software gor extracting information from the images includes the tissue microarray drop-in available for MetaMorph software or GenePix or similar software can be used.
Capturing problem sequences
Regions that remain refractory to capture, will be explored by testing various high affinity probe chemistries using spotted arrays. Once the best probes have been selected, an optimised array will contain 30-60 probes per exon tiled at every 5-10 bases in the sequence and made using Febit, Roche/Nimblegen or Agilent technology.
The case for sequencing on microarrays
While random array approaches offer high density, they are not tailored to targeted resequencing of selected regions of the genome. By contrast, microarrays enable complex mixtures (e.g. mRNA populations) to be resolved by sorting of specific sequences by hybridization to unique spatial addresses on a surface. This allows microarrays to systematically address whole chromosomes or to select multiple specific genomic regions of choice. However, to date, sequencing on microarrays has been limited to obtaining a single base or less of information per feature. In contrast, sequencing by extension or iterative hybridisation with a complete set of oligos on arrays has the potential to read 100s of bases of sequence from every array feature. When such sequencing is coupled to target capture on a microarray the intrinsic scalability of microarray technology promises massive sequencing throughput. A 2007 GeneChipTM study employed 180 million probe features on a single whole wafer array (Frazer et al). Moore's law, which predicts the rate of miniaturization achievable by the underlying photolithography technology, would suggest, that by 2009 we may have 360 million features. It is not then inconceivable, that on such an array a read as short as 10 bases could be applied to the systematic resequencing of a whole human genome of 3.2 billion bases.
Where random arrays use the reference sequence to compile the sequence, a microarray approach uses the reference sequence at the outset to design a target specific array. Therefore, in order to be informative, the random array approach requires 20-50 bases to anchor the sequence read to a location in the genome. On the other hand, microarray-based resequencing starts with a known location and therefore the sequence read is informative right from the start; sequence assembly is not required, only base calling.
Examples
The invention will now be described in the following non-limiting examples. It should be borne in mind that the following examples can be further optimised and the composition and concentrations of reagents used can be adjusted by those skilled in the art. Additonal components may be added as known in the art and as exemplified in the patents and publications referenced in this document. As many of the required procedures are standard molecular biology procedures that lab manual, Sambrook and Russell, Molecular Cloning A laboratory Manual, CSL Press (www. Molecular Cloning, com) can be consulted. Also Eckstein, editor, Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 1991) and M. J. Gait (ed.), 1984, Oligonucleotide Synthesis; B. D. Hames &amp; S. J. Higgins (eds. ) can be consulted for DNA synthesis. The following handbooks provide useful practical information : Handbook of Fluorescent Probes (Molecular Probes, www. probes, com) : Handbook of Optical Filters for Fluorescence Microscopy (www. chroma, com); New High Throughput Technologies for DNA Sequencing and Genomics. Ed. Keith Mitchelson; Perspectives in Bioanalysis Vol2.
Example 1: Capture and sequence directly from unamplified genomic DNA Fragmentation of Phi X174 single stranded DNA
Fifty microlitre of single stranded Phi X174 DNA (1ug/ul, New England Biolabs, Ipswich, MA, USA) was mixed with 5OuI of nuclease-free water (Ambion, Austin, TX, USA) in a 1.5ml Eppendorf tube and placed on ice. The tube was raised to the sonicator probe (Digital Sonifier, Branson, Danbury, CT, USA) taking care not to touch the sides of the tube. A cycle consisted of sonication burst for 20 seconds at 10% amplitude followed by pause on ice for 40 seconds. This cycle was carried out for a total of 7 times. A 2% agarose gel analysis of the sample against suitable fragment size markers was performed to check that the average fragment size was around 300 bases (range 100-600 bases). If necessary, further cycles would be carried out until the gel analysis confirmed that the average fragment size was close to the expected value. This protocol can be applied to double stranded template as well.
Hybridisation of template on arrays
For each array to be addressed on the 8x 15K Agilent custom microarray slide an 8OuI hybridisation mix containing the following was prepared: 4OuI 2x hybridisation buffer (Agilent Technologies, Santa Clara, CA, USA), 500ng (1ul at 500ng/ul) sonicated single stranded Phi X174 genome ( New England Biolabs, Ipswich, MA, USA) or beta-globin DNA and 39ul nuclease free water (Ambion, Austin, TX, USA). The mix was placed on the Agilent slide gasket and the Agilent microarray slide was laid on top. The Agilent hybridization chamber was assembled as per standard instructions and placed at 650C for 18-20 hours inside an oven incubator fitted with an oscillating table. At the end of the incubation the slide sandwich was opened in Wash 1 (0.2% SDS / 1x SSC) at room temperature and then briefly dipped in Wash 1 and Wash 2 (1x SSC) also at room temperature. The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, N.J., USA). For double-stranded beta-globin DNA incubation can be carried out at 55°C.
Alternative Hybridisation protocol for unfragmented DNA
An array is spotted on in 50% DMSO at 55% humidity onto aminopropytrimethoxysilane (APTES) coated slides are blocked with for example sonicated salmon sperm DNA. Genomic DNA (32 μl of 100 ng/μl in 10 mM Tris-HCI/0.1 mM EDTA, pH 7.5) is then denatured in at 98°C for 3 min, followed by 40C for 10 min, and suspended in 8 μl of 20* SSPE. Hybridisation is carried out at 30 0C for 3 hours before the slide is washed twice at 37 0C for 10 min in 2x saline phosphate /EDTA (SSPE)(3M NAcCI/20mMNa2HPO4/20mM EDTA). 1st Ligation cycle on arrays
For the first cycle of ligation a set of 10mer degenerate oligos with only one base defined in position 10 was used. This base was a cleavable RNA base.
An 8OuI ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCI2, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB), 8ul 50% PEG 4000 (Fermentas, Vilnius, Lithuania), O.βul Triton X-100 (Sigma Aldrich, St Louis, MO, USA), 2ul (800U = 3200 Cohesive End U) T4 DNA ligase (NEB)1 2ul (20U) T4 PNK kinase (NEB), IOOpmoles of 10mer degenerate oligonucleotide RNA position 10 C-Cy3 (IBA, Gottingen, Germany) and G-Cy5 (IBA), 400 pmoles 10mer degenerate oligonucleotide RNA position 10 U- Oregon Green (IBA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 8OuI.
The mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top. The Agilent hybridization chamber was assembled and placed at 46 0C for 1 h in an oven incubator fitted .with an oscillating table. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1, Wash 2 (1x SSC) and Wash 3 (0.1x SSC). All wash solutions were kept at room temperature. The slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired. The scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 43; OG 60, Cy3 52, Alexa 50.
Cleavage reactions on arrays
The ligation products were cleaved by incubating the array with an 8OuI mix containing 68ul 1x TED buffer (1OmM Tris pH 7.0, 5mM EDTA, 2mM DTT) and 12ul (12U) RiboShredder RNAse blend (Epicentre, Madison, Wl, USA). The mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top. The Agilent hybridization chamber was assembled and placed at 37 0C for 1 h in an oven incubator fitted with an oscillating table. The slide was washed using the same reagents and wash protocol described above.
2nd Ligation cycle on arrays
For the second cycle of ligation a new set of 10mer randomer oligos (IBA) with a specific DNA base in position 9 were used. Position 10 is still an RNA base but, instead of being a specific base, it is now degenerate. A new ligation mix containing all the reagents described above, with the new oligos taking the place of the old ones, was prepared and placed onto the Agilent gasket slide. After another round of incubation and washes the slide was scanned once more.
Oligonucleotide Libraries with RNA in position 10
Figure imgf000025_0001
Example 2: Capture and sequence in the presence of non-targeted DNA (added complexity)
Hybridisation of Phi X174 template on arrays in the presence of competing M13 DNA
For each array to be addressed on the 8x 15K Agilent custom microarray slide a 35ul hybridisation mix containing the following was prepared: 17.5ul 2x hybridisation buffer (Agilent Technologies, Santa Clara, CA, USA), 200ng (0.2ul at 1ug/ul) sonicated single stranded Phi X174 template (New England Biolabs, Ipswich, MA, USA) and 4.3ug (17.3ul at 250ng/ul) sonicated M13mp18 DNA (NEB). The mix was placed on the Agilent slide gasket and the Agilent microarray slide was laid on top. The Agilent hybridization chamber was assembled as per standard instructions and placed at 65°C for 24 hours inside an Agilent oven incubator fitted with a rotating rack (rotisserie style). At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 and Wash 2 (1x SSC). The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, NJ. , USA).
Ligation cycle on arrays
For the ligation reaction a set of 10mer degenerate oligos with only one DNA base defined in position 10 and a cleavable RNA base in position 9 was used.
A 35ul ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCb, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB)1 3.5ul (1400U = 5600 Cohesive End U) T4 DNA ligase (NEB), 3.5ul (35U) T4 PNK kinase (NEB), 200pmoles of 10mer degenerate oligonucleotide RNA position 9 C- Cy5 (IBA, Gottingen, Germany) and G-Cy3 (IBA), 800 pmoles 10mer degenerate oligonucleotide RNA position 9 T-Oregon Green (!BA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 35ul.
The mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top. The Agilent hybridization chamber was assembled and placed at 46 0C for 1h in an Agilent oven incubator fitted with a rotating rack. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 , Wash 2 (1x SSC) and Wash 3 (0.1x SSC). All wash solutions were kept at room temperature. The slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired. The scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 56; OG 67, Cy3 55, Alexa 50.
Hybridisation of Phi X174 template on arrays in the presence of competing human genomic DNA
For each array to be addressed on the 8x 15K Agilent custom microarray slide a 40ul hybridisation mix containing the following was prepared: 2OuI 2x hybridisation buffer (Agilent Technologies, Santa Clara, CA, USA), 1ug (1ul at 1ug/ul) sonicated single stranded Phi X174 template (New England Biolabs, Ipswich, MA, USA), 1ug (5.2ul at 190ng/ul) sonicated human genomic DNA (Promega, Madison, Wl, USA)) and 13.8ul nuclease-free water (Ambion, Austin, TX, USA). The mix was denaturated at 98°C for 5 minutes. After a pulse centrifugation the mix was placed on the Agilent slide gasket and the Agilent microarray slide was laid on top. The Agilent hybridization chamber was assembled as per standard instructions and placed at 65°C for 24 hours inside an Agilent oven incubator fitted with a rotating rack (rotisserie style). At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 and Wash 2 (1x SSC). The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, NJ. , USA).
Ligation cycle on arrays
For the ligation reaction a set of 10mer degenerate oligos with only one DNA base defined in positron 10 and a cleavable RNA base in position 9 was used.
A 4OuI ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCI2, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB), 4ul (1600U = 6400 Cohesive End U) T4 DNA ligase (NEB), 4ul (40U) T4 PNK kinase (NEB), 400pmoles of 10mer degenerate oligonucleotide RNA position 9 C-Cy5 (IBA, Gottingen, Germany) and G-Cy3 (IBA), 1600 pmoles 10mer degenerate oligonucleotide RNA position 9 T- Oregon Green (IBA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 40ul.
The mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top. The Agilent hybridization chamber was assembled and placed at 46 °C for 1h in an Agilent oven incubator fitted with a rotating rack. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 , Wash 2 (1x SSC) and Wash 3 (0.1x SSC). All wash solutions were kept at room temperature. The slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired. The scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 56; OG 67, Cy3 55, Alexa 50.
Oligonucleotide Libraries with RNA in position 9
Figure imgf000027_0001
The above labels can be replaced with Atto Dyes. In particular Atto 647N is a good substitute for Cy5 and Atto 488 is a good substitute for Oregon Green. In addition 8 mers and 9mers can be used instead of 10 mers. In addition to get a brighter signal suitable combinations fo wavelengths from a Qdot sampler kit (Invitrogen) can be used (QD525, QD565, QD585, QD605, QD655, QD705; All at 1uM, all streptavidin conjugates). Alternatively, Streptavidin-Phycoerythrin conjugate (SAPE) and SAPE-Alexa conjugates (Invitrogen) can be used: streptavidin, Alexa Fluor(r) 647-R-phycoerythrin conjugate; streptavidin, Alexa Fluor(r) 610-R-phycoerythrin conjugate; streptavidin, Alexa Fluor(r) 680-R-phycoerythrin conjugate streptavidin; R-phycoerythrin conjugate (SAPE). These are not only brighter but also only require a single wavelength to be used for excitation. Because these labels are available as streptavidin conjugates, they can be linked to biotinylated oligos (where biotin has replaced the label in the above table) with a different label used for each library. Fluospheres (Invitrogen) can also be used. The above type of labels can be mixed and matched.
Example 3: Capture and Sequence of Double Stranded DNA
Fragmentation of Betaglobin PCR product
The 400 bp PCR product of a betaglobin exon (BGE3,) was fragmented by DNase I digestion. The ΘOμl reaction mix comprised the following: 17.4ug of PCR product BGE3 (36ul at 438ng/ul) was mixed with 9.5ul 5mM CaCI2 (Fluka, Gillingham, UK), 1ul of diluted DNase I enzyme (0.025U/ul New England Biolabs, Ipswich, MA, USA) and 43.5ul nuclease-free water (Ambion, Austin, TX, USA). The mix was incubated at room temperature for 40 minutes. Then 1OuI of 0.1% SDS were added to the mix and incubated for IOmins at 950C in a Thermomixer Comfort (Eppendorf, Hamburg, Germany) to inactivate the DNase I enzyme. The DNA was ethanol precipitation protocol according to Sambrook & Russell, Molecular Cloning, 2001. The DNA pellet was dissolved in 45μl nuclease free water and its concentration checked on the NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).
Hybridisation and sequencing was done as for example 1. A hybridisation temperature of 55 °C can be used. Protocol for oligonucleotide library containing RNA in position 9
Hybridisation of Betaglobin PCR product template on arrays
For each array to be addressed on the 8x 15K Agilent custom microarray slide a 8OuI hybridisation mix containing the following was prepared: 4OuI 2x hybridisation buffer (Agilent Technologies, Santa Clara, CA, USA), 568ng (2ul at 284ng/ul) fragmented betaglobin PCR product, and 38ul nuclease-free water (Ambion, Austin, TX, USA). The mix was denaturated at 950C for 7 minutes. After a pulse centrifugation the mix was placed on ice. The Agilent hybridization chamber and slide gasket had been pre-warmed to 550C . The hybridization mix was placed on the Agilent slide gasket and the Agilent microarray slide was laid on top. The Agilent hybridization chamber was assembled as per standard instructions and placed at 550C for 22 hours inside an oven incubator fitted with an oscillating table. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1 and Wash 2 (1x SSC). The slide was then dried using a compressed air canister (Dust Off, Falcon Safety Products, Branchburg, N.J., USA).
Ligation cycle on arrays
For the ligation reaction a set of 10mer degenerate oligos with only one DNA base defined in position 10 and a cleavable RNA base in position 9 was used.
An 8OuI ligation mix containing the following was prepared: 1x T4 ligase buffer (5OmM Tris-HCI, 1OmM MgCI2, 1OmM DTT, 1mM ATP, 25ug/ml BSA, pH 7.5, NEB), 8ul (3200U = 12800 Cohesive End U) T4 DNA ligase (NEB), 8ul (80U) T4 PNK kinase (NEB), 400pmoles of 10mer degenerate oligonucleotide RNA position 9 C-Cy5 (IBA, Gottingen, Germany) and G-Cy3 (IBA), 1600 pmoles 10mer degenerate oligonucleotide RNA position 9 T- Oregon Green (IBA) and A-Alexa594 (IBA). Nuclease-free water (Ambion) was added as required to achieve a final mix volume of 8OuI.
The mix was placed on an Agilent gasket slide and then the Agilent microarray slide was placed on top. The Agilent hybridization chamber was assembled and placed at 46 0C for 1h in an oven incubator fitted with an oscillating table. At the end of the incubation the slide sandwich was prised open in Wash 1 (0.2% SDS / 1x SSC) and then briefly dipped in Wash 1, Wash 2 (Ix SSC) and Wash 3 (O.ix SSC). All wash solutions were kept at room temperature. The slide was air dried and scanned on the ProscanArray 4-color scanner (Perkin Elmer, Waltham, MA, USA) and scans for each fluorophore were acquired. The scanner settings were: constant laser power at 80% for all dyes and the following PMT values: Cy5 58; OG 71 , Cy3 65, Alexa 56.
Example 4: Capture and Sequence of RNA
Hybridisation is undertaken for 66h at 65°C in 2x hybridization buffer (10 x SSPE, 10x Denhardts, 1OmM EDTA, 0.2% SDS). Wash once with 1M NaCI 1OmM TrisCI pH 7.5 and 1mM EDTA. Wash twice at 20 "C for 15 minutes with 1xSSC/0.1 % SDS
Example 5: FLAP capture
See description of Figure 1 and 2
Example 6: Whole genome amplification and capture
1-10ng of genomic DNA is taken and MDA is performed according to manufacturers protocol to produce either 10ug (Qiagen; REPLI-g™ Mini kit) or 40ug (Qiagen; REPLI-g™ Midi kit) of genomic DNA. The capture reaction is then performed as described in examples 1 ,2 and 3.Extra measures for background reduction as described in the specification and bright labels and/or signal amplification can be used.
Example 7: Extracting DNA from a single cell and capture after Multiple Displacement Amplification
DNA from sperm is extracted and two rounds of MDA is performed before sequencing
Example 8: Extracting DNA from a single cell and direct anaysis by capture
RNA is directly extracted onto a nano-array of capture probes containing probes complementary to the PoIyA sequence and adjacent variable sequence.Also DNA or RNA is selectively extracted directly onto a surface and becomes attached to the surface in a random manner
Example 9: Capture facilitated by extension
Each 30 μl extraction contained 300-500 ng genomic DNA, 1x H-Buffer, which contains a polymerase, dNTPs and biotinylated dNTPs (Qiagen, Cat. # 4340004, 2x initial concentration) and DNAse free water, to denature the DNA at 95°C for 15 min. 20 min incubation at 640C during which the allele-specific oligos anneal and are extended, incorporating dNTPs. The copy can then be sequenced. Or a different primer can be used to sequence the captured template.
Example 10: Capture in presence of Helper DNA
Sample DNA (optionally amplified) fragmented genomic DNA (up to 4ug) is added to 45 mM Tris-HCI at pH 8.8, 11 mM ammonium sulphate, 4.5 mM MgCI2, 6.7 mM 2-mercaptoethanol, 4.4 μM EDTA, plus 2μg/mL single- stranded high-molecular-weight herring sperm DNA and 1.5 μM helper oligo in a volume of 2OuI. Up to 16ug of DNA can be processed in a 8OuI hybridization volume. The hybridization solution is denatured at 96 0C (preferably directly on the array for 75 seconds to 150 seconds. A thermocycler capabable of taking slides is used such as the G-Storm is used and temperature is stepped down from 75 to 66 to 65 over 3 minutes and held at 65 for a minimum of 2 minutes. Alternatively, after step down the temperature can be held for a longer. Array is washed at room temperature with 50 μL 45 mM Tris-HCI at pH 8.8, 11 mM ammonium sulphate, 4.5 mM MgCI2, 6.7 mM 2-mercaptoethanol, 4.4 μM EDTA, containing 10 μg/mL BSA (Ultra Pure, Ambion), and then with the same composition solution for 1 min at 65°C. In general the helper oligonucleotide targets the opposite strand of the duplex than the array probe, The helper oligonucleotide may be longer than the array probe or may form more stable base-pairs such C-5 Propyne modified DNA, LNA or PNA. Also the helper probe may intiate a solution based primer extension reaction. The helper oligo may span the region of interest with the array oligo lying at the centre. Preferably any unreacted reactor oligos are removed before hybridation. This can be done by passing the DNA through a chromospin or other column with appropriate cut-off. Alternatively, where allele- specific capture is required because a specific haplotype is to be sequenced the helper oligo may not be complementary to the array oligo but may be a probe complementary to the non-targeted allele.
Example 11: Clonal Bead Amplification and Capture Clonal bead amplification of human genomic DNA is carried out as descried in Porreca GJ, Shendure J, Church GM. Polony DNA sequencing. Curr Protoc MoI Biol. 2006 Nov; Unit 7.8. Basic protocol 1 and 2 described in this reference are carried out. Basic Protocol 3 is optional. The recovered beads are then resuspended in hybridization buffer and hybridized to the array. Optionally, if high background is obtained, the beads and/or the array are blocked before hybridisation. The blocking of the bead can be done by biocytin, BSA, Caesin, Denhardts solution. Reagent kits are available from Dover Systems (Salem, New Hampshire). In order to obtain efficient hybridization the magnetic bead can be subjected to switching magnetic field to facilitate mixing and release of beads bound to the wrong surface attached-probes.
Example 12: Capture and Colony Amplification
Aminated capture and amplification oligonucleotides are coated onto a BTA coated glass slide (see Turcatti et al). The capture reaction is carried out as described in Example 1 , 2 and 3. The stringency of the wash required can be established empirically by testing increasing temperatures of wash components. A wash temperature range between 4 and 90 °C can be investigated. Colony amplification is then carried out according to Turcatti et al as above.
References
The references are incorporated herein to represent the cited documents in their entirety so that the cuurent state of the art relatd to the field of invention can be practiced by those with basic skills in the art.
Next-Generation Sequencing of Plasma/Serum DNA: An Emerging Research and Molecular Diagnostic Tool. Lo YM, Chiu RW.CIin Chem.2009 Feb 20. [Epub ahead of print]
Next-Generation Sequencing: From Basic Research to Diagnostics.Voelkerding KV, Dames SA, Durtschi JD.CIin Chem. 2009 Feb 26. [Epub ahead of print]
Fluorescent Detection and Isolation of DNA Variants Using Stabilized RecA-Coated Oligonucleotides. Michael C. Rice,1 Brandy M. Heckman,1 Yi ϋu,2 and Eric B. Kmieci. Genome Res. 2004 January; 14(1): 116-125.
Z. Guo, L. Hood, M. Malkki, and E. W. Petersdorf. Long-range multilocus haplotype phasing of the mhc. Proceedings of the National Academy of Sciences of the United States of America, 103(18):6964-6969, May 2006
Gnirke et al . Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotech, 27(2): 182-189, February 2009.
Y.Z. Xu and E.T. Kool, Tetrahedron Lett., 1997, 38, 5595-5598.
Dressman D, Yan H, Traverso G, Kinzler KW, and Vogelstein B. Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations.Proceedings of the National Academy of Sciences of the United States of America 100(15):8817-222003 JuI 22
Braslavsky et al. Sequence information can be obtained from single DNA molecules. PNAS 100, 3960-4 (2003)) Callow et al Nucleic Acids Research 2004, 32: e21 and Dahl et al Nucleic Acids Research 2005, 33: e71
BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Milan Fedurco, Anthony Romieu, Scott Williams, lsabelle Lawrence, and Gerardo Turcatti. Nucleic Acids Res. 2006; 34(3): e22)
Nagy et al, Tissue Antigens 2007, p 176-180.
Jeffreys AJ and CA May, Genome Research 2003 13: 2316-2324
Robert B Browni ,2 and Julie Audeti ,2* Current techniques for single-cell lysis J R Soc Interface. 2008 October 6; 5(Suppl.2): 131-138.
Single-Cell Chemical Lysis Method for Analyses of Intracellular Molecules Using an Array of Picoliter-Scale Microwells Yasuhiro Sasuga, Tomoyuki Iwasawa, Kayoko Terada, Yoshihiro Oe, Hiroyuki Sorimachi, Osamu Ohara, Yoshie Harada Analytical Chemistry 2008 80 (23), 9141-9149 Mitra, R.D. et al. Digital genotyping and haplotyping with polymerase colonies. Proc. Natl. Acad. Sci. USA 100, 5926-5931 (2003)
F. Dahl M. GullbergJ. Stenberg U. Landegren, and M. Nilsson Multiplex amplification enabled by selective circularization of large sets of genomic DNA fragmentsNucleic Acids Res 2005 33:e71
C. Adessi, G. Matton, G. Ayala, G. Turcatti, J.-J. Mermod, P. Mayer, and E. Kawashima. Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms. Nucleic Acids Res., October 15, 2000; 28(20): e87 - e87.
M. H. Shapero, K. K. Leuther, A. Nguyen, M. Scott, and K. W. Jones. SNP Genotyping by Multiplexed Solid- Phase Amplification and Fluorescent Minisequencing Genome Res., November 1, 2001; 11(11): 1926 - 1934.
Mir KU, Southern EM. Sequence variation in genes and genomic DNA: methods for large-scale analysis. Annu. Rev. Genomics Hum. Genet. (2000) 1:329-360
Blazej RG, Kumaresan P, Mathies RA. Microfabricated bioprocessor for integrated nanoliter-scale Sanger DNA sequencing. Proc. Natl Acad. Sci. USA (2006) 103:7240-7245
Fredlake CP, Hert DG, Mardis ER, Barron AE. What is the future of electrophoresis in large-scale genomic sequencing? Electrophoresis (2006) 27:3689-3702
Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM. Accurate multiplex polony sequencing of an evolved bacterial genome. Science (2005) 309:1728- 1732
Frazer KA, Eskin E, Kang HM, Bogue MA, Hinds DA, Beilharz EJ, Gupta RV, Montgomery J, Morenzoni MM, Nilsen GB, et al. A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature (2007) 448:1050-1053
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature (2005) 437:376-380
Pihlak A, Bauren G, Hersoug E, Lδnnerberg P, Metsis A, Linnarsson S. Rapid genome sequencing with short universal tiling probes. Nat. Biotechnol. (2008) 26:676-684]
Metzker ML. Emerging technologies in DNA sequencing. Genome Res. (2005) 15:1767-1776.
Epstein JR, Ferguson JA, Lee KH, Walt DR. Combinatorial decoding: an approach for universal DNA array fabrication J. Am. Chem. Soc. (2003) 125:13753-13759
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature (2008) 452:872-876
Ecker DJ, Vickers TA, Hanecak R, Driver V, Anderson K. Rational screening of oligonucleotide combinatorial libraries for drug discovery. Nucleic Acids Res. (1993) 21 :1853-1856.
Stavis SM, Edel JB, Li YG, Samiee KT, Luo D, Craighead HG. Detection and identification of nucleic acid engineered fluorescent labels in submicrometre fluidic channels. Nanotechnology (2005) 16:S314-S323
Li Y, Cu YT, Luo D. Multiplexed detection of pathogen DNA with DNA-based fluorescence nanobarcodes. Nat Biotechnol. (2005) 23:885-88
Mauger F, Jaunay O, Chamblain V, Reichert F, Bauer K1 Gut IG, Gelfand DH. SNP genotyping using alkali cleavage of RNA/DNA chimeras and MALDI time-of-flight mass spectrometry. Nucleic Acids Res. (2006) 34:e18
Tian J, Gong H, Sheng N, Zhou X, Gulari E, Gao X, Church G. Accurate multiplex gene synthesis from programmable DNA microchips. Nature (2004) 432:1050-1054
Lee HJ, Wark AW, Li Y, Corn RM. Fabricating RNA microarrays with RNA-DNA surface ligation chemistry. Anal. Chem. (2005) 77:7832-7837
Mir KU. Ultrasensitive RNA profiling: counting single molecules on microarrays. Genome Res. (2006) 16:1195— 1197 Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I1 Causey M, Colonell J, Dimeo J, Efcavitch JW, et al. Single-molecule DNA sequencing of a viral genome. Science (2008) 320:106-109
Deng JY, Zhang XE, Mang Y, Zhang ZP, Zhou YF, Liu Q1 Lu HB, Fu ZJ. Oligonucleotide ligation assay-based DNA chip for multiplex detection of single nucleotide polymorphism. Biosens. Bioelectron (2004) 19:1277-1283.
Fedurco M, Romieu A, Williams S, Lawrence I, Turcatti G. BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res. (2006) 34:e22
Shchepinov MS, Denissenko MF, Smylie KJ, Worl RJ, Leppin AL, Cantor CR, Rodi CP. Matrix-induced fragmentation of P3-N5' phosphoramidate-containing DNA: high-throughput MALDI-TOF analysis of genomic sequence polymorphisms. Nucleic Acids Res. (2001) 29:3864-72.
Landegren U, Kaiser R, Sanders J, Hood L. A iigase mediated gene detection technique. Science (1988) 241:1077-80
Cowie S, Drmanac S, Swanson D, Delgrosso K, Huang S, du Sart D, Drmanac R, Surrey S, Fortina P. Identification of APC gene mutations in colorectal cancer using universal microarray-based combinatorial sequencing-by-hybridization. Hum. Mutat. (2004) 24:261-271
Gunderson KL, Huang XC, Morris MS, Lipshutz RJ1 Lockhart DJ, Chee MS. Mutation detection by ligation to complete n-mer DNA arrays. Genome Res. (1998) 8:1142-53
Housby JN, Southern EM. Fidelity of DNA ligation: a novel experimental approach based on the polymerisation of libraries of oligonucleotides. Nucleic Acids Res. (1998) 26:4259-4266
Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M1 et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. (2000) 18:630-634.
Valouev A, lchikawa J, Tonthat T, Stuart J, Ranade S1 Peckham H, Zeng K1 Malek JA, Costa G, McKeman K, et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res. (2008) 18:1051-63.
Patil N1 Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science (2001 ) 294:1719-1723.
Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M1, Hui H, et al. Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat. Methods (2004) 1 :109-111
Wang DG, Fan JB, Siao CJ1 Berno A1 Young P1 Sapolsky R, Ghandour G1 Perkins N, Winchester E1 Spencer J, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science (1998) 280:1077-1082
Whiteford N, Haslam N1 Weber G, Prugel-Bennett A1 Essex JW1 Roach PL, Bradley M, Neylon C. An analysis of the feasibility of short read sequencing. Nucleic Acid Res. (2005) 33:e171.
Ju J, Kim DH1 Bi L, Meng Q, Bai X, Li Z, Li X, Marma MS, Shi S, Wu J, et al. Four-color DNA sequencing by synthesis using cleavable fluorescent nucleotide reversible terminators. Proc. Natl Acad. Sci. USA (2006) 103:19635-19640
Turcatti G, Romieu A, Fedurco M, Tairi AP. A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis. Nucleic Acids Res. (2008) 36:e25
Albert TJ, MoIIa MN, Muzny DM, Nazareth L, Wheeler D1 Song X1 Richmond TA1 Middle CM1 Rodesch MJ, Packard CJ, et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods (2007) 4:903-905
Hodges E, Xuan Z, Balija V, Kramer M, MoIIa MN1 Smith SW1 Middle CM, Rodesch MJ, Albert TJ, Hannon GJ, et al. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. (2007) 39:1522-7.
Dahl F, Gullberg M1 Stenberg J, Landegren U, Nilsson M. Multiplex amplification enabled by selective circularization of genomic DNA fragments. Nucleic Acid Res. (2005) 33:e71
Porreca GJ, Zhang K, Li JB1 Xie B1 Austin D1 Vassallo SL, LeProust EM, Peck BJ, Emig CJ, Dahl F1 et al. Multiplex amplification of large sets of human exons. Nat. Methods (2007) 4:931-936. Steemers FJ, Chang W, Lee G, Barker DL, Shen R, Gunderson KL. Whole-genome genotyping with the single- base extension assay. Nat. Methods (2006) 3:31-33.
Wellcome Trust Case Control Consortium. Genome-wide association study of 14 000 cases of seven common diseases and 3000 shared controls. Nature (2007) 447:661-678.
Mir KU, Qi H, Salata O1 Scozzafava G.Nucleic Acids Res. 2009 Jan;37(1):e5. Epub 2008 Nov 16. Sequencing by Cyclic Ligation and Cleavage (CycLiC) directly on a microarray captured template.
Mir KU, Southern EM. Sequence variation in genes and genomic DNA: methods for large-scale analysis.Annu Rev Genomics Hum Genet. 2000;1 :329-60.
David R. Bentley et al.Accurate whole human genome sequencing using reversible terminator chemistry Nature 456, 53-59 (6 November 2008)
Methods of Target Enrichment, Gormley NA and West JW WO 2007/057652
RABANI, Ely, Michael; (WO/1996/027025) DEVICE, COMPOUNDS, ALGORITHMS, AND METHODS OF MOLECULAR CHARACTERIZATION AND MANIPULATION WITH MOLECULAR PARALLELISM
WHITE, Wanda, L.B. (WO/2004/092341 ) DUAL HYBRIDIZATION OF COMPLEX NUCLEIC ACID SAMPLES FOR SEQUENCING AND SINGLE-NUCLEOTIDE POLYMORPHISM IDENTIFICATION
(WO/2006/117541 ) DEVICES AND PROCESSES FOR ANALYSING INDIVIDUAL CELLS. SOUTHERN, Edwin; (GB).MEULEMAN, Wouter; (GB).LUEERSSEN, Dietrich, Wilhelm, Karl; (GB).MILNER, Natalie; (GB).
(WO/2005/040425) NUCLEIC ACID SEQUENCING METHODS. SHCHEPINOV, Mikhail, S.; (GB); MIR, KaMm; (GB).
(WO/2002/074988) ARRAYS AND METHODS OF USE. MIR, Kalim; (GB).

Claims

Claims
1 A method comprising the steps of i) selection of one or more entities from a plurality of entities, ii) characterisation of selected entities
2 A method of characterising one or more molecules from a mixture comprising the steps of
• Taking one or more population(s) of molecules
• Selecting one or more molecule(s) from the population(s)
• Characterising one or more selected molecule(s)
3 A method according to 2 where more than one population is selected, the different populations are differentially tagged
4 A method according claim 1 and 2 where characterisation comprises enumerating molecules
5 A method according claim 1 and 2 where characterisation comprises the sequencing of molecules
6 A method according to 4 and 5 where polynucleotides are characterised
7 A method according to claim 1 and 2 where the selection is carried out on an array
8 A method according to 7 where the array comprises capture probes
9 A method according to 8 where the capture probes serve as primers for extension
10 A method according to 7 where the capture probe is distinct to the primer for extension
11 A method according 1 and 2 where primer extension is undertaken on the selected molecules
12 A method according to 11 where wherein the primer extension characterises the molecules
13 A method where according to 7 the array is a spatially addressable array
14 A method according to 7 and 8 where the target molecules are attached to the selector probes at high stringency
15 A method according to 14 where capture temperature is selected from 50-550C, 55-600C, 60- 650C, 65-7O0C or 70-75°C or a combination thereof
16 A method according to 8 where capture is thermocycled
17 A method according to 7 where the target molecules are attached to a bead, nanoparticle, or a DNA nano-ball
18 A method according to 17 where the bead binds to the array at a location determined by the sequence of molecules on its surface
19 A method according to 4 and 5 where both selection and sequencing and/or enumeration are carried out directly on an array 0 A method according to 8 where the captured molecules are fixed to the surface 1 A method according to 20 where fixing is by UV irradiation 2 A method according to 20 where fixing is via a flap mediated reaction 23 A method for selectively sequencing portions of one or more complex genome sample(s) comprising the steps:
(i) Obtaining sample genomic DNA and an array of selection probes complementary to the portions of the genome to be selected
(ii) Optionally amplifying the sample genome (s) in a non-locus specific manner
(iii) preparing the sample DNA so that it is amenable to hybridisation including but not limited to fragmentation, digestion and/or denaturation
(iv) hybridising the sample DNA to the array of selection probes under conditions that deselect hybridisation of non-targeted DNA
(v) optionally fixing the selected sample DNA to the array probes (vi) carrying out wash steps to remove de-selected DNA
(vi) carrying out sequencing of the selected sample DNA while it remains in association with the array probes; wherein sequencing of the selected sample DNA may comprise but is not limited to the steps
(a) hybridisation of four libraries of oligonucleotides to the selection probe/ sample DNA complexes across the array, where each library comprises a defined base, a label exclusively identifying the defined base and a number of non-defined bases
(b) carrying out a template-directed ligation reaction using the selection probe as a primer and the sample DNA as a template where optionally there is no reagent exchange between steps an and b
(c) carrying out one or more wash steps
(d) detection of label across the array
(e) removal of extended portion of the primer to regenerate the non-extended primer- template duplex
(f) hybridisation of four libraries of oligonucleotides to the selection probe/ sample DNA complexes across the array, where each library comprises a defined base, a label exclusively identifying the defined base and a number of non-defined bases, wherein the defined base is different from any previous defined base
(g) repeat b-f 4 A method according 23 to where a-g is replaced with sequencing by synthesis 5 A method according to 23 where a-g is replaced with sequencing by hybridisation 6 A method according to 5 where if more than one genome is to be sequenced optionally tagging each sample genomic DNA so that one can be distinguished from another and performing digital detection and analysis
PCT/GB2009/000601 2008-03-04 2009-03-04 Multiplex selection and sequencing WO2009109753A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0804024A GB0804024D0 (en) 2008-03-04 2008-03-04 Multiplex selection
GB0804024.8 2008-03-04

Publications (2)

Publication Number Publication Date
WO2009109753A2 true WO2009109753A2 (en) 2009-09-11
WO2009109753A3 WO2009109753A3 (en) 2010-03-11

Family

ID=39315935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2009/000601 WO2009109753A2 (en) 2008-03-04 2009-03-04 Multiplex selection and sequencing

Country Status (2)

Country Link
GB (1) GB0804024D0 (en)
WO (1) WO2009109753A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8741564B2 (en) 2011-05-04 2014-06-03 Htg Molecular Diagnostics, Inc. Quantitative nuclease protection assay (QNPA) and sequencing (QNPS) improvements
US20150361422A1 (en) * 2014-06-16 2015-12-17 Agilent Technologies, Inc. High throughput gene assembly in droplets
WO2018183897A1 (en) * 2017-03-31 2018-10-04 Grail, Inc. Higher target capture efficiency using probe extension
US10746734B2 (en) 2015-10-07 2020-08-18 Selma Diagnostics Aps Flow system and methods for digital counting
US10982351B2 (en) 2016-12-23 2021-04-20 Grail, Inc. Methods for high efficiency library preparation using double-stranded adapters
CN112840036A (en) * 2018-08-15 2021-05-25 深圳华大生命科学研究院 Gene chip and preparation method thereof
US11035854B2 (en) 2016-07-29 2021-06-15 Selma Diagnostics Aps Methods in digital counting
US11274344B2 (en) 2017-03-30 2022-03-15 Grail, Inc. Enhanced ligation in sequencing library preparation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993021340A1 (en) * 1992-04-22 1993-10-28 Medical Research Council Dna sequencing method
WO2001023610A2 (en) * 1999-09-29 2001-04-05 Solexa Ltd. Polynucleotide sequencing
US20020081588A1 (en) * 1998-06-24 2002-06-27 Therasense, Inc. Multi-sensor array for electrochemical recognition of nucleotide sequences and methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993021340A1 (en) * 1992-04-22 1993-10-28 Medical Research Council Dna sequencing method
US20020081588A1 (en) * 1998-06-24 2002-06-27 Therasense, Inc. Multi-sensor array for electrochemical recognition of nucleotide sequences and methods
WO2001023610A2 (en) * 1999-09-29 2001-04-05 Solexa Ltd. Polynucleotide sequencing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MIR KALIM U ET AL: "Sequencing by Cyclic Ligation and Cleavage (CycLiC) directly on a microarray captured template." NUCLEIC ACIDS RESEARCH JAN 2009, vol. 37, no. 1, January 2009 (2009-01), page e5, XP002542688 ISSN: 1362-4962 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8741564B2 (en) 2011-05-04 2014-06-03 Htg Molecular Diagnostics, Inc. Quantitative nuclease protection assay (QNPA) and sequencing (QNPS) improvements
US20150361422A1 (en) * 2014-06-16 2015-12-17 Agilent Technologies, Inc. High throughput gene assembly in droplets
WO2015195257A1 (en) * 2014-06-16 2015-12-23 Agilent Technologies, Inc. High throughput gene assembly in droplets
CN106460231A (en) * 2014-06-16 2017-02-22 安捷伦科技有限公司 High throughput gene assembly in droplets
US10746734B2 (en) 2015-10-07 2020-08-18 Selma Diagnostics Aps Flow system and methods for digital counting
US11693001B2 (en) 2015-10-07 2023-07-04 Selma Diagnostics Aps Flow system and methods for digital counting
US11035854B2 (en) 2016-07-29 2021-06-15 Selma Diagnostics Aps Methods in digital counting
US10982351B2 (en) 2016-12-23 2021-04-20 Grail, Inc. Methods for high efficiency library preparation using double-stranded adapters
US11274344B2 (en) 2017-03-30 2022-03-15 Grail, Inc. Enhanced ligation in sequencing library preparation
WO2018183897A1 (en) * 2017-03-31 2018-10-04 Grail, Inc. Higher target capture efficiency using probe extension
US11118222B2 (en) 2017-03-31 2021-09-14 Grail, Inc. Higher target capture efficiency using probe extension
CN112840036A (en) * 2018-08-15 2021-05-25 深圳华大生命科学研究院 Gene chip and preparation method thereof

Also Published As

Publication number Publication date
WO2009109753A3 (en) 2010-03-11
GB0804024D0 (en) 2008-04-09

Similar Documents

Publication Publication Date Title
US11827927B2 (en) Preparation of templates for methylation analysis
US11866780B2 (en) Nucleic acid sample enrichment for sequencing applications
US20190024141A1 (en) Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers
US20080274904A1 (en) Method of target enrichment
US10072287B2 (en) Methods of targeted sequencing
US20070141604A1 (en) Method of target enrichment
US20080241831A1 (en) Methods for detecting small RNA species
EP1256632A2 (en) High throughput polymorphism screening
WO2009109753A2 (en) Multiplex selection and sequencing
EP1645640A2 (en) Methods for amplifying and analyzing nucleic acids
WO2014101655A1 (en) Method for analyzing high-throughput nucleic acid and application thereof
US10174368B2 (en) Methods and systems for sequencing long nucleic acids
US20060240431A1 (en) Oligonucletide guided analysis of gene expression
EP1874957A1 (en) Method for amplification
Park et al. DNA Microarray‐Based Technologies to Genotype Single Nucleotide Polymorphisms
JP2007295855A (en) Method for producing sample nucleic acid for analyzing nucleic acid modification and method for detecting nucleic acid modification using the same sample nucleic acid
So Universal Sequence Tag Array (U-STAR) platform: strategies towards the development of a universal platform for the absolute quantification of gene expression on a global scale

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09718218

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09718218

Country of ref document: EP

Kind code of ref document: A2