WO2009055508A1 - A method and system for obtaining ordered, segmented sequence fragments along a nucleic acid molecule - Google Patents

A method and system for obtaining ordered, segmented sequence fragments along a nucleic acid molecule Download PDF

Info

Publication number
WO2009055508A1
WO2009055508A1 PCT/US2008/080843 US2008080843W WO2009055508A1 WO 2009055508 A1 WO2009055508 A1 WO 2009055508A1 US 2008080843 W US2008080843 W US 2008080843W WO 2009055508 A1 WO2009055508 A1 WO 2009055508A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
polymerase
acid molecule
acceptor
fret
Prior art date
Application number
PCT/US2008/080843
Other languages
French (fr)
Inventor
Susan Hardin
Mitsu Sreedhar Reddy
Tommie Lloyd Lincecum, Jr.
Anelia Kraltcheva
Uma Nagaswamy
Alok N. Bandekar
Hongyi Wang
Original Assignee
Life Technologies Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Life Technologies Corporation filed Critical Life Technologies Corporation
Priority to JP2010531213A priority Critical patent/JP2011505119A/en
Priority to EP08843207A priority patent/EP2203568A1/en
Publication of WO2009055508A1 publication Critical patent/WO2009055508A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation

Definitions

  • TITLE A METHOD AND SYSTEM FOR OBTAINING ORDERED
  • nucleic acids More particularly, provided herein are methods, systems and compositions suitable for realtime single molecule sequencing using ordered, segmented sequence fragments along a nucleic acid molecule.
  • DNA deoxyribonucleic acid
  • first-generation methods require large quantities of the target DNA molecule to be sequenced using time and resource intensive processes.
  • Maxam-Gilbert sequencing involves the chemical cleavage of end-labeled fragments of DNA. The resulting fragments are then size separated by gel electrophoresis, and the sequence of the original end-labeled fragments is determined by analyzing the pattern of fragments produced by the gel. Read lengths using this approach are typically limited to approximately 500 nucleotides. Furthermore, such methods are lengthy, and frequently require amplification of the target DNA to obtain sufficient amounts of starting material.
  • DNA sequencing methodologies generally involve monitoring the activity of a sequencing enzyme, such as DNA polymerase, as it replicates a test DNA molecule by polymerizing monomelic subunits, such as dNTPs, to extend a primer into a newly synthesized DNA strand that complements the test molecule of interest.
  • the polymerization products are analyzed after the sequencing reaction has been terminated, thereby adding to the length of the process.
  • Sanger-dideoxy sequencing involves elongation of an end-labeled nucleotide primer with random incorporation of chain terminating dideoxy nucleotides in four separate DNA polymerase reactions.
  • the extension products must be size separated by gel electrophoresis and the nucleotide sequence may be determined from analyzing the pattern of fragments in the gel.
  • the use of four different fluorescently labeled dideoxynucleotides enables the sequencing reactions to be size separated in a single gel lane, facilitating automated sequence determination. Read lengths utilizing this approach are limited to approximately 1000 nucleotides, and the process can take a few hours to half a day to perform.
  • DNA strands thereby facilitating accurate assembly of contiguous extended nucleic acid sequences.
  • these methods readily facilitate high throughput sequencing in parallel, and ultimately allow the simultaneous sequencing of an entire genome rapidly and cheaply.
  • methods for sequencing at least a portion of a nucleic acid molecule in real time or near real time comprising the steps of displaying a nucleic acid molecule; manipulating the nucleic acid molecule to form one or more polymerase-accessible priming sites along the length of the nucleic acid molecule, wherein the one or more priming sites are separated from each other by a length of nucleotides sufficient to permit independent detection and resolution of sequencing activity occurring at each priming site by a detection system; contacting at least a portion of the nucleic acid molecule with a polymerase solution and one or more detectably labeled components under such conditions that extension occurs from at least one priming site; monitoring signals emitted during the extension reaction by at least one detectably labeled component; and analyzing the signals in real or near real time to determine the sequence of at least a portion of the nucleic acid molecule.
  • At least one of the detectably labeled components comprises a Forster resonance energy transfer (FRET) donor.
  • FRET Forster resonance energy transfer
  • At least one of the detectably labeled components comprises a FRET acceptor.
  • At least one of the detectably labeled components comprises both a FRET donor and a FRET acceptor.
  • At least one of the detectably labeled components is a polymerase operably linked to a FRET donor.
  • the signals emitted during the extension reaction are a result of
  • the signals emitted during the extension reaction are signals resulting from FRET between a FRET donor and the FRET acceptor.
  • the signals emitted during the extension reaction are FRET signals resulting from energy transfer between at least one intercalated dye molecule and at least one nucleotide labeled with a FRET acceptor.
  • At least one of the detectably labeled components is an intercalating dye.
  • At least one detectably labeled component is a nucleotide operably linked to a FRET acceptor.
  • the FRET acceptor is attached to a portion of the nucleotide that is released upon incorporation of the nucleotide into a nascent nucleotide strand that is synthesized by the polymerase.
  • the FRET acceptor is attached to a portion of the nucleotide that becomes incorporated into a nascent nucleotide strand synthesized by the polymerase, and the sequencing method further comprises the step of removing the acceptor after incorporation.
  • removing the acceptor after incorporation comprises photobleaching the acceptor after incorporation or, alternatively, photocleaving the acceptor after incorporation.
  • displaying the single nucleic acid molecule comprises immobilizing the nucleic acid molecule by attachment to a substrate.
  • immobilizing a polynucleotide strand further comprises providing a substrate including a surface having a layer formulated to immobilize a polynucleotide strand or a plurality of polynucleotide strands in an elongated form.
  • each immobilized polynucleotide strand is attached to the substrate by at least one attachment site.
  • the immobilized polynucleotide strand is immobilized at a plurality of attachment sites situated along its length so that the strand is fixed to the substrate in an elongated form to minimize strand movement during subsequent processing steps.
  • displaying the single nucleic acid molecule comprises introducing the molecule into a nanostructure adapted to receive and display the molecule.
  • manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises annealing one or more oligonucleotide primers along the length of the nucleic acid molecule.
  • one or more oligonucleotide primers is a random primer.
  • one or more oligonucleotide primers is a site- specific primer.
  • manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises contacting the nucleic acid molecule with a nicking reagent adapted to form a plurality of polymerase-accessible nick sites along the length of the nucleic acid molecule.
  • manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises treating the DNA with chemical or enzymatic nicking agents.
  • the polymerase solution comprises at least one type of detectably labeled nucleotide.
  • the detectably labeled nucleotides are added separately from the polymerase solution.
  • the detectably labeled nucleotides are added prior to, or after, the addition of the polymerase solution.
  • the polymerase solution comprises at least two, three or four types of detectably labeled nucleotides.
  • the detectable label of at least one type of detectably labeled nucleotide is a chromophore, fluorophore or luminophore.
  • the detectable label of at least one type of detectably-labeled nucleotide is selected from the group consisting of: ROX, Cy3,
  • Cy5 xanthine dye, fluorescein, cyanine, rhodamine, coumarin, acridine, Texas Red dye,
  • the polymerase solution comprises a polymerase.
  • the polymerase is an RNA polymerase, DNA polymerase or reverse transcriptase.
  • the DNA polymerase is a Klenow fragment of DNA polymerase I, PM29 DNA polymerase, B54 DNA polymerase, 9 0 N DNA polymerase, Vent DNA polymerase, Deep
  • Vent DNA polymerase E. coli DNA polymerase I, T7 DNA polymerase, T4 DNA polymerase, Thermus acquaticus DNA polymerase, or Thermococcus litoralis DNA polymerase.
  • the polymerase solution comprises at least one type of detectably labeled nucleotide comprising three, four or more phosphate groups.
  • At least one detectably labeled component comprises a detectably labeled nucleotide, wherein the detectable label is operably linked to a terminal phosphate in the polyphosphate chain of the detectably labeled nucleotide.
  • At least one detectably labeled component comprises a nucleotide operably linked to least two separate detectable labels.
  • the polymerase solution further comprises a detectably labeled polymerase.
  • At least one of the detectably labeled components is a polymerase operably linked to a nanocrystal or other FRET donor.
  • the nucleic acid molecule comprises chromosomal DNA.
  • the nucleic acid molecule comprises an intact chromosome.
  • the sequencing method further comprises sequencing one or more additional nucleotide strands in parallel with sequencing a first nucleotide strand according to the methods disclosed herein.
  • the detectably labeled components comprise one type of detectably labeled nucleotide and a detectably labeled polymerase.
  • the detectably labeled components comprise a fluorescent moiety that non-specifically associates with the template nucleic acid molecule along the length of the molecule.
  • the fluorescent moiety is a FRET donor.
  • the fluorescent moiety is an intercalating dye.
  • the intercalating dye becomes absorbed into the polynucleotide strand and becomes fluorescently active upon absorption.
  • the polymerase-accessible priming sites are separated by a length of nucleotides sufficient to separate the polymerase-accessible sites by a distance sufficient to permit independent detection of the polymerases on the polymerase-accessible nick sites via a detection system.
  • Also provided herein are methods for sequencing at least a portion of a nucleic acid molecule in real time or near real time comprising the steps of immobilizing a nucleic acid molecule on a substrate; nicking the immobilized nucleic acid molecule to form one or more polymerase-accessible nick sites along the length of the strand; adding an intercalating dye and a polymerase solution, wherein the polymerase solution further comprises a polymerase and one or more detectably labeled nucleotides, under conditions such that an extension reaction is initiated at one or more polymerase-accessible nick sites along the length of the immobilized nucleic acid molecule; monitoring signals emitted during the extension reaction at one or more polymerase-accessible nick sites; and analyzing the signals in real or near real time to determine the sequence of at least some portion of the nucleic acid molecule.
  • the extension reaction extends the polymerase-accessible nick site by a plurality of nucleotides, or by at least 10, 20, 50, 100, 250, 500 or 1000 nucleotides. [0043] In some embodiments, the extension reaction is monitored by a monitoring subsystem capable of visualizing extension activity along the strand at one or more polymerase- accessible nick sites.
  • the extension reaction is monitored through detection of FRET signals arising from energy transfer from at least one intercalated dye molecule and at least one detectable label of a detectably labeled nucleotide.
  • the sequencing method further comprises the step of converting the detected events into a sequence of identified nucleotides complementary to the non-nicked single strand at the nick site.
  • the distance separating the nick sites is between about 1 Kb to about 250 Kb, between about 2 Kb to about 200 Kb, between about 3 Kb to about 100 Kb, between about 3 Kb to about 50 Kb, between about 3 Kb to about 10 Kb, between about 3 Kb to about 5 Kb, or between about 5 Kb to about 10 Kb.
  • a system for sequencing a nucleotide strand by obtaining ordered sequence fragments along a polynucleotide strand comprising a reaction chamber comprising a substrate on which at least one polynucleotide strand can be immobilized and nicked; a monitoring subsystem capable of detecting signals from extension activity occurring at the nick sites along the at least one polynucleotide strand; and an analyzing subsystem that converts the signals detected from extension activity into sequence information and then maps sequence fragments along the length of the at least one polynucleotide strand in such a manner that ordered sequence fragment information is obtained for nucleic acid identification and classification.
  • a system for sequencing DNA by obtaining ordered sequence fragments along a polynucleotide strand comprising a reaction chamber comprising a substrate on which at least one polynucleotide strand can be nicked and immobilized; a monitoring subsystem capable of detecting signals from extension activity occurring at the nick sites along the at least one polynucleotide strand; and an analyzing subsystem that converts the signals detected from extension activity into sequence information and then maps sequence fragments along the length of the at least one polynucleotide strand in such a manner that ordered sequence fragment information is obtained for nucleic acid identification and classification.
  • Figure 1 depicts a visual characterization of one embodiment of the sequencing methods and systems of the present disclosure.
  • Figure 2 depicts fluorescent spectra of four intercalating dyes.
  • Figure 3A depicts SYBR Green I average intensity from a user-defined Region of
  • ROI Interest (ROI) containing a DNA fragment, relative to average background intensity.
  • Figure 3B depicts YOYO-I average intensity within a user-defined ROI, relative to average background intensity.
  • Figure 4A depicts spectra of YOYO-I and four fluorescent acceptors.
  • Figure 4B depicts spectra of a quantum dot (Qdot 525) and four fluorescent acceptors.
  • Figure 5 depicts images of background fluorescence from acceptor-labeled nucleotide on glass substrates coated with H — I — h and PEBN layers.
  • Figure 6 depicts images of DNA nicked with a site-specific nickase, incubated with acceptor-labeled nucleotide (dU-Cy5) and polymerase, mixed with the intercalating dye
  • Figure 7 depicts images of DNA nicked with a site-specific nickase, incubated with
  • FRET acceptor-labeled nucleotide (dU-A1610) and polymerase, mixed with the intercalating dye YOYO-I and immobilized on a PEBN-coated surface.
  • Figure 8 depicts images of DNA nicked with a site-specific nickase, incubated with acceptor-labeled nucleotide (dU-Cy5) and polymerase, mixed with the intercalating dye
  • Figure 9 depicts images of DNA nicked with a site-specific nickase, immobilized on a PEBN-coated surface, incubated with acceptor-labeled nucleotides and polymerase, and mixed with the intercalating dye YOYO-I.
  • Figure 10 depicts pictorially one exemplary embodiment of the sequencing compositions and methods disclosed herein, using a donor labeled polymerase, acceptor labeled nucleotides, and unlabeled surface-immobilized DNA template.
  • Figure 11 illustrates the detection of incorporation events that occur using the methodology of Figure 10, and depicts a comparison of various attributes of donor segments, made between different polymerases binding to immobilized duplex on a surface.
  • Figure 12 depicts a schematic for monitoring FRET emissions arising from incorporation of base-labeled nucleotides in real time.
  • Figure 13 depicts assessment of acceptor signal using SYBR Green I as the FRET donor.
  • Figure 14 depicts assessment of acceptor signal using SYBR Green I as the FRET donor in a single molecule assay.
  • Figure 15 depicts assessment of acceptor signal using SYBR Green I as the FRET donor.
  • Figure 16 depicts images of Lambda ( ⁇ ) DNA incubated with a mixture containing DNAse I, acceptor-labeled nucleotides and polymerase, then mixed with the intercalating dye YOYO-I, and immobilized on PEBN-coated surfaces.
  • Figure 17 depicts a real-time incorporation trace at right showing 20 ASN with base- labeled dNTP ("BL dNTP"); donor dipping is clearly detectable in the trace.
  • Figure 18 depicts results of depicts overlay of the segmented object after automatic segmentation and registration of the fluorescence image to identify FRET events.
  • Panel 18(A) depicts an average intensity image in the donor channel and the user defined box represents the region of interests (ROI) in the image.
  • Panel 18(B) depicts the overlay of the segmented object on top of the average intensity image in the donor channel.
  • Panel 18(C) depicts an average intensity image in the acceptor channel and the user defined box represents the ROI in the image.
  • Panel 18(D) depicts the overlay of the segmented object on top of the average intensity image in the acceptor channel.
  • Figure 19 Panel 19(A) depicts the overlay of the segmented object on top of the average intensity image in the donor channel.
  • Figure 19(B) depicts the overlay of the segmented object on top of the average intensity image in the acceptor channel.
  • Panel 19(C) depicts the overlay of the registered object with respect to the donor channel on top of the average intensity image in the acceptor channel.
  • Figure 20 The top-most graph of Panel 20(A) depicts the normalized intensity profile of the segmented object in the donor channel.
  • the middle graph of Panel 20(A) depicts the normalized intensity profile of the segmented object in the acceptor channel.
  • Panel 20(B) depicts co-localization of the detected points in the acceptor channel via Argon 488nm or Red HeNe excitation, confirming the accuracy to the level of pixel registration of the automated analysis.
  • the term “a” or “an” means “at least one” or “one or more”.
  • the terms “comprising” (and any form or variant of comprising, such as “comprise” and “comprises”), “having” (and any form or variant of having, such as “have” and “has”), "including” (and any form or variant of including, such as “includes” and “include”), or “containing” (and any form or variant of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, unrecited additives, components, integers, elements or method steps.
  • compositions and methods of this invention have been described in terms of preferred embodiments, these embodiments are in no way intended to limit the scope of the claims, and it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the methods described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
  • the sequencing methods, systems and compositions of the present disclosure collectively provide rapid sequencing of a single polymeric molecule of interest, such as a nucleic acid, by monitoring of emitted signals. More specifically, the present disclosure provides a method for obtaining ordered, segmented sequence fragments along a polynucleotide strand.
  • the nucleic acid molecule to be sequenced is oriented, or otherwise displayed, in a spatially addressable before or after being subjected to an extension (i.e., sequencing) reaction in situ. Either prior to or after display and/or extension, the nucleic acid molecule is also treated so as introduce or form a plurality of polymerase-accessible priming sites along the length of the molecule, where adjacent priming sites on a strand are separated by a length of nucleotides sufficient to permit independent detection and resolution by a detection system.
  • an extension i.e., sequencing
  • Treatment of the nucleic acid molecule to introduce priming sites may be performed before, after or concurrently with the elongation/display step and/or the extension step; the order of steps is immaterial.
  • the nucleic acid molecule is contacted with a polymerase solution and with other components of the sequencing machinery under conditions such that the polymerase extends the nucleic acid strand from at least one priming site by polymerizing nucleotides onto a free 3' end of the nucleic acid molecule.
  • the polymerase and/or components of the sequence machinery are operably linked to, or otherwise associated with, detectable labels that emit signals as the sequencing reaction proceeds. These signals are detected and analyzed in real time or near real time to obtain sequence information for at least some portion of the nucleic acid molecule.
  • the nucleic acid molecule to be sequenced is DNA or RNA; however, in some cases it can also be a polymer comprising nucleotide analogs capable of polymerization by a polymerase.
  • the nucleic acid is displayed by maintaining it in a spatially addressable format, such that signals emitted from a specific and discrete point, portion, region or terminus of the nucleic acid molecule can be visualized, resolved assigned to their site of origin on the nucleic acid molecule, and tracked over time.
  • Any suitable method may be used to display the nucleic acid molecule, including but not limited to fixation or immobilization of the molecule on a surface, suspending the nucleic acid molecule in a laminar flow stream, passing the nucleic acid molecule to a nanopore, confining the nucleic acid molecule within a waveguide or within a suitable nanostructure, e.g., a nanoture, nanowell, nanochannel or the like, adapted to receiving and display the nucleic acid molecule, or using optical tweezers to hold and restrict the nucleic acid molecule during the detection step.
  • fixation or immobilization of the molecule on a surface suspending the nucleic acid molecule in a laminar flow stream, passing the nucleic acid molecule to a nanopore, confining the nucleic acid molecule within a waveguide or within a suitable nanostructure, e.g., a nanoture, nanowell, nanochannel or the like, adapted to receiving and display the nucleic acid molecule,
  • the nucleic acid molecule may be displayed by moving the molecule relative to a detection station, such that signals emitted from the molecule are tracking along the length of the molecule and assigned to their point of original along the nucleic acid molecule.
  • the nucleic acid molecule is displayed by fixing or otherwise immobilizing the nucleic acid molecule on a two-dimensional surface on a substrate. Any suitable substrate may be used for immobilization of the nucleic acid molecule, including substrates that exhibit non-specific adherence to nucleic acid, or substrates to which nucleic acid molecules can be bound.
  • the nucleic acid molecule is immobilized by contacting it with a substrate including a surface having a layer formulated to immobilize a polynucleotide strand or a plurality of polynucleotide strands in an elongated form.
  • the nucleic acid molecule may be bound to the surface of the substrate via a plurality of attachment sites situated along its length, so that the strand is fixed to the substrate in an elongated form to minimize strand movement during subsequent processing steps.
  • individual polymeric molecules are displayed and elongated using a nanofluidic device comprising a nanochannel array, wherein the entire sample population is elongated and displayed in a spatially addressable format.
  • the use of nanofluidic devices for separation and isolation of test polymeric molecules bypasses the requirement for immobilization or attachment of sequencing components to a substrate and also enables the sequencing of intact chromosomes, thereby exponentially increasing the amount of sequencing information obtained from a single reaction and also enabling analysis of such "macro" structural features as methylation, inversions, indels and tandem repeats.
  • nanofluidic devices that permit the simultaneous observation of a high number of macromolecules in a multitude of channels can be employed.
  • Such devices increase the amount of sequence information obtainable from a single experiment and decrease the cost of sequencing of an entire genome. See, for example, U.S. Published App. No. 2004/0197843.
  • semiconductor nanocrystals or analogs thereof operably linked to polymerase activity polymer sequence data can be generated as labeled monomers are incorporated into a newly synthesized polymer strand by a polymerase, thus enabling the sequencing of polymers in real time.
  • the nanofluidic-based sequencing methods disclosed herein can be used to rapidly obtain both "raw" sequence at the single nucleic acid molecule level as well as validation of incoming sequence information via simultaneous priming at multiple points along the template strand.
  • Manipulation of the DNA includes without limitation any method of treatment that results in formation of one or more priming site along the length of a nucleic acid molecule, while preserving the ability to obtain resolvable sequence information from the molecule and maintaining the structural integrity of the molecule, i.e., will not induce fragmentation, degradation, disruption or complete breakage of the nucleic acid molecule.
  • the manipulation is performed in such as manner as to ensure that independent sequence information can be determined from each priming site spaced along the length of the DNA strand.
  • the manipulation is refined to obtain optimal spacing of priming sites, such that the priming sites are separated from each other by a length of nucleotides sufficient to make their termini accessible to a polymerase for subsequent extension.
  • the separation length, or distance, between the priming sites can vary between about 1 Kb (kilobase) to about 250 Kb along the nucleic acid strand. In certain embodiments, the separation distance between adjacent priming sites is between about 3 Kb and about 5 Kb along the length of the nucleic acid molecule.
  • priming sites on a test nucleic acid molecule is by annealing the nucleic acid molecule with one or more primers.
  • the primers may either be random, or may be adapted to bind only to certain portions of the nucleic acid molecule.
  • Another typical method used to introduce priming sites into a nucleic acid molecule involves nicking of the molecule by chemical or enzymatic means, or with nicking reagents.
  • Suitable nicking reagents include without limitation any reagent capable of creating a nick in either strand of the polynucleotide, such as, for example, exonucleases, DNases, chemical reagents such as the glycogen product from Fermentas International, Inc., of Ontario, Canada, or any other chemical or biological system capable of introducing nick sites into one or both strands of a nucleic acid molecule.
  • the limited nicking reaction can be performed in solution prior to nucleic acid immobilization as described in Zasloff and Camerini-Otero, 1980, or the limited nicking reaction can be performed after nucleic acid immobilization to ensure that most if not all of the nick sites are polymerase-accessible, if an enzyme is used to nick the polynucleotide.
  • the frequency of extendable nicks is characterized by incorporating a base-labeled nucleotide at the nick sites in solution.
  • the resulting nucleic acid comprising nicked termini is then immobilized on a substrate and visualized using a single-molecule detection system by either direct excitation of the acceptor, or by detection of FRET between a donor dye used to stain the DNA (e.g., SYBR Green I, YOYO-I or similar intercalating or groove-binding dye) and the incorporated acceptor, as described herein.
  • a donor dye used to stain the DNA e.g., SYBR Green I, YOYO-I or similar intercalating or groove-binding dye
  • the nucleic acid molecule must be contacted with a polymerase solution and at least one detectably labeled component. Such contacting may be achieved by any suitable means, in any phase, in any order of addition of reagents, and under any suitable conditions that permit polymerization of nucleotides to ultimately occur.
  • the polymerase solution polymerase solution comprising a polymerase, or any agent that is capable of polymerizing monomeric subunits into polymers, and a suitable buffer.
  • other components of the sequencing machinery such as nucleotides or nucleotide analogs that can be polymerized into the extending strand by the polymerase, are included in the polymerase solution.
  • these additional components may be added to the extension reaction, or otherwise contacted with the nucleic acid template, at any time in the procedure.
  • any and all components of the sequence/extension reaction may be added to, or otherwise contacted with, the nucleic acid molecule to be sequenced in any order whatsoever that permits productive extension via incorporation of nucleotides into the extending strand.
  • the polymerase and/or other components of the sequencing machinery are operably linked to, or otherwise associated with, a detectable label.
  • the nucleotides of the polymerase solution are detectably labeled.
  • a high efficiency FRET event occurs via energy transfer to the acceptor from donor intercalated dye molecules located both 5' (i.e., upstream), or 3' (i.e., downstream), or both 5' and 3', of the nucleotide incorporation site.
  • the detectable signal is a FRET signal generated between a FRET donor moiety and a FRET acceptor moiety.
  • FRET Forster resonance energy transfer
  • FRET donor first excited molecule
  • FRET acceptor second molecule
  • the process of energy transfer results in a reduction (quenching) of fluorescence intensity and excited state lifetime of the FRET donor, and can produce an increase in the emission intensity of the FRET acceptor.
  • FRET occurs only when two appropriately labeled molecules or moieties are sufficiently proximal to each other to transfer energy. Visualization of a FRET event can be achieved via detection of the FRET signal induced by energy transfer from the FRET donor dye moiety to the FRET acceptor moiety.
  • the FRET acceptor is attached to a nucleotide, and the FRET donor is operably linked to, or otherwise directly or indirectly associated with, any component of the sequencing machinery such as the polynucleotide backbone (e.g., phosphate groups), the polynucleotide bases, or the polymerase.
  • the polynucleotide backbone e.g., phosphate groups
  • the polynucleotide bases e.g., phosphate groups
  • the FRET donor is constantly replenished as the sequencing reaction progresses, thus allowing extended read lengths to be obtained.
  • a replenishing donor is dye that binds with high affinity in a sequence-independent fashion to the nucleic acid, including but not limited to intercalating dyes.
  • the replenishing donor can be a labeled DNA polymerase, which can be replenished by exchanging enzymes as the sequencing reaction progresses.
  • Figure 10 One embodiment of a system using a replenishing donor-labeled polymerase is shown in Figure 10, and involves the use of an unlabeled DNA template immobilized via attachment to a surface, with donor labeled polymerase and gamma-labeled dNTPs.
  • the donor can be replenished by exchanging enzymes. Further, there is no concern of the duplex disassociating from the enzyme complex. Moreover, incorporation will only occur and be detected when a donor (enzyme) binds to the duplex. This is indicated by the detection of FRET signals between the donor-labeled polymerase and the acceptor-labeled gamma-dNTPs.
  • the use of less processive polymerases is beneficial to the experimental setup because it allows for more rapid exchange of the donor and the donor is less likely to photo-bleach. Experiments are being carried out to determine the most appropriate enzyme for this method.
  • mutant polymerases that have increased activity with gamma-modified dNTPs but exhibit decreased processivity may be used.
  • an ultra-stable donor such as a nanocrystal may be used in place of a replenishing donor.
  • One exemplary embodiment of this method includes a donor nanocrystal stably attached to the polymerase.
  • the FRET donor is an intercalating dye molecule or other fluorescent moiety that has a high affinity for polynucleotides and spontaneously intercalates itself between the bases of, or otherwise associates itself with, a nucleic acid molecule, producing increased fluorescence in the intercalated or associated state.
  • intercalated dye molecules are displaced from a direction 3' of the polymerase and absorbed into a 5' direction of the DNA, thereby constantly replenishing as the polymerization reaction proceeds.
  • the acceptor-labeled nucleotide When the acceptor-labeled nucleotide is positioned within the polymerase active site for incorporation into the newly synthesized DNA strand by the polymerase, it undergoes FRET with the donor, resulting in emission of a FRET signal that can be detected and characterized. As the acceptor attached to the incorporated nucleotide is removed, a second detectably labeled nucleotide will enter the active site and produce a second high efficiency FRET event.
  • the polymerase extends the newly synthesized strand by successively adding labeled monomers to the free 3' end of the strand in a template-dependent fashion, the identity of each successive incoming monomer bound and incorporated by the polymerase will be identifiable by the emission spectrum of the FRET acceptor attached to that particular monomer. Accordingly, the base sequence of the newly synthesized strand can be identified by detection and characterization of the time- sequence of FRET events, as described below.
  • the template DNA strand is treated to introduce a multiplicity of priming sites along the length of the strand in such a manner that the priming sites are optimally spaced apart to allow independent detection and resolution.
  • the limits of detection and resolution will depend on the capabilities of the particular detection system employed in the disclosed methods.
  • each sequencing complex along the strand provides not only sequence information about a region contained within the extended fragment, but also information about the placement of each sequence read relative to others obtained from the same strand. In other words, each sequence read along the strand is both discrete and ordered.
  • nucleotide polymerase enzymes in the polymerase solution recognize and bind to the priming sites and initiate extension at the priming site by polymerization of nucleotides and elongation from the priming site
  • detection of signals is performed. Real-time sequencing is achieved by monitoring emissions from the detectable labels attached to various components of the polymerase solution as the extension reaction proceeds.
  • the progress of the sequencing or extension reaction can also be tracked by detecting the dip in donor intensity that accompanies any FRET event involving energy transfer between the FRET donor and acceptor moieties is.
  • the ability to detect a dip in donor intensity likely depends on a variety of conditions. Dips in donor intensities can be monitored using standard detection systems.
  • the number of donor fluorophores associated with the nucleic acid can be varied to maximize acceptor FRET.
  • Optimal spacing between a donor fluorophore and an acceptor on the incorporating nucleotide should be closer than the R of the donor-acceptor pair so that high FRET results.
  • the FRET efficiency is greater than about 80%. If too few donor fluorophores interact with the nucleic acid, the donor fluorophores can be spaced to far apart for adequate FRET signal to noise ratio - adequate FRET detection. However, too many intercalated donor fluorophores may result in signal quenching.
  • polymerases used in the methods and systems of this disclosure are first analyzed with regard to donor duration and donor signal frequency over the collection time.
  • the donor signals are assigned as segments of excited (digital unit), and dark (digital zero) depending on their intensities compared to the noise level.
  • the excited donor segments are denoted by a horizontal dark green bar and the dark regions are denoted by horizontal black bars (figure below).
  • the number of donor segments of the excited state is extracted for every donor in the field of view and attributes of these segments such as the duration, intensity and frequency are analyzed. A comparison of these attributes of donor segments, made between different polymerases binding to immobilized duplex on a surface, is shown in Figure 11.
  • priming sites and/or extension can be performed prior to display; in other embodiments, priming and/or extension is performed after the nucleotide is immobilized. In yet other embodiments, formation of priming sites and/or immobilization may precede extension. All of these permutations and combinations, as well as any others that preserve the spirit and scope of the invention, may be used according to the present disclosure, and are contemplated to be within the spirit and scope of the present invention.
  • Removal of the detectable label of a nucleotide following incorporation of the nucleotide into the newly synthesized DNA strand by the polymerase can be accomplished by any suitable means. Typically, removal is accomplished by enzymatic cleavage upon incorporation, as will occur when the detectable label comprising the FRET acceptor is attached to a portion of the nucleotide that is released during incorporation (e.g., a pyrophosphate group with or without an associated linker; a fluorophore) as a natural byproduct of polymerase activity. Such labels are commonly referred to as "non-persistent" acceptors.
  • the detectable label comprising the FRET acceptor is a "persistent" label, i.e., it remains attached to the portion of the nucleotide that is incorporated into the elongating nucleotide strand, and thus is also incorporated into the newly synthesized portion of the nucleotide strand.
  • the acceptor will have to be either photobleached or photocleaved after incorporation, or the acceptor will contribute to the signals emitted by the next incoming nucleotide until the persistent acceptor permanently photobleaches.
  • any suitable polymerase may be used that is capable of polymerizing monomeric subunits into polymers.
  • the polymerase is a nucleotide polymerase, i.e., a polymerase that can polymerize nucleotides such as DNA or RNA polymerases that polymerize DNA, RNA or mixed sequences, into extended nucleic acid polymers.
  • the nucleotide polymerase will elongate a pre-existing polynucleotide strand, typically a primer, by polymerizing nucleotides on to the 3' end of the strand.
  • polymerases that can be isolated from its host in sufficient amounts for purification and use and/or genetically engineered into other organisms for expression, isolation and purification in amounts sufficient for use, as well as mutants or variants of native polymerases having one or more amino acids replaced by amino acids amenable to attaching an atomic or molecular label, which have a detectable property.
  • Exemplary polymerases include without limitation DNA polymerases, RNA polymerases and reverse transcriptases.
  • the polymerase is a DNA polymerase.
  • Suitable nucleotide polymerases that may be used to practice the methods disclosed herein include without limitation any naturally occurring nucleotide polymerases as well as mutated, truncated, modified, genetically engineered or fusion variants of such polymerases.
  • Known conventional naturally occurring DNA polymerases include without limitation bacterial DNA polymerases, eukaryotic DNA polymerases, archaeal DNA polymerases, viral DNA polymerases and phage DNA polymerases.
  • Suitable bacterial DNA polymerase include without limitation E. coli DNA polymerases I, II and III, IV and V, the Klenow fragment of E.
  • coli DNA polymerase (including mutants thereof, such as mutants lacking exonuclease activity), Clostridium stercorarium (Cst) DNA polymerase, Clostridium thermocellum (Cth) DNA polymerase and Sulfolobus solfataricus (Sso) DNA polymerase.
  • Suitable eukaryotic DNA polymerases include without limitation the DNA polymerases ⁇ , ⁇ , ⁇ , ⁇ , ⁇ ,, ⁇ , ⁇ , ⁇ , and K, as well as the Revl polymerase (terminal deoxycytidyl transferase) and terminal deoxynucleotidyl transferase (TdT).
  • Suitable viral DNA polymerases include without limitation T4 DNA polymerase, T7 DNA polymerase, Phi29 DNA polymerase (also referred to herein as Phi-29 polymerase) and mutated and/or engineered PM29 DNA polymerases, including mutants lacking exonuclease activity.
  • Suitable archaeal DNA polymerases include without limitation the thermostable and/or thermophilic DNA polymerases such as, for example, DNA polymerases isolated from Thermus aquaticus (Taq) DNA polymerase, Thermus filiformis (Tfi) DNA polymerase, Thermococcus zilligi (Tzi) DNA polymerase, Thermus thermophilus (Tth) DNA polymerase, Thermus flavus (TfI) DNA polymerase, Pyrococcus woesei (Pwo) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase as well as Turbo Pfu DNA polymerase, Thermococcus litoralis (TIi) DNA polymerase or Vent DNA polymerase, Pyrococcus sp.
  • thermostable and/or thermophilic DNA polymerases such as, for example, DNA polymerases isolated from Thermus aquaticus (Taq) DNA polymerase, Thermus filiform
  • GB-D polymerase "Deep Vent” DNA polymerase, New England Biolabs), Thermotoga maritima (Tma) DNA polymerase, Bacillus stearothermophilus (B st) DNA polymerase, Pyrococcus Kodakaraensis (KOD) DNA polymerase, Pfx DNA polymerase, Thermococcus sp. JDF-3 (JDF-3) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus acidophilium DNA polymerase; Sulfolobus acidocaldarius DNA polymerase; Thermococcus sp.
  • RNA polymerases include, without limitation, T7, T3 and SP6 RNA polymerases.
  • Suitable reverse transcriptases include without limitation reverse transcriptases from HIV, HTLV-I, HTLV-II, FeLV, FIV, SIV, AMV, MMTV and MoMuLV, as well as the commercially available "Superscript” reverse transcriptases (Invitrogen) and telomerases.
  • the methods and systems disclosed herein may also be practiced using any subunits, mutated, modified, truncated, genetically engineered or fusion variants of naturally occurring polymerases (wherein the mutation involves the replacement of one or more or many amino acids with other amino acids, the insertion or deletion of one or more or many amino acids, or the conjugation of parts of one or more polymerases) non-naturally occurring polymerases, synthetic molecules or any molecular assembly that can polymerize a polymer having a pre-determined or specified or templated sequence of monomers may be used in the methods disclosed herein.
  • polymerases that retain the desired levels of processivity when conjugated to a donor or acceptor fluorophore are preferred.
  • fidelity refers to the accuracy of nucleotide polymerization by a given template-dependent nucleotide polymerase.
  • the fidelity of a nucleotide polymerase is typically measured as the error rate, i.e., the frequency of incorporation of a nucleotide in a manner that violates the widely known Watson-Crick base pairing rules.
  • the accuracy or fidelity of DNA polymerization is influenced not only by the polymerase activity of a given enzyme, but also by the 3 '-5' exonuclease activity of a DNA polymerase.
  • the fidelity or error rate of a DNA polymerase may be measured using assays known to the art. See, for example, Lundburg et al., 1991 Gene, 108:1-6. By suitable selection and engineering of the nucleotide polymerase, the error rate of the single-molecule sequencing methods disclosed herein can be further improved.
  • the polymerase used in this sequencing technology is engineered to possess either a strong strand displacement activity or 5' to 3' exonuclease activity to remove the downstream strand, thereby facilitating DNA synthesis.
  • a highly processive polymerase such as an engineered Phi29 polymerase, the downstream strand is displaced. However, because the 5' terminated strand cannot serve as a template in the absence of added primer, no secondary sequence information from this site will be detected, which would confound sequence data analysis.
  • an unwinding agent such as a topoisomerase or a gyrase may be added to the extension reaction to facilitate optimal sequencing performance.
  • the immobilized DNA is typically linearized through attachment to the surface at various points along its length, and therefore consists of a series of closed DNA domains.
  • One option for circumventing the "closed" state of the DNA involves inclusion of a topoisomerase and/or a gyrase to modulate the number of DNA supercoils that may be introduced during the sequencing reaction (See, e.g., Champoux, 2001). The requirement for inclusion of such enzymes will reflect both sequence read length and the degree to which the DNA is immobilized onto the surface.
  • the extension reaction is supplemented by addition of an agent that reduces formation of secondary structures that hinder progress of the extension reaction.
  • an agent that reduces formation of secondary structures that hinder progress of the extension reaction is undesirable because such structures bind dye molecules and exhibit increased fluorescence intensity as compared to dye molecules that are in solution, or that are intercalated into the displaced single strand, exhibit reduced fluorescence intensity.
  • the presence of secondary structure in the displaced strand results in inappropriate dye intercalation into the displaced strand, and consequently inappropriate detection of fluorescence.
  • additional dyes may be used to identify the dye, or dye combinations that produce the highest quality sequence information.
  • agents that stabilize the displaced strand and prevent formation of secondary structure may be included within the extension reaction mixture.
  • a reagent that may optionally be added to the extension reaction mixture to prevent formation of unwanted secondary structure is single strand binding protein, also known as SSBP.
  • SSBP single strand binding protein
  • the number of donor fluorophores that associate with the DNA is optimized to identify a staining concentration that produces high FRET.
  • optimal spacing between a donor fluorophore and an acceptor on the incorporated nucleotide should be closer than the Ro of the donor-acceptor pair so that high FRET results, with greater than 80% FRET being preferred. If too few fluorophores interact with the DNA, they will not be spaced closely enough to produce high FRET with the acceptor fluorophore. However, if too many donor fluorophores intercalate or bind the DNA, fluorophore quenching may occur.
  • Suitable depolymerizing agents for use in the disclosed methods and compositions include, without limitation, any depolymerizing agent that depolymerizes monomers in a step-wise fashion such as exonucleases in the case of DNA, RNA or mixed DNA/RNA polymers, proteases in the case of polypeptides and enzymes or enzyme systems that sequentially depolymerize polysaccharides.
  • the FRET donor is a dye molecule intercalated between the bases of the template nucleic acid molecule, or otherwise associated with the nucleic acid molecule.
  • Suitable intercalating dyes include, without limitation, any detectectable moiety capable of inserting, interposing, or otherwise intercalating into single- or double-stranded polynucleotides.
  • the intercalating dye may be a fluorescent dye or may be fluorescent dye conjugated to a molecule that is primarily an intercalator. Intercalating dyes are well known to the person of ordinary skill in the art.
  • intercalating dyes suitable for use in the disclosed methods and compositions include, without limitation, mono- and bis- intercalating dyes, phenanthridines and acridines, such as ethidium bromide, propidium iodidem, hexidium iodide, dihydroethidium, ethidium homodimers, acridine orange, 9- amino-6-chloro-2-methoxyacridine; indoles and imaidazoles such as DAPI, bisbenzimide dyes, Actinomycin D, Nissl stains, hydroxystilbamidine, SYBR Green I (also referred to herein simply as "SYBR Green"), SYBR Green II, SYBR GOLD, YO (Oxazole Yellow), TO (Thiazole Orange), PG (PicoGreen), dyes from ATTO-TEC GmbH of Siegen, Germany, intercalating dyes, BEBO, BETO and BOXTO, BO, BO-PRO, TO
  • the number of donor fluorophores that associate with the DNA is optimized to identify a staining concentration that produces high FRET.
  • the use of SYBR Green I and other intercalating dyes as replenishing FRET donors will increase donor lifetime and intensity, and more importantly will increase acceptor intensity.
  • Use of multiple donors at a dye-to- base pair ratio of ⁇ 1: 5-7 results in the punctuation of DNA with dye molecules that can serve as donors for the growing DNA strand (See, e.g., Howell et al., 2002; Takatsu et al 2004).
  • nucleotide or nucleotide analogs or their variants, as used herein, refer to any compounds that can be polymerized and/or incorporated into a newly synthesized strand by a naturally occurring, genetically modified or engineered nucleotide polymerase.
  • Suitable nucleotides or other monomers for use in the methods and compositions include, without limitation, any monomer that can be step-wise polymerized and/or incorporated into an elongating nucleotide strand or other polynucleotide polymer by a polymerase or other polymerizing agent, including but not limited to ribonucleotides, deoxyribonucleo tides, modified ribonucleotides, modified deoxyribonucleotides, ribonucleotide polyphosphates, deoxyribonucleotide polyphosphates, modified ribonucleotide polyphosphates, modified deoxyribonucleotide polyphosphates, peptide nucleotides, modified peptide nucleotides, and modified phosphate-sugar backbone nucleotides, and any analogs or variants of the foregoing, including analogs or variants having atomic and/or molecular labels attached thereto, or mixtures or combinations thereof
  • any suitable monomers capable of polymerization by a naturally occurring, genetically engineered, or synthetic polymerase may be used, including, for example, amino acids (natural or synthetic) for protein or protein analog synthesis, and monosaccharides or polysaccharides for carbohydrate synthesis.
  • the labeled nucleotide monomer has three, four or more phosphates.
  • the nucleotide or nucleotide analogs comprise more than one detectable label moiety per nucleotide molecule.
  • the nucleotides or nucleotide analogs may comprise a persistent acceptor, a non-persistent acceptor, or both a persistent and non- persistent acceptor group conjugated to the same nucleotide molecule.
  • a persistent acceptor e.g., a persistent acceptor, a non-persistent acceptor, or both a persistent and non- persistent acceptor group conjugated to the same nucleotide molecule.
  • dual nucleotides are known in the art. See, e.g., U.S. Provisional App. No. 60/891,029, filed February 21, 2007 and U.S. App. No. 12/035,352, filed February 21, 2008, herein incorporated by reference in their entirety.
  • the nucleotide is conjugated or otherwise operably linked to a detectable label using suitable methods.
  • Any suitable methods for detectably labeling nucleotides may be employed including but not limited to those described in U.S. Patent Nos. 7,041,892, 7,052,839, 7,125,671 and 7,223,541; U.S. Pub. Nos. 2007/072196 and 2008/0091005; Sood et al., 2005, J. Am. Chem. Soc. 127:2394-2395; Arzumanov et al., 1996, J. Biol. Chem.
  • operably link refers to chemical fusion or bonding or association of sufficient stability to withstand conditions encountered in the method of nucleotide sequencing utilized, between a combination of different molecules or moieties, such as, but not limited to: between a linker and a nucleotide; between a linker and a dye moiety; and the like.
  • dye labels may be conjugated to the terminal phosphate of deoxyribonucleotide polyphosphates using a linker and/or spacer using suitable techniques.
  • Suitable linkers include, for example, any compound or moiety that can act as a molecular bridge to operably link two different molecules.
  • Exemplary linkers include, but are not limited to, chemical chains, chemical compounds (e.g., reagents), and the like.
  • the linkers may include, but are not limited to, homobifunctional linkers and heterobifunctional linkers.
  • heterobifunctional linkers contain one end having a first reactive functionality to specifically link to a first molecule, and an opposite end having a second reactive functionality to specifically link to a second molecule.
  • the linker may vary in length and composition for optimizing properties such as stability, length, FRET efficiency, resistance to certain chemicals and/or temperature parameters, and be of sufficient stereo-selectivity or size to operably link a detectable label to a nucleotide such that the resultant conjugate is useful in optimizing a polymerization reaction.
  • Linkers can be employed using standard chemical techniques and include but not limited to, amine linkers for attaching labels to nucleotides (see, for example, U.S. Pat. No.
  • a linker typically contain a primary or secondary amine for operably linking a label to a nucleotide; and a rigid hydrocarbon arm added to a nucleotide base (see, for example, Science 282:1020-21, 1998.
  • any detectable label that is suitable for attachment to the polymerase, the nucleic acid molecule and/or the nucleotides may be used, including but not limited to luminescent, photoluminescent, electroluminescent, bioluminescent, chemluminescent, fluorescent and/or phosphorescent labels.
  • the label comprises a FRET donor and/or a FRET acceptor.
  • the FRET donor and/or the FRET acceptor is typically a fluorophore or fluorescent label; however the FRET donor and/or FRET acceptor may also be a luminophore, chemiluminophore, bioluminophore or other label, or a quencher that can participate in this reaction, as described below.
  • the FRET labels may be referred to as fluorophores or fluorescent labels for convenience, but this in no way is meant to exclude the possibility of using a quencher or limit the donor and/or acceptor only to fluorescent labels.
  • the detectable labels used in the disclosed methods and compositions may undergo other types of energy transfer with each other, including but not limited to luminescence resonance energy transfer, bioluminescence resonance energy transfer, chemiluminescence resonance energy transfer, and similar types of energy transfer not strictly following the Forster's theory, such as the nonoverlapping energy transfer when nonoverlapping acceptors are utilized. See, for example, Anal. Chem. 2005, 77: 1483-1487.
  • Suitable detectable labels for use in the disclosed methods and compositions include, without limitation, any atomic structure, molecule or other moeity amenable to attachment to a specific site in a polymerizing agent or dNTP, including but not limited to Europium shift agents, NMR active atoms or the like; fluorescent dyes such as Rhodol dyes, d-Rhodamine acceptor dyes including but not limited to dichloro[R110], dichloro[R6G], dichloro [TAMRA], dichloro [ROX] or the like, fluorescein donor dye including but not limited to fluorescein, 6-FAM, or the like; Acridine including but not limited to Acridine orange, Acridine yellow, Proflavin, or the like; aromatic hydrocarbon including but not limited to 2-Methylbenzoxazole, Ethyl p-dimethylaminobenzoate, Phenol, benzene, toluene, or the like; Arylmethine Dyes including but
  • miscellaneous dyes including but not limited to 4',6-Diamidino-2-phenylindole (DAPI), 4',6-Diamidino-2-phenylindole (DAPI), 7-Benzylamino-4-nitrobenz-2-oxa-l,3-diazole, dansyl glycine, dansyl glycine, Hoechst 33258, Hoechst 33258, Lucifer yellow CH, Piroxicam, Quinine sulfate, Quinine sulfate, Squarylium dye III, or the like; oligophenylenes including but not limited to 2,5- Diphenyloxazole (PPO), Biphenyl, POPOP, p-Quaterphenyl, p-Terphenyl, or the like; oxazines including but not limited to Cresyl violet perchlorate, Nile Blue, Nile Red, Ni
  • any molecule, nano-structure, or other chemical structure that is capable of chemical modification and includes a detectable property capable of being detected by a detection system may be used in the disclosed methods and systems.
  • detectable structure can include one presently known and structures that are being currently designed and those that will be prepared in the future.
  • the nucleotide comprises a releasable or non-persistent label that can be removed via suitable means prior to incorporation of the next nucleotide by the polymerase into the newly synthesized strand.
  • suitable non-persistent or releasable labels include detectable moieties operably linked to the base, sugar or alpha phophate of a nucleotide or nucleotide analog.
  • the FRET acceptor label is attached to a nucleotide phosphate group that is cleaved and released upon incorporation of the underlying nucleotide into the primer strand, for example the ⁇ -phosphate, the ⁇ - phosphate, or the terminal phosphate of the incoming nucleotide.
  • the signal from the label (or, for embodiments wherein the label is a FRET donor, the FRET signal between the FRET donor and the FRET acceptor moieties) ceases after the nucleotide is incorporated and the label (or FRET signal) diffuses away.
  • a detectable signal indicative of nucleotide incorporation is generated as each incoming nucleotide hybridizes to a complementary nucleotide in the target nucleic acid molecule and becomes incorporated into the newly synthesized strand.
  • the accompanying dip in donor intensity can also be detected to confirm the occurrence of FRET. While detection of donor dipping can be useful by providing independent corroboration of a FRET event, it can be dispensed with in embodiments where the acceptor signals are sufficiently intense and well- defined.
  • the nucleotide comprises a persistent label, which is not released upon incorporation of the nucleotide into the nascent nucleotide strand synthesized by the polymerase.
  • suitable persistent labels include without limitation any FRET acceptor moiety operably linked to the base, sugar, or internal phosphate, of the nucleotide or nucleotide analog, for example, the alpha phosphate.
  • Persistently-labeled nucleotides may be used when a stable signal is preferred and their use enables the reaction to be performed in advance of immobilization on the support for viewing in the detection system, which improves reaction efficiency.
  • the persistently-labeled nucleotide may be a dideoxynucleotide to ensure that a single nucleotide is incorporated at the reaction site.
  • Non-persistently-labeled nucleotides or nucleotides containing both a persistent and a non-persistent label are used when the detection of the non-persistent signal is preferred.
  • the use of non-persistently-labeled nucleotides or nucleotides containing both a persistent and a non-persistent label requires that the extension reaction is performed and detected in real-time or near real-time on the detection system to associate the nonpersistent label with a particular nucleic acid strand.
  • intercalating dyes such as SYBR Green I
  • SYBR Green I intercalating dyes
  • SYBR Green I intercalating dyes
  • new fluorophores are positioned as new donors when they insert into the newly synthesized, double- stranded DNA.
  • donor-acceptor pairs are typically selected such that there is overlap between the emission spectrum of the donor and excitation spectrum of the acceptor. Dyes and dye concentrations are chosen such that optimized donor emission and maximized acceptor intensities are obtained.
  • certain combinations of DNA-associated donor dyes produce higher intensity acceptor signals when paired with the spectrally-resolved acceptors used in other sequencing technologies based on determination of base identity (i.e., the donor fluorophores must be good FRET partners with the acceptors used to label the nucleotides), and these donor dyes may need to be present in particular ratios to maximize these effects.
  • Any suitable FRET donor: acceptor pair may be used in the disclosed methods and compositions, including but not limited to a fluorescein, cyanine, rhodamine, coumarin, acridine, Texas Red dye, BODIPY, Alexa Fluor, GFP, or a derivative or modification of any of the foregoing. See, for example, U.S. Published App. No. 2008/0091995.
  • excitation of the donor produces energy in its emission spectrum that is then picked up by the acceptor in its excitation spectrum, leading to the emission of light from the acceptor in its emission spectrum.
  • excitation of the donor sets off a chain reaction, leading to emission from the acceptor when the two are sufficiently close to each other.
  • the label operably linked or attached to the nucleotide may be a quencher. Quenchers are useful as acceptors in FRET applications, because they produce a signal through the reduction or quenching of fluorescence from the donor fluorophore.
  • quenchers have an absorption spectrum and large extinction coefficients, however the quantum yield for quenchers is extremely reduced, such that the quencher emits little to no light upon excitation.
  • illumination of the donor fluorophore excites the donor, and if an appropriate acceptor is not close enough to the donor, the donor emits light. This light signal is reduced or abolished when FRET occurs between the donor and a quencher acceptor, resulting in little or no light emission from the quencher.
  • interaction or proximity between a donor and quencher-acceptor may be detected by the reduction or absence of donor light emission.
  • quenchers include the QSY dyes available from Molecular Probes (Eugene, OR).
  • One exemplary method involves the use of quenchers in conjunction with fluorescent labels.
  • certain nucleotides in the reaction mixture are labeled with a fluorescent label, while the remaining nucleotides are labeled with one or more quenchers.
  • each of the nucleotides in the reaction mixture is labeled with one or more quenchers.
  • Discrimination of the nucleotide bases is based on the wavelength and/or intensity of light emitted from the FRET acceptor, as well as the intensity of light emitted from the FRET donor. If no signal is detected from the FRET acceptor, a corresponding reduction in light emission from the FRET donor indicates incorporation of a nucleotide labeled with a quencher. The degree of intensity reduction may be used to distinguish between different quenchers.
  • the intercalating dye and the detectable label of the nucleotide will be selected and/or designed to ensure not that the presence of such labels does not unduly hinder the progress of the polymerization reaction as determined by speed, error rate, fidelity, processivity and average read length of the newly synthesized strand.
  • the sequencing reaction is initiated by the addition of a suitable polymerase and labeled nucleotides. Suitable temperatures and the addition of other components such as divalent metal ions can be determined and optimized based on the particular nucleotide polymerase and the target nucleic acid sequences. Illumination of the reaction site permits observation of the detectable signals, e.g., FRET signals, which indicate the nucleotide incorporation event.
  • Detection of the signals emitted by various components of the polymerase reaction mixture as the polymerase incorporates nucleotide(s) into an extending strand in a template-directed fashion can be detected by means of any suitable system capable of detecting and/or monitoring such signals.
  • the detection system will achieve these functions by first generating and transmitting an incident wavelength to the polynucleotides isolated within nanostructures, and then collecting and analyzing the emissions from the reactants.
  • a typical sequencing system comprises a detection subsystem capable of viewing a field on the substrate. The view field can be adjusted to view one or a plurality of elongated and immobilized nucleic acids.
  • the sequencing system also comprises a monitoring subsystem capable of detecting nucleotide incorporation events occurring at the nick sites of the nucleotide strand to be sequenced.
  • the system also comprises an analyzing subsystem that converts the detected events into sequencing information and then maps the sequence fragments along the length of the nucleic acid so that ordered sequence fragment information is obtained for nucleic acid identification and classification including partial or fragmentary sequence information.
  • detection systems suitable for use according to the present disclosure include without limitation the systems described in U.S. Published App. No. 2008/0241951 and 2008/0241938, herein incorporated by reference in their entirety.
  • a detection system of the present invention comprises at least two elements, namely an excitation source and a detector.
  • the excitation source generates and transmits incident radiation used to excite the reactants contained in the array.
  • the source of the incident light can be a laser, laser diode, a light-emitting diode (LED), a ultra-violet light bulb, and/or a white light source.
  • more than one source can be employed simultaneously.
  • the use of multiple sources is particularly desirable in applications that employ multiple different reagent compounds having differing excitation spectra, consequently allowing detection of more than one fluorescent signal to track the interactions of more than one or one type of molecules simultaneously.
  • Any suitable detection strategies can be employed to determine the identity of the nitrogenous base of the incoming nucleotides, depending on the nature of the labeling strategy that is employed.
  • Exemplary labeling and detection strategies include but are not limited to those disclosed in U.S. Patent Nos. 6,423,551 and 6,864,626; U.S. Pub. Nos. 2005/0003464, 2006/0176479, 2006/0177495, 2007/0109536, 2007/0111350, 2007/0116868, 2007/0250274 and 2008/08825. Detection of emissions during the polymerization reaction permits the discrimination of independent interactions between uniquely labeled moieties, reactants or subunits.
  • the label linked to the nucleotide undergoes a transition to an 'excited state' whereby it emits photons over a spectral range characterized by the identity of the emitting moiety.
  • the donor moiety must be sufficiently excited in order for FRET to occur.
  • Emissions may be detected using any suitable device.
  • detectors include but are not limited to microscopes, optical readers, high-efficiency photon detection systems, photodiodes (e.g. avalanche photo diodes (APD); APD arrays, etc.), cameras, charge couple devices (CCD), electron-multiplying charge-coupled device (EMCCD), intensified charge coupled device (ICCD), photomultiplier tubes (PMT), a muti-anode PMT, and a microscope equipped with any of the foregoing detectors.
  • the subject arrays contain various alignment aides or keys to facilitate a proper spatial placement of each spatially addressable array location and the excitation sources, the photon detectors, or the optical transmission element as described below.
  • characteristic signals from different independently labeled, nucleotides are simultaneously detected and resolved using a suitable detection method capable of discriminating between the respective labels.
  • the characteristic signals from each nucleotide are distinguished by resolving the characteristic spectral properties of the different labels. See, for example, Lakowitz, J.R., 2006, Principles of Fluorescence Spectroscopy, Third Edition.
  • Spectral detection may also optionally be combined and/or replaced by other detection methods capable of discriminating between chemically similar or different labels in parallel, including, but not limited to, polarization, lifetime, Raman, intensity, ratiometric, time-resolved anisotropy, fluorescence recovery after photobleaching (FRAP) and parallel multi-color imaging.
  • an image splitter such as, for example, a dichroic mirror, filter, grating, prism, etc.
  • a CCD typically a CCD
  • multiple cameras or detectors may be used to view the sample through optical elements (such as, for example, dichroic mirrors, filters, gratings, prisms, etc.) of different wavelength specificity.
  • optical elements such as, for example, dichroic mirrors, filters, gratings, prisms, etc.
  • suitable methods to distinguish emission events include, but are not limited to, correlation/anti-correlation analysis, fluorescent lifetime measurements, anisotropy, time- resolved methods and polarization detection.
  • Suitable imaging methodologies that may be implemented for detection of emissions include, but are not limited to, confocal laser scanning microscopy, Total Internal Reflection (TIR), Total Internal Reflection Fluorescence (TIRF), near- field scanning microscopy, far-field confocal microscopy, wide-field epi- illumination, light scattering, dark field microscopy, photoconversion, wide field fluorescence, single and/or multi-photon excitation, spectral wavelength discrimination, evanescent wave illumination, scanning two-photon, scanning wide field two-photon, Nipkow spinning disc, multi-foci multi -photon, and/or other forms of microscopy.
  • TIR Total Internal Reflection
  • TIRF Total Internal Reflection Fluorescence
  • the detection system may optionally include one or more optical transmission elements that serve to collect and/or direct the incident wavelength to the reactant array; to transmit and/or direct the signals emitted from the reactants to the photon detector; and/or to select and modify the optical properties of the incident wavelengths or the emitted wavelengths from the reactants.
  • suitable optical transmission elements and optical detection systems include but are not limited to diffraction gratings, arrayed wave guide gratings (AWG), optic fibers, optical switches, mirrors, lenses (including microlens and nanolens), collimators.
  • Other examples include optical attenuators, polarization filters (e.g., dichroic filters), wavelength filters (low-pass, band-pass, or high- pass), wave-plates, and delay lines.
  • the detection system comprises optical transmission elements suitable for channeling light from one location to another in either an altered or unaltered state.
  • optical transmission devices include optical fibers, diffraction gratings, arrayed waveguide gratings (AWG), optical switches, mirrors, (including dichroic mirrors), lenses (including microlens and nanolens), collimators, filters, prisms, and any other devices that guide the transmission of light through proper refractive indices and geometries.
  • the detection system comprises an optical train that directs signals from an organized array onto different locations of an array-based detector to simultaneously detect multiple different optical signals from each of multiple different locations.
  • the optical trains typically include optical gratings and/or wedge prisms to simultaneously direct and separate signals having differing spectral characteristics from each spatially addressable location in an array to different locations on an array-based detector, e.g., a CCD.
  • detection is performed using multifluorescence imaging wherein each of the different types of nucleotide is operably linked to a label with different spectral properties from the rest, thereby permitting the simultaneous detection of incorporation of all different nucleotide types.
  • each of the different types of nucleotide may be operably linked to a FRET acceptor fluorophore, wherein each fluorophore has been selected such that the overlapping of the absorption and emission spectra between the different fluorophores, as well as the the overlapping between the absorption and emission maxima of the different fluorophores, is minimized.
  • Detection of different nucleotide label is performed by observing two or more targets at the same time, wherein the emissions from each label are separated in the detection path.
  • Such separation is typically accomplished through use of suitable filters, including but not limited to band pass filters, image splitting prisms, band cutoff filters, wavelength dispersion prisms and dichroic mirrors, that can selectively detect specific emission wavelengths.
  • filters may optionally be used in combination with suitable diffraction gratings.
  • the detection system utilizes tunable excitation and/or tunable emission fluorescence imaging.
  • tunable excitation light from a light source passes through a tuning section and condenser prior to irradiating the sample.
  • tunable emissions emissions from the sample are imaged onto a detector after passing through imaging optics and a tuning section. The user may control the tuning sections to optimize performance of the system.
  • a number of labeling and detection strategies are available for base discrimination using the FRET technique.
  • different fluorescent labels may be used for each type of nucleotide present in the extension reaction with discrimination between the different labels based on the wavelength and/or the intensity of the light emitted from the fluorescent label.
  • a second strategy involves the use of fluorescent labels and quenchers.
  • certain nucleotides in the reaction mixture are labeled with a fluorescent label, while the remaining nucleotides are labeled with one or more quenchers.
  • each of the nucleotides in the reaction mixture is labeled with one or more quenchers.
  • Discrimination of the nucleotide bases is based on the wavelength and/or intensity of light emitted from the FRET acceptor, as well as the intensity of light emitted from the FRET donor. If no signal is detected from the FRET acceptor, a corresponding reduction in light emission from the FRET donor indicates incorporation of a nucleotide labeled with a quencher. The degree of intensity reduction may be used to distinguish between different quenchers.
  • the signal from the detector is converted into a digital signal with an
  • A-D converter and an image of the sample is reconstructed on a monitor.
  • the user can optionally select a composite image that combines the images derived at a number of different wavelengths into a single image.
  • the user can also specify that an artificial color system is to be used in which particular probes are artificially associated with specific colors. In an alternate artificial color system the user can designate specific colors for specific emission intensities.
  • Any combination of the above described labeling and detection strategies may be employed together in the same sequencing reaction. Depending on the number of distinguishable labels and quenchers used in any of the above strategies, the identities of one, two, three or four nucleotides may be determined in a single sequencing reaction.
  • Multiple sequencing reactions may then be run, rotating the identities of the nucleotides determined in each reaction, to determine the identities of the remaining nucleotides.
  • these reactions may be run at the same time, in parallel, to allow for complete sequencing in a reduced amount of time.
  • the identities of the incorporated nucleotides may be determined rapidly, for example in real time or near real time, as extension of the primer strand occurs, through FRET interactions between a intercalating dye donor moiety and a FRET acceptor moiety attached to the incoming nucleotide as it are incorporated into the newly synthesized strand by the polymerase.
  • the raw data generated by the detector represents multiple time-dependent fluorescence data streams comprising wavelength and intensity information.
  • the data may be analyzed using suitable methods to correlate the particular spectral characteristics of the emissions with the identity of the incorporated base.
  • such analysis is performed by means of a suitable information processing and control system.
  • the information processing and control system comprises a computer or microprocessor attached to or incorporating a data storage unit containing data collected from the detection system.
  • the information processing and control system may maintain a database associating specific spectral emission characteristics with specific nucleotides.
  • the information processing and control system may record the emissions detected by the detector and may correlate those emissions with incorporation of a particular nucleotide.
  • the information processing and control system may also maintain a record of nucleotide incorporations that indicates the sequence of the template molecule.
  • the information processing and control system may also perform standard procedures known in the art, such as subtraction of background signals.
  • An exemplary information processing and control system may incorporate a computer comprising a bus for communicating information and a processor for processing information.
  • the processor is selected from the Pentium.RTM, Celeron.RTM, Itanium.RTM, or a Pentium Xeon.RTM family of processors (Intel Corp., Santa Clara, Calif.). Alternatively, other processors may be used.
  • the computer may further comprise a random access memory (RAM) or other dynamic storage device, a read only memory (ROM) and/or other static storage and a data storage device such as a magnetic disk or optical disc and its corresponding drive.
  • RAM random access memory
  • ROM read only memory
  • the information processing and control system may also comprise other peripheral devices known in the art, such a display device (e.g., cathode ray tube or Liquid Crystal Display), an alphanumeric input device (e.g., keyboard), a cursor control device (e.g., mouse, trackball, or cursor direction keys) and a communication device (e.g., modem, network interface card, or interface device used for coupling to Ethernet, token ring, or other types of networks).
  • display device e.g., cathode ray tube or Liquid Crystal Display
  • an alphanumeric input device e.g., keyboard
  • cursor control device e.g., mouse, trackball, or cursor direction keys
  • a communication device
  • the detection system may also be coupled to the bus.
  • Data from the detection unit may be processed by the processor and the data stored in the main memory.
  • Data on emission profiles for standard nucleotides may also be stored in main memory or in ROM.
  • the processor may compare the emission spectra from nucleotide in the polymerase reaction to identify the type of nucleotide precursor incorporated into the newly synthesized strand.
  • the processor may analyze the data from the detection system to determine the sequence of the template nucleic acid.
  • the data will typically be reported to a data analysis operation.
  • the data obtained by the detection system will typically be analyzed using a digital computer.
  • the computer will be appropriately programmed for receipt and storage of the data from the detection system, as well as for analysis and reporting of the data gathered.
  • custom designed software packages may be used to analyze the data obtained from the detection system.
  • data analysis may be performed, using an information processing and control system and publicly available software packages.
  • available software for DNA sequence analysis include the PRISM.TM. DNA Sequencing Analysis Software (Applied Biosystems, Foster City, Calif.), the Sequencher.TM. package (Gene Codes, Ann Arbor, Mich.), and a variety of software packages available through the National Biotechnology Information Facility at website www.nbif .org/links/1.4. J .php.
  • Data collection allows data to be assembled from partial information to obtain sequence information from multiple polymerase molecules in order to determine the overall sequence of the template or target molecule.
  • the method further comprises sequencing one or more additional nucleic acid molecules, for example a second nucleic acid, in parallel with sequencing the first nucleic acid.
  • the rate of nucleotide sequencing determination (based on a single read of a nucleic acid template) is equal to or greater than 10 nucleotides per second, typically equal to or greater than 100 nucleotides per second.
  • the sequencing error rate will be equal to or less than 1 in 100,000 bases.
  • the error rate of nucleotide sequence determination is equal to or less than 1 in 10 bases, 1 in 20 bases, 3 in 100 bases, 1 in 100 bases, 1 in 1000 bases, and 1 in 10,000 bases.
  • test DNA will comprise a complete and intact chromosome.
  • methods disclosed herein may be performed in a multiplex fashion (including in array format), such that additional nucleic acid molecules are sequenced in parallel with a first nucleic acid molecule.
  • primer(s) that direct sequencing complexes to particular areas along the DNA strand are used to specifically determine sequence at those sites.
  • the DNA is denatured and hybridized to at least one site-specific primer, and extension is initiated via addition of appropriate components, such as polymerase, nucleotides (especially base-labeled nucleotides that produce long duration signals) and reaction buffer.
  • appropriate components such as polymerase, nucleotides (especially base-labeled nucleotides that produce long duration signals) and reaction buffer.
  • the extension products are then displayed and visualized.
  • donor-labeled primers may be used to more specifically identify sequence at multiple sites of incorporation.
  • several differentially labeled primers can be used to produce 'multiplexed' sequence information at resolvable sites along the DNA strand.
  • the priming sites need not be resolvable by the detection system.
  • resolution can be dispensed with as long as the signals emitted from the primer-incorporated nucleotide located on the strand do not affect the fluorescence of the other primer- incorporated nucleotides. Because the number of bases that span a pixel in current detection systems is approximately 700, many primer probes can be used to interrogate potential SNPs within a single pixel. Acceptor signal to noise ratios and FRET distances constraints define this upper limit.
  • No information site should interfere with data from any other information site (i.e., primer-incorporated acceptor-labeled FRET pairs are distributed along the strand at >10 nm separation, which is no closer than -30 bp) whereas the incorporated acceptor is closely positioned to the donor on the primer to produce a high FRET event (i.e., each primer-incorporated acceptor-labeled nucleotide is at or within the RO of the donor-acceptor pair).
  • each primer-incorporated acceptor-labeled nucleotide is at or within the RO of the donor-acceptor pair.
  • the donor and acceptor fluorophores must not be too close that they are quenched.
  • the donor is typically present within 15 bases from the 3' end of the primer.
  • Primers are long enough to specifically hybridize to their target site (i.e, an 8 base primer is likely to be unique in a 65 Kb template, with the occurrence frequency being related to base composition and type of DNA being sequenced - coding versus non-coding regions; Hardin et al., U.S. Patent No. 6,083,695).
  • Longer primers increase hybridization specificity and reduce the need to highly purify specific genomes or regions thereof.
  • a preferred primer length is 25-50 bases with a minimal spacing of one base between each primer.
  • a polymerase that is not significantly affected by the presence of the donor fluorophore within or immediately 5' to its binding site on the primer/DNA template is used, and this enzyme is additionally deficient in 3' to 5' exonuclease activity and strand displacement activity.
  • the template- ordered extension products may be optionally ligated at a low concentration (to favor intramolecular ligation events) to create a covalently-closed linear DNA strand that is comprised of annealed primers that are extended by a single base.
  • each donor- acceptor is optimally spaced to produce a distinct high FRET event.
  • a further variation of this method produces donor-acceptor pairs that are well separated after first incorporating the persistently labeled nucleotide 3' of the primer, by adding natural nucleotides to complete synthesis to the 5' end of the next annealed primer, followed by performing the optional ligation reaction.
  • DNA strands of approximately 100 Kb can be viewed in one field of view with existing real-time sequencing systems, and the field of view can be moved to increase the length of examined DNA.
  • nick spacing of 3-5 Kb produces resolvable complexes and reduces the risk that a sequence read from one strand will encounter a nick on the opposite strand, thereby terminating extension.
  • the relative distance of visible markers can also be used to determine which DNA site is being determined.
  • the immobilized DNA can be stained with a dye that is not involved in producing the detectable FRET event.
  • the double-or single-stranded nature of the DNA must be taken into account when one needs information about the immobilized DNA strand.
  • the methods and systems of the present disclosure can be combined with reported techniques where integrated fluorescence intensity measurements coupled with quantile analysis provides an accurate measure for the amount of DNA (Li et al., 2007). Analogous to a whole genome shotgun sequencing strategy, the entire genome sequence can be determined according to the present disclosure by sequencing many individual copies of the same or overlapping DNA fragments.
  • donor energy transfer capabilities are continuously optimized throughout the extension reaction, because new donor fluorophores constantly intercalate into the nascent strand, effectively positioning a new donor at a distance that will produce a higher efficiency FRET event relative to the more upstream donor that may have photobleached or, as a result of nucleotide incorporation and enzyme translocation, become too distant from the acceptor- labeled dNTP bound at the enzymatic active site.
  • the acceptor signal is increased as compared to signal generated in other systems that use a single donor fluorophore.
  • the disclosed methods and systems involve the use of tracking software searches along a donor intensity trajectory for acceptor signals (sequence information) originating from different regions along the same DNA strand, thereby permitting accurate placement of the relative locations of each independent sequence along the DNA strand and resulting in the simultaneous generation of multiple discrete and ordered sequence "reads" along the length of a single nucleic acid strand.
  • sequencing of long DNA strands will facilitate the identification of genomic rearrangements and improve the assembly accuracy of chromosomal sequences (e.g., correctly identifying independent HIV genomes; associating sequence reads with the correct maternal/paternal chromosome).
  • Sequence reads obtained according to the present disclosure can produce haplotype information and thus further facilitate accurate genome assembly. Production of haplotype information is especially important because it is shown to have more power than individual nucleotide variation in the context of association studies and in predicting disease risks (Stephens, Schneider et al, 2001; HapMap Project).
  • the first diploid genome sequence of a single human demonstrates that maternal and paternal chromosomes are 99.5% similar when genetic variation due to insertion and deletion is taken into account (Levy, Sutton et al., 2007). The combination of longer read lengths and discrete, ordered reads will facilitate correct assembly of the maternal and paternal chromosome sequences.
  • the methods, compositions and systems disclosed herein for sequencing of long DNA strands are capable of facilitating the identification of genomic rearrangements within the strand, improving the assembly accuracy of chromosomal sequences (especially in regions sharing a great deal of similarity), and improving copy number variation determination (especially for longer repeats).
  • the first diploid genome sequence of a single human demonstrates that maternal and paternal chromosomes are 99.5% similar when genetic variation due to insertion and deletion is taken into account (Levy et al., 2007). Thus, it will be critical to carefully track sequence information associated with each chromosome.
  • Example 1 Assessment of Intercalating Dyes for use as FRET donors
  • Various intercalating dyes were tested for use as a donor in donor-based detection of acceptor signals. These dyes are advantageous in that many donors could be present and exchanged and/or replenished during the extension reaction, thus allowing for extended donor lifetime.
  • the dyes tested were SYBR Green I, YOYO-I, YO-PRO-I, SYBR GOLD, and SYBR Green I with YOYO-I. Representative spectra are shown in Figure 2. Fluorescence intensities were observed using YOYO-I or SYBR Green I with short primer/template duplexes and with linear genomic DNA.
  • PEBN-coated substrates were prepared using a modified version of the procedure disclosed by Braslavsky et al, 2003, PNAS Vol. 100, No. 7, pp. 3960-3964. Briefly, glass coverslips were treated overnight in alkaline base-bath, rinsed in distilled water and then cleaned with 2% Micro-90 for 60 minutes with sonication and heat, followed by boiling in RCA solution (H 2 O: 30% NH 4 OH: 30% H 2 O 2 (6:4:1)) for 60 minutes (2x 30 minutes). The cleaned glass cover slips were then immersed in 2mg/ml polyallylamine for 10 minutes, and rinsed five times in water followed by an immersion in 2mg/ml polyacrylic acid for 10 minutes and rinsed five times in water.
  • the polyallylamine and polyacrylic immersions were repeated one more time.
  • the coverslips were then rinsed in water and coated with a 5mM EDC-Biotin amine solution in 1OmM MES buffer, pH 5.5 for 30 minutes.
  • the slides were then rinsed in MES buffer for 5 minutes, in water for 5 minutes and in a solution of 1OmM Tris, pH 8.0, 1OmM NaCl for 5 minutes.
  • the slides were coated with Neutravidin by incubating for 30 minutes in a solution comprising lmg/ml Neutravidin.
  • Figure 3 depicts average fluorescence intensities obtained from SYBR
  • the DNA was incubated with increasing concentrations of SYBR Green I (0.01X, 0.1X; 0.3X; 0.6X; IX; 2X and 5X SYBR Green I in IX KB buffer supplemented with 5OmM BME). Following addition of each successive concentration of SYBR Green I to the reaction chamber, the chamber and contents were irradiated with an Argon 488nm laser at 500 uW power, and data were collected at 25ms integration time.
  • ROI Regions of interest
  • FIG. 3A depicts the SYBR Green I average intensity in a given region of interest (ROI) relative to average background intensity at each given concentration of SYBR Green I.
  • ROI region of interest
  • SYBR Green I dye did not exhibit high background fluorescence in the absence of DNA even at higher dye concentrations.
  • increased fluorescence of SYBR Green I was observed in the presence of DNA, especially at higher dye concentrations.
  • An identical titration was carried out using the intercalator YOYO-I in the place of SYBR Green I ( Figure 3B).
  • FIG. 3B depicts the YOYO-I average intensity in a given region of interest (ROI) relative to average background intensity at each given concentration of YOYO-I.
  • ROI region of interest
  • the nicked product was then incubated with 6.8nM acceptor- labeled nucleotide dU-A1610 in the presence of a mutant from of Klenow polymerase that comprises the mutation D424A and lacks exonuclease activity (hereinafter referred to as "Klenow(exo) polymerase”) at a final enzyme concentration of 0.476nM at 37 0 C for 40 minutes.
  • Klenow(exo) polymerase a mutant from of Klenow polymerase that comprises the mutation D424A and lacks exonuclease activity
  • the nicked and labeled DNA was then incubated with the intercalator dye YOYO-I and then immobilized via injection of the dye-DNA mixture at lOpM concentration into a glass chamber formed of surfaces derivatized either with +-+-+ ( Figure 5A) or PEBN ( Figure 5B)
  • Acceptor intensity on the coated surfaces was visualized using fluorescence microscopy via direct excitation with Yellow HeNe laser (for visualization of acceptor-labeled nucleotide) or Argon 488nm laser (for visualization of YOYO-I labeled DNA) in the same field of view.
  • Example 4 Detection of incorporation of acceptor labeled nucleotides based on fluorescence overlap with intercalating dye donors
  • the nicked product was then incubated with 6.8nM dU-Cy5 ( Figure 6) or dU-A1610 ( Figure 7) in the presence of Klenow(exo-) polymerase at 37 0 C for 40 minutes, following which 1 OpM of the labeled ⁇ DNA was contacted with an imaging mix containing 30OnM YOYO-I and 5OmM BME in IX KB buffer.
  • the entire mixture was injected into a glass chamber formed by PEBN-coated glass surfaces as described in Example 1.
  • visualization of YOYO-I containing regions was achieved using an Argon 488nm laser, and visualization of acceptor-containing regions using a Red HeNe laser ( Figure 6).
  • FIG. 6 shows donor fluorescence; the second panel depicts fluorescence images of the labeled ⁇ DNA as seen in the acceptor channel due to fluorescence 'bleed' into said channel, gathered following excitation with an Argon 488nm laser. Fluorescent acceptors are visible as regions of increased fluorescence intensity as indicated by white arrows.
  • the third panel shows the same field of view imaged using Red HeNe excitation to visualize the location of incorporated acceptor labels.
  • the fourth panel shows the composite image generated via overlay of the second and third panels, confirming the location of incorporated acceptor.
  • Figure 8 depicts results of a study identical to that of Figure 6, except that the
  • DNA comprising incorporated acceptors was immobilized on +-+-+ surfaces instead of PEBN surfaces prior to visualization.
  • Example 5 Detection of incorporation of acceptor-labeled nucleotides into surface-immobilized DNA
  • ⁇ DNA was nicked as described in Example 2, and then immobilized on a PEBN coated surface as described in Example 1.
  • the nicked and immobilized DNA was then contacted with an extension reaction mix containing 300-90OnM YOYO-I, 6.8nM dU-Cy5 and Klenow(exo-) in KB buffer.
  • the extension mix was replaced by buffer containing 25mM Tris pH 7.6 and 5OmM BME. (In some experiments, the YOYO-I dye was included in the Tris-BME buffer instead of in the extension mix).
  • the reactions contained 20OnM primer/template duplex, 2.5uM base- labeled NTP and 60OnM klenow(exo).
  • the reaction was initiated by addition of the enzyme.
  • Typical results are shown in Figure 12.
  • Panel 12(A) shows a schematic for the use of native (non-engineered) duplex comprising multiple intercalated SYBR Green I dye donors. In this test system, FRET occurs between the intercalated donors and the incorporated base labeled dNTPs.
  • Panel 12(B) depicts a graphed time series of fluorescence signals of both donor and acceptor groups detected using a fluorometer.
  • the X axis represents time in seconds; the Y axis represents fluorescence intensity in arbitrary units (AU).
  • Panel 12(C) depicts a bar graph of normalized FRET data from a series of individual incorporation experiments performed in a cuvette. The donor-acceptor pairs are specified on the X axis.
  • the Y axis on the right shows the normalized increase in acceptor signal due to FRET.
  • the data are normalized by applying the formula (I A after enzyme injection) - (7 A before enzyme injection) / donor intensity at start.
  • the FRET efficiency, as well as normalized acceptor intensity, using SYBR Green I as a donor are higher than corresponding FRET efficiencies and normalized acceptor intensities for the Alexa 488 donor samples.
  • Panel 12(D) depicts a bar graph of the fold increase in acceptor intensity obtained using SYBR Green I, relative to acceptor intensity obtained using A1488 as the donor.
  • a biotinylated primer template duplex consisting either of a biotinylated derivative of the engineered duplex (which contains a single donor group, A1488, at the -7 position on the primer) or a biotinylated derivative of the native duplex (which does not contain any intrinsic donor, but into which SYBR Green I donor molecules have been intercalated via co-incubation of the native duplex with SYBR Green I at 0.1X concentration) was immobilized on a PEG surface via attachment of the biotin on the template strand of each duplex.
  • Each surface-immobilized duplex was subjected to an extension reaction in situ by injecting into the chamber an extension mixture containing 15OnM Klenow(exo-) polymerase, 0.5uM of base labeled dNTPs (dUTP-ROX or dUTP-Alexa610), 5OmM Tris pH 7.2, 2mM MnSO4, 1OmM Na 2 SO4, 2mM DTT, 0.1% Triton X-100 and 0.01% Tween-20.
  • the reactions were allowed to occur for 10 minutes followed by a IM NaCl rinse, the samples were then rinsed either with buffer alone, or with buffer supplemented with 0.1X SYBR Green I.
  • the samples were excited using an Argon 488nm laser at 46OuW and the data were collected at 300ms integration time for 150 frames.
  • the emitted signals were detected by a Roper Scientific back-illuminated EMCCD camera (Cascade 1), with an inverted Nikon microscope (TE 2000U), and a 6Ox oil objective.
  • the emitted light was separated using dichroic (560nm, 650nm) and band pass filters (535/50nm; 620/40nm).
  • FRETAN software was used to obtain donor and acceptor traces and perform
  • the FRETAN software is an automated analysis software that identifies each of the spots in the donor channel (taking into consideration noise thresholds), subtracts the background fluorescence, and identifies anti-correlated changes in the time courses of fluorescence at each acceptor wavelength to identify single pair FRET events, and computes approximately 50 attributes associated with FRET.
  • the FRETAN software see U.S. Provisional App. No. 60/765,693 filed February 6, 2006, and U.S. Published App. No. 2007/0250274 Al, published October 25, 2007, herein incorporated by reference in their entirety.
  • Panel 13(A) shows a schematic of the Alexa 488 FRET system.
  • Panel 13(B) shows an example trace of FRET between the donor Alexa 488 and an incorporated base labeled dUTP-ROX.
  • Panel 13(C) shows a schematic of the SYBR Green I donor FRET system.
  • Panel 13(D) shows an example trace of FRET between the intercalating dye SYBR Green I as the donor and an incorporated base labeled dUTP-ROX.
  • the acceptor signals detected with SYBR green I as the donor are brighter compared to the signals detected with Alexa488 as the donor.
  • Figure 14 shows single molecule FRET data comparing the FRET efficiency and acceptor intensities using A1488 or the intercalating dye SYBR Green as a donor. Scatter plots of acceptor intensity (on the Y axis) and FRET efficiency (on the X axis) for acceptor Alexa 610 and acceptor ROX are shown in Panels (A) and (B), respectively. The lighter grey circles indicate data points obtained using Alexa 488 as the donor and darker stars in both plots A & B indicate data points obtained using SYBR Green as the donor.
  • Panel (C) shows a schematic of Alexa488 and SYBR Green I-driven FRET.
  • Panel (D) shows a bar graph of Acceptor intensities driven by Alexa 488 or SYBR Green above two user-defined thresholds, i.e., 1500 AU or 2000 AU.
  • the darker bars (on the left) are acceptor signals above 1500 AU and lighter grey bars (on the right) are acceptor signals above 2000 AU.
  • only very small percentage of acceptor intensities is higher than the user-defined cut-off thresholds (i.e., 1500 AU or 2000 AU) when A1488 is used as the donor, whereas acceptor intensities using SYBR Green I as the donor are consistently higher than both thresholds.
  • Panel 15(A) depicts a time series of acceptor intensity over a 30-minute period with SYBR Green present only 5' of the incorporated Alexa 610 acceptor.
  • Panel 15(B) depicts a time series of acceptor intensity over a 30-minute period with SYBR Green I present both 5' and 3' of the incorporated Alexa 610 acceptor.
  • SYBR Green I present both 5' and 3' of the incorporated Alexa 610 acceptor.
  • Lambda ( ⁇ ) DNA was randomly nicked via incubation with DNase I, an enzyme that introduces random single-strand nicks into the DNA backbone.
  • the nicked DNA was labeled in solution with dU-Cy5 or dU-A1594 by incubating 1.33nM of DNA with 6.8nM dU-Cy5 or dU-A1594 in IX KB buffer in the presence of 0.002units/ ⁇ l DNasel and 457nM Klenow(exo) polymerase.
  • the reaction was incubated at 37°C for 40min, and then stopped by adding DNasel stop solution (Promega).
  • 13pM of the labeled DNA was added to imaging mix containing IXKB, 5OmM BME and 30OnM YoYoI.
  • the imaging mix containing labeled DNA was added to the dry surface of PEBN-coated glass chambers.
  • the bound DNA was then washed with 25mM Tris pH 7.6 containing 5OmM BME to remove excess YOYO-I and unincorporated nucleotides.
  • the bound DNA was further washed with an oxygen-scavenger containing solution made of 25mM Tris pH7.6, 5OmM BME, lmg/ml glucose oxidase, 0.04mg/ml catalase and 0.4% glucose.
  • Regions of FRET activity were imaged using an Argon 488nm laser (for detection of YOYO- 1 labeled DNA), or a Red HeNe laser (for detection of Cy5), or a Yellow HeNe laser (for A1594 detection).
  • the images collected with the two lasers were overlayed using MetaMorph software to confirm the presence of incorporated acceptor label. The overlay results are shown in Figure 16.
  • Example 8 Use of the disclosed methods to screen HIV genotype within particular patients
  • diagnostic SNPs within the HIV genome are determined.
  • Primers are constructed along each strand of the HIV genome (the HIV RNA genome may be either directly interrogated, or converted into dsDNA and then ssDNA prior to interrogation) and, preferably, the 3' end of each primer terminates at the site of a candidate SNP. Some primers may terminate at non-variant sites to serve as internal controls for sequencing reaction efficiency and accuracy. If dsDNA is present in the hybridization, snap cooling is preferred to promote primer-template duplex formation, rather than slow cooling which favors reannealing of the template strands.
  • the HIV population may be screened and therapeutic regime prescribed and modified, as needed.
  • This method can be used to determine the relationship of SNPs in any nucleic acid for any application - i.e., cancer or predicting predisposition to any genetically influenced or determined disease.
  • Example 9 Use of the disclosed methods to assign detected sequence variations to particular genotypes
  • sequence reads may produce the following four variants, each of which can be analyzed to determine a consensus read for this region of the HIV genome (see
  • Variant#l ACTGT ATACGTACGATGCTATGCATCGATTCGTAC
  • Variant#2 ACTGTATACGTACGGTGCTATGCATCGATTCGTAC
  • Variant#3 CATCGATTCGTACGTGCCTCGAGTTTCTG
  • Variant#4 CATCGATTCGTACGTGCCTCGAGCCTCTG
  • 'NN' in different combinations may be very different. For example, substituting 'A' with 'CC in the above consensus may be a genotype resistant to a particular drug therapy, whereas 'A' with 'TT' may be effectively treated with the same therapy.
  • the methods and systems of the present disclosure address the central problem of how to align 1, 2, 3 and 4 because they provide important information about the relationship between these different short sequence reads, i.e., whether they occur on the same or different viral genome.
  • Example 10 Software analysis of fluorescence data gathered from Lambda
  • Data was processed as follows. A user-defined region of interest (ROI) in an average image of the Lambda ( ⁇ ) DNA volume was segmented. Thresholding and spatial connectivity information were used to automatically segment the ROI in average image of
  • Lambda ( ⁇ ) DNA in the donor channel was segmented in the donor channel. Specifically, automatic thresholding followed by largest connected component analysis method was used to segment the Lambda ( ⁇ ) DNA in the donor as shown in Figure 18, Panels (A) and (B). Using the information about the spatial extent of the ROI in the donor channel, the Lambda ( ⁇ ) DNA in the acceptor channel is segmented in a similar way as shown in Figure 18, Panels (C) and (D).
  • the segmented ROI of both channels was registered using standard image registration techniques, as shown in Figure 19.
  • signals were extracted at every corresponding spatial location in donor and acceptor channel ROI and the normalized intensity of every spatially corresponding point in both channels was compared.
  • a criterion was defined as a function of donor intensity and acceptor intensity at a particular point to determine the eligibility of that spatial coordinate as an incorporated label in the Lambda ( ⁇ ) DNA. Notably, a higher intensity was observed in the acceptor channel only at those points, as compared to the donor channel.
  • Figure 20 shows co-localization of the detected points in the acceptor channel via Argon 488nm or Red HeNe excitation, confirming the accuracy — to the level of pixel registration — of the automated analysis.
  • compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, these embodiments are in no way intended to limit the scope of the claims, and it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the methods described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.

Abstract

Disclosed herein are novels methods for sequencing ordered segments of a single nucleic acid molecule in real time, by displaying a nucleic acid molecule in elongated format, manipulating the molecule to form a plurality of polymerase-accessible priming sites along the length of the nucleic acid molecule, initiating an extension reaction at one or more priming sites; monitoring signals emitted during the extension reaction; and analyzing the signals to determine the sequence of the molecule.

Description

UNITED STATES PATENT APPLICATION
TITLE: A METHOD AND SYSTEM FOR OBTAINING ORDERED,
SEGMENTED SEQUENCE FRAGMENTS ALONG A NUCLEIC ACID MOLECULE
INVENTOR: Alok N. Bandekar, Anelia Kraltcheva, Susan H. Hardin, Tommie Lloyd Lincecum, Jr., Uma Nagaswamy, Mitsu Sreedhar Reddy, Hongyi Wang
ASSIGNEE: INVITROGEN CORP.
[0001] This application claims priority to U.S. provisional application no. 60/981,803, filed on October 22, 2007.
FIELD OF THE INVENTION
[0002] Provided herein are methods, systems and compositions for sequencing nucleic acids. More particularly, provided herein are methods, systems and compositions suitable for realtime single molecule sequencing using ordered, segmented sequence fragments along a nucleic acid molecule.
BACKGROUND OF THE INVENTION
[0003] One of the most widely studied biological polymers is deoxyribonucleic acid (DNA), and most DNA studies involve sequence analysis. Traditional sequencing methods, commonly referred to as "first-generation" methods, require large quantities of the target DNA molecule to be sequenced using time and resource intensive processes. For example, Maxam-Gilbert sequencing involves the chemical cleavage of end-labeled fragments of DNA. The resulting fragments are then size separated by gel electrophoresis, and the sequence of the original end-labeled fragments is determined by analyzing the pattern of fragments produced by the gel. Read lengths using this approach are typically limited to approximately 500 nucleotides. Furthermore, such methods are lengthy, and frequently require amplification of the target DNA to obtain sufficient amounts of starting material. [0004] Other traditional DNA sequencing methodologies generally involve monitoring the activity of a sequencing enzyme, such as DNA polymerase, as it replicates a test DNA molecule by polymerizing monomelic subunits, such as dNTPs, to extend a primer into a newly synthesized DNA strand that complements the test molecule of interest. The polymerization products are analyzed after the sequencing reaction has been terminated, thereby adding to the length of the process. For example, Sanger-dideoxy sequencing involves elongation of an end-labeled nucleotide primer with random incorporation of chain terminating dideoxy nucleotides in four separate DNA polymerase reactions. As with the chemically cleaved DNA fragments in the Maxam- Gilbert method, the extension products must be size separated by gel electrophoresis and the nucleotide sequence may be determined from analyzing the pattern of fragments in the gel. Originally performed with radionucleotide labeled primers, today the use of four different fluorescently labeled dideoxynucleotides enables the sequencing reactions to be size separated in a single gel lane, facilitating automated sequence determination. Read lengths utilizing this approach are limited to approximately 1000 nucleotides, and the process can take a few hours to half a day to perform.
[0005] Collectively, these first-generation methods are hampered by the requirement for a relatively large amount of DNA substrate, the need for complex liquid handling steps, short read-lengths (typically on the order of 500-1000 nucleotides), and the complexity of the underlying biochemistry. In addition, these approaches are not well-suited for rapid sequencing of nucleic acid molecules. Thus, there is a need in the art for rapid polymeric sequencing methods and compositions, for example, sequencing from small amounts of target molecules or from a single nucleic acid molecule more rapidly than is currently feasible with conventional sequencing methods.
[0006] The last decade has seen the emergence of the so-called "next-generation" or "second generation" methods, characterized by increased sequencing throughput and data generation rates, associated with lower sequencing costs per base, faster throughput and greater sensitivity. Still, the goal of real-time sequencing of a single target molecule remains elusive. [0007] More recently, so-called "third generation" sequencing methods seek to sequence single target molecules in real time. These methods involve the monitoring of signals emitted by luminophores, fluorophores or other labels attached to various components of the sequencing machinery during the sequencing reaction. Typically, these methods involve monitoring of a single sequence "read" from a template, often in a "paired end read" format where priming of the sequencing reaction occurs at each terminus of a nucleic acid template. Some of these methods also require confinement of the sequencing reaction and/or the zone of signal detection to a narrow fixed region, so as to minimize interference from the environment. Finally, these methods generally do not provide information concerning secondary or tertiary structure, nor do they provide information concerning haplotype-based variations. Accordingly, there is still a need in the art for strategies that combine sequencing and DNA identity mapping for efficient and fast screening of DNA for certain macro and micro characteristics.
SUMMARY OF THE INVENTION
[0008] The methods and related compositions described herein represent significant advances over the current methods. For example, the disclosed methods permit the sequencing of long
DNA strands, thereby facilitating accurate assembly of contiguous extended nucleic acid sequences. Moreover, these methods readily facilitate high throughput sequencing in parallel, and ultimately allow the simultaneous sequencing of an entire genome rapidly and cheaply.
[0009] More specifically, provided herein are methods for sequencing at least a portion of a nucleic acid molecule in real time or near real time, comprising the steps of displaying a nucleic acid molecule; manipulating the nucleic acid molecule to form one or more polymerase-accessible priming sites along the length of the nucleic acid molecule, wherein the one or more priming sites are separated from each other by a length of nucleotides sufficient to permit independent detection and resolution of sequencing activity occurring at each priming site by a detection system; contacting at least a portion of the nucleic acid molecule with a polymerase solution and one or more detectably labeled components under such conditions that extension occurs from at least one priming site; monitoring signals emitted during the extension reaction by at least one detectably labeled component; and analyzing the signals in real or near real time to determine the sequence of at least a portion of the nucleic acid molecule.
[0010] Optionally, at least one of the detectably labeled components comprises a Forster resonance energy transfer (FRET) donor.
[0011] Optionally, at least one of the detectably labeled components comprises a FRET acceptor.
[0012] Optionally, at least one of the detectably labeled components comprises both a FRET donor and a FRET acceptor.
[0013] In some embodiments, at least one of the detectably labeled components is a polymerase operably linked to a FRET donor.
[0014] In some embodiments, the signals emitted during the extension reaction are a result of
FRET occurring between at least one detectably labeled component comprising a FRET donor group, and at least one detectably labeled component comprising a FRET acceptor group. [0015] In some embodiments, the signals emitted during the extension reaction are signals resulting from FRET between a FRET donor and the FRET acceptor.
[0016] In some embodiments, the signals emitted during the extension reaction are FRET signals resulting from energy transfer between at least one intercalated dye molecule and at least one nucleotide labeled with a FRET acceptor.
[0017] In some embodiments, at least one of the detectably labeled components is an intercalating dye.
[0018] In some embodiments, at least one detectably labeled component is a nucleotide operably linked to a FRET acceptor. In one embodiment, the FRET acceptor is attached to a portion of the nucleotide that is released upon incorporation of the nucleotide into a nascent nucleotide strand that is synthesized by the polymerase.
[0019] In yet another embodiment, the FRET acceptor is attached to a portion of the nucleotide that becomes incorporated into a nascent nucleotide strand synthesized by the polymerase, and the sequencing method further comprises the step of removing the acceptor after incorporation. In some embodiments, removing the acceptor after incorporation comprises photobleaching the acceptor after incorporation or, alternatively, photocleaving the acceptor after incorporation.
[0020] In some embodiments, displaying the single nucleic acid molecule comprises immobilizing the nucleic acid molecule by attachment to a substrate. Optionally, immobilizing a polynucleotide strand further comprises providing a substrate including a surface having a layer formulated to immobilize a polynucleotide strand or a plurality of polynucleotide strands in an elongated form. Optionally, each immobilized polynucleotide strand is attached to the substrate by at least one attachment site. In some embodiments, the immobilized polynucleotide strand is immobilized at a plurality of attachment sites situated along its length so that the strand is fixed to the substrate in an elongated form to minimize strand movement during subsequent processing steps.
[0021] In some embodiments, displaying the single nucleic acid molecule comprises introducing the molecule into a nanostructure adapted to receive and display the molecule.
[0022] In some embodiments, manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises annealing one or more oligonucleotide primers along the length of the nucleic acid molecule. In one embodiment, one or more oligonucleotide primers is a random primer. In some embodiments, one or more oligonucleotide primers is a site- specific primer. [0023] In one embodiment, manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises contacting the nucleic acid molecule with a nicking reagent adapted to form a plurality of polymerase-accessible nick sites along the length of the nucleic acid molecule.
[0024] In some embodiments, manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises treating the DNA with chemical or enzymatic nicking agents.
[0025] In some embodiments, the polymerase solution comprises at least one type of detectably labeled nucleotide. Alternatively, the detectably labeled nucleotides are added separately from the polymerase solution. In some embodiments, the detectably labeled nucleotides are added prior to, or after, the addition of the polymerase solution.
[0026] In some embodiments, the polymerase solution comprises at least two, three or four types of detectably labeled nucleotides.
[0027] Optionally, the detectable label of at least one type of detectably labeled nucleotide is a chromophore, fluorophore or luminophore. Optionally, the detectable label of at least one type of detectably-labeled nucleotide is selected from the group consisting of: ROX, Cy3,
Cy5, xanthine dye, fluorescein, cyanine, rhodamine, coumarin, acridine, Texas Red dye,
BODIPY, ALEXA, GFP, and a derivative or modification of any of the foregoing.
[0028] In some embodiments, the polymerase solution comprises a polymerase. Optionally, the polymerase is an RNA polymerase, DNA polymerase or reverse transcriptase. In some embodiments, the DNA polymerase is a Klenow fragment of DNA polymerase I, PM29 DNA polymerase, B54 DNA polymerase, 90N DNA polymerase, Vent DNA polymerase, Deep
Vent DNA polymerase, E. coli DNA polymerase I, T7 DNA polymerase, T4 DNA polymerase, Thermus acquaticus DNA polymerase, or Thermococcus litoralis DNA polymerase.
[0029] In some embodiments, the polymerase solution comprises at least one type of detectably labeled nucleotide comprising three, four or more phosphate groups.
[0030] In some embodiments, at least one detectably labeled component comprises a detectably labeled nucleotide, wherein the detectable label is operably linked to a terminal phosphate in the polyphosphate chain of the detectably labeled nucleotide.
[0031] In some embodiments, at least one detectably labeled component comprises a nucleotide operably linked to least two separate detectable labels. [0032] In some embodiments, the polymerase solution further comprises a detectably labeled polymerase.
[0033] In some embodiments, at least one of the detectably labeled components is a polymerase operably linked to a nanocrystal or other FRET donor.
[0034] In some embodiments, the nucleic acid molecule comprises chromosomal DNA.
[0035] In some embodiments, the nucleic acid molecule comprises an intact chromosome.
[0036] In some embodiments, the sequencing method further comprises sequencing one or more additional nucleotide strands in parallel with sequencing a first nucleotide strand according to the methods disclosed herein.
[0037] In some embodiments, the detectably labeled components comprise one type of detectably labeled nucleotide and a detectably labeled polymerase.
[0038] In some embodiments, the detectably labeled components comprise a fluorescent moiety that non-specifically associates with the template nucleic acid molecule along the length of the molecule. Optionally, the fluorescent moiety is a FRET donor.
[0039] In some embodiments, the fluorescent moiety is an intercalating dye. In some embodiments, the intercalating dye becomes absorbed into the polynucleotide strand and becomes fluorescently active upon absorption.
[0040] In some embodiments, the polymerase-accessible priming sites are separated by a length of nucleotides sufficient to separate the polymerase-accessible sites by a distance sufficient to permit independent detection of the polymerases on the polymerase-accessible nick sites via a detection system.
[0041] Also provided herein are methods for sequencing at least a portion of a nucleic acid molecule in real time or near real time, comprising the steps of immobilizing a nucleic acid molecule on a substrate; nicking the immobilized nucleic acid molecule to form one or more polymerase-accessible nick sites along the length of the strand; adding an intercalating dye and a polymerase solution, wherein the polymerase solution further comprises a polymerase and one or more detectably labeled nucleotides, under conditions such that an extension reaction is initiated at one or more polymerase-accessible nick sites along the length of the immobilized nucleic acid molecule; monitoring signals emitted during the extension reaction at one or more polymerase-accessible nick sites; and analyzing the signals in real or near real time to determine the sequence of at least some portion of the nucleic acid molecule.
[0042] In some embodiments, the extension reaction extends the polymerase-accessible nick site by a plurality of nucleotides, or by at least 10, 20, 50, 100, 250, 500 or 1000 nucleotides. [0043] In some embodiments, the extension reaction is monitored by a monitoring subsystem capable of visualizing extension activity along the strand at one or more polymerase- accessible nick sites.
[0044] In some embodiments, the extension reaction is monitored through detection of FRET signals arising from energy transfer from at least one intercalated dye molecule and at least one detectable label of a detectably labeled nucleotide.
[0045] In some embodiments, the sequencing method further comprises the step of converting the detected events into a sequence of identified nucleotides complementary to the non-nicked single strand at the nick site.
[0046] In some embodiments, the distance separating the nick sites is between about 1 Kb to about 250 Kb, between about 2 Kb to about 200 Kb, between about 3 Kb to about 100 Kb, between about 3 Kb to about 50 Kb, between about 3 Kb to about 10 Kb, between about 3 Kb to about 5 Kb, or between about 5 Kb to about 10 Kb.
[0047] Also provided herein is a system for sequencing a nucleotide strand by obtaining ordered sequence fragments along a polynucleotide strand, comprising a reaction chamber comprising a substrate on which at least one polynucleotide strand can be immobilized and nicked; a monitoring subsystem capable of detecting signals from extension activity occurring at the nick sites along the at least one polynucleotide strand; and an analyzing subsystem that converts the signals detected from extension activity into sequence information and then maps sequence fragments along the length of the at least one polynucleotide strand in such a manner that ordered sequence fragment information is obtained for nucleic acid identification and classification.
[0048] Also provided herein is a system for sequencing DNA by obtaining ordered sequence fragments along a polynucleotide strand, comprising a reaction chamber comprising a substrate on which at least one polynucleotide strand can be nicked and immobilized; a monitoring subsystem capable of detecting signals from extension activity occurring at the nick sites along the at least one polynucleotide strand; and an analyzing subsystem that converts the signals detected from extension activity into sequence information and then maps sequence fragments along the length of the at least one polynucleotide strand in such a manner that ordered sequence fragment information is obtained for nucleic acid identification and classification.
DESCRIPTION OF THE DRAWINGS [0049] The present disclosure can be better understood with reference to the following detailed description together with the appended illustrative drawings in which like elements are numbered the same:
[0050] Figure 1 depicts a visual characterization of one embodiment of the sequencing methods and systems of the present disclosure.
[0051] Figure 2 depicts fluorescent spectra of four intercalating dyes.
[0052] Figure 3A depicts SYBR Green I average intensity from a user-defined Region of
Interest (ROI) containing a DNA fragment, relative to average background intensity.
[0053] Figure 3B depicts YOYO-I average intensity within a user-defined ROI, relative to average background intensity.
[0054] Figure 4A depicts spectra of YOYO-I and four fluorescent acceptors.
[0055] Figure 4B depicts spectra of a quantum dot (Qdot 525) and four fluorescent acceptors.
[0056] Figure 5 depicts images of background fluorescence from acceptor-labeled nucleotide on glass substrates coated with H — I — h and PEBN layers.
[0057] Figure 6 depicts images of DNA nicked with a site-specific nickase, incubated with acceptor-labeled nucleotide (dU-Cy5) and polymerase, mixed with the intercalating dye
YOYO-I and immobilized on a PEBN-coated surface.
[0058] Figure 7 depicts images of DNA nicked with a site-specific nickase, incubated with
FRET acceptor-labeled nucleotide (dU-A1610) and polymerase, mixed with the intercalating dye YOYO-I and immobilized on a PEBN-coated surface.
[0059] Figure 8 depicts images of DNA nicked with a site-specific nickase, incubated with acceptor-labeled nucleotide (dU-Cy5) and polymerase, mixed with the intercalating dye
YOYO-I and immobilized on a surface coated with successive layers of polyallylamine and polyacrylic acid.
[0060] Figure 9 depicts images of DNA nicked with a site-specific nickase, immobilized on a PEBN-coated surface, incubated with acceptor-labeled nucleotides and polymerase, and mixed with the intercalating dye YOYO-I.
[0061] Figure 10 depicts pictorially one exemplary embodiment of the sequencing compositions and methods disclosed herein, using a donor labeled polymerase, acceptor labeled nucleotides, and unlabeled surface-immobilized DNA template. [0062] Figure 11 illustrates the detection of incorporation events that occur using the methodology of Figure 10, and depicts a comparison of various attributes of donor segments, made between different polymerases binding to immobilized duplex on a surface. [0063] Figure 12 depicts a schematic for monitoring FRET emissions arising from incorporation of base-labeled nucleotides in real time.
[0064] Figure 13 depicts assessment of acceptor signal using SYBR Green I as the FRET donor.
[0065] Figure 14 depicts assessment of acceptor signal using SYBR Green I as the FRET donor in a single molecule assay.
[0066] Figure 15 depicts assessment of acceptor signal using SYBR Green I as the FRET donor.
[0067] Figure 16 depicts images of Lambda (λ) DNA incubated with a mixture containing DNAse I, acceptor-labeled nucleotides and polymerase, then mixed with the intercalating dye YOYO-I, and immobilized on PEBN-coated surfaces.
[0068] Figure 17 depicts a real-time incorporation trace at right showing 20 ASN with base- labeled dNTP ("BL dNTP"); donor dipping is clearly detectable in the trace. [0069] Figure 18 depicts results of depicts overlay of the segmented object after automatic segmentation and registration of the fluorescence image to identify FRET events. Panel 18(A) depicts an average intensity image in the donor channel and the user defined box represents the region of interests (ROI) in the image. Panel 18(B) depicts the overlay of the segmented object on top of the average intensity image in the donor channel. Panel 18(C) depicts an average intensity image in the acceptor channel and the user defined box represents the ROI in the image. Panel 18(D) depicts the overlay of the segmented object on top of the average intensity image in the acceptor channel.
[0070] Figure 19: Panel 19(A) depicts the overlay of the segmented object on top of the average intensity image in the donor channel. Figure 19(B) depicts the overlay of the segmented object on top of the average intensity image in the acceptor channel. Panel 19(C) depicts the overlay of the registered object with respect to the donor channel on top of the average intensity image in the acceptor channel.
[0071] Figure 20: The top-most graph of Panel 20(A) depicts the normalized intensity profile of the segmented object in the donor channel. The middle graph of Panel 20(A) depicts the normalized intensity profile of the segmented object in the acceptor channel. The bottom-most graph of Panel 20(A) depicts the thresholded intensity function, computed using the formula E = NIA / NIA+NID where MA and MD are the fluorescence intensities of the acceptor and donor molecules respectively. Panel 20(B) depicts co-localization of the detected points in the acceptor channel via Argon 488nm or Red HeNe excitation, confirming the accuracy to the level of pixel registration of the automated analysis.
DETAILED DESCRIPTION
[0072] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of ordinary skill in the art to which these inventions belong. All patents, patent applications, published applications, treatises and other publications referred to herein, both supra and infra, are incorporated by reference in their entirety. If a definition and/or description is set forth herein that is contrary to or otherwise inconsistent with any definition set forth in the patents, patent applications, published applications, and other publications that are herein incorporated by reference, the definition and/or description set forth herein prevails over the definition that is incorporated by reference.
[0073] The practice of the present disclosure will employ, unless otherwise indicated, conventional techniques of molecular biology, microbiology and recombinant DNA techniques, which are within the skill of the art. Such techniques are explained fully in the literature. See, for example, Sambrook, J., and Russell, D.W., 2001, Molecular Cloning: A Laboratory Manual, Third Edition; Ausubel, F.M., et al., eds., 2002, Short Protocols In Molecular Biology, Fifth Edition.
[0074] As used herein, the term "a" or "an" means "at least one" or "one or more". [0075] As used herein, the terms "comprising" (and any form or variant of comprising, such as "comprise" and "comprises"), "having" (and any form or variant of having, such as "have" and "has"), "including" (and any form or variant of including, such as "includes" and "include"), or "containing" (and any form or variant of containing, such as "contains" and "contain"), are inclusive or open-ended and do not exclude additional, unrecited additives, components, integers, elements or method steps.
[0076] As used herein, the terms "a," "an," and "the" and similar referents used herein are to be construed to cover both the singular and the plural unless their usage in context indicates otherwise. Accordingly, the use of the word "a" or "an" when used in conjunction with the term "comprising" in the claims or specification may mean "one," but it is also consistent with the meaning of "one or more," "at least one," and "one or more than one." [0077] All of the compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, these embodiments are in no way intended to limit the scope of the claims, and it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the methods described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
[0078] The sequencing methods, systems and compositions of the present disclosure collectively provide rapid sequencing of a single polymeric molecule of interest, such as a nucleic acid, by monitoring of emitted signals. More specifically, the present disclosure provides a method for obtaining ordered, segmented sequence fragments along a polynucleotide strand.
[0079] To perform nucleotide sequencing according to the present disclosure, the nucleic acid molecule to be sequenced is oriented, or otherwise displayed, in a spatially addressable before or after being subjected to an extension (i.e., sequencing) reaction in situ. Either prior to or after display and/or extension, the nucleic acid molecule is also treated so as introduce or form a plurality of polymerase-accessible priming sites along the length of the molecule, where adjacent priming sites on a strand are separated by a length of nucleotides sufficient to permit independent detection and resolution by a detection system. Treatment of the nucleic acid molecule to introduce priming sites may be performed before, after or concurrently with the elongation/display step and/or the extension step; the order of steps is immaterial. At some point in the process, the nucleic acid molecule is contacted with a polymerase solution and with other components of the sequencing machinery under conditions such that the polymerase extends the nucleic acid strand from at least one priming site by polymerizing nucleotides onto a free 3' end of the nucleic acid molecule. The polymerase and/or components of the sequence machinery are operably linked to, or otherwise associated with, detectable labels that emit signals as the sequencing reaction proceeds. These signals are detected and analyzed in real time or near real time to obtain sequence information for at least some portion of the nucleic acid molecule. An overview of a sequencing method and system typical of the present disclosure is presented in Figure 1.
[0080] Typically, the nucleic acid molecule to be sequenced is DNA or RNA; however, in some cases it can also be a polymer comprising nucleotide analogs capable of polymerization by a polymerase.
[0081] Typically, the nucleic acid is displayed by maintaining it in a spatially addressable format, such that signals emitted from a specific and discrete point, portion, region or terminus of the nucleic acid molecule can be visualized, resolved assigned to their site of origin on the nucleic acid molecule, and tracked over time. Any suitable method may be used to display the nucleic acid molecule, including but not limited to fixation or immobilization of the molecule on a surface, suspending the nucleic acid molecule in a laminar flow stream, passing the nucleic acid molecule to a nanopore, confining the nucleic acid molecule within a waveguide or within a suitable nanostructure, e.g., a nanoture, nanowell, nanochannel or the like, adapted to receiving and display the nucleic acid molecule, or using optical tweezers to hold and restrict the nucleic acid molecule during the detection step. Alternatively, the nucleic acid molecule may be displayed by moving the molecule relative to a detection station, such that signals emitted from the molecule are tracking along the length of the molecule and assigned to their point of original along the nucleic acid molecule. [0082] In one embodiment, the nucleic acid molecule is displayed by fixing or otherwise immobilizing the nucleic acid molecule on a two-dimensional surface on a substrate. Any suitable substrate may be used for immobilization of the nucleic acid molecule, including substrates that exhibit non-specific adherence to nucleic acid, or substrates to which nucleic acid molecules can be bound. Preferably, the nucleic acid molecule is immobilized by contacting it with a substrate including a surface having a layer formulated to immobilize a polynucleotide strand or a plurality of polynucleotide strands in an elongated form. The nucleic acid molecule may be bound to the surface of the substrate via a plurality of attachment sites situated along its length, so that the strand is fixed to the substrate in an elongated form to minimize strand movement during subsequent processing steps. [0083] In other embodiments, individual polymeric molecules are displayed and elongated using a nanofluidic device comprising a nanochannel array, wherein the entire sample population is elongated and displayed in a spatially addressable format. As disclosed herein, the use of nanofluidic devices to isolate and sequence a target polymer of interest, in combination with FRET-based analysis provides significant advantages. For example, the use of nanofluidic devices for separation and isolation of test polymeric molecules bypasses the requirement for immobilization or attachment of sequencing components to a substrate and also enables the sequencing of intact chromosomes, thereby exponentially increasing the amount of sequencing information obtained from a single reaction and also enabling analysis of such "macro" structural features as methylation, inversions, indels and tandem repeats. In some embodiments, nanofluidic devices that permit the simultaneous observation of a high number of macromolecules in a multitude of channels can be employed. Such devices increase the amount of sequence information obtainable from a single experiment and decrease the cost of sequencing of an entire genome. See, for example, U.S. Published App. No. 2004/0197843. Furthermore, by using semiconductor nanocrystals or analogs thereof operably linked to polymerase activity, polymer sequence data can be generated as labeled monomers are incorporated into a newly synthesized polymer strand by a polymerase, thus enabling the sequencing of polymers in real time. Moreover, the nanofluidic-based sequencing methods disclosed herein can be used to rapidly obtain both "raw" sequence at the single nucleic acid molecule level as well as validation of incoming sequence information via simultaneous priming at multiple points along the template strand.
[0084] Manipulation of the DNA includes without limitation any method of treatment that results in formation of one or more priming site along the length of a nucleic acid molecule, while preserving the ability to obtain resolvable sequence information from the molecule and maintaining the structural integrity of the molecule, i.e., will not induce fragmentation, degradation, disruption or complete breakage of the nucleic acid molecule. Preferably, the manipulation is performed in such as manner as to ensure that independent sequence information can be determined from each priming site spaced along the length of the DNA strand. Optionally, the manipulation is refined to obtain optimal spacing of priming sites, such that the priming sites are separated from each other by a length of nucleotides sufficient to make their termini accessible to a polymerase for subsequent extension. The separation length, or distance, between the priming sites can vary between about 1 Kb (kilobase) to about 250 Kb along the nucleic acid strand. In certain embodiments, the separation distance between adjacent priming sites is between about 3 Kb and about 5 Kb along the length of the nucleic acid molecule.
[0085] One suitable method of forming priming sites on a test nucleic acid molecule is by annealing the nucleic acid molecule with one or more primers. The primers may either be random, or may be adapted to bind only to certain portions of the nucleic acid molecule. [0086] Another typical method used to introduce priming sites into a nucleic acid molecule involves nicking of the molecule by chemical or enzymatic means, or with nicking reagents. Suitable nicking reagents include without limitation any reagent capable of creating a nick in either strand of the polynucleotide, such as, for example, exonucleases, DNases, chemical reagents such as the glycogen product from Fermentas International, Inc., of Ontario, Canada, or any other chemical or biological system capable of introducing nick sites into one or both strands of a nucleic acid molecule. The limited nicking reaction can be performed in solution prior to nucleic acid immobilization as described in Zasloff and Camerini-Otero, 1980, or the limited nicking reaction can be performed after nucleic acid immobilization to ensure that most if not all of the nick sites are polymerase-accessible, if an enzyme is used to nick the polynucleotide.
[0087] In certain embodiments, following the nicking step, the frequency of extendable nicks is characterized by incorporating a base-labeled nucleotide at the nick sites in solution. The resulting nucleic acid comprising nicked termini is then immobilized on a substrate and visualized using a single-molecule detection system by either direct excitation of the acceptor, or by detection of FRET between a donor dye used to stain the DNA (e.g., SYBR Green I, YOYO-I or similar intercalating or groove-binding dye) and the incorporated acceptor, as described herein.
[0088] In the methods and systems of the present disclosure, the nucleic acid molecule must be contacted with a polymerase solution and at least one detectably labeled component. Such contacting may be achieved by any suitable means, in any phase, in any order of addition of reagents, and under any suitable conditions that permit polymerization of nucleotides to ultimately occur.
[0089] Typically, the polymerase solution polymerase solution comprising a polymerase, or any agent that is capable of polymerizing monomeric subunits into polymers, and a suitable buffer. In some embodiments, other components of the sequencing machinery, such as nucleotides or nucleotide analogs that can be polymerized into the extending strand by the polymerase, are included in the polymerase solution. Alternatively, these additional components may be added to the extension reaction, or otherwise contacted with the nucleic acid template, at any time in the procedure. The exact order of addition of any components of the extension reaction is immaterial, any and all components of the sequence/extension reaction may be added to, or otherwise contacted with, the nucleic acid molecule to be sequenced in any order whatsoever that permits productive extension via incorporation of nucleotides into the extending strand.
[0090] Typically, the polymerase and/or other components of the sequencing machinery are operably linked to, or otherwise associated with, a detectable label. In some embodiments, the nucleotides of the polymerase solution are detectably labeled. When a labeled nucleotide enters the enzyme active site, a high efficiency FRET event occurs via energy transfer to the acceptor from donor intercalated dye molecules located both 5' (i.e., upstream), or 3' (i.e., downstream), or both 5' and 3', of the nucleotide incorporation site.
[0091] Typically, the detectable signal is a FRET signal generated between a FRET donor moiety and a FRET acceptor moiety. Forster resonance energy transfer (FRET) is a distance- dependent interaction between the electronic excited states of two molecules, during which energy is transferred non-radiatively from the first excited molecule (called a FRET donor) to the second molecule, called a FRET acceptor, which may then emit a photon. The process of energy transfer results in a reduction (quenching) of fluorescence intensity and excited state lifetime of the FRET donor, and can produce an increase in the emission intensity of the FRET acceptor. FRET occurs only when two appropriately labeled molecules or moieties are sufficiently proximal to each other to transfer energy. Visualization of a FRET event can be achieved via detection of the FRET signal induced by energy transfer from the FRET donor dye moiety to the FRET acceptor moiety.
[0092] In some embodiments, the FRET acceptor is attached to a nucleotide, and the FRET donor is operably linked to, or otherwise directly or indirectly associated with, any component of the sequencing machinery such as the polynucleotide backbone (e.g., phosphate groups), the polynucleotide bases, or the polymerase.
[0093] Preferably, the FRET donor is constantly replenished as the sequencing reaction progresses, thus allowing extended read lengths to be obtained. One example of a replenishing donor is dye that binds with high affinity in a sequence-independent fashion to the nucleic acid, including but not limited to intercalating dyes. Alternatively, the replenishing donor can be a labeled DNA polymerase, which can be replenished by exchanging enzymes as the sequencing reaction progresses. One embodiment of a system using a replenishing donor-labeled polymerase is shown in Figure 10, and involves the use of an unlabeled DNA template immobilized via attachment to a surface, with donor labeled polymerase and gamma-labeled dNTPs. Such a system provides several benefits. For example, the donor can be replenished by exchanging enzymes. Further, there is no concern of the duplex disassociating from the enzyme complex. Moreover, incorporation will only occur and be detected when a donor (enzyme) binds to the duplex. This is indicated by the detection of FRET signals between the donor-labeled polymerase and the acceptor-labeled gamma-dNTPs. For this embodiment, the use of less processive polymerases is beneficial to the experimental setup because it allows for more rapid exchange of the donor and the donor is less likely to photo-bleach. Experiments are being carried out to determine the most appropriate enzyme for this method. For example, mutant polymerases that have increased activity with gamma-modified dNTPs but exhibit decreased processivity may be used. [0094] Alternatively, in some embodiments an ultra-stable donor such as a nanocrystal may be used in place of a replenishing donor. One exemplary embodiment of this method includes a donor nanocrystal stably attached to the polymerase.
[0095] In one typical embodiment, the FRET donor is an intercalating dye molecule or other fluorescent moiety that has a high affinity for polynucleotides and spontaneously intercalates itself between the bases of, or otherwise associates itself with, a nucleic acid molecule, producing increased fluorescence in the intercalated or associated state. As the polymerase moves along the nucleotide template, intercalated dye molecules are displaced from a direction 3' of the polymerase and absorbed into a 5' direction of the DNA, thereby constantly replenishing as the polymerization reaction proceeds. When the acceptor-labeled nucleotide is positioned within the polymerase active site for incorporation into the newly synthesized DNA strand by the polymerase, it undergoes FRET with the donor, resulting in emission of a FRET signal that can be detected and characterized. As the acceptor attached to the incorporated nucleotide is removed, a second detectably labeled nucleotide will enter the active site and produce a second high efficiency FRET event. As the polymerase extends the newly synthesized strand by successively adding labeled monomers to the free 3' end of the strand in a template-dependent fashion, the identity of each successive incoming monomer bound and incorporated by the polymerase will be identifiable by the emission spectrum of the FRET acceptor attached to that particular monomer. Accordingly, the base sequence of the newly synthesized strand can be identified by detection and characterization of the time- sequence of FRET events, as described below.
[0096] Typically, the template DNA strand is treated to introduce a multiplicity of priming sites along the length of the strand in such a manner that the priming sites are optimally spaced apart to allow independent detection and resolution. The limits of detection and resolution will depend on the capabilities of the particular detection system employed in the disclosed methods.
[0097] One significant advantage of the disclosed methods and systems arises from the treatment of the nucleic acid molecule to produce multiple, optimally spaced, priming sites along the length of the molecule, each of which can be primed and monitored as an independent extension event. By simultaneously obtaining multiple "reads" from a single strand, the information obtained from each study is exponentially enhanced. Specifically, each sequencing complex along the strand provides not only sequence information about a region contained within the extended fragment, but also information about the placement of each sequence read relative to others obtained from the same strand. In other words, each sequence read along the strand is both discrete and ordered.
[0098] Following the contacting step, wherein the nucleotide polymerase enzymes in the polymerase solution recognize and bind to the priming sites and initiate extension at the priming site by polymerization of nucleotides and elongation from the priming site, detection of signals is performed. Real-time sequencing is achieved by monitoring emissions from the detectable labels attached to various components of the polymerase solution as the extension reaction proceeds.
[0099] In some embodiments, the progress of the sequencing or extension reaction can also be tracked by detecting the dip in donor intensity that accompanies any FRET event involving energy transfer between the FRET donor and acceptor moieties is. The ability to detect a dip in donor intensity likely depends on a variety of conditions. Dips in donor intensities can be monitored using standard detection systems.
[00100] Optionally, following the nicking reaction, the number of donor fluorophores associated with the nucleic acid can be varied to maximize acceptor FRET. Optimal spacing between a donor fluorophore and an acceptor on the incorporating nucleotide should be closer than the R of the donor-acceptor pair so that high FRET results. In certain embodiment, the FRET efficiency is greater than about 80%. If too few donor fluorophores interact with the nucleic acid, the donor fluorophores can be spaced to far apart for adequate FRET signal to noise ratio - adequate FRET detection. However, too many intercalated donor fluorophores may result in signal quenching.
[00101] Optionally, polymerases used in the methods and systems of this disclosure are first analyzed with regard to donor duration and donor signal frequency over the collection time. The donor signals are assigned as segments of excited (digital unit), and dark (digital zero) depending on their intensities compared to the noise level. The excited donor segments are denoted by a horizontal dark green bar and the dark regions are denoted by horizontal black bars (figure below). The number of donor segments of the excited state is extracted for every donor in the field of view and attributes of these segments such as the duration, intensity and frequency are analyzed. A comparison of these attributes of donor segments, made between different polymerases binding to immobilized duplex on a surface, is shown in Figure 11.
[00102] It will be readily appreciated by persons of skill in the art of nucleotide sequencing that the specific order of steps in the disclosed methods is not defined or mandatory, and can be varied at will. For example, in some embodiments, formation of priming sites and/or extension can be performed prior to display; in other embodiments, priming and/or extension is performed after the nucleotide is immobilized. In yet other embodiments, formation of priming sites and/or immobilization may precede extension. All of these permutations and combinations, as well as any others that preserve the spirit and scope of the invention, may be used according to the present disclosure, and are contemplated to be within the spirit and scope of the present invention.
[00103] Removal of the detectable label of a nucleotide following incorporation of the nucleotide into the newly synthesized DNA strand by the polymerase can be accomplished by any suitable means. Typically, removal is accomplished by enzymatic cleavage upon incorporation, as will occur when the detectable label comprising the FRET acceptor is attached to a portion of the nucleotide that is released during incorporation (e.g., a pyrophosphate group with or without an associated linker; a fluorophore) as a natural byproduct of polymerase activity. Such labels are commonly referred to as "non-persistent" acceptors. In other embodiments, the detectable label comprising the FRET acceptor is a "persistent" label, i.e., it remains attached to the portion of the nucleotide that is incorporated into the elongating nucleotide strand, and thus is also incorporated into the newly synthesized portion of the nucleotide strand. When persistent acceptors are used, then the acceptor will have to be either photobleached or photocleaved after incorporation, or the acceptor will contribute to the signals emitted by the next incoming nucleotide until the persistent acceptor permanently photobleaches.
[00104] Any suitable polymerase may be used that is capable of polymerizing monomeric subunits into polymers. Preferably, the polymerase is a nucleotide polymerase, i.e., a polymerase that can polymerize nucleotides such as DNA or RNA polymerases that polymerize DNA, RNA or mixed sequences, into extended nucleic acid polymers. Generally, the nucleotide polymerase will elongate a pre-existing polynucleotide strand, typically a primer, by polymerizing nucleotides on to the 3' end of the strand. Also preferred are polymerases that can be isolated from its host in sufficient amounts for purification and use and/or genetically engineered into other organisms for expression, isolation and purification in amounts sufficient for use, as well as mutants or variants of native polymerases having one or more amino acids replaced by amino acids amenable to attaching an atomic or molecular label, which have a detectable property. Exemplary polymerases include without limitation DNA polymerases, RNA polymerases and reverse transcriptases.
[00105] In a preferred embodiment, the polymerase is a DNA polymerase. Suitable nucleotide polymerases that may be used to practice the methods disclosed herein include without limitation any naturally occurring nucleotide polymerases as well as mutated, truncated, modified, genetically engineered or fusion variants of such polymerases. Known conventional naturally occurring DNA polymerases include without limitation bacterial DNA polymerases, eukaryotic DNA polymerases, archaeal DNA polymerases, viral DNA polymerases and phage DNA polymerases. Suitable bacterial DNA polymerase include without limitation E. coli DNA polymerases I, II and III, IV and V, the Klenow fragment of E. coli DNA polymerase (including mutants thereof, such as mutants lacking exonuclease activity), Clostridium stercorarium (Cst) DNA polymerase, Clostridium thermocellum (Cth) DNA polymerase and Sulfolobus solfataricus (Sso) DNA polymerase. Suitable eukaryotic DNA polymerases include without limitation the DNA polymerases α, δ, ε, η, ζ,, β, σ, λ, μ, ι, and K, as well as the Revl polymerase (terminal deoxycytidyl transferase) and terminal deoxynucleotidyl transferase (TdT). Suitable viral DNA polymerases include without limitation T4 DNA polymerase, T7 DNA polymerase, Phi29 DNA polymerase (also referred to herein as Phi-29 polymerase) and mutated and/or engineered PM29 DNA polymerases, including mutants lacking exonuclease activity. Suitable archaeal DNA polymerases include without limitation the thermostable and/or thermophilic DNA polymerases such as, for example, DNA polymerases isolated from Thermus aquaticus (Taq) DNA polymerase, Thermus filiformis (Tfi) DNA polymerase, Thermococcus zilligi (Tzi) DNA polymerase, Thermus thermophilus (Tth) DNA polymerase, Thermus flavus (TfI) DNA polymerase, Pyrococcus woesei (Pwo) DNA polymerase, Pyrococcus furiosus (Pfu) DNA polymerase as well as Turbo Pfu DNA polymerase, Thermococcus litoralis (TIi) DNA polymerase or Vent DNA polymerase, Pyrococcus sp. GB-D polymerase, "Deep Vent" DNA polymerase, New England Biolabs), Thermotoga maritima (Tma) DNA polymerase, Bacillus stearothermophilus (B st) DNA polymerase, Pyrococcus Kodakaraensis (KOD) DNA polymerase, Pfx DNA polymerase, Thermococcus sp. JDF-3 (JDF-3) DNA polymerase, Thermococcus gorgonarius (Tgo) DNA polymerase, Thermococcus acidophilium DNA polymerase; Sulfolobus acidocaldarius DNA polymerase; Thermococcus sp. 9° N-7 DNA polymerase; Pyrodictium occultum DNA polymerase; Methanococcus voltae DNA polymerase; Methanococcus thermoautotrophicum DNA polymerase; Methanococcus jannaschii DNA polymerase; Desulfurococcus strain TOK DNA polymerase (D. Tok Pol); Pyrococcus abyssi DNA polymerase; Pyrococcus horikoshii DNA polymerase; Pyrococcus islandicum DNA polymerase; Thermococcus fumicolans DNA polymerase; Aeropyrum pernix DNA polymerase; and the heterodimeric DNA polymerase DP1/DP2. Similarly, suitable RNA polymerases include, without limitation, T7, T3 and SP6 RNA polymerases. Suitable reverse transcriptases include without limitation reverse transcriptases from HIV, HTLV-I, HTLV-II, FeLV, FIV, SIV, AMV, MMTV and MoMuLV, as well as the commercially available "Superscript" reverse transcriptases (Invitrogen) and telomerases. In addition to naturally occurring polymerases, the methods and systems disclosed herein may also be practiced using any subunits, mutated, modified, truncated, genetically engineered or fusion variants of naturally occurring polymerases (wherein the mutation involves the replacement of one or more or many amino acids with other amino acids, the insertion or deletion of one or more or many amino acids, or the conjugation of parts of one or more polymerases) non-naturally occurring polymerases, synthetic molecules or any molecular assembly that can polymerize a polymer having a pre-determined or specified or templated sequence of monomers may be used in the methods disclosed herein. In particular, polymerases that retain the desired levels of processivity when conjugated to a donor or acceptor fluorophore are preferred. Also preferred are polymerases that are selected and/or engineered to exhibit high fidelity with low error rates. The term "fidelity" as used herein refers to the accuracy of nucleotide polymerization by a given template-dependent nucleotide polymerase. The fidelity of a nucleotide polymerase is typically measured as the error rate, i.e., the frequency of incorporation of a nucleotide in a manner that violates the widely known Watson-Crick base pairing rules. The accuracy or fidelity of DNA polymerization is influenced not only by the polymerase activity of a given enzyme, but also by the 3 '-5' exonuclease activity of a DNA polymerase. The fidelity or error rate of a DNA polymerase may be measured using assays known to the art. See, for example, Lundburg et al., 1991 Gene, 108:1-6. By suitable selection and engineering of the nucleotide polymerase, the error rate of the single-molecule sequencing methods disclosed herein can be further improved. [00106] Optionally, the polymerase used in this sequencing technology is engineered to possess either a strong strand displacement activity or 5' to 3' exonuclease activity to remove the downstream strand, thereby facilitating DNA synthesis. When a highly processive polymerase is used, such as an engineered Phi29 polymerase, the downstream strand is displaced. However, because the 5' terminated strand cannot serve as a template in the absence of added primer, no secondary sequence information from this site will be detected, which would confound sequence data analysis.
[00107] In some embodiments, such as when the nucleic acid is immobilized, an unwinding agent such as a topoisomerase or a gyrase may be added to the extension reaction to facilitate optimal sequencing performance. The immobilized DNA is typically linearized through attachment to the surface at various points along its length, and therefore consists of a series of closed DNA domains. One option for circumventing the "closed" state of the DNA involves inclusion of a topoisomerase and/or a gyrase to modulate the number of DNA supercoils that may be introduced during the sequencing reaction (See, e.g., Champoux, 2001). The requirement for inclusion of such enzymes will reflect both sequence read length and the degree to which the DNA is immobilized onto the surface. For templates having longer read lengths, and therefore an increased number of attachment sites to the surface, inclusion of topoisomerase and/or gyrase will more quickly increase the number or impact of helical windings generated during sequencing. These situations will benefit from inclusion of an enzyme that can maintain DNA supercoiling at levels that support efficient replication. [00108] In some embodiments, the extension reaction is supplemented by addition of an agent that reduces formation of secondary structures that hinder progress of the extension reaction. The presence of such secondary structure is undesirable because such structures bind dye molecules and exhibit increased fluorescence intensity as compared to dye molecules that are in solution, or that are intercalated into the displaced single strand, exhibit reduced fluorescence intensity. (See, e.g., Zipper et ah, 2004). In such embodiments, the presence of secondary structure in the displaced strand results in inappropriate dye intercalation into the displaced strand, and consequently inappropriate detection of fluorescence. When such problems arise, additional dyes may be used to identify the dye, or dye combinations that produce the highest quality sequence information. Alternatively, agents that stabilize the displaced strand and prevent formation of secondary structure may be included within the extension reaction mixture. One example of a reagent that may optionally be added to the extension reaction mixture to prevent formation of unwanted secondary structure is single strand binding protein, also known as SSBP. [00109] In certain embodiments, the number of donor fluorophores that associate with the DNA is optimized to identify a staining concentration that produces high FRET. Generally, optimal spacing between a donor fluorophore and an acceptor on the incorporated nucleotide should be closer than the Ro of the donor-acceptor pair so that high FRET results, with greater than 80% FRET being preferred. If too few fluorophores interact with the DNA, they will not be spaced closely enough to produce high FRET with the acceptor fluorophore. However, if too many donor fluorophores intercalate or bind the DNA, fluorophore quenching may occur.
[00110] Suitable depolymerizing agents for use in the disclosed methods and compositions include, without limitation, any depolymerizing agent that depolymerizes monomers in a step-wise fashion such as exonucleases in the case of DNA, RNA or mixed DNA/RNA polymers, proteases in the case of polypeptides and enzymes or enzyme systems that sequentially depolymerize polysaccharides.
[00111] In some embodiments, the FRET donor is a dye molecule intercalated between the bases of the template nucleic acid molecule, or otherwise associated with the nucleic acid molecule. Suitable intercalating dyes include, without limitation, any detectectable moiety capable of inserting, interposing, or otherwise intercalating into single- or double-stranded polynucleotides. The intercalating dye may be a fluorescent dye or may be fluorescent dye conjugated to a molecule that is primarily an intercalator. Intercalating dyes are well known to the person of ordinary skill in the art. Examples of intercalating dyes suitable for use in the disclosed methods and compositions include, without limitation, mono- and bis- intercalating dyes, phenanthridines and acridines, such as ethidium bromide, propidium iodidem, hexidium iodide, dihydroethidium, ethidium homodimers, acridine orange, 9- amino-6-chloro-2-methoxyacridine; indoles and imaidazoles such as DAPI, bisbenzimide dyes, Actinomycin D, Nissl stains, hydroxystilbamidine, SYBR Green I (also referred to herein simply as "SYBR Green"), SYBR Green II, SYBR GOLD, YO (Oxazole Yellow), TO (Thiazole Orange), PG (PicoGreen), dyes from ATTO-TEC GmbH of Siegen, Germany, intercalating dyes, BEBO, BETO and BOXTO, BO, BO-PRO, TO-PRO, YOPRO, BOBO-3, intercalating oxazole yellow DNA dye quinolinium,4-[(3-methyl-2(3H)- benzoxazolylidene)methyl]-l-[3-(trimethylammonio)propyl]-,diiodide (YO-PRO) and its homodimer (YOYO). See, for example, intercalating dyes disclosed in WO/1997/017471 and U.S. Patent Nos. 5,734,058, and 6,015,902. In certain embodiments, the number of donor fluorophores that associate with the DNA is optimized to identify a staining concentration that produces high FRET. For example, the use of SYBR Green I and other intercalating dyes as replenishing FRET donors will increase donor lifetime and intensity, and more importantly will increase acceptor intensity. Use of multiple donors at a dye-to- base pair ratio of ~ 1: 5-7 results in the punctuation of DNA with dye molecules that can serve as donors for the growing DNA strand (See, e.g., Howell et al., 2002; Takatsu et al 2004).
[00112] Any suitable nucleotides or nucleotide analogs may be used for the disclosed methods and compositions. The terms "nucleotide" or "nucleotide analogs" or their variants, as used herein, refer to any compounds that can be polymerized and/or incorporated into a newly synthesized strand by a naturally occurring, genetically modified or engineered nucleotide polymerase. Suitable nucleotides or other monomers for use in the methods and compositions include, without limitation, any monomer that can be step-wise polymerized and/or incorporated into an elongating nucleotide strand or other polynucleotide polymer by a polymerase or other polymerizing agent, including but not limited to ribonucleotides, deoxyribonucleo tides, modified ribonucleotides, modified deoxyribonucleotides, ribonucleotide polyphosphates, deoxyribonucleotide polyphosphates, modified ribonucleotide polyphosphates, modified deoxyribonucleotide polyphosphates, peptide nucleotides, modified peptide nucleotides, and modified phosphate-sugar backbone nucleotides, and any analogs or variants of the foregoing, including analogs or variants having atomic and/or molecular labels attached thereto, or mixtures or combinations thereof. For sequencing of non-nucleic acid polymers (for example, a protein) any suitable monomers capable of polymerization by a naturally occurring, genetically engineered, or synthetic polymerase may be used, including, for example, amino acids (natural or synthetic) for protein or protein analog synthesis, and monosaccharides or polysaccharides for carbohydrate synthesis. In some embodiments, the labeled nucleotide monomer has three, four or more phosphates. In some embodiments, the nucleotide or nucleotide analogs comprise more than one detectable label moiety per nucleotide molecule. For example, the nucleotides or nucleotide analogs may comprise a persistent acceptor, a non-persistent acceptor, or both a persistent and non- persistent acceptor group conjugated to the same nucleotide molecule. Examples of such "dual" nucleotides are known in the art. See, e.g., U.S. Provisional App. No. 60/891,029, filed February 21, 2007 and U.S. App. No. 12/035,352, filed February 21, 2008, herein incorporated by reference in their entirety.
[00113] Preferably, the nucleotide is conjugated or otherwise operably linked to a detectable label using suitable methods. Any suitable methods for detectably labeling nucleotides may be employed including but not limited to those described in U.S. Patent Nos. 7,041,892, 7,052,839, 7,125,671 and 7,223,541; U.S. Pub. Nos. 2007/072196 and 2008/0091005; Sood et al., 2005, J. Am. Chem. Soc. 127:2394-2395; Arzumanov et al., 1996, J. Biol. Chem. 271:24389-24394; and Kumar et al., 2005, Nucleosides, Nucleotides & Nucleic Acids, 24(5):401-408. As used herein, the term "operably link" and its variants refer to chemical fusion or bonding or association of sufficient stability to withstand conditions encountered in the method of nucleotide sequencing utilized, between a combination of different molecules or moieties, such as, but not limited to: between a linker and a nucleotide; between a linker and a dye moiety; and the like. For example, dye labels may be conjugated to the terminal phosphate of deoxyribonucleotide polyphosphates using a linker and/or spacer using suitable techniques. Suitable linkers include, for example, any compound or moiety that can act as a molecular bridge to operably link two different molecules. Exemplary linkers include, but are not limited to, chemical chains, chemical compounds (e.g., reagents), and the like. The linkers may include, but are not limited to, homobifunctional linkers and heterobifunctional linkers. For example, heterobifunctional linkers contain one end having a first reactive functionality to specifically link to a first molecule, and an opposite end having a second reactive functionality to specifically link to a second molecule. Depending on such factors as the molecules to be linked and the conditions in which the method of strand synthesis is performed, the linker may vary in length and composition for optimizing properties such as stability, length, FRET efficiency, resistance to certain chemicals and/or temperature parameters, and be of sufficient stereo-selectivity or size to operably link a detectable label to a nucleotide such that the resultant conjugate is useful in optimizing a polymerization reaction. Linkers can be employed using standard chemical techniques and include but not limited to, amine linkers for attaching labels to nucleotides (see, for example, U.S. Pat. No. 5,151,507); a linker typically contain a primary or secondary amine for operably linking a label to a nucleotide; and a rigid hydrocarbon arm added to a nucleotide base (see, for example, Science 282:1020-21, 1998.
[00114] Any detectable label that is suitable for attachment to the polymerase, the nucleic acid molecule and/or the nucleotides may be used, including but not limited to luminescent, photoluminescent, electroluminescent, bioluminescent, chemluminescent, fluorescent and/or phosphorescent labels. Typically, the label comprises a FRET donor and/or a FRET acceptor. The FRET donor and/or the FRET acceptor is typically a fluorophore or fluorescent label; however the FRET donor and/or FRET acceptor may also be a luminophore, chemiluminophore, bioluminophore or other label, or a quencher that can participate in this reaction, as described below. In this description, the FRET labels may be referred to as fluorophores or fluorescent labels for convenience, but this in no way is meant to exclude the possibility of using a quencher or limit the donor and/or acceptor only to fluorescent labels. Alternatively, the detectable labels used in the disclosed methods and compositions may undergo other types of energy transfer with each other, including but not limited to luminescence resonance energy transfer, bioluminescence resonance energy transfer, chemiluminescence resonance energy transfer, and similar types of energy transfer not strictly following the Forster's theory, such as the nonoverlapping energy transfer when nonoverlapping acceptors are utilized. See, for example, Anal. Chem. 2005, 77: 1483-1487. Suitable detectable labels for use in the disclosed methods and compositions include, without limitation, any atomic structure, molecule or other moeity amenable to attachment to a specific site in a polymerizing agent or dNTP, including but not limited to Europium shift agents, NMR active atoms or the like; fluorescent dyes such as Rhodol dyes, d-Rhodamine acceptor dyes including but not limited to dichloro[R110], dichloro[R6G], dichloro [TAMRA], dichloro [ROX] or the like, fluorescein donor dye including but not limited to fluorescein, 6-FAM, or the like; Acridine including but not limited to Acridine orange, Acridine yellow, Proflavin, or the like; aromatic hydrocarbon including but not limited to 2-Methylbenzoxazole, Ethyl p-dimethylaminobenzoate, Phenol, benzene, toluene, or the like; Arylmethine Dyes including but not limited to Auramine O, Crystal violet, Crystal violet, Malachite Green or the like; Coumarin dyes including but not limited to 7- Methoxycoumarin-4-acetic acid, Coumarin 1, Coumarin 30, Coumarin 314, Coumarin 343, Coumarin 6 or the like; Cyanine Dye including but not limited to l,l'-diethyl-2,2'-cyanine iodide, Cryptocyanine, Indocarbocyanine (C3) dye, Indodicarbocyanine (C5) dye, Indotricarbocyanine (C7) dye, Oxacarbocyanine (C3) dye, Oxadicarbocyanine (C5) dye, Oxatricarbocyanine (C7) dye, Pinacyanol iodide, Stains all, Thiacarbocyanine (C3) dye, Thiacarbocyanine (C3) dye, Thiadicarbocyanine (C5) dye, Thiatricarbocyanine (C7) dye, or the like; Dipyrrin dyes including but not limited to N,N'-Difluoroboryl-l,9-dimethyl-5-(4- iodophenyl)-dipyrrin, N,N'-Difluoroboryl-l,9-dimethyl-5-[(4-(2-trimethylsilylethynyl), N,N'- Difluoroboryl-l,9-dimethyl-5-phenydipyrrin, or the like; Merocyanines including but not limited to 4-(dicyanomethylene)-2-methyl-6-(p-dimethylaminostyryl)-4H-pyran (DCM), 4- (dicyanomethylene)-2-methyl-6-(p-dimethylaminostyryl)-4H-pyran (DCM), 4-
Dimethylamino-4'-nitrostilbene, Merocyanine 540, or the like; miscellaneous dyes including but not limited to 4',6-Diamidino-2-phenylindole (DAPI), 4',6-Diamidino-2-phenylindole (DAPI), 7-Benzylamino-4-nitrobenz-2-oxa-l,3-diazole, Dansyl glycine, Dansyl glycine, Hoechst 33258, Hoechst 33258, Lucifer yellow CH, Piroxicam, Quinine sulfate, Quinine sulfate, Squarylium dye III, or the like; oligophenylenes including but not limited to 2,5- Diphenyloxazole (PPO), Biphenyl, POPOP, p-Quaterphenyl, p-Terphenyl, or the like; oxazines including but not limited to Cresyl violet perchlorate, Nile Blue, Nile Red, Nile blue, Oxazine 1, Oxazine 170, or the like; polycyclic aromatic hydrocarbons including but not limited to 9,10-Bis(phenylethynyl)anthracene, 9,10-Diphenylanthracene, Anthracene, Naphthalene, Perylene, Pyrene, or the like; polyene/polyynes including but not limited to 1,2- diphenylacetylene, 1 ,4-diphenylbutadiene, 1 ,4-diphenylbutadiyne, 1,6-Diphenylhexatriene, Beta-carotene, Stilbene, or the like; Redox-active Chromophores including but not limited to Anthraquinone, Azobenzene, Benzoquinone, Ferrocene, Riboflavin, Tris(2,2'- bipyridyl)ruthenium(II), Tetrapyrrole, Bilirubin, Chlorophyll a, Chlorophyll b, Diprotonated- tetraphenylporphyrin, Hematin, Magnesium octaethylporphyrin, Magnesium octaethylporphyrin (MgOEP), Magnesium phthalocyanine (MgPc), Magnesium phthalocyanine (MgPc), Magnesium tetramesitylporphyrin (MgTMP), Magnesium tetraphenylporphyrin (MgTPP), Octaethylporphyrin, Phthalocyanine (Pc), Porphin, Tetra-t- butylazaporphine, Tetra-t-butylnaphthalocyanine, Tetrakis(2,6-dichlorophenyl)porphyrin, Tetrakis(o-aminophenyl)porphyrin, Tetramesitylporphyrin (TMP), Tetraphenylporphyrin (TPP), Vitamin B 12, Zinc octaethylporphyrin (ZnOEP), Zinc phthalocyanine (ZnPc), Zinc tetramesitylporphyrin (ZnTMP), Zinc tetramesitylporphyrin radical cation, Zinc tetraphenylporphyrin (ZnTPP), or the like; Cy3, Cy3B, Cy5, Cy5.5, Atto590, AttoόlO, Attoόl l, Attoόllx, Atto620, Atto655, Alexa488, Alexa546, Alexa594, AlexaόlO, AlexaόlOx, Alexa633, Alexa647, Alexa660, Alexa680, Alexa700, Bodipy630, DY610, DY615, DY630, DY632, DY634, DY647, DY680, DyLight647, HiLyte647, HiLyte680, LightCycler (LC) 640, Oyster650, ROX, TMR, TMR5, TMR6; xanthenes including but not limited to Eosin Y, Fluorescein, Fluorescein, Rhodamine 123, Rhodamine 6G, Rhodamine B, Rose bengal, Sulforhodamine 101, or the like; or mixtures or combination thereof or synthetic derivatives thereof or FRET fluorophore-quencher pairs including but not limited to DLO-FBl (5'-FAM/3'-BHQ-l) DLO-TEBl (5'-TET/3'-BHQ-l), DLO-JBl (5'-JOE/3'-BHQ- 1), DLO-HBl (5'-HEX/3'-BHQ-l), DLO-C3B2 (5'-Cy3/3'-BHQ-2), DLO-TAB2 (5'- TAMRA/3'-BHQ-2), DLO-RB2 (5'-ROX/3'-BHQ-2), DLO-C5B3 (5'-Cy5/3'-BHQ-3), DLO- C55B3 (5'-Cy5.5/3'-BHQ-3), MBO-FBl (5'-FAM/3'-BHQ-l), MBO-TEBl (5'-TET/3'-BHQ- 1), MBO-JBl (5'-JOE/3'-BHQ-l), MBO-HBl (5'-HEX/3'-BHQ-l), MBO-C3B2 (5'-Cy3/3'- BHQ-2), MB0-TAB2 (5'-TAMRA/3'-BHQ-2), MB0-RB2 (5'-ROX/3'-BHQ-2); MBO-C5B3 (5'-Cy5/3'-BHQ-3), MBO-C55B3 (5'-Cy5.5/3'-BHQ-3) or similar FRET pairs available from Biosearch Technologies, Inc. of Novato, CA; fluorescent quantum dots (stable long lived fluorescent donors); labels with NMR active groups; Raman active labels; labels with spectral features that can be easily identified such as IR, far IR, near IR, visible UV, far UV or the like. It should be recognized that any molecule, nano-structure, or other chemical structure that is capable of chemical modification and includes a detectable property capable of being detected by a detection system may be used in the disclosed methods and systems. Such detectable structure can include one presently known and structures that are being currently designed and those that will be prepared in the future.
[00115] In some embodiments, the nucleotide comprises a releasable or non-persistent label that can be removed via suitable means prior to incorporation of the next nucleotide by the polymerase into the newly synthesized strand. Examples of suitable non-persistent or releasable labels include detectable moieties operably linked to the base, sugar or alpha phophate of a nucleotide or nucleotide analog. The use of releasably labeled nucleotides wherein the label can be cleaved and removed via suitable means have been described, for example, in U.S. Pub. Nos. US2005/0244827 and US2004/0244827, as well as U.S. Patent Nos. 7,345,159; 6,664,079; 7,345,159; and 7,223,568. Typically, the FRET acceptor label is attached to a nucleotide phosphate group that is cleaved and released upon incorporation of the underlying nucleotide into the primer strand, for example the β-phosphate, the γ- phosphate, or the terminal phosphate of the incoming nucleotide. By cleaving the phosphate and releasing the label upon incorporation of the incoming nucleotide, the signal from the label (or, for embodiments wherein the label is a FRET donor, the FRET signal between the FRET donor and the FRET acceptor moieties) ceases after the nucleotide is incorporated and the label (or FRET signal) diffuses away. Thus, in these embodiments, a detectable signal indicative of nucleotide incorporation is generated as each incoming nucleotide hybridizes to a complementary nucleotide in the target nucleic acid molecule and becomes incorporated into the newly synthesized strand. In addition to detecting the increase in acceptor intensity to determine whether and when a P7RET event has occurred, the accompanying dip in donor intensity can also be detected to confirm the occurrence of FRET. While detection of donor dipping can be useful by providing independent corroboration of a FRET event, it can be dispensed with in embodiments where the acceptor signals are sufficiently intense and well- defined.
[00116] In certain embodiments that comprise so-called "non-persistent" acceptors that release the label upon incorporation, successive extensions can each be detected without interference from nucleotides previously incorporated into the complementary strand. In other embodiments, the nucleotide comprises a persistent label, which is not released upon incorporation of the nucleotide into the nascent nucleotide strand synthesized by the polymerase. Examples of suitable persistent labels include without limitation any FRET acceptor moiety operably linked to the base, sugar, or internal phosphate, of the nucleotide or nucleotide analog, for example, the alpha phosphate. Persistently-labeled nucleotides may be used when a stable signal is preferred and their use enables the reaction to be performed in advance of immobilization on the support for viewing in the detection system, which improves reaction efficiency. The persistently-labeled nucleotide may be a dideoxynucleotide to ensure that a single nucleotide is incorporated at the reaction site.
[00117] Non-persistently-labeled nucleotides or nucleotides containing both a persistent and a non-persistent label are used when the detection of the non-persistent signal is preferred. The use of non-persistently-labeled nucleotides or nucleotides containing both a persistent and a non-persistent label requires that the extension reaction is performed and detected in real-time or near real-time on the detection system to associate the nonpersistent label with a particular nucleic acid strand.
[00118] One advantage of the use of intercalating dyes, such as SYBR Green I, as the donor fluorophore is because used of such dyes allows donor fluorophore to be effectively replenished as new dye molecules constantly intercalate both upstream and downstream of the sequencing site. As a result of this replenishment, new fluorophores are positioned as new donors when they insert into the newly synthesized, double- stranded DNA. When conducting such "donor replacement sequencing", donor-acceptor pairs are typically selected such that there is overlap between the emission spectrum of the donor and excitation spectrum of the acceptor. Dyes and dye concentrations are chosen such that optimized donor emission and maximized acceptor intensities are obtained. In certain embodiments, certain combinations of DNA-associated donor dyes produce higher intensity acceptor signals when paired with the spectrally-resolved acceptors used in other sequencing technologies based on determination of base identity (i.e., the donor fluorophores must be good FRET partners with the acceptors used to label the nucleotides), and these donor dyes may need to be present in particular ratios to maximize these effects. Any suitable FRET donor: acceptor pair may be used in the disclosed methods and compositions, including but not limited to a fluorescein, cyanine, rhodamine, coumarin, acridine, Texas Red dye, BODIPY, Alexa Fluor, GFP, or a derivative or modification of any of the foregoing. See, for example, U.S. Published App. No. 2008/0091995.
[00119] Although the energy transfer from the donor to the acceptor does not involve emission of light, it may be thought of in the following terms: excitation of the donor produces energy in its emission spectrum that is then picked up by the acceptor in its excitation spectrum, leading to the emission of light from the acceptor in its emission spectrum. In effect, excitation of the donor sets off a chain reaction, leading to emission from the acceptor when the two are sufficiently close to each other.
[00120] In addition to spectral overlap between the donor and acceptor, other factors affecting FRET efficiency include the quantum yield of the donor and the extinction coefficient of the acceptor. The FRET signal may be maximized by selecting high yielding donors and high absorbing acceptors, with the greatest possible spectral overlap between the two. See, e.g., Piston, D.W., and Kremers, G.J., 2007, Trends Biochem. ScL, 32:407. [00121] In other embodiments, the label operably linked or attached to the nucleotide may be a quencher. Quenchers are useful as acceptors in FRET applications, because they produce a signal through the reduction or quenching of fluorescence from the donor fluorophore. As with conventional fluorescent labels, quenchers have an absorption spectrum and large extinction coefficients, however the quantum yield for quenchers is extremely reduced, such that the quencher emits little to no light upon excitation. For example, in a FRET detection system, illumination of the donor fluorophore excites the donor, and if an appropriate acceptor is not close enough to the donor, the donor emits light. This light signal is reduced or abolished when FRET occurs between the donor and a quencher acceptor, resulting in little or no light emission from the quencher. Thus, interaction or proximity between a donor and quencher-acceptor may be detected by the reduction or absence of donor light emission. Examples of quenchers include the QSY dyes available from Molecular Probes (Eugene, OR). [00122] One exemplary method involves the use of quenchers in conjunction with fluorescent labels. In this strategy, certain nucleotides in the reaction mixture are labeled with a fluorescent label, while the remaining nucleotides are labeled with one or more quenchers. Alternatively, each of the nucleotides in the reaction mixture is labeled with one or more quenchers. Discrimination of the nucleotide bases is based on the wavelength and/or intensity of light emitted from the FRET acceptor, as well as the intensity of light emitted from the FRET donor. If no signal is detected from the FRET acceptor, a corresponding reduction in light emission from the FRET donor indicates incorporation of a nucleotide labeled with a quencher. The degree of intensity reduction may be used to distinguish between different quenchers.
[00123] Preferably, the intercalating dye and the detectable label of the nucleotide will be selected and/or designed to ensure not that the presence of such labels does not unduly hinder the progress of the polymerization reaction as determined by speed, error rate, fidelity, processivity and average read length of the newly synthesized strand.
[00124] Typically, the sequencing reaction is initiated by the addition of a suitable polymerase and labeled nucleotides. Suitable temperatures and the addition of other components such as divalent metal ions can be determined and optimized based on the particular nucleotide polymerase and the target nucleic acid sequences. Illumination of the reaction site permits observation of the detectable signals, e.g., FRET signals, which indicate the nucleotide incorporation event.
[00125] Detection of the signals emitted by various components of the polymerase reaction mixture as the polymerase incorporates nucleotide(s) into an extending strand in a template-directed fashion can be detected by means of any suitable system capable of detecting and/or monitoring such signals. Typically, the detection system will achieve these functions by first generating and transmitting an incident wavelength to the polynucleotides isolated within nanostructures, and then collecting and analyzing the emissions from the reactants. A typical sequencing system comprises a detection subsystem capable of viewing a field on the substrate. The view field can be adjusted to view one or a plurality of elongated and immobilized nucleic acids. The sequencing system also comprises a monitoring subsystem capable of detecting nucleotide incorporation events occurring at the nick sites of the nucleotide strand to be sequenced. The system also comprises an analyzing subsystem that converts the detected events into sequencing information and then maps the sequence fragments along the length of the nucleic acid so that ordered sequence fragment information is obtained for nucleic acid identification and classification including partial or fragmentary sequence information. Examples of detection systems suitable for use according to the present disclosure include without limitation the systems described in U.S. Published App. No. 2008/0241951 and 2008/0241938, herein incorporated by reference in their entirety. [00126] In some embodiments, a detection system of the present invention comprises at least two elements, namely an excitation source and a detector. The excitation source generates and transmits incident radiation used to excite the reactants contained in the array. Depending on the intended application, the source of the incident light can be a laser, laser diode, a light-emitting diode (LED), a ultra-violet light bulb, and/or a white light source. Where desired, more than one source can be employed simultaneously. The use of multiple sources is particularly desirable in applications that employ multiple different reagent compounds having differing excitation spectra, consequently allowing detection of more than one fluorescent signal to track the interactions of more than one or one type of molecules simultaneously.
[00127] Any suitable detection strategies can be employed to determine the identity of the nitrogenous base of the incoming nucleotides, depending on the nature of the labeling strategy that is employed. Exemplary labeling and detection strategies include but are not limited to those disclosed in U.S. Patent Nos. 6,423,551 and 6,864,626; U.S. Pub. Nos. 2005/0003464, 2006/0176479, 2006/0177495, 2007/0109536, 2007/0111350, 2007/0116868, 2007/0250274 and 2008/08825. Detection of emissions during the polymerization reaction permits the discrimination of independent interactions between uniquely labeled moieties, reactants or subunits. On exposure to suitable chemical, electrical, electromagnetic energy (potentially any light source, typically a laser) or upon resonance as in FRET, the label linked to the nucleotide undergoes a transition to an 'excited state' whereby it emits photons over a spectral range characterized by the identity of the emitting moiety. The donor moiety must be sufficiently excited in order for FRET to occur.
[00128] Emissions may be detected using any suitable device. A wide variety of detectors are available in the art. Representative detectors include but are not limited to microscopes, optical readers, high-efficiency photon detection systems, photodiodes (e.g. avalanche photo diodes (APD); APD arrays, etc.), cameras, charge couple devices (CCD), electron-multiplying charge-coupled device (EMCCD), intensified charge coupled device (ICCD), photomultiplier tubes (PMT), a muti-anode PMT, and a microscope equipped with any of the foregoing detectors. Where desired, the subject arrays contain various alignment aides or keys to facilitate a proper spatial placement of each spatially addressable array location and the excitation sources, the photon detectors, or the optical transmission element as described below.
[00129] Typically, characteristic signals from different independently labeled, nucleotides are simultaneously detected and resolved using a suitable detection method capable of discriminating between the respective labels. Typically, the characteristic signals from each nucleotide are distinguished by resolving the characteristic spectral properties of the different labels. See, for example, Lakowitz, J.R., 2006, Principles of Fluorescence Spectroscopy, Third Edition. Spectral detection may also optionally be combined and/or replaced by other detection methods capable of discriminating between chemically similar or different labels in parallel, including, but not limited to, polarization, lifetime, Raman, intensity, ratiometric, time-resolved anisotropy, fluorescence recovery after photobleaching (FRAP) and parallel multi-color imaging. See, for example, Lakowitz, supra. In the latter technique, use of an image splitter (such as, for example, a dichroic mirror, filter, grating, prism, etc.) to separate the spectral components characteristic of each label is preferred to allow the same detector, typically a CCD, to collect the images in parallel. Optionally, multiple cameras or detectors may be used to view the sample through optical elements (such as, for example, dichroic mirrors, filters, gratings, prisms, etc.) of different wavelength specificity. Other suitable methods to distinguish emission events include, but are not limited to, correlation/anti-correlation analysis, fluorescent lifetime measurements, anisotropy, time- resolved methods and polarization detection. Suitable imaging methodologies that may be implemented for detection of emissions include, but are not limited to, confocal laser scanning microscopy, Total Internal Reflection (TIR), Total Internal Reflection Fluorescence (TIRF), near- field scanning microscopy, far-field confocal microscopy, wide-field epi- illumination, light scattering, dark field microscopy, photoconversion, wide field fluorescence, single and/or multi-photon excitation, spectral wavelength discrimination, evanescent wave illumination, scanning two-photon, scanning wide field two-photon, Nipkow spinning disc, multi-foci multi -photon, and/or other forms of microscopy. [00130] The detection system may optionally include one or more optical transmission elements that serve to collect and/or direct the incident wavelength to the reactant array; to transmit and/or direct the signals emitted from the reactants to the photon detector; and/or to select and modify the optical properties of the incident wavelengths or the emitted wavelengths from the reactants. Illustrative examples of suitable optical transmission elements and optical detection systems include but are not limited to diffraction gratings, arrayed wave guide gratings (AWG), optic fibers, optical switches, mirrors, lenses (including microlens and nanolens), collimators. Other examples include optical attenuators, polarization filters (e.g., dichroic filters), wavelength filters (low-pass, band-pass, or high- pass), wave-plates, and delay lines.
[00131] Typically, the detection system comprises optical transmission elements suitable for channeling light from one location to another in either an altered or unaltered state. Non-limiting examples of such optical transmission devices include optical fibers, diffraction gratings, arrayed waveguide gratings (AWG), optical switches, mirrors, (including dichroic mirrors), lenses (including microlens and nanolens), collimators, filters, prisms, and any other devices that guide the transmission of light through proper refractive indices and geometries.
[00132] In one embodiment, the detection system comprises an optical train that directs signals from an organized array onto different locations of an array-based detector to simultaneously detect multiple different optical signals from each of multiple different locations. In particular, the optical trains typically include optical gratings and/or wedge prisms to simultaneously direct and separate signals having differing spectral characteristics from each spatially addressable location in an array to different locations on an array-based detector, e.g., a CCD. By separately directing signals from each array location to different locations on a detector, and additionally separating the component signals from each array location, one can simultaneously monitor multiple signals from each array location. [00133] In one embodiment, detection is performed using multifluorescence imaging wherein each of the different types of nucleotide is operably linked to a label with different spectral properties from the rest, thereby permitting the simultaneous detection of incorporation of all different nucleotide types. For example, each of the different types of nucleotide may be operably linked to a FRET acceptor fluorophore, wherein each fluorophore has been selected such that the overlapping of the absorption and emission spectra between the different fluorophores, as well as the the overlapping between the absorption and emission maxima of the different fluorophores, is minimized. Detection of different nucleotide label is performed by observing two or more targets at the same time, wherein the emissions from each label are separated in the detection path. Such separation is typically accomplished through use of suitable filters, including but not limited to band pass filters, image splitting prisms, band cutoff filters, wavelength dispersion prisms and dichroic mirrors, that can selectively detect specific emission wavelengths. Such filters may optionally be used in combination with suitable diffraction gratings.
[00134] Alternatively, multifluorescence studies involving differently labeled nucleotide types may be performed by observing each label separately, requiring section of special filter combinations for each excitation line and each emission band. In one embodiment, the detection system utilizes tunable excitation and/or tunable emission fluorescence imaging. For tunable excitation, light from a light source passes through a tuning section and condenser prior to irradiating the sample. For tunable emissions, emissions from the sample are imaged onto a detector after passing through imaging optics and a tuning section. The user may control the tuning sections to optimize performance of the system.
[00135] A number of labeling and detection strategies are available for base discrimination using the FRET technique. For example, different fluorescent labels may be used for each type of nucleotide present in the extension reaction with discrimination between the different labels based on the wavelength and/or the intensity of the light emitted from the fluorescent label.
[00136] A second strategy involves the use of fluorescent labels and quenchers. In this strategy, certain nucleotides in the reaction mixture are labeled with a fluorescent label, while the remaining nucleotides are labeled with one or more quenchers. Alternatively, each of the nucleotides in the reaction mixture is labeled with one or more quenchers. Discrimination of the nucleotide bases is based on the wavelength and/or intensity of light emitted from the FRET acceptor, as well as the intensity of light emitted from the FRET donor. If no signal is detected from the FRET acceptor, a corresponding reduction in light emission from the FRET donor indicates incorporation of a nucleotide labeled with a quencher. The degree of intensity reduction may be used to distinguish between different quenchers. [00137] Typically, the signal from the detector is converted into a digital signal with an
A-D converter and an image of the sample is reconstructed on a monitor. The user can optionally select a composite image that combines the images derived at a number of different wavelengths into a single image. The user can also specify that an artificial color system is to be used in which particular probes are artificially associated with specific colors. In an alternate artificial color system the user can designate specific colors for specific emission intensities. [00138] Any combination of the above described labeling and detection strategies may be employed together in the same sequencing reaction. Depending on the number of distinguishable labels and quenchers used in any of the above strategies, the identities of one, two, three or four nucleotides may be determined in a single sequencing reaction. Multiple sequencing reactions may then be run, rotating the identities of the nucleotides determined in each reaction, to determine the identities of the remaining nucleotides. In some embodiments, these reactions may be run at the same time, in parallel, to allow for complete sequencing in a reduced amount of time.
[00139] The identities of the incorporated nucleotides may be determined rapidly, for example in real time or near real time, as extension of the primer strand occurs, through FRET interactions between a intercalating dye donor moiety and a FRET acceptor moiety attached to the incoming nucleotide as it are incorporated into the newly synthesized strand by the polymerase.
[00140] Typically, the raw data generated by the detector represents multiple time- dependent fluorescence data streams comprising wavelength and intensity information. Once the emissions are detected and gathered, the data may be analyzed using suitable methods to correlate the particular spectral characteristics of the emissions with the identity of the incorporated base. In some embodiments, such analysis is performed by means of a suitable information processing and control system. Preferably, the information processing and control system comprises a computer or microprocessor attached to or incorporating a data storage unit containing data collected from the detection system. The information processing and control system may maintain a database associating specific spectral emission characteristics with specific nucleotides. The information processing and control system may record the emissions detected by the detector and may correlate those emissions with incorporation of a particular nucleotide. The information processing and control system may also maintain a record of nucleotide incorporations that indicates the sequence of the template molecule. The information processing and control system may also perform standard procedures known in the art, such as subtraction of background signals. [00141] An exemplary information processing and control system may incorporate a computer comprising a bus for communicating information and a processor for processing information. In one embodiment, the processor is selected from the Pentium.RTM, Celeron.RTM, Itanium.RTM, or a Pentium Xeon.RTM family of processors (Intel Corp., Santa Clara, Calif.). Alternatively, other processors may be used. The computer may further comprise a random access memory (RAM) or other dynamic storage device, a read only memory (ROM) and/or other static storage and a data storage device such as a magnetic disk or optical disc and its corresponding drive. The information processing and control system may also comprise other peripheral devices known in the art, such a display device (e.g., cathode ray tube or Liquid Crystal Display), an alphanumeric input device (e.g., keyboard), a cursor control device (e.g., mouse, trackball, or cursor direction keys) and a communication device (e.g., modem, network interface card, or interface device used for coupling to Ethernet, token ring, or other types of networks).
[00142] In particular embodiments, the detection system may also be coupled to the bus. Data from the detection unit may be processed by the processor and the data stored in the main memory. Data on emission profiles for standard nucleotides may also be stored in main memory or in ROM. The processor may compare the emission spectra from nucleotide in the polymerase reaction to identify the type of nucleotide precursor incorporated into the newly synthesized strand. The processor may analyze the data from the detection system to determine the sequence of the template nucleic acid.
[00143] It is appreciated that a differently-equipped information processing and control system than those described above may be used for certain implementations. Therefore, the configuration of the system may vary in different embodiments. It should also be noted that, while the processes described herein may be performed under the control of a programmed processor, in alternative embodiments, the processes may be fully or partially implemented by any programmable or hardcoded logic, such as Field Programmable Gate Arrays (FPGAs), TTL logic, or Application Specific Integrated Circuits (ASICs), for example. Additionally, the method may be performed by any combination of programmed general purpose computer components and/or custom hardware components.
[00144] Following the data gathering operation, the data will typically be reported to a data analysis operation. To facilitate the analysis operation, the data obtained by the detection system will typically be analyzed using a digital computer. Typically, the computer will be appropriately programmed for receipt and storage of the data from the detection system, as well as for analysis and reporting of the data gathered.
[00145] Any suitable base-calling algorithms may be employed. See, for example, US.
Provisional App. No. 61/037,285. In certain embodiments, custom designed software packages may be used to analyze the data obtained from the detection system. In alternative embodiments, data analysis may be performed, using an information processing and control system and publicly available software packages. Non-limiting examples of available software for DNA sequence analysis include the PRISM.TM. DNA Sequencing Analysis Software (Applied Biosystems, Foster City, Calif.), the Sequencher.TM. package (Gene Codes, Ann Arbor, Mich.), and a variety of software packages available through the National Biotechnology Information Facility at website www.nbif .org/links/1.4. J .php. Data collection allows data to be assembled from partial information to obtain sequence information from multiple polymerase molecules in order to determine the overall sequence of the template or target molecule.
[00146] Additionally, in certain instances it is useful to perform reactions with reference controls, similar to microarray assays. Comparison of signal(s) between the reference sequence and the test sample are used to identify differences and similarities in sequences or sequence composition. Such reactions can be used for fast screening of DNA polymers to determine degrees of homology between the polymers, to determine polymorphisms in DNA polymers, or to identity pathogens.
[00147] In some embodiments, the method further comprises sequencing one or more additional nucleic acid molecules, for example a second nucleic acid, in parallel with sequencing the first nucleic acid. In other embodiments, the rate of nucleotide sequencing determination (based on a single read of a nucleic acid template) is equal to or greater than 10 nucleotides per second, typically equal to or greater than 100 nucleotides per second. [00148] Typically, the sequencing error rate will be equal to or less than 1 in 100,000 bases. In some embodiments, the error rate of nucleotide sequence determination is equal to or less than 1 in 10 bases, 1 in 20 bases, 3 in 100 bases, 1 in 100 bases, 1 in 1000 bases, and 1 in 10,000 bases. In another preferred embodiment, the test DNA will comprise a complete and intact chromosome. Optionally, the methods disclosed herein may be performed in a multiplex fashion (including in array format), such that additional nucleic acid molecules are sequenced in parallel with a first nucleic acid molecule.
[00149] In an alternative embodiment, primer(s) that direct sequencing complexes to particular areas along the DNA strand are used to specifically determine sequence at those sites. To achieve such site-specific primer-based sequencing, the DNA is denatured and hybridized to at least one site-specific primer, and extension is initiated via addition of appropriate components, such as polymerase, nucleotides (especially base-labeled nucleotides that produce long duration signals) and reaction buffer. The extension products are then displayed and visualized. Alternatively, donor-labeled primers may be used to more specifically identify sequence at multiple sites of incorporation. In yet another embodiment, several differentially labeled primers can be used to produce 'multiplexed' sequence information at resolvable sites along the DNA strand.
[00150] In some embodiments, the priming sites need not be resolvable by the detection system. For example, in assays to determine the presence or absence of a particular sequence or combination of single nucleotide polymorphisms on the DNA strand (haplotype), resolution can be dispensed with as long as the signals emitted from the primer-incorporated nucleotide located on the strand do not affect the fluorescence of the other primer- incorporated nucleotides. Because the number of bases that span a pixel in current detection systems is approximately 700, many primer probes can be used to interrogate potential SNPs within a single pixel. Acceptor signal to noise ratios and FRET distances constraints define this upper limit. No information site should interfere with data from any other information site (i.e., primer-incorporated acceptor-labeled FRET pairs are distributed along the strand at >10 nm separation, which is no closer than -30 bp) whereas the incorporated acceptor is closely positioned to the donor on the primer to produce a high FRET event (i.e., each primer-incorporated acceptor-labeled nucleotide is at or within the RO of the donor-acceptor pair). However, if non-incorporated nucleotides are removed prior to DNA immobilization this distance may be increased because of the improved acceptor signal to noise ratio. In any case, the donor and acceptor fluorophores must not be too close that they are quenched. Thus, the donor is typically present within 15 bases from the 3' end of the primer. Primers are long enough to specifically hybridize to their target site (i.e, an 8 base primer is likely to be unique in a 65 Kb template, with the occurrence frequency being related to base composition and type of DNA being sequenced - coding versus non-coding regions; Hardin et al., U.S. Patent No. 6,083,695). Longer primers increase hybridization specificity and reduce the need to highly purify specific genomes or regions thereof. A preferred primer length is 25-50 bases with a minimal spacing of one base between each primer. A polymerase that is not significantly affected by the presence of the donor fluorophore within or immediately 5' to its binding site on the primer/DNA template is used, and this enzyme is additionally deficient in 3' to 5' exonuclease activity and strand displacement activity. Additionally, if a persistently labeled nucleotide is incorporated 3' of each annealed target-specific primer, the template- ordered extension products may be optionally ligated at a low concentration (to favor intramolecular ligation events) to create a covalently-closed linear DNA strand that is comprised of annealed primers that are extended by a single base. As above, each donor- acceptor is optimally spaced to produce a distinct high FRET event. A further variation of this method produces donor-acceptor pairs that are well separated after first incorporating the persistently labeled nucleotide 3' of the primer, by adding natural nucleotides to complete synthesis to the 5' end of the next annealed primer, followed by performing the optional ligation reaction.
[00151] As described above, one significant benefit of the current sequencing strategy is that it will allow discrete and ordered reads to be produced along the length of the strand. Using standard microscopy techniques, DNA strands of approximately 100 Kb can be viewed in one field of view with existing real-time sequencing systems, and the field of view can be moved to increase the length of examined DNA. Using the system and methods of the present disclosure, nick spacing of 3-5 Kb produces resolvable complexes and reduces the risk that a sequence read from one strand will encounter a nick on the opposite strand, thereby terminating extension. Because the disclosed systems and methods will provide read lengths of approximately ~1 Kb, this spacing also prevents situations in which a sequencing read from one strand will encounter a nick on the opposite strand, thereby terminating extension. Moreover, because each pixel of data obtained in a standard detector corresponds to a physical distance of approximately 700bp, sequencing reactions from nicks separated by less than 1000 bases may produce acceptor signals that cannot be confidently associated with a specific sequencing site. Optimization of the nicking treatment to produce nicks approximately 3-5 Kb apart will eliminate this problem, and moreover may also yield additional information regarding the strand from which the sequence originates. Given that approximately 700 bases span a single pixel in a standard detection system, read lengths greater than 1000 bases may additionally provide information about the strand from which the sequence originates due to a single pixel shift in photon density, further facilitating correct genome assembly.
[00152] In another embodiment, the relative distance of visible markers (such as the distance from the end of the DNA strand) can also be used to determine which DNA site is being determined. In applications in which the donor is present on the primer, the immobilized DNA can be stained with a dye that is not involved in producing the detectable FRET event. For staining/visualization purposes, the double-or single-stranded nature of the DNA must be taken into account when one needs information about the immobilized DNA strand. [00153] Optionally, as an additional confirmation of distance between sequencing sites, the methods and systems of the present disclosure can be combined with reported techniques where integrated fluorescence intensity measurements coupled with quantile analysis provides an accurate measure for the amount of DNA (Li et al., 2007). Analogous to a whole genome shotgun sequencing strategy, the entire genome sequence can be determined according to the present disclosure by sequencing many individual copies of the same or overlapping DNA fragments.
[00154] There are several additional advantages of the real-time sequencing methods and systems disclosed herein. For example, with respect to certain embodiments, such as embodiments using an intercalating dye donor and acceptor-labeled nucleotides, there is no requirement for labeling of the polymerase, thereby increasing the speed of polymerization and decreasing the handling effort associated with the sequencing process. Furthermore, donor energy transfer capabilities are continuously optimized throughout the extension reaction, because new donor fluorophores constantly intercalate into the nascent strand, effectively positioning a new donor at a distance that will produce a higher efficiency FRET event relative to the more upstream donor that may have photobleached or, as a result of nucleotide incorporation and enzyme translocation, become too distant from the acceptor- labeled dNTP bound at the enzymatic active site. Similarly, the acceptor signal is increased as compared to signal generated in other systems that use a single donor fluorophore. Finally, the disclosed methods and systems involve the use of tracking software searches along a donor intensity trajectory for acceptor signals (sequence information) originating from different regions along the same DNA strand, thereby permitting accurate placement of the relative locations of each independent sequence along the DNA strand and resulting in the simultaneous generation of multiple discrete and ordered sequence "reads" along the length of a single nucleic acid strand.
[00155] Furthermore, use of the methods and systems disclosed herein will expand the repertoire of information generated through the sequencing process. For example, sequencing of long DNA strands according to the present disclosure will facilitate the identification of genomic rearrangements and improve the assembly accuracy of chromosomal sequences (e.g., correctly identifying independent HIV genomes; associating sequence reads with the correct maternal/paternal chromosome).
[00156] Sequence reads obtained according to the present disclosure can produce haplotype information and thus further facilitate accurate genome assembly. Production of haplotype information is especially important because it is shown to have more power than individual nucleotide variation in the context of association studies and in predicting disease risks (Stephens, Schneider et al, 2001; HapMap Project). The first diploid genome sequence of a single human demonstrates that maternal and paternal chromosomes are 99.5% similar when genetic variation due to insertion and deletion is taken into account (Levy, Sutton et al., 2007). The combination of longer read lengths and discrete, ordered reads will facilitate correct assembly of the maternal and paternal chromosome sequences. For example, rearrangements involving duplication of genomic regions can be very difficult to identify via sequence information alone if the sequences are highly conserved, because one must determine whether an independent sequence read represents information from the maternal or paternal chromosome or whether it is from a different region of the genome. Currently, and for many fold more than $1000, very accurate and deep sequence coverage may allow this distinction to be made if the borders of the genomic breakpoints are identified. However, because donor replacement sequencing strategies will directly couple sequence information with mapping information, these genomic variations will be identified during 'routine' $1000 whole genome sequencing.
[00157] Furthermore, the methods, compositions and systems disclosed herein for sequencing of long DNA strands are capable of facilitating the identification of genomic rearrangements within the strand, improving the assembly accuracy of chromosomal sequences (especially in regions sharing a great deal of similarity), and improving copy number variation determination (especially for longer repeats). The first diploid genome sequence of a single human demonstrates that maternal and paternal chromosomes are 99.5% similar when genetic variation due to insertion and deletion is taken into account (Levy et al., 2007). Thus, it will be critical to carefully track sequence information associated with each chromosome.
EXAMPLES
[00158] Example 1 : Assessment of Intercalating Dyes for use as FRET donors
[00159] Various intercalating dyes were tested for use as a donor in donor-based detection of acceptor signals. These dyes are advantageous in that many donors could be present and exchanged and/or replenished during the extension reaction, thus allowing for extended donor lifetime. The dyes tested were SYBR Green I, YOYO-I, YO-PRO-I, SYBR GOLD, and SYBR Green I with YOYO-I. Representative spectra are shown in Figure 2. Fluorescence intensities were observed using YOYO-I or SYBR Green I with short primer/template duplexes and with linear genomic DNA.
[00160] Next, a titration curve of fluorescence from each dye (YOYO-I and SYBR
Green I) following incubation with immobilized linear Lambda (λ) DNA was generated in order to assess signal-to-noise ratios and evaluate levels of background fluorescence of the dye after binding to immobilized DNA. Typical results are shown in Figures 3A&B. [00161] To generate the data shown in Figures 3A and 3B, substrates suitable for DNA immobilization were prepared by coating a glass substrate with a PEBN or +-+-+ layers, wherein "+" indicates a polyallylamine layer and "-" indicates presence of a polyacrylic acid layer.
[00162] PEBN-coated substrates were prepared using a modified version of the procedure disclosed by Braslavsky et al, 2003, PNAS Vol. 100, No. 7, pp. 3960-3964. Briefly, glass coverslips were treated overnight in alkaline base-bath, rinsed in distilled water and then cleaned with 2% Micro-90 for 60 minutes with sonication and heat, followed by boiling in RCA solution (H2O: 30% NH4OH: 30% H2O2 (6:4:1)) for 60 minutes (2x 30 minutes). The cleaned glass cover slips were then immersed in 2mg/ml polyallylamine for 10 minutes, and rinsed five times in water followed by an immersion in 2mg/ml polyacrylic acid for 10 minutes and rinsed five times in water. The polyallylamine and polyacrylic immersions were repeated one more time. The coverslips were then rinsed in water and coated with a 5mM EDC-Biotin amine solution in 1OmM MES buffer, pH 5.5 for 30 minutes. The slides were then rinsed in MES buffer for 5 minutes, in water for 5 minutes and in a solution of 1OmM Tris, pH 8.0, 1OmM NaCl for 5 minutes. Finally, the slides were coated with Neutravidin by incubating for 30 minutes in a solution comprising lmg/ml Neutravidin. [00163] To prepare substrates coated with +-+-+ layers, glass coverslips were treated overnight in an alkaline base-bath, then rinsed in distilled water and cleaned with 2% Micro- 90 for 60 minutes with sonication and heat, followed by boiling in RCA solution (H2O: 30% NH4OH: 30% H2O2 (6:4:1)) for 60 minutes (2x 30 minutes). The cleaned glass coverslips were then immersed in 2mg/ml polyallylamine for 10 minutes and rinsed five times in water, then immersed in 2mg/ml polyacrylic acid for 10 minutes and rinsed five times in water. The polyallylamine and polyacrylic immersions were repeated one more time. Finally, the cover slips were immersed in 2mg/ml of polyallylamine for 10 minutes.
[00164] Figure 3 depicts average fluorescence intensities obtained from SYBR
Green I (Figure 3A) and YOYO-I (Figure 3B) in regions where stained DNA is present, relative to intensities in DNA-free regions. Briefly, levels of average fluorescence intensity from DNA-bound SYBR-Green I were compared with levels of average background fluorescence intensity of SYBR-Green I in the absence of DNA. Intact Lambda (λ) DNA at a final concentration of lOpM in IX KB (5OmM Tris pH7.2; 1OmM MgSO4; O.lmM DTT) was first immobilized via injection into glass chamber formed between a glass slide and a glass coverslip coated with a PEBN layer using the coating process described above. The DNA was incubated with increasing concentrations of SYBR Green I (0.01X, 0.1X; 0.3X; 0.6X; IX; 2X and 5X SYBR Green I in IX KB buffer supplemented with 5OmM BME). Following addition of each successive concentration of SYBR Green I to the reaction chamber, the chamber and contents were irradiated with an Argon 488nm laser at 500 uW power, and data were collected at 25ms integration time.
[00165] Data were analyzed using software as described in Example 10, below.
Briefly, the images were visually evaluated to identify the regions depicting dye-stained DNA strands. Regions of interest (ROI) consisting of DNA bound to SYBR Green I, as well as regions of the image containing background, were manually selected and fluorescence intensities from both ROI and background regions were compared.
[00166] Representative results obtained with SYBR Green I are shown in Figure 3A, which depicts the SYBR Green I average intensity in a given region of interest (ROI) relative to average background intensity at each given concentration of SYBR Green I. As can be seen from Figure 3A, SYBR Green I dye did not exhibit high background fluorescence in the absence of DNA even at higher dye concentrations. However, increased fluorescence of SYBR Green I was observed in the presence of DNA, especially at higher dye concentrations. [00167] An identical titration was carried out using the intercalator YOYO-I in the place of SYBR Green I (Figure 3B). Representative results are shown in Figure 3B, which depicts the YOYO-I average intensity in a given region of interest (ROI) relative to average background intensity at each given concentration of YOYO-I. As can be seen from Figure 3B, YOYO-I dye did not exhibit high background fluorescence in the absence of DNA even at higher dye concentrations. However, increased fluorescence of YOYO-I was observed in the presence of DNA, especially at higher dye concentrations. (The ROI values obtained at 30OpM and 600 pM concentrations of YOYO-I are not included because the fluorescence intensity observed at these concentrations was too low). [00168] Example 2: Characterization of various coated surfaces by estimation of background binding of labeled nucleotides
[00169] Glass surfaces coated with H — I — ι- and PEBN respectively were tested for propensity to non- specifically bind acceptor-labeled nucleotide and thus contribute to harmful background signal and interfere with detection of fluorescence emissions from dye-labeled DNA. Figure 5 depicts images of acceptor background arising from non-specific interaction of the acceptor-labeled nucleotide with either +— I — ι- surface (Figure 5A) or PEBN surface (Figure 5B). For these studies, Lambda (λ) DNA was first nicked via incubation with the site-specific nicking enzyme Nb.BbvCl according to a procedure described in Xiao et al., Nucl. Acid Res. Vol. 35(3) el6, pp.1-12 (2007). Briefly, 8.5μl Lambda (λ) DNA having a concentration of 500ng/ul (15.62nM) was nicked in a 50μl or lOOμl final reaction volume comprising IX Nb.BbvCl buffer (final DNA concentration: 1.33nM). The reaction was incubated at 370C for 1 hour. The nicked product was then incubated with 6.8nM acceptor- labeled nucleotide dU-A1610 in the presence of a mutant from of Klenow polymerase that comprises the mutation D424A and lacks exonuclease activity (hereinafter referred to as "Klenow(exo) polymerase") at a final enzyme concentration of 0.476nM at 370C for 40 minutes. The nicked and labeled DNA was then incubated with the intercalator dye YOYO-I and then immobilized via injection of the dye-DNA mixture at lOpM concentration into a glass chamber formed of surfaces derivatized either with +-+-+ (Figure 5A) or PEBN (Figure 5B) Acceptor intensity on the coated surfaces was visualized using fluorescence microscopy via direct excitation with Yellow HeNe laser (for visualization of acceptor-labeled nucleotide) or Argon 488nm laser (for visualization of YOYO-I labeled DNA) in the same field of view. As shown in Figure 5A, a high level of background fluorescence from the labeled nucleotide was observed for +-+-+ surfaces even in regions that did not contain DNA. Similarly, use of coatings containing polyacrylic acid or Tween-20 had either no or moderate alleviation of non-specific binding of labeled nucleotides to the surface (data not shown). In contrast, visualization of labeled Lambda (λ) DNA on a PEBN surface resulted in low acceptor channel background, presumably due to decreased background binding of the acceptor-labeled nucleotide to the PEBN surface, as shown in Figure 5B. [00170] Example 3: Detection of FRET between acceptor-labeled nucleotides and intercalating dye donors
[00171] The ability of YOYO-I and/or a quantum dot to form effective FRET pairs with various dye-labeled nucleotides was assessed by assembling spectra wherein either YOYO-I (Figure 4A) or a commercially available quantum dot (Qdot 525 from Invitrogen Corp., Figure 4B) was evaluated as a FRET donor, in conjunction with acceptor dyes Cy3, ROX, Alexa Fluor 610 and Cy5. The assembled spectra are shown in Figures 4A and 4B. Donor and acceptor compounds were mixed and subjected to excitation using a 457 nm laser. Under these conditions, it was possible to detect real-time incorporation of base labeled dUTP with Cy 3, Alexa Fluor 610 and Cy 5 independently on a primer/template duplex immobilized to a cover slip. Experiments were also performed to study the use of quantum dots as a donor which is efficiently excited at lower wavelengths. Use of a lower laser line permits increase of the concentration of acceptors in solution 5-fold with lower background; increasing the concentration up to 20-fold still results in acceptable noise levels (data not shown).
[00172] Example 4: Detection of incorporation of acceptor labeled nucleotides based on fluorescence overlap with intercalating dye donors
[00173] Visualization of solution-based incorporation of acceptor-labeled nucleotide into nicked DNA was performed as follows. Briefly, λ DNA was incubated with the site- specific nicking enzyme Nb.BbvCl for 1 hour at 370C according to the procedure of Xiao et al, as described in Example 2. The nicked product was then incubated with 6.8nM dU-Cy5 (Figure 6) or dU-A1610 (Figure 7) in the presence of Klenow(exo-) polymerase at 370C for 40 minutes, following which 1 OpM of the labeled λ DNA was contacted with an imaging mix containing 30OnM YOYO-I and 5OmM BME in IX KB buffer. The entire mixture was injected into a glass chamber formed by PEBN-coated glass surfaces as described in Example 1. For studies using dU-Cy5, visualization of YOYO-I containing regions was achieved using an Argon 488nm laser, and visualization of acceptor-containing regions using a Red HeNe laser (Figure 6). Images were collected using a Nikon Eclipse TE-2000 microscope for 100 frames at 25 ms integration time; images obtained using Argon 488nm excitation and Red HeNe excitation were overlaid with each other to visualize regions containing incorporated fluorescent label, as shown in Figure 6.
[00174] For studies using dU-A1610 (Figure 7), visualization of YOYO-I containing regions was achieved using an Argon 488nm laser, and visualization of acceptor-containing regions using a Yellow HeNe laser. Images were collected using a Nikon Eclipse TE-2000 microscope for 25 frames at 25 ms integration time; images obtained using Argon 488nm excitation (YOYO-I stained DNA) and Yellow HeNe (dU-A1610) excitation were overlaid with each other to visualize regions containing incorporated fluorescent label, as shown in Figure 7.
[00175] As shown in Figures 6 and 7, regions of possible FRET emissions were identified via overlap with regions of incorporated acceptor (indicated by white arrows). For example, the first panel of Figure 6, shows donor fluorescence; the second panel depicts fluorescence images of the labeled λ DNA as seen in the acceptor channel due to fluorescence 'bleed' into said channel, gathered following excitation with an Argon 488nm laser. Fluorescent acceptors are visible as regions of increased fluorescence intensity as indicated by white arrows. The third panel shows the same field of view imaged using Red HeNe excitation to visualize the location of incorporated acceptor labels. The fourth panel shows the composite image generated via overlay of the second and third panels, confirming the location of incorporated acceptor.
[00176] Figure 8 depicts results of a study identical to that of Figure 6, except that the
DNA comprising incorporated acceptors was immobilized on +-+-+ surfaces instead of PEBN surfaces prior to visualization.
[00177] Example 5: Detection of incorporation of acceptor-labeled nucleotides into surface-immobilized DNA
[00178] Briefly, λ DNA was nicked as described in Example 2, and then immobilized on a PEBN coated surface as described in Example 1. The nicked and immobilized DNA was then contacted with an extension reaction mix containing 300-90OnM YOYO-I, 6.8nM dU-Cy5 and Klenow(exo-) in KB buffer. After 20 to 40 minutes, the extension mix was replaced by buffer containing 25mM Tris pH 7.6 and 5OmM BME. (In some experiments, the YOYO-I dye was included in the Tris-BME buffer instead of in the extension mix). Incorporation products were then visualized using fluorescence microscopy following excitation with an Argon 488nm laser (for YOYO-I) or Red HeNe laser (for dU-Cy5). The images collected using Argon 488nm and Red HeNe excitation were overlaid to determine presence of incorporated fluorescent label. Representative results are shown in Figure 9. [00179] Example 6: Visualization of FRET activity between intercalating dye donors and acceptor labeled nucleotides
[00180] The following studies analyzed and compared FRET signals arising from a single DNA duplex engineered to contain a single FRET donor (A1488), and a native DNA duplex containing multiple intercalated dye donors. The engineered duplex comprised an Alexa 488 (A1488) moiety at the -7 position on the primer strand; the distance between the FRET donor and the FRET acceptor on the engineered DNA duplex was 8 bases or 27A. [00181] Extension reactions were performed in a fluorometer cuvette, wherein a single base-labeled acceptor molecule was incorporated and the increase in both donor and or acceptor signals as a function of time were monitored using a Cary Eclipse fluorometer in the kinetic scan mode. The reactions contained 20OnM primer/template duplex, 2.5uM base- labeled NTP and 60OnM klenow(exo). The reaction was initiated by addition of the enzyme. Acceptor signal intensities obtained from a duplex labeled with A1488 or SYBR Green I donors, and ROX, A1594, A1610 or Cy5 acceptors, were compared with each other. [00182] Typical results are shown in Figure 12. Panel 12(A) shows a schematic for the use of native (non-engineered) duplex comprising multiple intercalated SYBR Green I dye donors. In this test system, FRET occurs between the intercalated donors and the incorporated base labeled dNTPs.
[00183] Panel 12(B) depicts a graphed time series of fluorescence signals of both donor and acceptor groups detected using a fluorometer. The X axis represents time in seconds; the Y axis represents fluorescence intensity in arbitrary units (AU). [00184] Panel 12(C) depicts a bar graph of normalized FRET data from a series of individual incorporation experiments performed in a cuvette. The donor-acceptor pairs are specified on the X axis. The Y axis on the left displays the calculated FRET efficiency using the formula E = IA / IA+ JD where IA and ID are the fluorescence intensities of the acceptor and donor molecules respectively, measured in AU. The Y axis on the right shows the normalized increase in acceptor signal due to FRET. The data are normalized by applying the formula (IA after enzyme injection) - (7A before enzyme injection) / donor intensity at start. As can be seen from this panel, the FRET efficiency, as well as normalized acceptor intensity, using SYBR Green I as a donor, are higher than corresponding FRET efficiencies and normalized acceptor intensities for the Alexa 488 donor samples.
[00185] Panel 12(D) depicts a bar graph of the fold increase in acceptor intensity obtained using SYBR Green I, relative to acceptor intensity obtained using A1488 as the donor.
[00186] As can be seen from Figure 12, the acceptor signals obtained using SYBR
Green as the donor are higher than acceptor signals obtained using the single-donor labeled engineered duplex, for all three acceptors tested. The maximum increase in acceptor intensity following excitation was detected with the acceptor ROX (3 fold increase). [00187] Figure 13 shows a similar study conducted in a single-molecule format.
Briefly, approximately 5OpM of a biotinylated primer template duplex consisting either of a biotinylated derivative of the engineered duplex (which contains a single donor group, A1488, at the -7 position on the primer) or a biotinylated derivative of the native duplex (which does not contain any intrinsic donor, but into which SYBR Green I donor molecules have been intercalated via co-incubation of the native duplex with SYBR Green I at 0.1X concentration) was immobilized on a PEG surface via attachment of the biotin on the template strand of each duplex. Each surface-immobilized duplex was subjected to an extension reaction in situ by injecting into the chamber an extension mixture containing 15OnM Klenow(exo-) polymerase, 0.5uM of base labeled dNTPs (dUTP-ROX or dUTP-Alexa610), 5OmM Tris pH 7.2, 2mM MnSO4, 1OmM Na2SO4, 2mM DTT, 0.1% Triton X-100 and 0.01% Tween-20. The reactions were allowed to occur for 10 minutes followed by a IM NaCl rinse, the samples were then rinsed either with buffer alone, or with buffer supplemented with 0.1X SYBR Green I. The samples were excited using an Argon 488nm laser at 46OuW and the data were collected at 300ms integration time for 150 frames. The emitted signals were detected by a Roper Scientific back-illuminated EMCCD camera (Cascade 1), with an inverted Nikon microscope (TE 2000U), and a 6Ox oil objective. The emitted light was separated using dichroic (560nm, 650nm) and band pass filters (535/50nm; 620/40nm). [00188] FRETAN software was used to obtain donor and acceptor traces and perform
FRET analysis. The FRETAN software is an automated analysis software that identifies each of the spots in the donor channel (taking into consideration noise thresholds), subtracts the background fluorescence, and identifies anti-correlated changes in the time courses of fluorescence at each acceptor wavelength to identify single pair FRET events, and computes approximately 50 attributes associated with FRET. For further details regarding the FRETAN software, see U.S. Provisional App. No. 60/765,693 filed February 6, 2006, and U.S. Published App. No. 2007/0250274 Al, published October 25, 2007, herein incorporated by reference in their entirety. Using the FRETAN software, the FRET efficiency of the individual molecules was calculated using the formula E = IA / ' IA+ID where IA and ID are the fluorescence intensities of the acceptor and donor molecules respectively. Custom designed PERL and MATLAB scripts were used for extracting and graphing the data. [00189] Results are shown in Figure 13, panels A-D, which depict single molecule
FRET time traces generated using the FRETAN software. Panel 13(A) shows a schematic of the Alexa 488 FRET system. Panel 13(B) shows an example trace of FRET between the donor Alexa 488 and an incorporated base labeled dUTP-ROX. Panel 13(C) shows a schematic of the SYBR Green I donor FRET system. Panel 13(D) shows an example trace of FRET between the intercalating dye SYBR Green I as the donor and an incorporated base labeled dUTP-ROX. The acceptor signals detected with SYBR green I as the donor are brighter compared to the signals detected with Alexa488 as the donor.
[00190] Figure 14 shows single molecule FRET data comparing the FRET efficiency and acceptor intensities using A1488 or the intercalating dye SYBR Green as a donor. Scatter plots of acceptor intensity (on the Y axis) and FRET efficiency (on the X axis) for acceptor Alexa 610 and acceptor ROX are shown in Panels (A) and (B), respectively. The lighter grey circles indicate data points obtained using Alexa 488 as the donor and darker stars in both plots A & B indicate data points obtained using SYBR Green as the donor. Panel (C) shows a schematic of Alexa488 and SYBR Green I-driven FRET. Finally, Panel (D) shows a bar graph of Acceptor intensities driven by Alexa 488 or SYBR Green above two user-defined thresholds, i.e., 1500 AU or 2000 AU. The darker bars (on the left) are acceptor signals above 1500 AU and lighter grey bars (on the right) are acceptor signals above 2000 AU. [00191] As shown in Figure 14, only very small percentage of acceptor intensities is higher than the user-defined cut-off thresholds (i.e., 1500 AU or 2000 AU) when A1488 is used as the donor, whereas acceptor intensities using SYBR Green I as the donor are consistently higher than both thresholds.
[00192] In order to test if the presence of the intercalating dye both 5' and 3' of the acceptor increases the acceptor intensity, solution-based extension reactions were performed in the fluorescence cuvette, and increase in both donor and acceptor intensity was tracked as a function of time for 30 minutes. The reactions contained 20OnM primer/template duplex, 2.5uM dNTPs and 60OnM of a mutant Phi29 polymerase lacking exonuclease activity (hereinafter referred to as "Phi29(exo) polymerase"). Based on the template sequence in one experiment natural G, A and C and base labeled U-A1610 were added and the reaction was started with addition of Phi29(exo) polymerase. In another experiment, natural G, A and C and base labeled U-A1610 were added and the reaction was started with addition of Phi29(exo) polymerase, the reaction was monitored for ~ 10 minutes for the naturals and the base labeled to be incorporated, then the reaction was spiked with natural T so that the entire length of the template could be extended. The reaction was monitored for an additional 20 minutes. Note both the reactions were performed in the presence of SYBR Green I at IX concentration. In the first experiment the SYBR Green I is present only 3 ' of the incorporated acceptor. In the second experiment the SYBR Green I is present both 3' and 5 'of the incorporated acceptor.
[00193] Representative results are shown in Figure 15. Panel 15(A) depicts a time series of acceptor intensity over a 30-minute period with SYBR Green present only 5' of the incorporated Alexa 610 acceptor. Panel 15(B) depicts a time series of acceptor intensity over a 30-minute period with SYBR Green I present both 5' and 3' of the incorporated Alexa 610 acceptor. As can be seen from these results, the presence of donors on both the 3' and 5' side of the acceptor increases the acceptor intensity.
[00194] Example 7: Incorporation of acceptor-labeled nucleotide into randomly nicked DNA
[00195] Lambda (λ) DNA was randomly nicked via incubation with DNase I, an enzyme that introduces random single-strand nicks into the DNA backbone. The nicked DNA was labeled in solution with dU-Cy5 or dU-A1594 by incubating 1.33nM of DNA with 6.8nM dU-Cy5 or dU-A1594 in IX KB buffer in the presence of 0.002units/μl DNasel and 457nM Klenow(exo) polymerase. The reaction was incubated at 37°C for 40min, and then stopped by adding DNasel stop solution (Promega). 13pM of the labeled DNA was added to imaging mix containing IXKB, 5OmM BME and 30OnM YoYoI. The imaging mix containing labeled DNA was added to the dry surface of PEBN-coated glass chambers. The bound DNA was then washed with 25mM Tris pH 7.6 containing 5OmM BME to remove excess YOYO-I and unincorporated nucleotides. For reactions including Cy5-labeled nucleotide, the bound DNA was further washed with an oxygen-scavenger containing solution made of 25mM Tris pH7.6, 5OmM BME, lmg/ml glucose oxidase, 0.04mg/ml catalase and 0.4% glucose. Regions of FRET activity were imaged using an Argon 488nm laser (for detection of YOYO- 1 labeled DNA), or a Red HeNe laser (for detection of Cy5), or a Yellow HeNe laser (for A1594 detection). The images collected with the two lasers were overlayed using MetaMorph software to confirm the presence of incorporated acceptor label. The overlay results are shown in Figure 16.
[00196] Example 8: Use of the disclosed methods to screen HIV genotype within particular patients
[00197] In a non-limiting example, diagnostic SNPs within the HIV genome are determined. Primers are constructed along each strand of the HIV genome (the HIV RNA genome may be either directly interrogated, or converted into dsDNA and then ssDNA prior to interrogation) and, preferably, the 3' end of each primer terminates at the site of a candidate SNP. Some primers may terminate at non-variant sites to serve as internal controls for sequencing reaction efficiency and accuracy. If dsDNA is present in the hybridization, snap cooling is preferred to promote primer-template duplex formation, rather than slow cooling which favors reannealing of the template strands.
[00198] When the information about individual HIV genomes is determined in this manner (preferably using persistently-labeled nucleotides or dideoxynucleotides), this information allows health care providers to prescribe a patient's therapeutic regime based on the population of HIV genomes within an individual patient rather than on information about the total HIV sequence content within the individual. This is important because there is currently no cost-effective way to obtain deep sequencing for the HIV genomes that comprise the population of HIV within a patient. This problem is a result of the diversity of the HIV sequences within an individual and the fact that no current sequencing technology can completely sequence an entire HIV genome. Because of the high variability within the HIV population, it is extremely difficult to correctly build consensus sequences for each of the HIV genotypes within a single patient. This, in turn, is a huge problem because the relationship between SNPs within a HIV genome is predictive of therapy outcome. Using the sequencing methods described herein, the HIV population may be screened and therapeutic regime prescribed and modified, as needed. This method can be used to determine the relationship of SNPs in any nucleic acid for any application - i.e., cancer or predicting predisposition to any genetically influenced or determined disease.
[00199] Example 9: Use of the disclosed methods to assign detected sequence variations to particular genotypes
[00200] Generally, as when conventional sequencing methods are used, it is very difficult to assign particular sequence variations to the same source genome if such variations are further apart than the read length of the sequencing system. For example, when sequencing HIV genomes from patient tissue samples, it is difficult to know to which variant a particular variation should properly be assigned, a problem that is exacerbated by the high number of polymorphisms in each HIV genome. The key issue is the proper assembly of the individual sequence reads into a consensus genome for a particular genotype when the variations are not within the same sequence read (i.e., the read may belong to variant A, but not variant B, even though other parts of these variants may be very similar). This situation is depicted diagrammatically in Figure 17. [00201] For example, sequence reads may produce the following four variants, each of which can be analyzed to determine a consensus read for this region of the HIV genome (see
Figure 17 for proper alignment):
[00202] Variant#l: ACTGT ATACGTACGATGCTATGCATCGATTCGTAC
[00203] Variant#2: ACTGTATACGTACGGTGCTATGCATCGATTCGTAC
[00204] Variant#3: CATCGATTCGTACGTGCCTCGAGTTTCTG
[00205] Variant#4: CATCGATTCGTACGTGCCTCGAGCCTCTG
[00206] Consensus: ACTGT ATACGTACGNTGCTATGCATCGATTCGTAC
[00207] GTGCCTCGAGNNTCTG
[00208] The biological activities of sequence variants comprising nucleotides 'N' or
'NN' in different combinations may be very different. For example, substituting 'A' with 'CC in the above consensus may be a genotype resistant to a particular drug therapy, whereas 'A' with 'TT' may be effectively treated with the same therapy.
[00209] The methods and systems of the present disclosure address the central problem of how to align 1, 2, 3 and 4 because they provide important information about the relationship between these different short sequence reads, i.e., whether they occur on the same or different viral genome.
[00210] Example 10: Software analysis of fluorescence data gathered from Lambda
(λ) DNA
[00211] In initial studies, donor bleed into the acceptor channel frequently masked detection of acceptor signals. Thus, analysis software that extracts acceptor signals from the donor bleed was developed.
[00212] Data was processed as follows. A user-defined region of interest (ROI) in an average image of the Lambda (λ) DNA volume was segmented. Thresholding and spatial connectivity information were used to automatically segment the ROI in average image of
Lambda (λ) DNA in the donor channel. Specifically, automatic thresholding followed by largest connected component analysis method was used to segment the Lambda (λ) DNA in the donor as shown in Figure 18, Panels (A) and (B). Using the information about the spatial extent of the ROI in the donor channel, the Lambda (λ) DNA in the acceptor channel is segmented in a similar way as shown in Figure 18, Panels (C) and (D).
[00213] Next, the segmented ROI of both channels was registered using standard image registration techniques, as shown in Figure 19. [00214] Following registration of the donor and acceptor channels, signals were extracted at every corresponding spatial location in donor and acceptor channel ROI and the normalized intensity of every spatially corresponding point in both channels was compared. A criterion was defined as a function of donor intensity and acceptor intensity at a particular point to determine the eligibility of that spatial coordinate as an incorporated label in the Lambda (λ) DNA. Notably, a higher intensity was observed in the acceptor channel only at those points, as compared to the donor channel. Figure 20 shows co-localization of the detected points in the acceptor channel via Argon 488nm or Red HeNe excitation, confirming the accuracy — to the level of pixel registration — of the automated analysis. [00215] All references cited herein are incorporated by reference in their entirety.
[00216] All of the compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, these embodiments are in no way intended to limit the scope of the claims, and it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the methods described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.

Claims

1. A method for sequencing at least a portion of a nucleic acid molecule in realtime or near real-time, comprising the steps of: a. displaying a nucleic acid molecule; b. manipulating the nucleic acid molecule to form one or more polymerase- accessible priming sites along the length of the nucleic acid molecule, wherein the one or more priming sites are separated from each other by a length of nucleotides sufficient to permit independent detection and resolution of sequencing activity occurring at each priming site by a detection system; c. contacting at least a portion of the nucleic acid molecule with a polymerase solution and one or more detectably labeled components under such conditions that extension occurs from at least one priming site; d. monitoring signals emitted during the extension reaction by at least one detectably labeled component; and e. analyzing the signals in real or near real time to determine the sequence of at least a portion of the nucleic acid molecule.
2. The method of claim 1, wherein at least one of the detectably labeled components comprises a FRET donor, a FRET acceptor, or both.
3. The method of claim 1, wherein at least one of the detectably labeled components is a polymerase operably linked to a FRET donor.
4. The method of claim 1, wherein, at least one of the detectably labeled components is an intercalating dye.
5. The method of claim 1, wherein at least one detectably labeled component is a nucleotide operably linked to a FRET acceptor.
6. The method of claim 1, wherein the FRET acceptor is attached to a portion of the nucleotide that is released upon incorporation of the nucleotide into a nascent nucleotide strand that is synthesized by the polymerase.
7. The method of claim 1, wherein displaying the single nucleic acid molecule comprises immobilizing the nucleic acid molecule by attachment to a substrate comprising a surface having layer formulated to immobilize a polynucleotide strand or a plurality of polynucleotide strands in an elongated form.
8. The method of claim 1, wherein displaying the single nucleic acid molecule comprises introducing the molecule into a nano structure adapted to receive and display the molecule.
9. The method of claim 1, wherein manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises annealing one or more oligonucleotide primers along the length of the nucleic acid molecule.
10. The method of claim 1, wherein manipulating the nucleic acid molecule to form a plurality of polymerase-accessible priming sites further comprises contacting the nucleic acid molecule with a nicking reagent adapted to form a plurality of polymerase-accessible nick sites along the length of the nucleic acid molecule.
11. The method of claim 1, further comprising sequencing one or more additional nucleotide strands according to the method of claim 1 in parallel with sequencing a first nucleotide strand according to the method of claim 1.
12. A method for sequencing at least a portion of a nucleic acid molecule in real time or near real time, comprising the steps of: a. immobilizing a nucleic acid molecule on a substrate; b. nicking the immobilized nucleic acid molecule to form one or more polymerase-accessible nick sites along the length of the strand; c. adding an intercalating dye and a polymerase solution, wherein the polymerase solution further comprises a polymerase and one or more detectably labeled nucleotides, under conditions such that an extension reaction is initiated at one or more polymerase-accessible nick sites along the length of the immobilized nucleic acid molecule; d. monitoring signals emitted during the extension reaction at one or more polymerase-accessible nick sites; and e. analyzing the signals in real or near real time to determine the sequence of at least some portion of the nucleic acid molecule.
13. The method of claim 1 or claim 12, wherein the signals emitted during the extension reaction are signals resulting from Forster resonance energy transfer (FRET) from a FRET donor to the FRET acceptor.
14. The method of claim 1 or claim 12, wherein the signals emitted during the extension reaction are FRET signals resulting from energy transfer between at least one intercalated dye molecule and at least one nucleotide labeled with a FRET acceptor.
15. A system for sequencing a nucleotide strand according to the methods of claim 1 or claim 12, comprising: a. a reaction chamber comprising a substrate on which at least one polynucleotide strand can be immobilized and nicked; b. an optical monitoring subsystem capable of detecting signals from extension activity occurring at the nick sites along the at least one polynucleotide strand; and c. an analyzing subsystem that converts the signals detected from extension activity into sequence information and then maps sequence fragments along the length of the at least one polynucleotide strand in such a manner that ordered sequence fragment information is obtained for nucleic acid identification and classification.
PCT/US2008/080843 2007-10-22 2008-10-22 A method and system for obtaining ordered, segmented sequence fragments along a nucleic acid molecule WO2009055508A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2010531213A JP2011505119A (en) 2007-10-22 2008-10-22 Method and system for obtaining ordered and fragmented sequence fragments along a nucleic acid molecule
EP08843207A EP2203568A1 (en) 2007-10-22 2008-10-22 A method and system for obtaining ordered, segmented sequence fragments along a nucleic acid molecule

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US98180307P 2007-10-22 2007-10-22
US60/981,803 2007-10-22

Publications (1)

Publication Number Publication Date
WO2009055508A1 true WO2009055508A1 (en) 2009-04-30

Family

ID=40260414

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/080843 WO2009055508A1 (en) 2007-10-22 2008-10-22 A method and system for obtaining ordered, segmented sequence fragments along a nucleic acid molecule

Country Status (3)

Country Link
EP (1) EP2203568A1 (en)
JP (1) JP2011505119A (en)
WO (1) WO2009055508A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6083731B2 (en) * 2012-09-11 2017-02-22 国立大学法人埼玉大学 FRET type bioprobe and FRET measurement method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002004680A2 (en) * 2000-07-07 2002-01-17 Visigen Biotechnologies, Inc. Real-time sequence determination
US20030044781A1 (en) * 1999-05-19 2003-03-06 Jonas Korlach Method for sequencing nucleic acid molecules
US20040197843A1 (en) * 2001-07-25 2004-10-07 Chou Stephen Y. Nanochannel arrays and their preparation and use for high throughput macromolecular analysis
WO2005040425A2 (en) * 2003-10-20 2005-05-06 Isis Innovation Ltd Nucleic acid sequencing methods
WO2007014397A2 (en) * 2005-07-28 2007-02-01 Helicos Biosciences Corporation Consecutive base single molecule sequencing
US20070202521A1 (en) * 2006-02-14 2007-08-30 Applera Corporation Single Molecule DNA Sequencing Using Fret Based Dynamic Labeling

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030044781A1 (en) * 1999-05-19 2003-03-06 Jonas Korlach Method for sequencing nucleic acid molecules
US20060078937A1 (en) * 1999-05-19 2006-04-13 Jonas Korlach Sequencing nucleic acid using tagged polymerase and/or tagged nucleotide
WO2002004680A2 (en) * 2000-07-07 2002-01-17 Visigen Biotechnologies, Inc. Real-time sequence determination
US20040197843A1 (en) * 2001-07-25 2004-10-07 Chou Stephen Y. Nanochannel arrays and their preparation and use for high throughput macromolecular analysis
WO2005040425A2 (en) * 2003-10-20 2005-05-06 Isis Innovation Ltd Nucleic acid sequencing methods
WO2007014397A2 (en) * 2005-07-28 2007-02-01 Helicos Biosciences Corporation Consecutive base single molecule sequencing
US20070202521A1 (en) * 2006-02-14 2007-08-30 Applera Corporation Single Molecule DNA Sequencing Using Fret Based Dynamic Labeling

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BRASLAVSKY IDO ET AL: "Sequence information can be obtained from single DNA molecules", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, NATIONAL ACADEMY OF SCIENCE, WASHINGTON, DC.; US, vol. 100, no. 7, 1 April 2003 (2003-04-01), pages 3960 - 3964, XP002341053, ISSN: 0027-8424 *
CHAN TING-FUNG ET AL: "A simple DNA stretching method for fluorescence imaging of single DNA molecules.", NUCLEIC ACIDS RESEARCH 2006, vol. 34, no. 17, 2006, pages e113, XP002512223, ISSN: 1362-4962 *
XIAO MING ET AL: "Rapid DNA mapping by fluorescent single molecule detection.", NUCLEIC ACIDS RESEARCH 2007, vol. 35, no. 3, 14 December 2006 (2006-12-14), pages e16, XP002512222, ISSN: 1362-4962 *

Also Published As

Publication number Publication date
JP2011505119A (en) 2011-02-24
EP2203568A1 (en) 2010-07-07

Similar Documents

Publication Publication Date Title
US9587275B2 (en) Single molecule sequencing with two distinct chemistry steps
US9200320B2 (en) Real-time sequencing methods and systems
CN100462433C (en) Real-time sequence determination
US20110281740A1 (en) Methods for Real Time Single Molecule Sequencing
US20110165652A1 (en) Compositions, methods and systems for single molecule sequencing
US10954551B2 (en) Devices, systems, and methods for single molecule, real-time nucleic acid sequencing
US20070031875A1 (en) Signal pattern compositions and methods
US20160115473A1 (en) Multifunctional oligonucleotides
US20070196832A1 (en) Methods for mutation detection
WO2009055508A1 (en) A method and system for obtaining ordered, segmented sequence fragments along a nucleic acid molecule
US20230080657A1 (en) Methods for nucleic acid sequencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08843207

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2010531213

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2008843207

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE