CA2643162A1 - Polymer identification, compositional analysis and sequencing, based on property comparison - Google Patents

Polymer identification, compositional analysis and sequencing, based on property comparison Download PDF

Info

Publication number
CA2643162A1
CA2643162A1 CA002643162A CA2643162A CA2643162A1 CA 2643162 A1 CA2643162 A1 CA 2643162A1 CA 002643162 A CA002643162 A CA 002643162A CA 2643162 A CA2643162 A CA 2643162A CA 2643162 A1 CA2643162 A1 CA 2643162A1
Authority
CA
Canada
Prior art keywords
polymer
data structure
polymers
properties
chemical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002643162A
Other languages
French (fr)
Other versions
CA2643162C (en
Inventor
Ganesh Venkataraman
Zachary Shriver
Rahul Raman
Ram Sasisekharan
Nishla Keiser
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Massachusetts Institute of Technology
Original Assignee
Massachusetts Institute Of Technology
Ganesh Venkataraman
Zachary Shriver
Rahul Raman
Ram Sasisekharan
Nishla Keiser
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Massachusetts Institute Of Technology, Ganesh Venkataraman, Zachary Shriver, Rahul Raman, Ram Sasisekharan, Nishla Keiser filed Critical Massachusetts Institute Of Technology
Publication of CA2643162A1 publication Critical patent/CA2643162A1/en
Application granted granted Critical
Publication of CA2643162C publication Critical patent/CA2643162C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K1/00General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/40Encryption of genetic data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • Y10S707/99936Pattern matching access
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Abstract

A data structure, tangibly embodied in a computer computer-readable medium, representing a polymer of chemical units is disclosed. The data structure includes an identifier including a plurality of fields for storing values corresponding to properties of the polymer. In one embodiment, the fields are capable of storing binary values. The polymer may, for example, be a polysaccharide and the chemical units may be saccharides. A computer-implemented method for determining whether properties of a query sequence of chemical units match properties of a polymer of chemical units. The query sequence is represented by a first data structure, tangibly embodied in a computer-readable medium, including an identifier including a plurality of bit fields for storing values corresponding to properties of the query sequence. The polymer is represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier including a plurality of bit fields for storing values corresponding to properties of the polymer. The invention also relates to methods of sequencing polymers such as nucleic acids, polypeptides and polysaccharides and methods for identifying a polysaccharide-protein interaction. The invention also involves a notational system referred to as Property Encoded Nomenclature.

Description

.64371-431D

SYSTEM AND METHOD FOR NOTATING POLYMERS

This application is a division of Canadian Patent Application No. 2,370,539.
filed April 24, 2000.

Background Various notational systems have been used to encode classes of chemical units.
In such systems, a unique code is assigned to each chemical unit in the class.
For example, in a conventional notational system for encoding amino acids, a single letter of the alphabet is assigned to each known amino acid. A polymer of chemical units can be represented, using such a notational system, as a set of codes corresponding to the chemical units. Such notational systems have been used to encode polymers, such as proteins, in a computer-readable format. A polymer that has been represented in a computer-readable format according to such a notational system can be processed by a computer.
Conventional notational schemes for representing chemical units have represented the chemical units as characters (e.g., A, T, G, and C for nucl.eic acids), and i5 have represented polymers of chemical,wuts as sequences or sets of characters. Various operations may be performed on such a notational representation of a chemical unit or a polymer comprised of chemical units. For example, a user may search a database of chemical units for a query sequence of chemical units. The user typically provides a character-based notMional representation of the sequence in the form of a sequence of characters, which is compared against the character-based notational representations of sequences of chemical units stored in the database. Character-based searching algorithms, however, are typically slow because such algorithms search by comparing individual characters in the query sequence against individual characters in the sequences of chemical units stored in the database. The speed of such algorithms is therefore related to the length of the query sequence, resulting in particularly poor performance for long query sequences.

Summary Polymers may be characterized by identifying properties of the polymers and comparing those properties to reference polymers, a process referred to hereim-a-s property encoded nomenclature (PEN). In one embodiment, the properties are encoded using a binary notation system, and the comparison is accomplished by comparing the binary representations of polymers. For instance, in one aspect a sample polymer is -~~ =j ' WO 00/65521 PCT/US00/10990
-2-subjected to an experimental constraint to modify the polymer, the modified polymer is compared to a reference database of polyrriers to identify a population of polymers having a property that is the same as or similar to a property of the sample polymer. The method may be repeated until the population of polymers in the reference database is reduced to one and the identity of the sample polymer is known.
In one aspect, the invention is directed to a notational system for representing polymers of chemical units. The notational system is referred to as Property encoded nomenclature (PEN). According to one embodiment of the notational system, a polymer is assigned an identifier that includes information about properties of the polymer. For lo example, in one embodiment, properties of a disaccharide are each assigned a binary value, and an identifier for the disaccharide includes the binary values assigned to the properties of the disaccharide. In one embodiment, the identifier is capable of being expressed as a number, such as a single hexadecimal digit. The identifier may be stot=ed in a computer readable medium, such as in a data unit (e.g., record or table entry) of a polymer database. Polymer identifiers may be used in a number of ways. For example, the identifiers may be used to determine whether properties of a query sequence of chemical units match properties of a polymer of chemical units. One application of such matching is to quickly search a polymer database for a particular polymer of interest or for a polymer or polymers having specified properties.
In one aspect, the invention is directed to a data structure, tangibly ernbodied in a computer-readable medium, representing a polymer of chemical units. In another aspect, the invention is directed to a computer-implemented method for generating such a data structure. The data structure may include an identifier that may include one or more fields for storing values corresponding to properties of the polymer. At least one field 25' may be a non=character-based field. Each field may be capable of storing a binary value.
The identifier may be a numerical identifier, such as a number that is representable as a -single-digit hexadecimal number.

The polymer may be any of a variety of polymers. For example, (1) the polymer may be a polysaccharide and the chemical units may be saccharides; (2) the polymer may.
_-be a nucleic acid and the chemical units may be nucleotides; or (3) the polymer may be a polypeptide and the chemical units may be amino acids_ The properties may be properties of the chemical units in the polymer: For example, the properties may include charges of chemical units in the polymer, identities WO 00/65521 PCTIUS00/100i,~'' '
-3-of chemical units in the polymer, confirmations of chemical units in the polymer, or identities af substituents of chemical units in the polymer. The properties may be properties of the polymer that are not properties of any individual chemical unit within the polymer. Example properties include a total charge of the polymer, a total number of s sulfates of the polymer, a dye-binding.of the polymer, a mass of the polymer, compositional ratios of substituents, compositional ratios of iduronic versus glucuronic, enzymatic sensitivity, degree of sulfation, charge, and chirality.
In another aspect, the invention is directed to a computer-implemented method for determining whether properties of a query sequence of chemical units match ~i properties of a polymer of chemical units. The query sequence may be represented by a first data structure, tangibly embodied in a computer-readable medium, including an identifier that may include one or more bit fields for storing values corresponding to properties of the query sequence. The polymer may be represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier thttt ittay include one or more bit fields for storing values corresponding to properties of the polymer. The method may include acts.of generating at least onemask based on the values stored in the one or more bit fields of the first data structure, performing at least one binary operation on the values stored in the one or more bit fields of the second data structure using the at least one mask to generate at least one result, and determining whether the properties of the query sequence match the properties of the polymer based on the at least one result. The chemical units may, for example, be any of the chemical .xnits described above. Similarly, the properties may be any of the properties described above.
In one embodiment, the act of generating includes an act of generating the at least 2 one mask as a sequence of bits that is equivalent to the values stored in the one or more bit fields of the first data structure. In another embodiment, the act of generating includes an act of generating the at least one mask as a sequential repetition of the values stored in the one or more bit fields of the first data structure.
In a further embodiment, the at least one mask includes a plurality of masks and the act of performing at least one binary operation includes acts of performing a logical AND operation on the values stored in the one or more bit fields of the second data structure using each of the plurality of masks to generate a plurality of intermediate results, and combining the pluralityof intermediate results usirrg at least one logical OR
-4-operation to generate the at least one result. In one embodiment, the act of determining includes an act of determining that the properties of the query sequence match the properties of the polymer when the at least one result has a non-zero value.
In a further embodiment, the at least one binary operation includes at least one logical AND

operation.
In another aspect, the invention is directed to a database, tangibly embodied in a =
computer-readable medium, for storing information descriptive of one or more polymers.
The database may include one or more data units (e.g., records or table entries) corresponding to the one or more polymers, each of the data units may include an lo identifier that may include one or more fields for storing values corresponding to properties of the polymer.
In another embodiment, the invention is directed to a data structure, tangibly embodied in a computer-readable medium, representing a chemical unit of a polymer.
'l'he data structure may comprise an identifier including one or more fields.
Each field may be for storing a value corresponding to one or more properties of the chemical unit. At least one field may store a non-character-based value such as, for example, a binary or decimal value.
Polymers may be characterized by identifying properties of the polymers and comparing those properties to reference polymers, a process referred to herein as property encoded nomenclature (PEN). In one embodiment, the properties are encoded using a binary notation system, and the comparison is accomplished by comparing the binary representations of polymers. For instance, in one aspect a sample polymer is subjected to an experimental'constraint to modify the polymer, the modified polymer is compared to a reference database of polymers to identify a population of polymers having a property that is the same as or similar to a property of the sample polymer. The method may be repeated until the population of polymers in the reference database is reduced to one and the identity of the sample polymer is known.
In a system including a database of properties of polymers of chemical units a method for determining the composition of a sample polymer of chemical units having a known molecular weight and length is provided according to one aspect of the invention.
The method includes the steps of (A) selecting, from the database, candidate polymers of chemical units having.
the same length as the sample polymer of chemical units and having v._..
WO 00/65521 PCT/US00/10990 '
-5-molecular weights similar to the molecular weight of the sample polymer of chemical units;
(B) performing an experiment on the sample polymer of chemical units;
(C) measuring properties of the sample polymer of chemical units resulting s from the experiment; and (D) eliminating, from the candidate polymers of chemical units, polymers of chemical units having properties that do not correspond to the experimental results.
In some embodiments the method also includes the step of-(E) repeatedly performing the step (D) until the number of candidate polymers of chemical units falls below a predetermined threshold.
In other aspects the invention is a method for identifying a population of polymers of chemical units having the same property as a sample.polymer of chemical units. The method includes the steps of determining a property of a sauiple polymer of chemical units, and comparing the property of the sample polymer to a reference database of polymers of known sequence and known properties to identify a population of polymers of chemical units having the same property as a sample polymer of chemical units, wherein_the reference database of polymers includes identifiers corresponding to the chemical units of the polymers, each of the identifiers including a field storing a value corresponding to the property.
In one embodiment the step of determining a property of the sample polymer involves the use of mass spectrometry, such as for example, matrix assisted laser desorption ionization mass spectrometry (MALDI-MS), electron spray-MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) to determine the molecular weight of the polymer. MALDI-MS, for instance, may be used to determine the molecular weight of the polymer with an accuracy of approximately one Dalton.
The step of identifying a property of the polymer in other embodiments may involve the reduction in size of the polymer into pieces of several units in length that may be detected by strong ion exchange chromatography. The fragments of the polymer may be compared to the reference database polymers.
According to other aspects, the invention is a method for identifying a subpopulation of polymers having a property in common with a sample polymer of `' .
-6-chemical units. The method involves the steps of applying an experimental constraint to the polymer to modify the polymer, detecting a property of the modified polymer, identifying a population of polymers of chemical units having the same molecular length as the sample polymer, and identifying a subpopulation of the identified population of polymers having the same property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer. The steps may be repeated on the modified polymer to identify a second subpopulation within the subpopulation of polymers having a second property in common with the twice modified polymer. Each of the steps may then be repeated until to the number of polymers within the subpopulation falls below a predetermined threshold:
The method may be performed to identify the sequence of the polymer. In this case the predetermined threshold of polymers within the subpopulation is two polymers.
In yet another aspect, the invention is a method for identifying a subpopulation of polymers having a property in common with a sample polymer of chemical units.
The method involves the steps of applying an experimental constraint to the polymer to modify the polymer, detecting a first property of the modified polymer, identifying a population of polymers of chemical units having a second property in common with the sample polymer, and identifying a subpopulation of the identified population of polymers having the same first property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer.
In one embodiment the experimental constraints applied to the polymer are different for each repetition. The experimental constrain may be any manipulation which alters the polymer in such a manner that it will be possible to derive structural information about the polymer or a unit of the polymer. In some embodiments the experimental constraint applied to the polymer may be any one or more of the following constraints: enzymatic digestion, e.g., with an exoenzyme, an endoenzyme, a restriction endonuclease; chemical digestion; chemical modification; interaction with a binding compound; chemical peeling (i.e., removal of a monosaccharide unit); and enzymatic modification, for instance sulfation at a particular position with a heparin sulfate sulfotransferases.
The property of the polymer that is detected by the method of the invention may be any structural property of a polymer or unit. For instance the property of the polymer WO 00/65521 PCT/US00/1099e.:~
.
-7-may be the molecular weight or length of the polymer. In other embodiments the property may be the compositional ratios of substituents or units, type of basic building block of a polysaccharide, hydrophobicity, enzymatic sensitivity, hydrophilicity, secondary structure and conformation (i.e., position of helices), spatial distribution of substituents, ratio of one set of modifications to another set of modifications (i.e., relative amounts of 2-0 sulfation to N-sulfation or ratio of iduronic acid to glucuronic acid, and binding sites for proteins.

The properties of the modified polymer may be detected in any manner possible which depends on the property and polymer being analyzed. In one embodiment the step lo of detection involves mass spectrometry such as matrix assisted laser desorption ionization mass spectrometry (MALDI-MS), electron spray MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD). Alternatively, the step of detection involves strong ion exchange cliromatography, for example, if the polymer has beeri digested into several smallcr fragments composed of several units.each.
The method is based on a comparison of the sample polymer with a population of polymers of the same length or having at least one property in common. In some embodiments the population of polymers of chemical units includes every polymer sequence having the molecular weight of the sample polymer. In other embodiments the population of polymers of chemical units includes less than every polymer sequence having the molecular weight of the sample polymer. According to some embodiments the step of identifying includes selecting the population of polymers of chemical units from a database including molecular weights of polymers of chemical units.
Preferably the database includes identifiers corresponding to chemical units of a plurality of polymers, each of the identifiers including a field storing a value corresponding to a property of the corresponding chemical unit.
According to another aspect of the invention a method for compositional analysis of a sample polymer is provided. The method includes the steps of applying an experimental constraint to the sample polymer to modify the sample polymer, detecting a property of the modified sample polymer, and comparing the modified sample polymer to a reference database of polymers of identical size as the polymer, wherein the polymers of the reference database have also been subjected to the same experimental t- - ,
-8-constraint as the sample polymer, wherein the comparison provides a compositional analysis of the sample polymer.
In some embodiments the compositional analysis reveals the number and type of units within the polymer. In other embodiments the compositional analysis reveals the.
s identity of a sequence of chemical units of the polymer.
Similarly to the aspects of the invention described above the properties of the polymer may be detected in any manner possible and will depend on the particular property and polymer being analyzed. In one embodiment the step of detection involves.
mass spectrometry such as matrix assisted laser desorption ionization mass spectrometry io (MALDI-MS), electron spray MS, fast atom bombardment mass spectrometry (FAB=
MS) and collision-activated dissociation mass spectrometry,(CAD). Preferably the experimental constraint applied to the polymer is an enzymatic or chemical reaction which involves incomplete enzymatic digestion of the polymer and wherein the steps of the method are repeated until the number of polymers within the reference database falls 15 below a predetermined threshold. Alternatively, the step of detection involves capillary electrophoresis, particularly when the experimental constraint applied to the polymer involves complete degradation of the polymer into individual chemical units.
In one embodiment the reference database includes identifiers corresponding to chemical units of a plurality of polymers; each of the identifiers including a field storing 20 a value corresponding to a property of the corresponding chemical unit.
According to yet another aspect of the invention a method for sequencing a polymer is provided. The method includes the steps of applying an experimental constraint to the polymer to modify the polymer, detecting a property of the modified polymer, identifying a population of polymers having the same molecular length as the 25 sample polymer and having molecular weights similar to the molecular weight of the sample polymer, identifying a subpopulation of the identified population of polymers having.the same property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer, and repeating the steps applying an experimental constraint, detecting 30 a property and identifying a subpopulation by applying additional experimental constraints to the polymer and identifying. additional subpopulations of polymers until the number of polymers within the subpopulation is one and the sequence of the polymer may be identified.

64371-431(S)
-9-In another aspect the invention relates to a method for identifying a polysaccharide-protein interaction, by contacting a protein-coated MALDI
surface with a polysaccharide containing sample to produce a polysaccharide-protein-coated MALDI
surface, removing unbound polysaccharide from the polysaccharide-protein-coated MALDI surface, and performing MALDI mass spectrometry to identify the polysaccharide that specifically interacts with the protein coated on the MALDI surface.
In one embodiment a MALDI matrix is added to the polysaccharide-protein-coated MALDI surface. In other embodiments an experimental constraint may be applied to the polysaccharide bound on the polysaccharidc-proteili-coated MALDI
io surface before performing the MALDI mass spectrometry analysis. The experimental constraint applied to the polymer in some embodiments is digestion with an exoenzyme or digestion with an endoenzyme. In other embodiments the experimental constraint applied to the polymer is selected from the group consisting ofrestrictiun eiiilouuclease digestion; chemical digestion; chemical modification; and enzymatic modification.

Brief Description of the Drawings FIG. I is a block diagram illustrating an example of a computer system for storing and manipulating polymer information.
FIG. 2A is a diagram illustrating an example of a record for storing information about a polymer and its constituent chemical units.
FIG. 2B is a diagram illustrating an example of a record for storing information about a polymer.
FIGS. 2C and 2D are diagrams illustrating examples of a record for storing information about constituent chemical units of a polymer.
FIG. 3 is a flow chart illustrating an example of a method for determining whether properties of a first polymer of chemical units match properties of a second chemical unit.
FIG. 4 is a dataflow diagram of a system for sequencing a polymer.
FIG. 5 is a flow chart of a process for sequencing a polymer.
FIG. 6 is a flow chart of a process for sequencing a-polymer using a genetic algorithm.
-10-FIG. 7A-D is a set of diagrams depicting notation schemes for branched chain analysis.
FIG. 8 is a mass line diagram.
FIG. 9 is a mass-line diagram for (A) Polysialic Acid with.NAN and (B) Polysialic Acid with NGN.
FIG. 10 is a graph (A) depicting cleavage by Hep III of either G(=), 1(0) or 12s(*) litikages, and a graph (B) depicting same study as in A but where cleavage was performed with Hep I.
FIG. I I is a graph dep'icting MALDI-MS analysis of the extended core structures 1o derived from enzymatic treatment of a mixture of bi- and triantennary structures.
FIG. 12 is a graph depicting MALDI-MS analysis of the PSA polysaccharide. (A) intact polysacchaiide structure. (B) Treatment of [A] with sialidase from A.
iurefaciens.
(C)- Digest of [B] with galactosidase from S. pneumoniae. (D) Digest of [C]
with N-acetylhexosaminid.ase from S. pneumoniae. (E) Table of the analysis scheme with schetnatic structure and theoretical molecular masses. [O] = mannose; [*]=
fucose;
[IM]= N-acetylglucosamine; [l]]= galactose; and [,&]=N-acetylneuraminic acid.
Peaks marked with an asterisk are impurities, and the analyte peak is detected both as M-H
(m/z 2369.5) and asa monosodiated adduct (M+Na-2H, m/z 2392.6).
FIG. 13 is a graph depicting the results of enzymatic degradation of the saccharide chain directly off of PSA. (A) PSA before the addition of exoenzymes. (B) Treatment of (A) with sialidase results in a mass decrease of 287 Da, consistent with the loss of one sialic acid residue. (C) Treatment of (B) with galactosidase. (D) Upon digestion of (C) with hexosaminidase, a decrease of 393 Da indicates the loss of two N-acetylglucosamine residues.
FIG. 14 is a graph depicting the results of treatment of biantennary and triantennary saccharides with endoglycanse F2. (A) Treatment of the bian.tennary saccharide results in a mass decrease of 348.6, indicating cieavage between the G1cNAc residues. (B) Treatment of the triantennary saccharide with the same substituents results in no cleavage showing that EndoF2 priniarily cleaves biantennary structures.
(C) EndoF2 treatment of heat denatured PSA. There is a mass reduction of 1709.7 Da in the molecular mass of PSA (compare B4C and B3a) indicating that the normal glycan structure of PSA is biantennary.

Detailed Description The.invention relates in some aspects to methods for characterizing polymers to identify structural properties of the polymers, such as the charge, the nature and number of units of the polymer, the nature and number of chemical substituents on the units, and the stereospecificity of the polymer. The structural properties of polymers may provide useful information about the function of the polymer. For instance, the properties of the polymer may reveal the entire sequence of units of the polymer, which is useful for identifying the polymer. Similarly, if the sequence of the polymer was previously unknown, the structural properties of the polymer are useful for comparing the polymer lo to known polymers having known functions. The properties of the polymer-may also reveal that a polymer has a net charge or has regions which are charged. This information is useful for identifying compounds that the polymer may interact with or predicting which regions of a polymer may be involved in a binding interaction or have a specific function.
Many methods have been described in the prior art fur idetitifyitig polyiiicrs iuid in particular for identifying the sequence of units of polymers. Once the sequence of a polymer is identified the sequence information is stored in a database and may be used to compare the polymer with other sequenced polymers. Databases such as GENBANK
enable the storage and retrieval of information relating to the sequences of nucleic acids which have been identified by researchers all over'the world. These databases typically store information using notational systems that encode classes of chemical units by assigning a unique code to each chemical unit in the class. For example, a conventional notational system for encoding amino acids assigns a single letter of the alphabet to each known amino acid. Such databases represent a polymer of chemical units using a set of codes corresponding to the chemical units. Searches of such databases have typically been performed using character-based comparison algorithms.
New methods for identifying structural properties of polymers which can utilize Bioinformatics and which differ from the prior art methods of assigning a character to each unit of a polymer have been discovered. These methods are referred to as PEN
(property encoded nomenclature). In one aspect, the invention is based on the identification and characterization of properties of a polymer, rather than units of the polymer, and the use of numeric identifiers to classify those properties and to facilitate information processing relating to the polymer.

46.

The ability to identify properties of polymers and to manipulate the infomlation concerning the properties of the polymer provide many advantages over prior art methods of characterizing polymers and Bioinformatics. For instance, the methods of the invention may be used to identify structural information and analyze complex polymers such as polysaccharides which were previously very difficult to analyze using prior art methods.
The heterogeneity and the high degree of variability of the polysaccharide btiilding blocks have hindered prior art attempts to sequence these complex molecules.
With the advent of extremely sensitive techniques like High Pressure Liquid io Chromatography (HPLC), Capillary Electrophoresis (CE) and Mass Spectrometry (MS) to isolate and characterize large biomolecules, significant advances have been made in isolating and purifying polysaccharide fragments containing specific sequences but extensive experimental manipulation is still required to identify and sequence information. Additionally, in most of these approaches, plenty of information about the sequence is required in order to design the experimental manipulations that will enable the sequencing of the polysaccharide. The methods of the prior art provide simple and rapid methods for identifying sequence information. Many other advantages will be clear from the description of the preferred embodiments set forth below.
The present invention will be better understood in view of the following detailed description of a particular embodiment thereof, taken in conjunction with the attached drawings.
FIG. 1 shows an example of a computer system 100 for storing and manipulating polymer information. The computer system 100 includes a polymer database 102 which includes a plurality of records 104a-n storing information corresponding to a plurality of polymers. Each of the records 104a-n may store information about properties of the corresponding polymer, properties of the corresponding polymer's constituent ehemical units, or both. The polymers for which information is stored in the polymer database 102 may be any kind of polymers. For example, the polymers may include polysaccharides, nucleic acids, or polypeptides.
A "polymer" as used herein is a compound having a linear and/or branched backbone of chemical units which are secured together by linkages. In some but not all cases the backbone of the polymer may be branched. The term "backbone" is given its usual meaning in the field of polymer chemistry. The polymers may be heterogeneous in -i .
WO 00/65521 PCT/US00/1099%. --li-backbone composition thereby containing any possible combination of polymer units linked together such as peptide- nucleic acids. ln some embodiments the polymers are homogeneous in backbone composition and are, for example, a nucleic acid, a polypeptide, a polysaccharide, a carbohydrate, a polyurethane, a polycarbonate, a .5 polyurea, a polyethyleneimine, a polvarvlene sulfide, a polysiloxane, a polyimide, a polyacetate, a polyamide, a polyester, or a polythioester. A "polysaccharide"
is a biopolymer comprised of linked saccharide or sugar units. A "nucleic acid" as used herein is a biopolymer comprised of nucleotides, such as deoxyribose nucleic acid (DNA) or ribose nucleic acid (RNA). A polypeptide as used herein is a biopolymer comprised of linked amino acids.
As used herein with respect to linked units of a polymer, "linked"-or "linkage"
means two entities are bound to one another by any physicochemical means. Any linkage known to those of ordinary skill in the art, covalent or non-covalent, is embraced.
Such linkages are well known to those of ordinary skill in the art. Natural linkages, whicli are those ordinnrily fouiid in nuture coiuiectitig the chemical units of a particular polymer, are most common: Natural linkages include, for instance, amide, ester and thioester linkages. The chemical units of a polymer analyzed by the methods of the invention may be linked, however, by synthetic or modified linkages. Polymers where the units are linked by covalent bonds will be most common but also include hydrogen bonded, etc.
The polymer is made up of a plurality of chemical units. A "chemical unit" as used herein is a building block or monomer which can be linked directly or indirectly to other building blocks or monomers to form a polymer. The polymer preferably is a polymer of at least two different linked units. The particular type of unit will depend on the type of polymer. For instance DNA is a biopolymer comprised of a deoxyribose phosphate backbone composed of units of purines and pyrimidines such as adenine, cytosine, guanine, thymine, 5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 2,6-diaminopurine, hypoxanthine, and other naturally and non-naturally occurring nucleobases, substituted and unsubstituted aromatic moieties. RNA is a biopolymer .
comprised of a ribose phosphate backbone composed of units of purines and pyrimidines such as those described for DNA but wherein uracil is substituted for thymidine. DNA
= ,,----, units may be linked to the other units of the polymer by their 5' or 3' hydroxyl group thereby forming an ester linkage. RNA units may be linked to the other units of the ; = ~=.;.

polymer by their 5', 3' or 2' hydroxyl group thereby forming an ester linkage.
Altematively, DNA or RNA units having a terminal 5', 3' or 2' amino group may be linked to the other units of the polymer by the amino group thereby forming an amide linkage.
Whenever a nucleic acid is represented by a sequence of letters it will be understood that the nucleotides are in 5'-+ 3' order from left to right and that "A"
denotes adenosine, "C" denotes cytidine, "G" denotes guanosine, "T" denotes thymidine, and "U" denotes uracil unless otherwise noted.
The chemical units of a polypeptide are amino acids, including the 20 naturally occurring amino acids as well as modified amino acids. Amino acids may exist as amides or free acids and are linked to the other units in the backbone of the polymers through their a-amino group thereby forming an amide linkage to the polymer.
A polysaccharide is a polymer composed of monosaccharides linked to one another. In many polysaccharides the basic building block of the polysaccharide is actually a disaccharide unit which can be repeating or non-repeating. Thus, a unit when used with respect to a polysaccharide refers to a basic building block of a polysaccharide and can include a monomeric building block (monosaccharide) or a dimeric building block (disaccharide).
A "plurality of chemical units" isat least two units linked to one another.
The polymers may be native or naturally-occurring polymers which occur in nature or non-naturally occurring polymers which do not exist in nature. The polymers typically include at least a portion of a naturally occurring polymer. The polymers can be isolated or synthesized de novo. For example, the polymers can be isolated from natural sources e.g. purified, as by cleavage and gel separation or may be synthesized e.g.,(i) amplified in vitro by, for example, polymerase chain reaction (PCR);
(ii) synthesized by, for example, chemical synthesis; (iii) recombinantly produced by cloning, etc.
Fig. 2A illustrates an example of the format of a data unit.200 in the polymer database 102. (i.e., one of the data units 104a-n). As shown in FIG. 2A, the data unit. 200 may include a polymer identifier (ID) 202 that identifies the polymer corresponding to the data unit 200. The polymer ID 202 is described in more detail below with respect to FIG. 2B. The data unit 200 also may include one or more chemical unit identifiers (IDs) 204a-n corresponding to chemical units that are constituents of the polymer corresponding to the data unit 200. The, chemical unit IDs 204a-n are described in more detail below with respect to FIG. 2C. The format of the data unit 200 shown in FIG. 2A
is merely an example of a format that may be used to represent polymers in the polymer database 102. Polymers may be represented in the polymer database in other ways. For example, the data unit 200 may include only the polymer ID 202 or may only include one or more of the chemical unit IDs 204a-n.
FIG. 2B illustrates an example of the polymer 1D 202. The polymer -ID 202 may include one or more fields 202a-n for storing information about properties of the polymer corresponding to the data unit 200 (FIG. 2A). Similarly, FIG. 2C illustrates an example of the chemical unit 204a. The chemical unit ID 204a may include one ore more fields 206a-m for storing information about properties of the chemical unit corresponding to the chemical unit ID 204a. Although the following description refers to. the fields 206a-m of the chemical unit ID 204a, such description is equally applicable to the fields 202a-n of the polymer ID 202a (and the fields of the chemical unit IDs 204b-n).
The fields 206a-ni of the chcmical unit ID 204a may store auy kind of value lhal is capable of being stored in a computer readable medium, such as, for example, a binary value, a hexadecimal value, an integral decimal value, or a floating point value.
Each field 206a-m may store information about any property of the-corresponding chemical unit. Thus, the invention is-useful for identifying properties of polymers. A
. 20 "property" as used herein is a characteristic (e.g., structural characteristic) of the polymer that provides information (e.g., structural information) about the polymer.
When the term property is used with respect to any polymer except a polysaccharide the property provides information other than the identity of a unit of the polymer or the polymer itself. A compilation of several properties of a polymer may provide sufficient information to identify a chemical unit or even the entire polymer- but the property of the polymer itself does not encompass the chemical basis of the chemical unit or polymer.
When the term property is used with respect to polysaccharides, to define a polysaccharide property, it has the same meaning as described above except that due to the complexity of the polysaccharide, a property may identify a type of monomeric 3o building block of the polysaccharide. Chemical units of polysaccharides are much more complex than chemical uriits of other polymers, such as nucleic acids and polypeptides.
The polysaccharide unit has more variables in addition to its basic chemical structure than other chemical units. For example, the polysaccharide may be acetylated or sulfated WO 00/65521 PCI'/US00/10990 at several sites on the chemical unit, or it may be charged or uncharged.
Thus, one property of a polysaccharide may be the identity of one or more basic building blocks of the polysaceharides.
A basic building block alone, however, may not provide information about the s charge and the nature of substituents of the saccharide or disaccharide. For example, a building block of uronic acid may be iduronic or glucuronic acid. Each of these building blocks may have additional substituents that add complexity to the structure of the chemical unit. A single property, however, may not identify such additional substitutes charges, etc., in addition to identifying a complete building block of a polysaccharide.
t o This information, however, may be assembled from several propeirties.
Thus, a property of a polymer as used herein does not encompass an amino acid or nucleotide but does encompass a saccharide or disaccharide building block of a polysaccharide.
A type of property that provides inforination about a polymer may depend on a type of polymer being analyzed. For instance, if the polymer is a polysaccharide, 15 properties such as charge, molecular weight, nature and degree of sulfation or acetylation, and type of saccharide may provide information about the polymer.
Properties may include, but are not limited to, charge, chirality, nature of substituents, quantity of substituents, molecular weight, molecular length, compositional ratios of substituents or units, type of basic building block of a polysaccharide, hydrophobicity, 20 enzymatic sensitivity, hydrophilicity, secondary structure and conformation (i.e., position of helicies), spatial distribution of substituents, ratio of one set of modifications to another set of modifications (i.e., relative amounts of 2-0 sulfation to N-sulfation or ratio of iduronic acid to glucuronic acid), and binding sites for proteins. Other properties may_ be identified by those of ordinary skill in the art. A substituent, as used herein is an atom 25 or group of atoms that substitute a unit, but are not themselves the units.
A.property of a polymer may be identified by any means known in the art. The procedure used to identify a property may depend on a type of property..
Molecular weight, for instance, may be determined by several methods including mass spectrometry. The use of mass spectrometry for determining the molecular weight of 30 polymers is well known in the art. Mass Spectrometry has been used as a powerful tool to characterize polymers because of its accuracy (f l Dalton) in reporting the masses of -fragments generated (e.g., by enzymatic cleavage), and also because only pM
sample concentrations are required. For example, matrix-assisted laser desorption-ionization ;4371-431(S) mass spectrometry (MALDI-MS) has been described for identifying the molecular weight of polysaccharide fragments in publications such as Rhomberg, A. J. et al, PNAS.
USA, v. 95, p. 4176-4181 (1998); Rhomberg, A. J. et al, PNAS. USA, v. 95, p.

12237 (1998); and Ernst, S. et. al., PNAS. USA, v. 95, p. 4182-4187 (1998).

Other types of mass spectrometry known in the ar t, such as, electron spray-MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) can also be used to identify the molecularweight of the polymer or polymer fragments.

The mass spectrometry data may be a valuable tool to ascertain information about lo the polymer fragment sizes after the polvmer has undergone degradation with enzymes or chemicals. After a molecular weight of a polymer is identified, it may be compared to molecular weights of other known polymers. Because masses obtained from the mass spectrometry data are accurate to one Dalton (ID), a size of one or more polymer fragments obtained by enzymatic digestion may be precisely determined, and a number of substituents (i.e., sulfates and acetate groups present) may be determined.
-One technique for comparing molecular weights is to generate a mass line and compare the molecular weight of the unknown polymer to the mass line to determine a subpopulation of polymers which have the same molecular weight. A "mass line" as used herein is an information database, preferably ir- the form of a graph or chart which stores information for each possible type of polymer having a unique sequence based on the molecular weight of the polymer. Thus, a mass line may describe a number of polymers having a particular molecular weight. A two-unit nucleic acid molecule (i.e., a nucleic acid having two chemical units) has 16 (4 units 2) possible polymers at a molecular weight corresponding to two nucleotides. A two-unit polysaccharide (i.e., disaccharide) has 32 2s possible polymers at a molecular weight corresponding to two saccharides.
Thus, a mass line may be generated by uniquely assigning a particular mass to a particular length of a given fragment (all possible di, tetra, hexa, octa, up-to a hexadecasaccharide), and tabulating the results (An Example is shown in-Figure 8).
Table I below shows an example of a computed set of values for a polysaccharide. From Table 1, a number of chemical units of a polymer may be determined from the minimum difference in mass between a fragment of length n+
I and a fragment of length n. For example, if the repeat is a disaccharide unit, a fragment of WO 00/65521 PGT/usoo/1o99o length n has 2n monosaccharide units. For example, n=1 may correspond to a length of a disaccharide and n=2 may correspond to a length of a tetrasaccharide, etc.

Fragment Length n Minimum difference in mass between n+1 and n (Dalton) 1 101.13 2 1.3.03 3 13.03 4 9.01 9.01 6 4.99 7 4.99 8 0.97 9 " _. 0.97 Because mass spectrometrydata indicates the mass of a fragment. to 1 D
accuracy, a length may be assigned uniquely to fragment by looking up a, mass on the mass line.
Further, it may be determined from the mass line that, within a fragment of particular lo length higher than a disaccharide, there is a minimum of 4.02D different in masses indicating that two acetate groups (84.08D) replaced_a sulfate group (80.06D).
Therefore, a number of sulfates and acetates of a polymer fragment may be determined from the mass from the mass spectrometry data and, such number may be assigned to the.
polymer fragment.
In addition to molecular weight, other properties may be determined using methods known in the art. The compositional ratios of substituents or chemical units (quantity and type of total substituents or chemical units) may be determined using methodology known in the art, such as capillary electrophoresis. A polymer may be subjected to an experimental constraint such as enzymatic or chemical degradation to separate each of the chemical units of the polymers. These units then may be separated using capillary electrophoresis to determine the quantity and type of substituents or chemical units present i.rt the polymer. Additionally, a number of substituents or chemical units can be determined usin2 calculations based on the molecular weight of the polymer.
In the method of capillary gel-electrophoresis, reaction samples may be analyzed by small-diameter, gel-filled capillaries. The small diameter of the capillaries (50 m) allows for efficient dissipation of heat generated during electrophoresis.
Thus, high field strengths can be used without excessive Joule heating (400 V/m), lowering the separation time to about 20 minutes per reaction run. therefor increasing resolution over conventional gel electrophoresis. Additionally, many capillaries may be analyzed in parallel, allowing amplification of generated polymer information.
-0 In addition to being useful for identifying a property, compositional analysis also may be used to determine a presence and composition of an impurity as well as a main property of the polymer. Such determinations may be accomplished if the impurity does not contain an identical composition.as the polymer. To determine whether an impurity is present may involve accurately integrating an area under each peak that appears in the electrophoretogram and normalizing the peaks to the smallest of the major peaks. The sum of the normalized peaks should be equal to one or close to being equal to one. If it is not, then one or more impurities are present. Impurities even may be detected in unknown samples if at least one of the disaccharide units of the impurity differs from any disaccharide unit of the unknown.
If an impurity is present, one or more aspects of a composition of the components may be determined using capillary electrophoresis. Because all known disaccharide units may be baseline-separated by the capillary electrophoresis method described above and because migration times typically are determined using electrophoresis (i.e., as opposed to electroosmotic flow) and are reproducible, reliable assignment to a polymer fragment of the various saccharide units may be achieved. Consequently, both a composition of the major peak and a composition of a minor contaminant may be assigned to a polymer fragment. The composition for both the major and minor components of a solution may be assigned as described below.
One example of such assignment of compositions involves determining the composition of the major AT-III binding HLGAG decasaccharide (+ DDD4-7) and its minor contaminant (+ D5D4-7) present in solution in a 9:1 ratio. Complete digestion of this 9:1 mixture with a heparinases yields 4 peaks: three representative of the major decasaccharide (viz., D. 4, and -7) which are also present in the contaminant and one S, . .

peak, 5, that is present only in the contaminant. In other words, the area of each peak for D, 4, and -7 represents an additive combination of a contribution from the major decasaccharide and the contribution from the contaminant, whereas the peak for represents only the contaminant.
To assign the composition of the contaminant and the major component, the area under the 5 peak may be used as a starting point. This area represents an area under the peak for one disaccharide unit of the contaminant. Subtracting this area from the total area of 4 and -7 and subtracted twice this area from an area under D yields a 1:1:3 ratio of 4:-7:D. Such a ratio confirms the composition of the major component and indicates t o that the composition of the impurity is two Ds, one 4, one -7 and one 5.
Methods of identifying other types of properties may be easily identifiable to those of skill in the art and may depend on the type of property and the type of polymer.
For example, hydrophobicity may be determined using reverse-phase high-pressure liquid chromatography (RP-HPLC). Enzymatic sensitivity may be identified by exposing the polymer to an enzyme and determining a number of fragments present after such exposure. The chirality may be determined using circular dichroism.
Protein binding sites may be determined by mass spectrometry, isothermal calorimetry and NMR. Enzymatic*modification (not degradation) may be deterinined in a similar manner as enzymatic degradation, i.e., by exposing a substrate to the enzyme and using MALDI-2o MS to determine if the substrate is modified. For example, a sulfotransferase may transfer a sulfate group to an HS chain having a concomitant increase in 80Da.
Conformation may be determined by modeling and nuclear magnetic resonance (NMR).
The relative amounts of sulfation may be determined by compositional analysis or approximately determined by raman spectroscopy.
In some aspects the invention is useful for generating, searching and manipulating information about polymers. In this aspect the complete building.block of a polytner is assigned a unique numeric identifier, which may be used to classify the complete building block. For instance if a polysaccharide is being analyzed, each numeric identifier would represent a complete building block of a polysaccharide, .30 including the exact chemical structure as defined by the basic building block of a polysaccharide and all of its substituents, charges etc. A basic building block refers to a basic structure of the polymer unit e.g., a basic ring structure of a polysaccharide, such as iduronic acid or glucuronic acid but does not include substituents; charges etc. The information is generated and processed in the same manner as described above with respect to "properties" of polymers.
. Currently, saccharide fragments are detected in capillary electrophoresis by monitoring at 232 nm, the wavelength at which the 04'5 double bond, generated upon heparinase cleavage, absorbs. However, other detection methods are possible.
First, nitrous acid cleavage of heparin fragments, followed by reduction with 3H-sodium borohydride yields degraded fragments having a 3H radioactive tag. This represents both a tag which may be followed by capillary electrophoresis (counting radioactivity) or mass spectrometry (by the increase in mass). Another method of using radioactivity l o would be to label the heparin fragment with S35. Similar to the types of detection possible for 3H-labeled fragments, S35 labeled fragments may be useful for radioactive detection (CE) or measurement of mass differences (MS).
Especially in the case of S35, this detection will be powerful. In this case, the human sulfotransferases may be used to label specifically a certain residue.
This will give additional structural information.
Nitrous acid degraded fragments, unlike heparinase-derived fragments, do not have a UV-absorbing chromophore. As we have shown, MALDI-MS will record the mass of heparin fragments regardless of how they are derived. For CE, two methods may be used to monitor fragments that lack a suitable chromophore. First is indirect detection of fragments. We may detect heparin fragments with our CE
methodology using a suitable background absorber, e:g., 1,5-napthalenedisulfonic acid. The second method for detection involves chelation of metal ions by saccharides: The saccharide-metal complexes may be detected using UV-Vis just like monitoring the unsaturated double bond.
Other groups have begun the process of raising antibodies to specific HLGAG
sequences. We have previously shown that proteins, e.g., angiogenin, FGF, may be used as the complexing agent instead of a synthetic, basic peptide. By extension, antibodies could be used as a complexing agent for MALDI-MS analysis. This enables us to determine whether specific sequences are present in an unknown sample simply by observing whether a given antibody with a given sequence specificity complexes with the unknown using MALDI-MS.
The final point is that using mass tags, we may distinguish the reducing end of a glycosaminoglycan from the non-reducing end. All of these tags involve selective WO 00/65521 PC'I'/US00/10990 chemistry with the anomeric OH (present at the reducing end of the polymer), thus labeling occurs at the reducing end of the chain. One common tag is 2-aminobenzoic acid which is fluorescent. In general tags involve chemistry of the following types: (1) reaction of amines with the anomeric position to form imines (i.e., 2-aminobenzoic acid), hydrazine reaction to form hydrazones. and reaction of semicarbazones with the anomeric OH to form semicarbazides. Commonly used. tags (other than 2-aminobenzoic acid) include the following compounds:
1. semicarbazide 2. Girard's P reagent to 3. Girard's T reagent 4. p-aminobenzoic ethyl ester 5. biotin-x-hydrazide 6. 2-arninobenzainide 7. 2-aminopyridine 8. anthranilic acid 9. 5- [(4,6-dichlorotriazine-2-yl)amino1-fl uorescein 10. 8-aminonaphthalene-1,3,6-trisulfonic acid
11. 2-aminoacridone FIG. 2D illustrates an example of the chemical unit ID 204a. The chemical unit ID 204a contains one or more fields 212a-e for storing information about properties of a polymer. Although the invention encompasses all polymers, the use of the invention is described in more detail with respect to polysaccharides because of the complex nature of polysaccharides. The invention, however, is not limited to polysaccharides.
The heterogeneity of the heparin-like-glycosaminoglycan (HLGAG) fragments and the high degree of variability in their saccharide building blocks have hindered the attempts to sequence these complex molecules. Heparin-like-glycosaminoglycans (HLGAGs) which ' include heparin and heparan sulfate are complex polysaccharide molecules made up of disaccharide repeat units comprising hexoseamine and glucuronicriduronic acid that are linked by a/0 1-4 glycosidic linkages. These defining units may be modified by:
sulfation at the N, 3-0 and 6-0 position of the hexoseamine, 2-0 sulfation of the uronic acid, and C5 epimerization that converts the glucuronic acid to iduronic acid.
The disaccharide unit of HLGAG may be represented as:
(a 1-->4)1/G20x (a/(i 1-44) H3ox,wirbox (a 1-->4), where X may be sulfated (-S03H) or unsulfated (-H), and Y may be sulfated (-S03H) or acetylated (-COCH3) or, in rare cases, neither sulfated nor acetylated.

The fields 212a-e may store any kinds of values, such as, for example single-bit values, single-digit hexadecimal values, or decimal values. In one embodiment, the chemical unit ID 204a includes each of the following fields: (1) a field 212a for storing a value indicating whether the polymer contains an iduronic or a glucuronic acid (I/G); (2) a field 212b for storing a value indicating whether the 2X position of the iduronic or glucuronic acid is sulfated or unsulfated; (3) a field 212c for storing a value indicating whether the hexoseamine is sulfated or unsulfated; (4) a field 212d indicating whether the 3X position of the hexoseamine is sulfated or unsulfated; and (5) a field 212e indicating whether the NX position of the hexoseamine is sulfated or acetylated.
Optionally, each of the fields 212a-e may be represented as a single bit.
Table 2 illustrates an example of a data structure having a plurality of entries, where each entry represents an HLGAG encoded in accordance with Fig. 2D. Bit values for each of the fields 212a-e may be assigned in any known manner. For example, with respect to field 212a (I/G), a value of one may indicate Iduronic and a value of zero may indicate Glucuronic, or vice versa.

CODE (~U) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 1 I-HNs 417.35 0 0 0 1 0 2 I-HNAOs 459.39 0 0 1 1 3 I-HNS,3s 497.41 0 0 1 0 0 4 I-HNA.,65 459.39 0 0 1 0 1 5 I-HNS,6s 497.41 0 0 1 1 0 6 I-HNAOs,6S 539.45_ 0 0 1 1 1 7 I-HNS,3s,6s 577.47 0 1 0 0 0 8 12S-HNAc 459.39 0 1 0 0 1 9 12S-HNS 497.41 0 1 0 1 0 A I2s-HNAc,3s 539.45 0 1 0 1 1 B I2s-HNS,3s 577.47 0 1 1 0 0 C I2S-HNAc,6s 539.45 '- =;

CODE (DU) 0 1 1 0 1 D lis-HNS.6s 577.47 0 1 1 1 0 E 1ZS- 619.51 HNAc,3S.6S
0 1 1 1 1 F 12s-HNS.3s.6s 657.53 1 0 0 0 0 -0 G-HNAC 379.33 1 0 0 0 1 -1 G-HNs 417.35 1 0 0 1 0 -2 G-HNA,,3s 459.39 1 0 .0 1 1 -3 G-HNS,3s 497.41 1 0 1 0. 0 1-4 G-HNAc.6S 459.39 1 0 1 0 1 -5 G-HNS," 497.41 1 0 1 1 0 -6 G-HNAc,3S.6s 5 9.45 1 0 1 1 1 -7 G-1'-INS,3s,6s 577:47 1 1 0 0 0 -8 G2s-HNA. 459.39 1 1 0 0 1 -9 G2S-HNs 497.41 1 1 0 1 0 -A G2S-HNAos 539.45 I 1 0 1 1 -B G2s-HNs,3s 577.47 1 I 1 0 0 G2S-HNA.,6S

1 1 1 0 1 -D G2s-HNs,6s 577.47 1 I 1 1 0 -E G2S- 619.51 HNAOS.6S
1 1 l 1 1- -F G2S- 657.53 HNS,3S,6S

Representing a HLGAG using a bit field may have a number of advantages.
Because a property of an HLGAG may have one of two possible states, a binary bit is ideally-suited for storing information representing an HLGAG property. Bit fields may be used to store such information in a computer readable medium (e.g., a computer memory or storage device), for example, by packing multiple bits (representing multiple -- 04371-431 (S) fields) into a single byte or sequence of bytes. Furthermore, bit fields may be stored and manipulated quickly and efficiently by digital computer processors, which typically store information using bits and which typically can quickly perform operations (e.g., shift, AND, OR) on bits. For example, as described in more detail below, a plurality of properties each stored as a bit field can be searched more quickly than searches conducted using typical character-based searching methods.
Further, using bit fields to represent properties of HLGAGs permits a user to more easily incorporate additional properties (e.g., 4-0 sulfation vs.
unsulfation) into a chemical unit ID 204a by adding extra bits to represent the additional properties.
In one embodiment, the four fields 212b-e (each of which may store a single-bit value) may be represented as a single hexadecimal (base 16) number where each of the fields 212a-e represents one bit of the hexadecimal number. Using hexadecimal numbers to represent disaccharide units is convenient both for representation and processing.
because hexadecimal digits are a common form of representation used by conventional computers.
Optionally, the five fields 212a-e of the record' may be represented as signed hexadecimal digit; in which the fields 212b-212e collectively encode a single-digit hexadecimal number as described above and the I/G field is used as a sign bit.
In such a signed representation, the hexadecimal numbers 0-F may be used to code chemical units containing iduronic acid and the hexadecimal numbers -0 to -F may be used to code units containing glucuronic acid. The chemical unit ID 204a may, however, be encoded using other forms of representations, such as by using a twos-complement representation:
The fields 212a-e of the chemical unit ID 204a may be arranged in any order.
For example, a gray code system may be used to code HLGAGs. In a gray code numbering scheme, each successive value differs from the previous value only in.a single bit position. For example, in the case of HLGAGs, the values representing HLGAGs may be arranged so that any two neighboring values differ in the value of only one property. An. example of a gray code system used to code HLGAGs is shown in Table 3.

I/G 2X 6X 3X NX Numeric DISACC MASS
16 8 4 2 1 Value (AU) 1/G 2X 6X 3X NX Numeric DISACC MASS
16 .8 4 2 1 Value (AU) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 1 I-HNs 417.35 0 0 0 1 1 3 I-HNS,3s 497.41 0 0 0 1 0 2 I-HNAOS 459.39 0 0 1 1 0 6 I-HNAc,3S,6s 539.45 0 0 1 1 1 7 I-HNS,3s,6s 577.47 0 0 1 0 1 5 I-HNS,6s 497.41 0 0 1 0. 0 4 I-HNAc,6S 459.39 0 1 1 0 0 12 12s-HNA,,6s 539.45 0 1 1 0 1 13 I2s-HNS,6s 577.47 0 1 1 1 1 15 I2S-HNS,35,6s 657.53 0 1 1 1 0 14 I26-HNAo.35.63619.51 0 1 0 1 0 10 12s-HNAc,3S 539.45 0 1 0 1 1 11 I2s-HNS,3s 577.47 0 1 0 0 1 9 I2S-HNS 497.41 0 1 0 0 0 8 I2s-HNAC 459.39 1 1 0 0 0 24 G2s-HNA, 45939 l 1 0 0 1 25 G2S-HNs 497.41 1 1 0 1 1 27 G2s-HNS,3s 577.41 1 1 0 1 0 26 G2s-HNAOs 539.45 1 1 1 1 0 30 G2s-HNA,3s,6s 619.51 1 1 1 1 1 31 G2s-HNs,3s,6s 657.53 1 1 1 0_ 1 29 G2s-HNS,6s 577.47 1 1 1 0 0 28 G2s-HNA,,6s 539.45 1 0 1 0 0 20 G-HNAc,6s 459.39 1 0 1 0 1 21 G-HNs,6s 497.41 1 0 1 1 1 23 G-HNS,3s,6s 577.47 1. 0 1 1 0 22 G-HNAOs,6s 539.45 1 0 0 1 0 18 G-HNAOs 459.39 WO 00/65521 PCT/t1S00/10990 I/G 2X 6X 3X NX Numeric DISACC MASS
16 8 4 2 1 Value (AU) 1 0 0 1 1 19 G-HNS3S 497.41 1 0 0 0 1 17 G-HNs 417.35 1 0 0 0 0 16 G-HNAc 379.33 Table 3 illustrates that use of a gray coding scheme arranges the disaccharide building blocks such that neighboring table entries differ from each other only in the value of a single property. One advantage of using gray codes to encode HLGAGs is that a biosynthesis of HLGAG fragments may follow a specific sequence of modifications starting from the basic building block G-HHN..
In Table 3, bit weights of 8, 4, 2, and l are used to calculate the numerical lo equivalent of a hexadecimal number with the most significant bit (I/G) being used as a sign bit. For example, the hexadecimal code A(01010 binary) is equal to 8* 1 +
4*0 +
2*1+1*0-10.
In another embodiment, the weights of each of the fields 212a-e may be changed thereby implementing an alternative weighting system. For example, bit fields 212a-e "may have weights of 16, 8, 4, -2, and -1, respectively, as shown in Table 4.

I/G 2X NX 3X 6X Value DISACC MASS
16 8 4 -2 -1 (~U) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 -1 I-HNAc,6s 459.39 0 0 0 1 0 -2 I-HNAOs 459.39 0 0 1 1 -3 I-HNAC,3S,6S 539.45 0 0 1 0 0 4 I-HNs 417.35 .0 0 1 '0 1 3 I-HNS,cs 497.41 0 0 1 1 0 2 I-HNS,3s 497.41 0 0 1 1 1 1 I-HN5,35,6s 577.47 0 1 0 0 0 8 I2s-HNAc 459.39 ~,= . . WO 00/65521 PCr/US00/10990 I/G 2X NX 3X 6X Value DISACC MASS
16 8 4 -2 -1 (AU) o 1 0 0 1 7 I2S-HNAc.6S 539.45 0 1 0 1 0 6 12s-HNAC,3s 539.45 0 1 0 1 1 5 I2s-HNAc,3S,6s 619.51 0 1 1 0 0 12 12s-HNS 497.41 0 I 1 0 1 1 I I2S-HNS.6s 577.47 0 1 1 1 0 10 12s-HNS,3S 577.47 0 1 1 1 1 9 12s-HNS,3s,6s 657.531 0 0 0 0 16 G-HNA, 379.33 1 0 0 0 1 15 G-HNAc,6s 459.39 1 0 0 1 0 14 G-HNAc,ss 459.39 1 0 0 1 1 13 G-HNAo,3s,6s 539.45 1 0 1 0 0 20 G-fINs 417.35 1 0 1 0 1 19 G-HNs,6s 497.41 1 0 1 1 0 18 G-HNs,3s 497.41 1 0 1 1 1 17 G-HNs,3s,6s 577.47 I 1 0 0 0 24 G2S-HNAC 459.39 1 1 0 0 1 23 G2s-HNA,,6s 539.45 1 1 0 1 0 22 G2S-HNAOs 539.45 1 1 0 1 1 21 G2s-HNA,,3s,6s 619.51 1 1 1 0 0 1128 G2S-HNS 497.41 1 1 1 0- 1 27 G2S-HNS,6S 577.47 1 1 1 1. 0 26 G2s-HNS,3s 577.47 i I 1 1 1 25 G2s-HNs,3s,6s 657.53 Modifying the weights of the bits may be used to score the disaccharide units.
For example, a database of sequences may be created and the different disaccharide units may be scored based on their relative abundance in the sequences present inthe database.
Some units, for example, I-HNAOS6S, which rarely occur in naturally-occurring HLGAGs, may receive a low score based on a scheme in which the bits are weighted in the manner shown in Table 4.
Optionally, the sulfation and acetylation positions may be arranged in an shown in Table 2: I/G, 2X, 6X, 3X, NX. These positions may, however, be arranged differently, resulting in a same set of codes representing different disaccharide units.
Table 5, for example, shows an arrangement in vYhich the positions are arranged as I/G, 2X, NX, 3X, 6X.

CODE (AU) 0 0 0 0 0 0 I-HNA, 379.33 0 0 0 0 1 1 I-HNA,,6s 459.39 0 0 0 1 0 2 I-I-INAc,3S 459.39 0 0 1 1 3 I-HNA,,3s,6s 539.45 0 0 1 0 0 4 I-HNS 417.35 0 0 1 0 1 5 I-HNS,6s 497.41 0 0 1 1 0 6 I-HNS,3s 497.41 0 0 1 1 1 7 I-HNS,3s,6s 577.47 0 1 0 0 0 8 12S-HNAc 459.39 0 1 0 0 1 9 I2s-HNA~,6s 539.45 0 1 0 1 0 A I2S-HNA,,3s 539_45 0 1 0 1 1 B I2S- 619.51 HNAc,3S.6S
0 1 1 0_ 0 C I2s-HNS 497.41 0 1 1 0 1 D I2s-HNS,6s 577.47 0 1 1 1 0 E- I2S-HNS,3S 577.47 0 1 1 1 1 F 12S=HNS.3S,6s 657.53 1 0 0 0 0 -0 G-HN,4c 379.33 1 0 0 0 1 1-1 G-HNA,,6s 459.39 1 0 0 1 0 -2 G-HNA,,3s _ 459.39 1 0 0 1 1 -3 G-HNAc,3S.6s 539.45 CODE (AU) 1 0 1 .0 0 4 G-HNS 417.35 1 0 1 0 1 -5 G-HNS,6S 497.41 0 1 1 0 -6 G-HNS.3S 497.41 1 0 1 1 1 -7 G-HNS,3S,6s- 577.47 1 1 0 0 0 -8 G2S-HNAc 459.39 1 1 0 0 1 -9 G2S-HNAc66S 539.45 1 0 1 0 -A G2S-IHNAc,3s 539.45 l 1 0 1 1 -B GZS- 619.51 HNAc,3S,6S
1 1 0 0 -C Gzs-HNS 497.41 1 11 1 0. 1¾~ -D
62S-HNS,6s 577.47 I I 1 0 -E G2S-HNS,3s 577.47 1 1 l. 1 1 -F G2S- 657.53 HNS,3S,6S

It has been observed that disaccharide units in some HLGAG sequences are neither N-sulfated nor N-acetylated. Such disaccharide units may be represented using the chemical unit ID 204a in any of a number of ways.
If the properties of a chemical unit are represented by bit fields, disaccharide units that contain a free amine in the N position may be represented by, for example, adding an additional bit field. For example, referring to FIG. 2D, an additional field NY
io may be used in the chemical unit ID 204a.. For example, an NY field having a value of zero may correspond to a free amine, and an NY field having a value of one may correspond to N-acetylation, or vice versa. Further, a value of one in the NX
field 212e may correspond to N-sulfation.
Optionally, disaccharide units that contain a free amine in the N position may be represented using a tristate field. For example, the field 212e (NX) in the chemical unit ID 204a may be a tristate field having three permissible values. Forexample, a value of zero may correspond to a free amine, a value of one may correspond to N=acetylation, WO 00/65521 PCT/US00/1099i.

and a value of two could correspond to N-sulfation. Similarly, the values of any of the fields 212a e may be represented using a number system with a base higher than two.
For example, if the value of the field 212e (NX) is represented by a single-digit number having a base of three, then the field 212e may store three permissible values.

Referring to Fig. 1, user may perform a query on the polymer database 102 to.
search for particular information. For example, a user may search the polymer database 102 for specified polymers, specified chemical units, or polymers or chemical units having specified properties. A user may provide to a query user interface 108 user input 106 indicating properties for which to search. The user input 106 may, for example, to indicate one or more chemical units, a polymer of chemical units or one or more properties to search for using, for example, a standard character-based notation. The query user interface 108 may, for example, provide a graphical user interface (GUI) which allows the user to select from a list of properties using an input device such as a keyboard or a mouse.
'1'he query user interface 108 may generate a search query 110 based on the user input 106. A search engine 112 may receive the search query 110 and generate a mask 114 based on the search query. Example formats of the mask 114, and example techniques to determine whether properties specified by the mask 114 match properties of polymers in the polymer database 102 are described in more detail below in .2o connection to Fig. 3.
The search engine .112 may determine whether properties specified by the mask 114 match properties of polymers. stored in the polymer database 102.
Subsequently, the search engine 112 may generate search results 116 based on the search indicating whether the polymer database 102 includes polymers, having the properties specified by the mask 114. The search results 116 also may indicate polymers in the polymer database 102 that have the properties specified by the mask 114. For example,,ifthe user input 106 specified properties of a chemical unit, the search results 116 may indicate which polymers in the polymer database 102 include the specified.chemical unit.
Alternatively, if the user input 106 specified part_icular chemical unit properties, the search results 116 may indicate polymers in the polymer database 102 that include chemical units having the specified chemical unit properties. Similarly, if the user input 106 specified particular polymer properties, the search results;116 may indicate which _ polymers in the polymer database 102 have.the specified polymer properties.

`..-- ~='_ -Fig. 3 is a flowchart illustrating an example of a process 300 that may be used by the search engine 112 to generate the search results 116. In act 302, the search engine 112 may receive a search query 110 from the query user interface 108. Next, in act 304, the search engine 112 may generate a mask 114 generated based on the search query 110:
In a following act 306, the search engine 112 may perform a binary operation on one or more of the records 104a-n in the polymer database 102 by applying the mask 114.
Next, in act 308, the search engine 112 may generate the search results 116 based on the results of the binary operation performed in step 306.
The process 300 will now be described in more detail with respect to an embodiment in which the fields 206a-m of the chemical unit 204a are binary fields. In act 302, the received search query 110 may indicate to search the polymer database 102 for a particular chemical unit, e.g. the chemical unit 12S-HNS. If , for example, the coding scheme shown in Table 1 is used to encode chemical units in the polymer database, the chemical unit 12s-HNS may be represented by a binary value of 01001. To generate the mask 114 for this chemical unit (step 304); the search engine 112 may use the binary value of the chemical unit, i.e., 01001, as the value of the mask 114. As a result, the values of the bits of the mask 114 may specify the properties of the chemical unit 12S-HNS. For example, the value of zero in the leftmost bit position may indicate Iduronic, and the value of one in the next bit position may indicate that the 2X
position is sulfated.
The search engine 112 may use this mask 114 to determine whether polymers in the polymer database 102 contain the chemical unit 12s-HNs. To make this determination, the search engine 112 may perform a binary operation on the data units 104a-n of the polyrrier database 102 using the mask 114 (step 306). For example, the search engine 112 may perform a logical AND operation on each chemical unit of each of the polymers in the polymer database 102 using the mask 114. If the result of the logical AND
operation on a particular chemical unit is equal to the value of the mask 114, then the chemical unit may satisfy the search query .110, and, in act 308, the search engine 112 may indicate a successful match in the search results 116. The search engine 112 may generate additional information in the search results 116, such as the polymer identifier of the polymer containing the matching chemical unit.
In response to receiving the search query in act 302, in act 304, the search engine 112 also may generate the mask 114 that indicates one or more properties of a particular polymer or chemical unit. To generate the mask 114 for such a search query, the search WO 00/65521 PCTlUS00/1099G

engine 112 may set each bit position in. the mask according to a property specified by the search query to the value specified by the search query. Consider, for example, search query 110 that indicates a search for all chemical units in which both the 2X
position and the 6X position are sulfated. To generate a mask corresponding to this search query, the search engine 112 may set the bit positions of the mask corresponding to the 2X and 6X
positions to a value corresponding to being sulfated. Using the coding scheme shown above in Table 1, for example, in which the 2X and 6X positions have bit positions of 3 and 2 (counting from the rightmost position beginning at bit position zero), respectively, the mask corresponding to this search query is 01100. The two bits of this mask that 1o have a value of one correspond to the bit positions in Table I
corresponding to the 2X
and 6X positions.
To determine whether the one or more properties of a particular chemical unit in the polymer database 102 match the one or more properties specified by the mask 114, the search engine 112 may perform a logical AND operation on the chemical unit identif3er of the chemical unit in the polymer database 102 using the mask 114. To generate search results for th.is chemical unit (i.e., act 308), the search engine 112 may compare the result of the logical AND operation to the mask 114. If the values of the bit positions of the logical AND operation corresponding to the properties specified by the search query are equal to the values of the same bit positions of the mask 114, then the chemical unit has the properties specified by the search query 110, and the search engine 112 indicates a successful match in the search results 116.

For example, consider the search query 110 described above, which indicates a search for all chemical units in which both the 2X position and the 6X
position are sulfated. Using the coding. scheme of Table 1, the bit positions corresponding to the 2X
and 6X positions are bit positions 3 and 2. Therefore, after perfonning a logical AND
operation on the chemical unit identifier of a chemical unit using the mask 114, the search engine 112 compares bit positions 3 and 2 of the result of the logical AND
operation to bit positions 3 and 2 of the mask. If the values in both bit positions are equal, then the chemical unit has the properties specified by the mask 114.

The techniques described above for generating the mask 114 and searchingwith a mask 1 l4 also may be used to perfornl searches with respect to sequences of chemical units or entire polymers. For example, if the search query 110 indicates a sequence of chemical units, the search engine 112 may fill the mask 114 with a sequence of bits = WO 00/65521 PCT/US00/10990 corresponding to the concatenation of the binary encodings of the specified sequence of chemical units. The search engine 112 may then perform a binary AND operation on the polymer identifiers in the polymer database 102 usiiig the mask 114, and generate the search results 116 as described above.
The techniques described above for generating the mask 114 and searching with the mask 114 are provided merely as an example. Other techniques for generating and searching with the mask 114 may also be used. The search engine 112 also may use more than one mask for each search query 110, and the search engine 112 may perform multiple binary operations in parallel in order to improve computational efficiency. In io addition, binary operations other than a logical AND may be used to determine whether properties of the polymers in the polymer database 102 match the properties specified by the mask 114: Other binary operations include, for example, logical OR and logical XOR (exclusive or). Such binary operations may be used alone or in cornbination with each other.
Using the techniques described above, the polymer database 102 may be searched quickly for particular chemical units. One advantage of the process 300, if used in conjunction with a chemical unit coding scheme that encodes properties of chemical units using binary values is that a chemical unit identifier (e.g., the chemical unit identifier 204a) may be compared to a search query (in the form of a mask) using a single binary operation (e.g., a binary AND operation). As described above, conventional notation systems that use character-based notation systems to encode sequences of chemical units (e.g., systems which encode DNA sequences as sequences of characters) typically search for a sub-sequence of chemical units (represented by a first sequence of characters) within a super-sequence of chemical units (represented by a second sequence of characters) and use character-based comparison. Such a comparison typically is slow because it sequentially compares each character in a first sequence of characters (corresponding to the sub-sequence) to characters in a second sequence until a match is found. Consequently, the speed of the search is related to the length of the sub-sequence--i.e., the longer the sub-sequence, the slower the search.
In contrast, the speed of the techniques described above for searching binary operations may be constant in relation to the length of a sub-sequence that is the basis for the search query. Because the search engine 112 can search for a query sequence of chemical units using a single binary operation (e.g., a logical AND operation) regardless j ..

WO 00/65521 PCT/US00/1099f,' of the length of the query sequence, searches may be performed more quickly than conventional character-based methods whose speed is related to the length of the query sequence. Further, the binary operations used by the search engine 112 may be performed more quickly because conventional computer processors are designed to perform binary operations on binary data.
A further advantage of the techniques described above for searching using binary operations is that encoding one or more properties of a polymer into the notational representation of the polymer enables the search engine 112 to quickly and directly search the polymer database 102 for particular properties of polymers. Because the l o properties of a polymer are encoded into the polymer's notational representation, the search engine 112 may determine whether the polymer has a specified property by determining whether the specified property is encoded in the polymer's-not.ational representation. For example, as described above, the search engine 112 may determine whether the polymer has the specified property by performing a logieal AND
operation on the polymer's notational representation using the mask 114. This operation may be performed quickly by conventional computer processors and may be performed using only the polymer's notational representation and the mask, without reference to additional information about -the properties of the polymer.
Some aspects of the techniques, described herein for representing properties using binarypotation may be useful for. generating, -searching and manipulatinginformation about polysaccharides. Accordingly, complete building.block of a polymer may be assigned a unique numeric identifier, which may be used to classify the complete building block. For example, each numeric identifier may represent a complete building block of a polysaccharide, including the exact chemical structure as, defined by the basic building block of a polysaccharide and all of its substituents, charges etc. A
basic building block refers to a basic ring structure such as iduronic acid or glucuronic acid but does not include substituents, charges etc. Such building block information may be generated and processed in a same or similar manner as described above with respect to "properties" of polymers.
A computer system that may implement the system 100 of FIG. 1 as a computer program typically may include a main unit connected to both an output. device which displays information to a user and an input device which receives input from auser. The main unit generally includes a processor eonnected.to a memory system via m interconnection meehanism. The input device and output device also may be connected to the processor and memory system via the interconnection mechanism.
One or more output devices may be connected to the computer system. Example output devices include a cathode ray tube (CRT) display, liquid crystal displays (LCD), printers, communication devices such as a modem, and audio output. One or more input devices also may be connected to the computer system. Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet, communication device, and data input devices such as sensors. The subject matter disclosed herein is not limited to the particular input or output devices used in combination with the computer system or to to those described herein.
The computer system may be a general purpose computer system which is programmable using a computer programming language, such as C++, Java, or other language, such as a scripting language or assembly language. The computer system also may include specially-programmed, special purpose hardware such as, for example, an Application-Specific Integrated Circuit (ASIC). In a general purpose computer system, the processor typically is a commercially-available processor, of which the series x86, Celeron, and Pentium processors, available from Intel, and similar devices from AMD
and Cyrix, the 680X0 series microprocessors available from Motorola, the PowerPC
microprocessor from IBM and the Alpha-series processors from Digital Equipment Corporation, are examples: Many other processors are available. Such a microprocessor executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management; and commtinication control and related services. The processor and operating system define a computer platform for which application programs in high-level programming languages may be written.
A memory system typically includes a computer readable and writeable nonvolatile recording medium, of which a magnetic disk, a flash memory and tape are examples: The disk may be removable, such as a "floppy disk," or permanent, known as a hard drive. A disk has a number of tracks in which signals are stored, typically in binary form, i.e., a form interpreted as a sequence of one and zeros. 'Such signals.may define an application program to be executed by the microprocessor, or information WO 00/65521 PCT/US00/10991, . =' stored on the disk to be processed by the application program. Typically, in operation, the processor causes data to be read from the nonvolatile recording medium into an integrated circuit memory element, which is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM). The integrated circuit memory element typically allows for faster access to the information by the processor than does the disk. The processor generally manipulates the data within the integrated circuit memory and then copies the data to the disk after processing is completed. A variety of mechanisms are known for managing data movement between the disk and the integrated circuit memory element, and the subject matter disclosed t o herein is not limited to such mechanisms. Further, the subject matter disclosed herein is not limited to a particular memory system.
The subject matter disclosed herein is not limited to a particular computer platform, particular processor, or particular high-level programming language.
Additionally, the coniputcr systcni may be a multiproeessor computer systern or may -5 include multiple computers connected over a conlputer network. It should be understoud that each module (e.g. 110, 120) in FIG. 1 may be separate modules of a computer program, or may be separate computer programs. Such modules may be operable on separate computers. Data (e.g., 104, 106, 110, 114 and 116) may be stored in a memory system or transmitted between computer systems. The subject matter disclosed herein is 2o not limited to any particular implementation using software or hardware or firmware, or any combination thereof. The various elements of the system, either individually or in combination, may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor.
Various steps of the process may be performed by a computer processor executing a program tangibly 25 embodied on a computer-readable medium to perform functions by operating on input and generating output. Computer programming languages suitable for implementing such a system include procedural programming languages, object-oriented programming languages, and combinations of the two.
Referring to FIG. 4, a system 400_for sequencing polymers, is shown. The system 30 400 includes a polymer database 402 which includes a plurality of records storing information cortesponding to a plurality of polymers.. Each of the records may store information about properties of the corresponding polymer, properties of the corresponding polymer's constituent chemical units, or both. The polymers for which information is stored in the polymer database 402 may be any kind of polymers.
For example, the polymers may include polysaccharides, nucleic acids, or polypeptides. In one embodiment, each of the records in the polymer database 402 includes a polymer identifier (ID) that identifies the polymer corresponding to the record. The record also includes chemical unit identifiers (IDs) corresponding to chemical units that are constituents of the polymer corresponding to the record. Polymers may be represented in the polymer database in other ways. For example, records in the polymer database 402 may include only a polymer ID or may only include chemical unit IDs.
The polymer database 402 may be any kind of storage medium capable of storing.
1o information about polymers as described herein. For example, the polymer database 402 may be a flat file, a relational database, a table in a database,.an object or structure in a computer-readable volatile or non-volatile memory, or any data accessible to a computer program, such as data stored in a resource fork of an application program file on a computer-readable storage medium.
In one embodiment, a polymer ID includes a plurality of fields for storing information about properties of the polymer corresponding to the record containing the polymer ID. Similarly, in one embodiment, chemical unit IDs include a plurality of fields for storing infonnation about properties of the chemical unit corresponding to the chemical unit ID. Although the following description refers to the fields of chemical unit Ids, such description is equally applicable to the fields of polymer IDs.
The fields of chemical unit IDs may store any kind of value that is capable of being stored in a computer readable medium, such as a binary value, a hexadecimal value, an integral decimal value, or a floating point value. The fields may store information about any properties of the corresponding chemical unit.
A compositional analyzer 408 receives as input a sample polymer 406 and generates as output polymer composition data 410 that is descriptive of the composition of the sample polymer. A compositional analyzer as used herein is any type of equipment or experimental procedure that may be used to identify a property of a polymer modified by anexperiment constraint, such as those described above.
These include, for instance, but are not limited to capillary electrophoresis, mass spectrometry, and chromatography. The polymer composition data 410 includes information about the sample polymer 406, such as the properties of the chemical units in the sample polymer 406 and the number of chemical units in the sample polymer 406. A sequencer i.- ,. .. .
WO 00/65521 PCT/US00110990' generates a candidate list 416 of a subpopulation of polymers that might match the sample polymer 406 in the process of sequencing the sample polymer 406 using information contained in a mass line 414 and the polymer database 402. A
candidate list is also referred to herein as a"population" of polymers. At the end of the sequencing process, the candidate list 416 contains zero or more polymers that correspond to the sample polymer 406. A subpopulation of polymers is defined as a set of polymers having at least two properties in common with a sample polymer. It is useful to identify subpopulations of polymers in order to have an information set with which to compare the sample polymer 406.
Consider, for example, the sequence DD7DAD-7, which is a tetradecasaccharide (14 mer) of HLGAG containing 20 sulfate groups. The compositional analyzer 408 may, for example, perform compositional analysis of DD7DAD-7 by degrading the sequence to its disaccharide building blocks and analyzing the relative abundance of each unit using capillary electrophoresis to generate the polymer composition data 410.
The polymer composition data 410 in this case would show a major peak corresponding to D, a peak about'/Z the size of the major peak corresponding to 7 and another peak about 1/4 the size of the major peak corresponding to A_ Note that the sign is used because degradation by heparinase would create a double bond between the C4 and C5 atoms in the uronic acid ring thereby leading to the loss of the iduronic vs.
glucuronic acid information. From the polymer composition data 410, it may be inferred that there are 4 fDs, 2 7s and a A in the sequence.
Referring to FIG_ 5, a process 500. that may be performed by the sequencer 412 to sequence the sample polymer 406 is shown. The sequencer 412 receives the polymer composition data 410 from the compositional analyzer 408. The sequencer 412 uses the polymer composition data 410 and the information contained in the polymer database 402 to generate an initial candidate list 416 of all possible polymers: (1) having the same length as the sample polymer 406 and (2) having the same constituent chemical units as the sample polymer 406 (step 504).
For example, consider the sequence DD7DAD-7 mentioned above. The polymer composition data 410 indicates that the sequence includes 4113s, 2 t7s and one tA, and indicates that the length- of the sample polymer. 406 is seven. In this case, step -504 (generation of the candidate list 416)-involves generating all possible sequences having the same length as the sample polymer 406 and having 4 fDs, 2 t7s and a fA: In one ~=. _ embodiment, the sequencer 412 uses a brute force method to generate all sequences having these characteristics by generating all sequences of length seven having 4 Ds, 2 7s and a fA using standard combinatoric methods.
The sequencer 412 then uses the data from the mass line 414 to progressively eliminate sequences from the list generated in step 504 until the number of sequences in the list reaches a predetermined threshold (e.g., one). To perform such elimination, in one embodiment, the sequencer 412 calculates the value of a predetermined property of each of the polymers in the candidate list 416 (step 506). The predetermined property may, for example, be the mass of the polymer. An example method for calculating the 1o mass of a polymer will be described in more detail below. The sequencer 412 compares the calculated values of the predetermined property of the polymers in the candidate list 416 to the value of the predetermined property of the sample polymer 406 (step 508).
The sequencer 412 eliminates candidate polymers from the candidate list 416 whose predetennined property values do not match the value of the predetermined property of the sample polymer 406 within a predeterrrmined range (step 508). For example, if the predetermined property is molecular weight, the predetermined range may be 1.5D.
The sequencer 412 applies an experimental constraint to the sample polymer 406 to inodify the sample polymer 406 (step 510). An "experimental constraint" as used herein is a biochemical process performed on a polymer which results in modification to the polymer which may be detected. Experimental constraints include but are not_limited to enzymatic digestion, e.g., with an exoenzyme, an endoenzyme, a restriction endonuclease; chemical digestion; chemical modification; interaction with a binding compound; chemical peeling (i.e., removal of a monosaccharide unit); and enzymatic modification, for instance sulfation at a particular position with a heparan sulfate sulfotransferases.
The sequencer 412 measures properties of the modified sample polymer 406 (step 512). The sequencer 412 eliminates from the candidate list 416 those candidate polymers having property values that do not match the property values of the experimental results 422 (step 514).
If the size of the candidate list 416 is less than a predetermined threshold (e.g., 1) (step 516), then the sequencer 412 is done (step 518). The contents of the 'candidate list 416 at this time represent the results of the sequencing process. The candidate list 416 may contain zero or more polymers, depending upon the contents of the polymer database 402 and the value of the predetermined threshold. If the size of the candidate list 416 is not less than the predetermined threshold (step 516), steps 510-516 are repeated until the size of the candidate list 416 falls below the predetermined threshold.
When the sequencer 412 is done (step 518). the sequencer 412 may, for example, display the candidate list 416 to the user on an output device such as a computer monitor.
Referring to FIG. 6, in another embodiment, the sequencer 412 uses a genetic algorithm process 600 to generate the initial candidate list 416 and to modify the candidate list 416 in order to arrive at a final candidate polymer that identifies the sequence of the sample polymer 406. The sequencer 412 generates a population of 1o random sequences with the composition indicated by the polymer composition data 4 10 and having the same length as the sample polymer 406 (step 602). The sequencer evaluates the fitness (score) of the polymers in the candidate list 416 using a scoring function based on the enzymatic degradation of enzyme ENZ (step 604). The genetic algorithm process 600 uses the fitness values to decide which of the sequences in the candidate list 416 can survive into the nekt generation and which of the sequences in the candidate list 416 has the highest chance of producing other sequences of equal or higher fitness by cross-over and mutation. The sequencer 412 then performs cross-over and mutation operations that select for fit sequences in the candidate list 416 into the next generation (step 606). If at least a predetermined number (e.g., three) of generations of the candidate list 416 include copies of the correct sequence with the maximum fitness (step 608), then the sequencer 412 is done sequencing. Otherwise, the sequencer 412 repeats steps 604-606 until the condition of step 608 is satisfied. Cross-over and mutation operations are used by genetic algorithms to randomly sample the different regions of a search space.
In one embodiment, steps 510 and 512 are automated (e.g., carried out by a computer). For example, after the initial candidate list 416 has been generated (step 508), the sequencer 412 may divide the candidate list 416 into categories (the categories are preferably based on properties), such as hepl cleavable, hepllI cleavable, and nitrous acid cleavable (the property is enzymatic sensitivity). The sequencer 412. may then simulate the corresponding degradation or modification of the sequences present in each of the categories and search for those sequences that give fragments of unique masses.
Based on the population of sequences that can give fragments of unique masses upon degradation or modification, the sequencer 412 chooses the particular enzyme or WO 00/65521 _42PCT/US00/10990 _ chemical as the experimental constraint to eliminate candidate polymers from the candidate list 416 (step x). Although in this example only hepi, hepIll, and nitrous acid are used, other experimental constraints such as enzymes may be used including the exoenzymes and other HLGAG degrading chemicals. -In another embodiment, the sequencer 412 uses a chemical characteristic to guide the choice of experimental constraint. For example, normalized frequencies of chemical units of known polymers containing I2S, G, HNS, and HNeC may be calculated.
For example, the normalized frequency f(I2S) of chemical units containing I2S may be calculated as f(IZS) = (number of disaccharide units containing I2S) / (number of to disaccharide units). An example set of normalized frequencies calculated for known sequences in this way is shown in table 6 below.
Sequence f(IzS) f(G) f(HNS) f(HNAJ Constraints used for convergence Octa2 DDD-5 0.75 0.25 1 0 Hep I and Hep III
degradation FGF binding 1 0 1 0. Hep I normal and DDDDD exhaustive degradation ATIII binding 0.6 0.2 0.8 0.2 Hep l, Hep II and nitrous DDD4-7 acid degradation The "constraints used for convergence" column indicates constraints that have been shown empirically to achieve convergence for the corresponding known sequence.
Once compositional analysis has been performed on a sample (unknown) polymer, the relative frequencies of 12s, G, HNS, and HNA, in the sample sequence may be compared to the relative frequencies of the known sequences using the table above. To select a set of experimental constraints to apply to the.sample polymer, the relative frequencies of the sample polymer may be compared to the relative frequencies of the known sequences in the table above. A known sequence with relative frequencies that are similar to the relative frequencies of the sample polymer may then be selected, and the experimental constraints identified with the selected sequence (as shown in the table) may then be applied to the sample polymer.

,.; .

For example, Table 6 demonstrates that the presence of f(G) and f(HNAr) are important-factors in the decision to use hepIIl and nitrous acid, because nitrous acid clips after a HNs, and hepIIl clips after a disaccharide unit containing G. The disaccharide unit 12s-HNS,6S is the dominant unit in heparin-like regions (i.e., highly-sulfated regions) of the HLGAG chains. Therefore, if a sequence is more heparin-like, then hepl may be chosen as the default enzyme and the information content present in chemical units containing G and HNAc become important for choosing enzymes and chemicals other than hepl. Similarly, for low-sulfated regions on HLGAG chains, hepIll may be a default enzyme and f(I2S) and f(HNS) become important for choosing hepI and nitrous acid. Similarly, one may also calculate the positional sulfate or acetate distribution along the chain and generate the criterion for using the sulfotransferases or sulfateases for convergence.
The polymer database 402 may include information indicating that sulfation at a position of a polymer contributes 80.06D to the mass of the polymer and that substitution of a sulfate for an acetate contributes an additional 38.02D to the mass of the polymer.
Therefore, the mass M of any polymer in the polymer database 402 may be calculated using the following fonnula:
M = 379.33 + [0 80.06 80.06 80.06 38.02] * C, where C is the vector containing the binary representation of the polymer and * is a vector multiplication operator. For example, the mass of the disaccharide unit I2s-HNS.6s, having a binaryrepresentation of 01101, would be equal to 379.33 + [0 80.06 80.06 80.06 38.021 *[0 1101) = 379.33 + 0+ 80.06* I+ 80.06* I+ 80.06*0 + 38.02* 1=
577.47D.
HLGAG fragments may be degraded using enzymes such as heparin lyase enzymes or nitrous acid and they may also be modified using different enzymes that transfer sulfate groups to the positions mentioned earlier or remove the sulfate groups from those positions. The modifying enzymes are exolytic and non-processive which means that they just act once on the non reducing end and will let go of the heparin chain without sequentially modifying the rest of the chain. For each of the modifiable positions in the disaccharide unit there exits a modifying enzyme. An enzyme that adds a sulfate group is called a sulfotransferase and an enzyme that removes a sulfate group is called a sulfatase. The modifying enzymes include 2-0 sulfatase/
sulfotransferase, 3-0 sulfatase/sulfotransferase, 6-0 sulfatase/sulfotransferase and N-deacetylase-N-,.:.. . , WO 00/65521 PC'r/US00/10990 sulfotransferase. The function of these enzymes is evident from their names, for example a 2-0 sulfotransferase transfers a sulfate group to the 2-0 position of an iduronic acid (2-0 sulfated glucuronic acid is a rare occurrence in the HLGAG
chains) and a 2-0 sulfatase removes the sulfate group from the 2-0 position of an iduronic acid.
HLGAG degrading enzymes include heparinase-I, heparinase- II , heparinase-III, D-glucuronidase and L-iduronidase. The heparinases cleave at the glycosidic linkage before a uronic acid. Heparinase I clips at a glycosidic linkage before a 2-0 sulfated iduronic acid. Heparinase -III cleaves at a glycosidic linkage before an unsulfated glucuronic acid. Heparinase -II cleaves at'both Hep-I and Hep-III cleavable sites. After cleavage by the heparinases the uronic acid before which the cleavage occurs loses the information of iduronic vs. glucuronic acid because a double- bond is created between the C4 and C5 atoms of the uronic acid.
Glucuronidase and iduronidase, as their naine suggests cleave at the glycosidic linkage after a glucuronic acid and iduronic acid respectively. Nitrous acid clips randomly at glycosidic linkages after a N-sulfatcd hexosamine and converts the six membered hexosamine ring to a 5 membered anhydromannitol ring.
The above rules for the enzymes may easily be encoded into a computer as described above using binary arithmetic so that the activity of an enzyme on a sequence may be carried out using simple binary operators to give the fragments that would be formed from the 'enzymatic activity.
These techniques may be used to construct a database of polysaccharide sequences. In some aspects the invention is a database of polysaccharide sequences, as well as, motif search and- sequence alignment algorithms for obtaining valuable information about the nature of polysaccharide-protein interactions that are vital for the biological functioning of these molecules. The sequence information in the database of polysaccharide sequences may also be used to provide valuable insight into sequence-structure relationships of these molecules.
In addition to the use of the methods of the invention for sequencing polymers, the methods may be used for any purpose in which it is desirable to identify structural properties related to a polymer. For instance the methods of the invention may be used for analysis of low molecular weight heparin. By limited digestion of LMWH and analysis by CE and MALDI-MS, we may obtain an "digest spectrum" of various,=
preparations of LMWH, thus deriving information about the composition and variations WO 00/65521 PCT/[JS00/10990 thereof. Such information is of value in terms of quality control for LMWH
preparations.
The methods are also useful for understanding the role of HLGAGs in fundamental biological processes. Already MS has been used to look at the presence of various proteins as a function of time in Drosophila development. In a similar fashion HLGAG expression can be as a function both of position and of time in Drosophila development. Similarly the methods may be used as a diagnostic tool for human diseases. There is a group of human diseases called mucopolysaccharidosis (MPS). The molecular basis for these diseases is mostly in the degradation.pathway for HLGAGs.
to For instance, mucopolysaccharidosis type I involves a defect in iduronidase, which clips unsulfated iduronate residues from HLGAG chains. Similarly, persons suffering from mucopolysaccharidosis type II (MPS II) lack iduronate-2-sulfatase. In each of these disorders, marked changes in the composition and sequence of cell surface HLGAGs occurs. Our methodology could be used as a diagnostic for these disorders to identify which MPS syndrome a patient is suffering from.
Additionally the methods of the invention are useful for mapping protein binding HLGAG sequences. Analogous to fingerprinting DNA, the MALDI-MS sequencing approach may be used to specifically map HLGAG sequences that bind to~selected proteins. This is achieved by sequencing the HLGAG chain in the presence of a target protein-as well as in the absence of the particular protein. In this manner, sequences protected from digestion are indicative of sequences that bind with high affinity to the target protein.
The methods of the invention may be used to analyze branched or unbranched polymers. Analysis of branched polymers is more difficult than analysis of unbranched polymers because branched carbohydrates, are "information dense" molecules.
Branched polysaccharides include a few building blocks that can be combined in several -different ways, thereby, coding for many sequences. For instance, a trisaccharide, in theory, can give rise to over 6 million different sequences. The methods for analyzing branched polysaccharides, in particular,, are advanced by the, creation of an-efficient 3o nomenclature that is amenable to computational manipulation. Thus, an efficient nomenclature for branched sugars that is amenable- to computational manipulation has been developed according to the.invention. Two types of numerical schemes.
that may encode the sequence information of these polysaccharides has been developed in order to = WO 00/65521 PCT/USao/10990 bridge the widely used graphic (pictorial) representation and the proposed numerical scheme discussed below.
a. Byte-based (Binary-scheme) notation scheme: The first notation scheme is based on a binary numerical system. The binary representation in conjunction with a tree-traversing algorithm is used to represent all the possible combinations of the branched polysaccharides. The nodes (branch points) are easily amenable to computational searching through tree-traversing algorithms (Figure 7A). Figure shows a notation scheme for branched sugars. Each monosaccharide unit can be represented as a node (N) in a tree. The building blocks can be defined as either (A), (B), or (C) where N1, N2, N3, and N4 are individual monosaccharides. Each of these combinations can be coded numerically to represent building blocks of information. By.
defining glycosylation patterns in this way, there are several tree traversal and searching algorithms in computer se.ience that may be applied to solve this problem.
A simpler version of this notational scheme is shown in Figure 7B. This simplified version may be extended to include all other possible modifications including unusual structures. For examples, an N-linked glycosylation in vertebrates contains a core region (the tri-mannosyl chitobiose moiety), and up to four branched chains from the core. In addition to the branched chains the notation scheme also includes other modification (such as addition of fucose to the core, or'fucosylation of the G1cNac in the branches or sialic acid on the branches). Thus, the superfamily of N-linked polysaccharides can be broadly represented by three modular units: a) core region: regular, fucosylated andlor bisected with a GlcNac., b) number of branches: up to four branched chains, each with G1cNac, Gal and Neu., and c) modifications of the branch sugars. These modular units may be systematically combined to generate all - possible combinations of the polysaccharide. Representation of the branches and the sequences within the branches can be performed as a n-bit binary eode (0 and 1) where n is the number of monosaccharides in the branch. Figure 7C depicts a binary code containing the entire.
information regarding the branch. Since there are up to four branches possible, each branch cari be represented by a 3-bit binary code, giving a total of 12 binary bits. The first bit represents the presence (binary 1) or absence (binary 0) of the G1cNac residue adjoining the mannose. The second and the third bit similarly represent the presence or absence of the Gal and the -Neu residues in the branch. Hence a complete chain containing G1cNac-Gal-Neu is represented as binary (111) which is equivalent to decimal 7. Four of the branches can then be represented by a 4 bit decimal code, the IS`
bit of the decimal code for the first branch and the 2"d, the second branch etc (right).
This simple binary code does not contain the information regarding the linkage (a vs. p and the 1-6 or 1-3 etc.) to the core. This type of notation scheme, however, may be easily expanded to include additional bits for branch modification. For instance, the -presence of a 2-6 branched neuraminic acid to the G1eNae in the branch can be encoded by a binary bit..
b. Prime Decimal Notation Scheme: Similar to the binary notation described above, a second computationally friendly numerical system, which involves the use of a prime number scheme, has been developed. The algebra of prime numbers is extensively used in areas of encoding, cryptography and computational data manipulations.
The scheme is based on the theorem that for small numbers, there exists a uniquely-definable set of prime divisors. In this way, composition information may be rapidly and accurately analyzed.
This scheme is illustrated by the following example: The prime numbers 2, 3, 5, 7, 11, 13, 17, 19, and 23 are assigned to nine common building blocks of polysaccharides. The composition of a poiysaccharide chain may, then be represented as .
the product of the prime decimals that represent each of the building blocks.
For illustration, GlcNac is assigned the number 3 and mannose the number 2.- The core is represented in this scheme as 2x2x2x3x3 =72 (3 mannose and 2, GlcNacs). This notation, therefore, relies on the mathematical principle that 72 can be ONLY
expressed as the combination of three 2s and two 3s. The prime divisors are therefore unique and can encode the composition information. This becomes a problem when one gets to very large numbers but not an issue for the size of numbers we encounter in this analysis;
From this number the mass of the polysaccharide chain can be determined.
The power of the computational.approaches of the notional scheme may be used to systematically develop an exhaustive list of all possible combinations of the polysaecharide sequences. For instance, an unconstrained combinatorial list of possible sequences of size m", where m is the number of building blocks and n is the number of positions in the chain may be used. In Figure 7C, there are 256 different saccharide combinations that are tlieoretically possible (4 combinations for each branch and 4 branches = 4 ).

A mass line of the 256 different polysaccharide structures may be plotted.
Then the rules of biosynthetic pathways may be used to further analyze the polysaccharide. In the example (shown in Figure 7B), it is known= that the first step of the biosynthetic pathway is the addition of GIcNac at the 1-3 linked chain (branch 1). Thus, branch I 5 should be present for any of the other branches to exist. Based on this rule the 256 possible combinations may be reduced using a factorial approach to conclude that the branch 2, 3, and 4 exist if and only if branch one is non-zero. Similar constraints can be incorporated at the notation level before generation of the master list of ensembles. With the notation scheme in place, experimental data can be generated (such as MALDI-MS
1 o or CE or chromatography) and those sequences that do not satisfy this data can be eliminated. An iterative procedure therefore enables a rapid convergence to a solution.
To identify branching patterns, a-combination of MALDI-MS and CE (or other techniques) may bc used, as shown in the Examples. Elimination of the pendant arms of the branched polysaccharide may be achieved by the judiciotis use of exo and.
i5 endoenzymes. All antennary groups may be removed, retaining only the GIcNAc moieties extending from the mannose core and forming an "extended" core. In this way, infoimation about branching is retained, but separation and identification of glycoforms is made simpler. One methodology that could be employed to form extended cores for most polysaccharide structures is the following. Addition of sialidases, and fucosidases 20 will remove capping and branching groups from the arms. Then application of endo-0--galactosidase will cleave the arms to the extended core. For more unusual structures, other exoglycosidases are available, for instance xylases and glucosidases. By addition of a cocktail of degradation enzymes, any polysaccharide motif may be reduced to its corresponding "extended" core. Identification of "extended" core structures will be 25 made by mass spectral analysis. There are unique mass signatures associated with an extended core motif depending on the number of pendanf arms (Figure 7D).
Tigure 7D
shows a massline of the "extended" core motifs generated upon exhaustive digest of glycan structures by the enzyme cocktail. Shown are the expected masses of mono-, di-, tri- and tetrantennary structures both with and without a fucose linked a1-+6 to the core 30 G1cNAc moiety (from left to right)_ All of the "extended" core structures have a unique mass signature that is easily resolved by MALDI MS (from left to right).
Quantification of the various glycan cores present may be completed by capillary electrophoresis, which has proven to be a highly rapid. and sensitive means for quantifying polysaccharide structures. [Kakehi, K. and S. Honda, Arialysis of.glycoproteins, glycopeptides and glycoprotei'n-derived polysaccharides by high-performance capillary electrophoresis. J
Chromatogr A, 1996. 720(1-2): p. 377-93.]

s Examples Example 1: Identification of the number of fragments versus the fragment mass for Di, Tetra, and Hexasaccharide.
The masses of all the possible disaccharide, tetrasaccharide and hexasaccharide fragments were calculated and are shown in the mass line shown in Figure 8.
The X axis shows the different possible masses of the. di, tetra and hexasaccharides and the Y axis shows the number of fragments that having that particular mass. Although there is a considerable overlap between the tetra and hexasaccharide the minimum difference in their masses is 13.03D. Note that the Y axis has been broken to omit values between 17 15- and 40, to show all the bars clearly.

Example 2: Sequencing of an octasaccharide of HLGAG.
Using hepl, hepII, hepIII, nitrous acid, and exoenzymes, such as 2-sulfatase and oc-iduronidase, (3-glucuronidase, n-deacetylase as experimental constraints and the 2o computer algorithm described above, an octasaccharide (02), two decasaccharide (FGF
binding and ATIII binding) and a hexasaccharide sequence of HLGAG were sequenced.
1. Compositional Analysis of 02:

Compositional analysis of 02 was completed by exhaustive digest of a 30 M
sample with heparinases I-III and analysis by capillary electrophoresis (CE).
Briefly, to 25 10 UL of polysaccharide was added 200 nM of heparinases I-III in sodium phosphate buffer pH 7Ø The reaction was allowed to proceed at 30 C overnight. For CE
analysis the sample was brought to 25 pL. Naphthalene trisulfonic acid (2 pM) was iun as an internal standard. Assignments of AU2s-HNS,6s and oU-HNS,6s were made on the basis that they comigrated with known standards. The internal standard migrated between 4 30 and 6mins, the trisulfated disaccharide L1U2s-HNs,6s migrated between 6 and 8 mins and the disulfated disaccharide AU-HNS,6S migrated between 8 and 10 mins..
Integration of the peaks indicated that the relative amounts of the two saccharides was 3:1.

The CE data for 02 octasaccharide demonstrated that there is a major peak corresponding to the commonly occurring trisulfated disaccharide (AU2s-HNS,6s) and a small peak that corresponds to a disulfated disaccharide (DU-HNS,6s). The relative abundance of these disaccharide units obtained from the CE data shows that there are 3 Ds ( ) and a 5(f). The number of possible combination of sequences having these disaccharide units is 32. The possible combinations are shown in Table 7 below.
Possible sequences:

(i) J. Heparinase I digest Seq Fragments formed (577) (577) (I074) DDDS D D DS (ii) Heparinase III digest (iii) DDD- 5 D D D- 5 Seq Fragments formed DD 5 D ~ -D D - -D S -- (1732) Table 7 2. Digestion of 02 with heparinase I:
Digestion of 02 was completed using both a short procedure and an exhaustive digest. "Short" digestion was defined as using 100 nM. of heparinase I and a digestion time of 10 minutes. "Exhaustive" digestion was defined as overnight digestion with 200 nM enzyme. All digests were completed at room temperature. In the case of 02, both digest conditions yield the same results. Short digestion with heparinase I
yields a pentasulfated tetrasaccharide (no acetyl groups) of m/z 5300.1 (1074.6) and a disaccharide of m/z 4802.6 (577.1) corresponding to a trisulfated disaccharide. This profile did not change upon exhaustive digest of 02.
Upon treatment with heparinase I, 02 is clipped to form fragments with m/z 4802.6 and 5,300.1. From the masses of these fragments it was possible to uniquely determine that m/z of 4802.6 corresponded to a trisulfated disaccharide and m/z of 5300.1 corresponded to a pentasulfated tetrasaccharide. Since the disaccharide composition 'of the sequence was known the only trisulfated disaccharide that may be formed is f D and the possible pentasulfated tetrasaccharides that may be formed are ~
5D, 5-D, D5 and D-5. After identification of the fragments, the next step was to arrange them to give the right sequence. Since this was a cumbersome job to be handled manually a computer simulation was used to progressively eliminate sequences from the master list that did not fit the experimental data. Using the rule that heparinase-I cleaves before and I2S the heparinase-I digestion was simulated on the computer to generate the fragments for all the 32 sequences in the master list. From the list of fragments formed for each sequence, the computer was used to search for fragments that corresponded to the di and tetrasaccharide observed from the mass spectrometry data. The sequences that gave the fragments that fit the mass spec data of hep I are shown in Fig 8A:
It may be observed from Fig 8A that all the sequences have 3 Ds which is consistent with the known rules for hepl digestion used to produce these fragments. It may also be observed that two arrangements give the same product profile namely having the +/- 5(I-.HNAc,6s or G-HNs,6s) the reducing end and having +/- 5 at the second position from the non-reducing end. To resolve this issue a second experimental constraint, digestion with hepIIl, was used.
Table 7 provides a list of sequences that satisfy the product profiles of hepl and 2o heplII- digests of the octasaccharide 02. (a) shows the sequences that gave the di and tetrasaccharide fragments as observed from the mass spectrometry data. The fragments listed below along with their masses are those generated by computer simulation of hepl digest. (b) sequences in (a) that give the hexasaccharide fragment observed in the mass spectrometrydata after heplIl digestion. The fragments along with their masses were generated by computer simulation of hepIlI digestion.
3. Digestion of 02 with heparinase IIL=
IIigestion of 02 with heparinase III yielded a nonasulfated hexasaccharide of m!z 5958.7 (1731.9) and an unobserved disulfated disaccharide (to conserve sulfates). Both short and exhaustive digests yielded the same profile.
Heparinase III treatment of 02 resulted in a major fragment of m/z 5958.7 which was uniquely identified as a hexasaccharide with 9 sulfate groups. The only sequence that satisfied the product profile of hepIIl digestion was f DDD-5 which is shown in =.

Table 7. Table 7 shows that there should be a-5 (G-HNA,.6s) in the reducing end. This was consistent with the rule used for hepIII digestion, i.e. hepIIl clips before a G. The masses shown in the table are integers. The masses used to search for the required fragments were accurate to two decimal places. _ Thus it was possible to demonstrate the ability to converge to the final sequence starting from the list of all possible sequences by eliminating sequences that do not fit experimental data. Since the starting point was a list of all the possible sequences given the composition of a sequence it was not possible that any sequences were missed during the analysis.

Example 3: Sequencing of a basic fibroblast growth factor (FGF-2) binding saccharide., MALDI-MS of a basic fibroblast growth factor (FGF-2) binding saccharide was perfornied to determine the mass and size of the saccharide as a complex with FGF-2 (G.
Venkataraman et al., PNAS. 96, 1892, (1999).). Dimers of FGF-2 bound to the saccharide (S) yielding a species with a m/i of 37,009. By subtraction of FGF-molecular weight, the molecular mass of the saccharide was determined to be 2808, corresponding to a decasaccharide with 14 sulfates and an anhydromannitol at the reducing end.
1. Compositional Analysis:
.Compositional analysis and CE of FGF-2 binding saccharide were completed as described above. Compositional analysis of this sample resulted in two peaks corresponding to D (AU2sHNs,6s) and D' (AU2SMan6S) in the ratio 3:1. As this decasaccharide was derived by nitrous acid degradation of heparin, the uronic acid at the non-reducing end was not observed by CE (232 nm). Therefore, the non-reducing end residue was identified as +D (I2sHNS,6s) by sequencing with exoenzymes. The number of possible sequences with this composition is 16 Table 8(i). Of the 16 sequences, those that could result in the observed fragments upon heparinase I digestion of the decasaccharide are shown in Table 8(ii).

:r.

~0 (IJI~ ~ Fragments formed and their mass (i) IMoa-A D D4 Cco 47t :tD47i I o~a~- fn ~.4 ~ *471 ~
'~ ~o ffloo4-71 40 AD4 IDo 14~7t ID4-71 0 7 Doa4-7 tD tDa :LCoo ta--A ID4.71 t7oo4-cx ~/ D Da 7M ta-ot AA-D
:GM44a 17/1D JDa 7Do f4-0c AD4-0 2. Digestion wilh heparinase I uncl hcrlmrinuse 111:
To resolve the isomeric state of the internal uronic acid +D vs. -D, exhaustive digestion of the saccharide with heparinase I and heparinase III was performed.
Heparinase I exhaustive digestion of the saccharide results in only two species corresponding to a trisulfated disaccharide( D) and its anhydromannitol derivative.

while lieparinase 111 did not cleavc the decasaceliaridc at all.
Heparinase I digestion of the decasaccharidc yielded a pcntasulfated tetrasaccharide (ni/z 5286.3) with an anliydromannitol at the reducing end and a trisullatcd disaccharide ol'nn/z 4304Ø 'I'able 8 shows tlic cemvcrgencc u!=thc FC;1-binding decasaccharide sequence. Thus, it provides a list of sequences that satisfied the mass spectrometry product profiles of FGF-2 binding saccharide on treatment with hepi.
Section (i) of Table 8 shows the master list of 16 sequences derived from compositional analysis and exoenzyme sequencing of the non-reducing end. The disaccharide unit at the non-reducing end was assigned to be a +D using exoenzynies and the anhydromannitol group at the reducing end is shown as `. The mass of the fragments resulting from digestion of decasaccharide witli heparinase I are shown in (ii). Also shown in (ii) are those sequences froni-(i) that satisfy heparinase Idigestion data.
Section (iii) of Table 8 shows the sequence.of decasaccharide frorn (ii) that satisfies the data from exhaustive digestion using heparinase I. This product profile tnay be obtained only if there is a hepI cleavable site at every position in the decasaccharide which led us to converge to the final sequence DDDDD' shown in section iii of Table 14. The above taken together confirm the sequence-of the FGF-2 binding decasaccharide sequence to be.
DDDDD' [(IisHNS.vs)4I2sManc,s].

i ~' Example 4: Sequencing of an AT-III binding saccharidc.
An AT-III binding saccharide was used as an example of the determination of a complex sequence.
1. ComposiJional Analuc,is:

Compositional analysis and CE. were completed as described above.
Compositional analysis of an AT-111 binding saccharide indicated the presence of three building blocks, corresponding to DU-ItiHN,,6ti ( D). AUHNA,;,6ti ( 4) and AUHNS,;%,(,s ( 7) in the relative ratio of 3:1:1 respectively. The shortest polysaccharide that may be formed with this composition corresponds to a decasaccharide, consistent with the lo MALDI-MS data. The total number of possible combinations of this tridecasulfated single acetylateddecasaccharide sequences with the above disaccharide building blocks is 320 Table 9.

(i i~
Fragments fom~ad ~~M 9. +aO= L-54mce 2 t D00aD 1Q+f~0~r1' (5~ (Sm ~
3. +CODOD 11: +Dt'JO~CO 3f~ 3fl 3D 3~0- 4~ODO-D 12+0DDDQ 3D t~ ~ tl}D
S +{7E3Gfl0 13 4DD1JDf7 Ui 6& ;Dr1CD0 146+DD-ODtY Pence ts fom+ed 7. 4CDDW 156*DDD CU
g+ODCD-O 16. +DDD6E7 4()D000 3D

2. Digestion wFth heparinasc- I:
Digestion of this decasaccharide with heparinase I resulted in four fragments.
The major fragments include a decasulfated singly-acetylated octasaccharide (ni/_ 6419.7), a heptasulfated, singly acetylated hexasaccharide with nrli 5842.1, a hexasulfated tetrasaccharide with mlz of 5383.1 and a trisulfated disaccharide (nrlz 4805.3). Also present is a contaminant (*), a pentasulfated tetrasaccharide.
The sequenee of AT-IIl binding decasaccharide has been reported to be D4-7DD, on the basis of NMR
spectroscopy (Y.Toida et al.; J. Biol. Chem. 271. 32040 (1996)). Such a sequence 4.~ .
WO 00/65521 PCT/US00t10990 should show the appearance of a tagged D or DD residue at the reducing end.
However, we have found all the different experiments used in the elucidation of the decasaccharide sequence to be consistent with each other in the appearance of a 4-7 tagged product and not a D (or a DD) product. Surprisingly, this saccharide did not contain an intact AT-III
binding site, as proposed. Therefore, confirmation of the proposed sequence was sought through the use of integral glycan sequencing (IGS) methodology. The result of IGS
agreed with our analysis. A minor contaminant saccharide has also been found.
Of the 320 possible sequences, only 52 sequences satisfied heparinase I digestion data Table 15(i). The mass spectrum of the exhaustive digestion of the decasaccharide with lo heparinase I showed m/z values that corresponded to a trisulfated disaccharide and a octasulfated hexasaccharide, thereby further reducing the list of 52 sequences to 28 sequences Table 9(ii).
3. Digestion with heparinase II:
To fiuther converge on the sequence, a`mass-tag' was used at the reducing end of the saccharide (A m/z of 56.1 shown as `t'). This enabled the identification of the saccharide sequence close to and at the reducing end. Typical yields for the mass-tag labeling varied between 80-90% as determined by CE. Treatment of the semicarbazide tagged decasaccharide, with heparinase.Il resulted in the following products:
m/z 5958.4 (nine sulfated hexasaccharide), n7/z 5897.7 (tagged heptasulfated, singly acetylated hexasaccharide), mlz 5380.1 (hexasulfated tetrasaccharide), m!z 5320.9 (tagged tetrasaulfated tetrasaccharide), m/z 5264.6 (tetrasulfated tetrasaccharide) and;m/z 4805.0 (a trisulfated disaccharide). The m1z value of 5320.9 and 5897.7 corresponded to a tagged tetrasulfated tetrasaccharide and a tagged heptasulfated hexasaccharide, both containing the N-acetyl glucosamine residue. This result indicated that +/-4(IlGHNAe,6s) is present at the reducing or one unit from the reducing end, thereby limiting the number of possible sequences from 28 to 6 Table-9 (iii).
4. Digestion with nitrous acid.=
Partial nitrous acid digestion of the tagged as well as the untagged decasaccharide provided no additional constraints but confirmed the heparinase II data.
Exhaustive 3o nitrous acid digestion, however, gave only the reducing end tetrasaccharide (with and without the tag) as an unclipped product. Exhaustive nitrous acid treatment of decasaccharide essentially gives one tetrasulfated single-acetylated anhydromannitol tetrasaccharide species (one tagged n>lz 5241.5 and one untagged )Wz 5186.5).
This = WO 00/65521 PCTIUSOO/10990 confirmed that +1-4 (I/GHNnC,6s) is one unit away from the reducing end.
Sequential use of exoenzymes uniquely resolved the isomeric state of the uronic acid as +4 and the reducing end disaccharide to be -7 consistent with 4-7 being the key AT-11I
binding motif. Treatment of this tetrasaccharide with iduronidase (and not glucuronidase) resulted in a species of ni/z 5007.8 corresponding to the removal of iduronate residue.
Further treatment with exoenzymes only in the following order (glucosamine 6-0 sulfatase, hexosamidase and glucuronidase) resulted in the complete digestion of the trisaccharide. Table 9 shows the convergence of the AT-I1I binding decasaccharide sequence from 320 possible sequences to 52 to 28 to 6 to the final sequence.
Thus, the lo sequence of the AT-III binding decasaccharide was deduced as DDD4-7 (AU2SHNS.6SI2SHNS,6SI2SHNS,6SI HNAc,6sG HNS.3s,6s) =

Example 5: Sequencing of a Hexasacchnridel of HLGAG.
pM H1 was treated with 2mM nitrous acid in 20 mM HCl at room teniperature is for 20 minutes such that limited degradation occurred. After 20 minutes, a two-fold molar excess of (arg-gly)19arg in saturated matrix solution was added. 1 pmol of saccharide was spotted and used for mass spectrometric study. All saccharides were detected as non-covalent complexes with (ard gly)14arg. Starting hexasaccharide was observed as was a tetrasaccharide and disaccharide. Also observed is uncomplexed peptide: Hereafter two mlz values are reported. The first is the observed mlz value that corresponds to the saccharide + peptide. The second number in parentheses is the m1z of the saccharide alone obtained by subtracting the mass of the peptide.
After 20 minutes, nitrous acid treatment of H1 yielded starting material at m/z 5882.5 (1655.8) which corresponded to a.hexasaccharide with 8 sulfates and an anhydromannitol at the reducing end, a m/z 5304.1 (1077.3), which corresponded to a tetrasaccharide with the anhydromannitol at the reducing end and a m/z of 4726.2 (499.4) which corresponded to a disulfated disaccharide with the anhydromanitol at the reducing end.
This sample was then subjected to exoenzyme analysis. Three exoenzymes were added - iduronate 2-0 sulfatase. iduronidase, and glucosamine 6-0 sulfatase.
The nitrous acid sample was neutralized via addition of 1/5 volume of 200 mlyl sodium acetate I mg/mL BSA pH 6.0 after which the enzymes were added. Glucosamine 6-0 sulfatase was added after digestion with the first two enzymes was complete.
Final = _.
WO 00/65521 PCTlUS00/10990 enzyme concentrations were in the range of 20-40 milliunits/mL and digestion was carried out at 37 C for a minimum of two hours.
Upon incubation with iduronate 2-0 sulfatase and iduronidase, the hexasaccharide and tetrasaccharide peaks were reduced in mass. The disaccharide was no longer detectable after incubation with the enzymes. The hexasaccharide gave a new species at m/z 5627.3 (1398.8) corresponding to loss of sulfate and iduronate.
The tetrasaccharide yielded a species of m/z 5049.3 (820.8) again corresponding to loss of sulfate at the 2-0 position and loss of iduronate. These data showed that all the disaccharide building blocks contained an 12S.
Addition of glucosamine 6-0 sulfatase and incubation overnight at 37 C
resulted in the production of two new species. One at m/z 5546.8 (1318.3) resulting from loss of sulfate at the 6 position on glucosamine and the other at m/z 5224.7 (996.2), again corresponding to a tetrasaccharide 6-0 sulfate. These data showed that except for the reducing end anhydromanitol containing -disaccharide unit the other units contained HNS. The data indicated that the sequence is DDD', indicating that this sequence was originally derived from nitrous acid degradation unlike the other sequences which were derived from degradation by the heparinases.

Example 6: Sequencing of other complex polysaccharides The sequencing approach may be readily extended to other complex polysaccharides by developing appropriate experimental constraints. For example, the dermatan/chondroitin mucopolysaccharides (DCMP) consisting of a disaccharide repeat unit is amenable to a hexadecimal coding system and MALDI-MS. Similar to what is observed for HLGAGs, there is unique signature associated with length and composition to a given mass in DCMP. For instance, the minimum difference between any disaccharide and any tetrasaccharide is 139.2 Da, therefore, the length, the'number of sulfates and acetates may be readily assigned for a given DCM polysaccharide up to an octa-decasaccharide. Similarly, in the case of polysialic acids (PSA), present mostly as homopolymers of 5-N-acetylneuraminic acid (NAN) or 5-N-glycolylneuraminic acid .30 (NGN), the hexadecimal coding system may be easily extended to NAN/NGN to encode the variations in the functional groups and enabling a sequencing approach for PSA.

= WO 00/65521 PCT/US00/10990 1. Dermatan/chondroitin family of complex mucopolysaccharides DCMP are found in dense connective tissues such as bone and cartilage. The basic repeat unit of the dermatan/chondroitin mucopolysaccharides (DCMP) may be represented as -(o 144) U2X-((XJ(3 143) GaINAC.4x. 6x-, where U is uronic acid, GaINAc is a N-acetylated galactosamine. The uronic acid may be glucuronic acid (G) or iduronic acid (I) and sulfated at the 2-0 position and the galactosamine (GaINAc) may be sulfated in the 4-0 or the 6-0 position, thereby resulting in 16 possible combinations or building blocks for DCMP. Like the heparinases that degrade HLGAGs, there are distinct chondoroitinases and other chemical methods available that clip at specific glycosidic linkages of DCMP and serve as experimental constraints. Furthermore, since DCMPs are acidic polysaccharides, the MALDI-MS techniques and methods used for HLGAGs may be readily extended to the DCMPs.
PEN scheme and mass-identity relationships for DCMP: Shown in Table 10 are the property-encoded nomenclature (PEN) of the 16 possible building blocks of dermatan/chondroitin family of molecules. The sequencing approach enables one to establish important mass-identity relationships as well as master list of all possible DCMP sequences from disaccharides to dodecasaccharides. These are plotted as a mass line as shown in Figure 8. As observed for HLGAGs, there is a unique signature associated with length and composition for a given mass. As described above the 2o minimum difference between any disaccharide and any tetrasaccharide was found to be 101 Daltons for HLGAGs. Interestingly, iri the case of DCMP the minimum difference between any disaccharide and any tetrasaccharide is 139.2 Da. Therefore, the length, the number of sulfates and acetates may be readily assigned for a given DCM
polysaccharide up to an octa-decasaccharide.
I/G 2X 6X 4X ALPH DISACC MASS (AU) CODE
0 0 0. 0 0 I-Ga1õõc 379.33 0 0 0 1 1 I-Ga1MAI,4s 459.39 0 0 1 0 2 I-Ga1NU,6s 459.39 0 0 1 1 3 I=Ga1NAc, 4s, 6s 539.45 0 1 0 0 4 12S-Ga114Ac 459.39 0 1 0 1 5 I2S-GalõM, 4 539.45 0 1 1 0 6 I2S-Ga1wz, 65 539.45 0 1 1 1 7 I25-Ga lõAC, 4s. as 619.51 . ,.

1 0 0 0 -0 G-Ga1õA, 379.33 1 0 0 1 -1 G-Ga1NAC4.s 459.39 1 0 1 0 -2 G-Ga1mAc66S 459.39 1 0 1 1 -3 G-Ga1pAC=dS,6S 539.45 1 1 0 0 -4 G,5-Ga1NAc 459.39 1 1 0 1 -5 G2S-GaINAC. 4S 539.45 1 1 1 0 -6 G25-Ga1NA,66s 539.45 1 1 1 1 -7 G2s-Ga1,Ac,4s.es 619.51 Table 10 shows the Property Encoding Numerical scheme used to code DCMPs.
The first column codes for the isomeric state of the uronic acid (0 corresponding to iduronic and l corresponding to glucuronic). The second column codes for the =
substitution at the 2-0 position of the uronic acid (0-unsulfated,l-sulfated) . Columns 3 and 4 code for the substitution at the 4 and 6 position of the galactosamine.
Column 5 shows the numeric code for the disaccharide unit, column 6 shows the disaccharide unit and column 7 shows the theoretical mass calculated for the disaccharide unit.
Tools as experimental constraints: Similar to the heparinases that degrade HLGAGs there are chondroitinases that degrade chondroitin-like and dermatan-like regions of DCMP. The chondroitinases B, C, AC and ABC have distinct specificities with some overlap. For the most part the chondroitinases cover the entire range of linkages.found in DCMP. There are several chondroitinases that have been isolated and cloned from different sources. In addition to the enzymes, there are a few well-established chemical methods that may be used to investigate DCMP. These include nitrous acid treatment. Thus there are adequate tools (enzymatic and chemical) which function as `experimental constraints' to enable DCMP sequencing. Below we use two DCMP sequences to illustrate sequencing DCMP.
A. Serpin HCF-2 binding DCMP hexasaccfiaride):
The minimum size DCMP binding to serpin HCF-2 was isolated and its composition was determined using elaborate methods which included anion exchange chromatography, paper electrophoresis and paper chromatography. The sequencing strategy through the integration of PEN and MS established the identity of this serpin -HCF-2 binding saccharide to be a hexasaccharide with 6 sulfates and 3 acetates. The high degree of sulfation pointed to a dermatan-like saccharide. Since this saccharide was derived using partial N-deacetylation and nitrous acid treatment, it-comprises a 5 ;==

-membered anhydrotalitol ring at the reducing end. Composition analysis of the saccharide may be obtained by degradation using the chondroitinases_ The composition shows the presence of L1U2SGa1NA,4s ( 5) and DU2saTal4S (aTal - anhydrotalitol - 5') in a 2:1 ratio. This enabled the generation of a master list with 8 possible sequences as shown in Table I 1 a. 2-sulfatase and iduronidase treatment of the hexasaccharide produced a shift in the mass spectrum corresponding to the loss of a sulfate and iduronate, thereby fixing the. IZs at non-reducing end (Table I Ib). In order to converge further, Chondroitinase B (which acts on iduronate residues in dermatan-like regions) was used and a single peak in the mass spectrum corresponding to a 2-sulfated lo disaccharide was observed. This led us to converge to the sequence +555' (IZS-GaIxA,,as-IZS-Ga1NA,:4s-I2s-aTa14s)-+555' 1-5 +55-5' +5-55' +5-5=5' +555' Sequence Fragments -555' 2-sulfatase +55-5' Chondroitinase B formed -55-51 +5-551 +555' +5 5 5' -5-55' Iduronidase +5-5-51 20 -5-5-5' (a) (b) B. Hypothetical:
In this example a "hypothetical DCMP polysaccharide" which is more complex than the previous example is used. Assume that MS yields a result that is interpreted to be an octasaccharide with 8 sulfates and 4 acetates, and that the composition analysis points to three species corresponding to AU2SGaINA,,4s ( 5), AUGalNAc,6s ( 2) and DU2sGa.lNA,,,4S,6s ( 7) in 2:1:1 relative abundance. This enables one to generate a master-list, which would point to 96 possible sequences (Table 12a). It is expected that the digestion of the saccharide sample with chondroitinase AC would result in two products with masses that would correspond to two tetrasulfated tetrasaccharide units and thereby reduce the master list to 4 possible sequences (Table 12b). Complete deamination using hydrazonolysis and nitrous acid treatment would result in 3 peaks, two corresponding to a disulfated disaccharide and the third corresponding to a trisulfated disaccharide.

Treatment of the degraded products with 2-sulfatase and iduronidase (and not glucuronidase) should result in peaks that correspond to the loss of sulfate and iduronate residues. This would enable the identification of the isomeric state of 5 and 7 thereby converging the master-list to one sequence 55-27 (AU2S-Ga1NA,.4s-I2s-GalNAc,as-G-Ga1NAc,6s-I2s-GalNnc,4s.6s)-Master list of 96 sequences (a) Chon roitinase AC

Sequence Fragments Complete deamination 55-27 55 +27 nitrous acid treatment Sequence Fragments t55-2-7 i55 t2-7 10 5' +5' -2' +7' t5-5-27 5-5 t27 2-sulfatase, iduronidase 55-27 5-5-2-7 5-5 2-7 (b) It is important to reiterate that, similar to what was developed for HLGAG, distinct or additional `convergence strategies or experimental constraints' may be used to arrive at the `unique' solution for DCMP.
Z. Polysialic Acid Polysialic acids are linear complex polysaccharides found as a highly regulated post-translational modification of the neural cell adhesion molecule in mammals that are present mostly as homopolymers of 5-N-acetylneuraminic acid (NAN) or 5-N-glycolylneuraminic acid (NGN). The monomeric units of NAN and NGN are linked by a-2-8 glycosidic linkages, and may be modified at the 4-0, 7-0, and 9-0 positions. The major modification is acetylation. In addition, much rarer modifications including sulfation and lactonization occur. at the 9-0 position. A deaminated form of neurarriinic acid namely 5-deamino-3.5-dideoxyneuraminic acid (KDN) has also been discovered.
The PEN-MS sequencing approach is extended to polysialic acids, and using NAN
and NGN units we illustrate how this is achieved.

WO 00/65521 PCTiUS00/10990 PEN scheme and mass-identity relationships for PSA: PSA is comprised of two different monomeric repeats, with variations in the modification of each unit.
The flexibility of the PEN enables easy adaptation to a monomeric repeat unit for PSA from the dimeric repeats for HLGAG and DCMP. The PEN scheme for PSA is shown in Table 13. The sequencing approach establishes important mass-identity relationships as well as master list of all the combinations of monomeric units for NAN and NGN. The mass-line for polymeric units of NAN and NGN are shown in Fig. 9A and 9B. Note that there is a considerable overlap in masses observed for the higher order oligomers of both NAN and NGN (Figure 9A and 9B). The minimum difference in the masses between a n to 'mer and a n+1 'mer stabilizes at 3.01Da for NAN and 13Da for NGN, as we go to tetra, penta and hexasaccharide, thereby providing a safe margin for detection of these fragments using MS.

NAN! 9X 7X 4X Code Saccharide unit Mass NGN
0 0 0 0 0 NAN 309.28 0 0 0 1 1 NAN4AC 351.32 0 0 1 0 2 NANIA, 351.32 0 0 1 1 3 NAN4AtõAc 393.36 0 1 0 0 4 NAN9A. 351.32 0 1 0 1 5 N^,u,9M 393.36 0 1 1 0 6 NAN7rC,9M 393.36 0 1 1 1 7 NAN4At,7Ac,yAe 435.40 1 0 0 0 -0 NGN 325.27 1 0 0 1 -1 NGN.Ac 367.32 1 0 1 0' ' -2 NGN7,k 367.32 1 0 1 1 -3 NGN4A,,,7,k 409.36 1 1 0 0 -0 NGN9Ac 367.32 1 1 0 1 -5 NGN4A,9M 409.36 1 1 1 0 -6 NGN7..,9Ac 409.36 1 1 1 1 -7 NGN4AS7Ac.94k 451.40 Shown in Table 13 is the Property Encodcd Numerical scheme for PSA. Column I codes for whether the monomeric unit is NAN or NGN. Columns 2,3 and 4 code for the variations in the 9, 7 and 4 positions respectively, where I corresponds to acetylated and 0 corresponds to unacetylated. Column 5 shows the numeric code for the PSAs. -0 to -7 was used instead of 8-F. Assigning the numbers to code for the variability in acetylation and the sign would indicate if it is NANINGN. Column 6 lists the monosaccharide represented by the code in column 5_ Column 7 lists the theoretical mass calculated for the monomeric units shown in column 6.
The mass-line for the combinations of substituted/unsubstituted NAN containing io monomeric units in PSA is shown in Figure 9A. The X-axis represents the calculated masses for monosaccharide to hexasaccharides. Shown in the Y axis is the number of fragments of a particular length and composition that exists for a given mass.
The values 150-190 were omitted to improve the clarity of the other peaks. The minimum difference between any monosaccharide and any disaccharide is 165.2Da, between any di and any trisaccharide is 39.03Da, between any tri and any tetrasaccharide is 39.03Da and 3.01Da for all higher order saccharides.
The mass-line for the combinations of substituted/unsubstituted NGN monomeric units in-PSA is shown in Figure 9B_ The X-axis represents the calculated masses for monosaccharaide to hexasaccharide. Shown in the Y axis is the number of fragments of a particular length and composition that exist for a given mass. The values 150-190 were omitted to improve the clarity of the other peaks. The minimum difference between any monosaccharide and any disaccharide is 181.2Da, between any di and any trisaccharide is 55:03Da and 13Da for higher order saccharides.
Tools as experimental constraints: There are several tools and detection methods available for studying PSAs. Based on the properties of the building blocks of PSA, this class of linear polysaccharides is amenable for MS. Methods of purify}ng PSA
polymers and obtaining composition using HPLC, CE and mass spectrometry have very recently been established. Enzymatic tools from various sources-have been used to study PSA
extensively. Notably the bacterial exosalidase which cleave PSA polymers processively from the non-reducing end and the bacteriophage derived endoneuramidase, which clips endolytically both the NAN and NGN containing PSA linear polysaccharides. In addition to these enzymes chemical methods such as hydrozonolysis followed by nitrous acid treatment and periodate oxidation followed by sodium borohydrate treatment may be used to as tools to degrade PSA polysaccharides into smaller polysaccharides.
Example 7: Variation of experimental conditions resulting in alteration of enzymatic reactions and its effect on the methods of the invention.
Secondary specificities of the heparinases have been observed, especially under exhaustive degradation conditions. As a part of ongoing investigations into the enzymology of heparinases, the relative rates of cleavage of I and G
containing sites by heparinase I and III with defined substrates under different conditions have been measured. For instance heparinase III cleaves both at I and G containing linkages and not I2S [H. E. Conrad, Heparin Binding Proteins (Academic Press, San Diego, 1998).].
However, under the reaction conditions used in this study, there is a dramatic (8-10 fold) difference in the rates of cleavage, with I-containing linkages being clipped more slowly than G-containing linkages (Figure I OA). Figure ] OA shows cleavage by recombinant heparinase I1I of.tetrasaccharides containing either G(e), I(o) or I2S (+) linkages. Each reaction was followed by capillary electrophoresis. With these substrates, heparinase III
does not cleave I2S-containing glycosidic linkages, and cleaves G-containing linkages roughly 10 times as fast as I-containing linkages. Under the "short"
conditions of digest it is expected that only G-containing saccharides are cleaved to an appreciable extent.
[Conditions for enzymatic digest of HLGAG oligosaccharides were set forth above, briefly, Digests were either designated as "short" or "exhaustive". Short digests were completed with 50 nM enzyme for 10 minutes. Exhaustive digests were completed using 200 nM enzyme for either four hours or overnight. Partial nitrous acid cleavage was completed using a modification of published procedures. Briefly, to an aqueous solution of saccharide was added a 2x solution of sodium nitrite in HCI such that the concentrati.on of nitrous acid was 2mM and HCI was 20mM. The reaction was allowed to proceed at room temperature with quenching of aliquots at various time points via the addition. of I L of 200 mM sodium acetate 1 mg/mL BSA pH 6Ø Exhaustive nitrous acid was completed by reacting saccharide with 4 mIvl nitrous acid in HCI
overnight at 3o room temperature. In both cases, it was found that the products of nitrous acid cleavage could be sampled directly by MALDI without further cleanup and without the need to reduce the anhydromannose residues to anhydromannitol. The entire panel of HLGAG
degrading exoenzymes were purchased from Oxford Glycosystems (Wakefield, MA) and .;ti.

used as suggested by the manufacturer.] For example, with the hexasaccharide AUHNH,6sGHNSIHNA-z, (which contains both I and G in a minimally sulfated region) cleavage occurs only at the G under "short" digest conditions as shown in Table 14.
Table 14 Species m/z (+ Peptide) Observed DUHNH66SGHNSIHNac 5442.1 DUHNsIHNac 5023.6 AUHNJ1,6sGHNs 5061.7 Heparinase II was incubated with the hexasaccharide AUHNH.6SGHNSIHNac and only cleavage at the G and not the I was observed. Furthermore, we have found that degree of sulfation does affect the kinetics of heparinase III degradation of oligosaccharides [S. Emst et al., Crit. Rev. Biochem. Mol. Biol. 30, 387 (1995); S.
Yamada et al., Glycobiology 4, 69 (1994); U.R. Desai, H.M. Wang, R.J.
Linhardt, Biochemistry 32, 8140 (1993); R.J. Linhardt et al., Biochemistry 29, 2611 (1990).]. In the case of heparinase I, this enzyme does not clip either I or G-containing glycosidic linkages within the context of our experimental procedures, whereas it readily clips I2s containing polysaccharides (Figure lOB). Figure l OC shows the same study. as completed in (A) except heparinase I was used instead of heparinase III. With heparinase I, cleavage only occurs at I2S-containing linkages but not before I
or G. There is only one report of heparinase I clipping G2S containing linkages [S.
Yamada, T.
Murakami, H. Tsuda, K. Yoshida, K. Sugahara, J. Biol. Chem. 270, 8696 (1995).], which was tested with two tetrasaccharide substrates and the experiments were performed under conditions which are kinetically very different from the `short' heparinase I
digestion presented here.
Quite a few factors have severely limited and complicated prior at-t.studies and int.erpretation of heparinase substrate specificity experiments. First, not only is a homogenous substrate preparation difficult, but also analyzing the substrates and products have been very challenging. Analysis has primarily relied on co-migration of the saccharides with known standards, and as others and we have observed, oligosaccharides with different sulfation pattems do co-migrate, complicating unique assignments. Further, some oligosaccharides used in previous studies to assign substrate specificity for the heparinases. were not homogeneous, complicating analys'is:
The : i...

development of the MALDI-MS procedure of the invention has enabled rapid and accurate determination of the saccharides. The second problem is the preparation of pure wild-type heparinases from the native host. The wild-type heparinase is isolated from Flavobacterium heparinum and this organism produces several complex polysaccharide- -degrading enzymes, and often these copurify with each other. For example, when examining the kinetics of heparinase III, we found that a commercial source of heparinase III was able to degrade the supposedly non-cleavable DU2sHNS,6S12SHNS.6S-Furthermore, MS and CE analysis of the products indicated that one was specifically 2-0 desulfated suggesting a sulfatase contamination. Recombinant heparinase III
produced l0 and purified in our laboratory (and not having contamination with other heparin degrading enzymes) does not cleave AU2sHNS,6SI2SHNS,6S as expected. Thus, different enzyme preparations and differences in digestion conditions, and differences in substrate.
size aiid composition and often contaminating substrates, taken together with assignments based on co-elution make comparison of data not only very difficult but also has led to contradictory findings.
Regardless of the outcome of heparinase substrate specificities, there are other methods that may be used to extract the isomeric state of the uronic acid [I
or G or IzS or G2s]. The uronic acid component of each disaccharide unit may be unambiguously ascertained by completing compositional analysis after exhaustive nitrous acid treatment.
By this method, compositional analysis of given oligosaccharides may be accomplished and the presence of G2s, 12S, I and G containing building blocks assessed.
With this infoTmation, rapid convergence to a single sequence could be completed by judicious application of the heparinases (regardless of their exact substrate specificity), since cleavage would give mass information on either side of the cleavage site.
Thus, in the octasaccharide (example 1) case, application of exhaustive nitrous acid would yield 1x0Ulvlan6s, 2x I2SMan6s =and lx GMan6s. Then, digestion of this octasaccharide, after tagging, with heparinase III under any conditions (forcing or non-forcing) would result in the formation of a hexasaccharide nzlz 5958.7 and. a disaccharide, immediately fixing the sequence. A similar sequence of events may be used with heparinase I to converge to a single sequence for the octasaccharide.
While there are caveats to the use of any one particular system for sequence analysis, whether the system is chemical degradation or enzymatic analysis, the sequencing strategy presented here is not critically dependent on any, single technique.

One of the major strengths of the sequencing strategy of the invention is the flexibility of our approach and the integration of MALDI and the coding scheme which enable the ability to adapt to different experimental constraints [For example, the recently cloned mammalian heparanase is another possible experimental constraint. M.D. Hulett et al., Nat. Med. 5.,793 (1999); I. Vlodavsky et. al., Nat. Med. 5, 803 (1999).]. As stated additional or different sets of experimental constraints may be used to not only arrive at a unique solution but also may be used to validate or confirm the solution from a given set of experimental constraints.

to Example 8: Methods for identifying protein-polysaccharide interactions and improved methods for sequencing.
To identify HLGAG sequences that bind to a particular protein, the most common methodology involves affinity fractionation of oligosaccharides using a.
particular HLGAG subset, namely porcine intestinal mucosa heparin.
Enzymatically or chemically derived heparin oligosaccharides of a particular length are passed over a column of immobilized protein. After washing, the bound fraction is eluted using high salt to disrupt interactions between the sulfates on the polysaccharide and basic residues on the protein; interactions which are crucial for binding. Eluted oligosaccharides are then characterized, typically by NMR. In this manner, sequences that bind to a number of proteins, including antithrombin III (AT-III), basic fibroblast growth factor (FGF-2), and endostatin have been identified.
While rigorous and well tested, this approach suffers from a. number of limitations. First, column chromatography requires large (milligram) amounts of material for successful analysis. Of the entire family of HLGAGs, only heparin is available in these quantities. However, heparin, due to its high sulfate content, contains a limited number of 'sequences, biasing *the selection procedure. Thus, there is no opportunity to sample or select for unusual sequences that might in fact bind with high affinity. In vivo HLGAG-binding proteins sample and bind to the. more structurally diverse heparan sulfate (HS) chains of proteoglycans at the cell surface where heparin-like sequences (i.e., sequences with a high degree of sulfation) do not always predominate. Heparin, while structurally related to HS, is present in vivo only in mast cells. For these reasons, heparin is not always an appropriate analog of cell surface HS, and in fact, the exclusive use of heparin in affinity fractionation experiments has created confusion in the field. One example illustrates this point. FGF-2 binds to a specific subset of heparan sulfate sequences that contain a critical 2-0 sulfated iduronate residue.
Column chromatography has separated a high affinity binder of FGF-2, the sequence(s) of which have been identified as oligosaccharides containing the predominant trisulfated disaccharide [I2sHNs,6s],, (n=3-6). However, rigorous examination of the crystal structures of FGF-2, including co-crystals of FGF with HLGAG oligosaccharides, indicates that only three contacts between sulfates and basic residues on FGF-2 are important for high affinity binding.
Using the mass spectrometric approach of the invention we have developed an l0 improved way to identify polysaccharide-protein interactions. The advantage of this approach is that it is highly sensitive, requiring only picomoles of material, which may be isolated from in vivo sources. As described below the approach may be used for the identification and sequencing of oligosaccharides that bind to proteins using picomoles of material. As a proof of concept, we show herein that this novel methodology is functionally equivalent-to the established column affinity fractionation method for three proteins: FGF-1, FGF-2 and ATIII, using heparin oligosaccharides as a model system.
Furthermore, we show herein that this system can be extended such that heparan sulfate isolated from the cell surface can be used to isolate binding proteins, demonstrating that, for. the first time, unbiased, biologically relevant HLGAGs can be used to identify binding sequences.
Methods:
Protein preparation and immobilization. ATIII was incubated overnight with excess porcine mucosal heparin, then biotinylated with EZ-link sulfo-NHS
biotin (Pierce). Canon NP Type E transparency film was taped to the MALDI sample plate and 2s used as a protein immobilization surface. FGF-1 and FGF-2 were immobilized by spotting 1 1 of aqueous solution on the film and air-drying. ATIII was immobilized by -first drying 4 g neutravidin on the film surface, then adding biotinylated ATIII to the neutravidin spot. Heparin was removed by washing ten times with 1 M NaCI and ten times with water.
. Saccharide binding, selection and analysis. Saccharides were derived from a partial digest of porcine mucosal heparin by heparinase I. The hexasaccharide fraction was obtained by size exclusion chromatography on Biogel P-6 and lyophilized to dryness. Saccharides were bound to immobilized proteins by spotting I l of aqueous solution on ihe protein spot for at least five minutes. Unbound saccharides were removed by washing with water fifteen times. For selection experiments, the spot was washed ten times with various NaCi concentrations. followed by ten water washes. Caffeic acid matrix in 50% acetonitrile with 2pmol/ l (RG)19R was added to the spot prior to MALDI
analysis. All saccharides were detected as noncovalent complexes with (RG)19R
using MALDI parameters described herein.

Saccharide digestion by heparinase I or III. Saccharides selected for FGF-2 binding were digested with heparinases I or III by spotting 8 g of enzyme in water after selection was completed. The spot was kept wet for the desired digestion time by adding water as necessary. Caffeic acid matrix with 2pmol/ l (RG),9R was added to the spot for MALDI analysis.
Isolation, Purification, and Selection of FGF binders from SMC heparan sulfate. Bovine aortic smooth muscle cells (SMCS) were grown to confluency.
Cells were washed twice with PBS and then 200 nM heparinase III was added for 1 hr.
The supernatant was heated to 50 C for 10 minutes to inactivate heparinase III and filtered.
To remove polynucleotide contamination, the samples were treated with DNAse and RNAse at room temperature ovemight. Heparan sulfate was isolated by~ binding to a DEAE filter, washing away unbound material, and elution using 10 mM sodium phosphate I-M NaCI pH 6Ø The material was then concentrated and buffer exchanged into water using a 3,000 MWCO membrane. The retentate was lyophilized and reconstituted in water. 100 nM heparinase II was added and aliquots were taken at 5, 10, 20, and 30 minutes post-addition. 1 L; was spotted on. FGF. After drying, the sample was washed, 2pmol/ l (RG)19R in matrix was added, and the sample was analyzed as outlined above.
Results:
Saccharide binding. to FGF-2 and FGF-1. As a first step towards the development of a viable MALDI _selection procedure, the FGF system using its prototypic members, viz. FGF-l and FGF-2 was selected. Initial experiments involved the use of a purified polysaccharide (Hexa I of Table 21) that.is known td bind with high affinity to FGF. With FGF-2, we found that Hexa 1 binds to FGF-2 and were detected, even with a salt wash of 0.5M NaCl; consistent with the. known affinity of Hexa 1 for -...

FGF-2. In addition, when an equimolar mixture of Hexa I and Hexa 2 (a low affinity binder) were applied to FGF-2 and washed with 0.2M NaCI to eliminate nonspecific binding, only Hexa 1 was observed. Together, these results point to the fact that, under of the conditions of the experiment, immobilized FGF-2 retained the same binding specificity as FGF in solution. Furthei demonstrating that binding specificity was intact, heat denaturation of FGF resulted in the detection of no saccharide binders.

Saccharide Sequence Hexa 1 (a) DDD or (b) DDMan6s Hexa 2 +D4-7 Penta 1 Table 15 FGF affinity fractionation of a hexasaccharide mixture derived from the enzymatic depolymerization of heparin was used to enrich for FGF binders. To .
determine whether specific binders could be selected from a more complex mixture using our methodology, a hexasaccharidc fraction derived from incomplete heparinase I
digestion of porcine intestinal mucosa heparin was spotted on immobilized FGF.
At least five unique structures were detected in the unfractionated hexasaccharide mixture.
Upon a salt wash, only two structures, 8- and 9-sulfated hexasaccharides, remained.
Importantly, the same results could alternately be achieved by enriching the spot for ' :specific binders and competing off low affinity binders. FGF-1, which has been shown to have sinular binding properties as FGF-2, could also select for the octa- and nonasulfated hexasaccharides from a mixture.
Sequencing saccharides on the MALDI surface. The highly sensitive sequencing rnethodology of the invention was used to test whether we could derive structural information of FGF high affinity binders on target. The octa- and nonasulfated saccharides were subjected to enzymatic and chemical depolymerization. After saccharide selection, the saccharide sample was depolymerized by heparinase I
to obtain sequence information. The nonasulfated hexasaccharide was reduced to a single-trisulfated disaccharide indicating that this saccharide is a repeat of [I2sHNs,~s]. Digestion =
of the octasulfated hexasaccharide yielded the trisulfated disaccharide and a pentasulfated tetrasaccharide. That this tetrasaccharide contains an unsulfated uronic acid was confirmed by heparinase III cleavage, which resulted in the disappearance of the tetrasaccharide. Confirmation of our sequencing assignments were made by isolating the octa- and nonasulfated hexasaccharides and sequenced using the methods described herein. Thus, the sequence of the nonasulfated hexasaccharide is DDD
(DU2sHNs.6sl 2sfINS,6s12sHNS.6s) and the sequence of the octasulfated hexasaccharide is DD-5.

Saccharide Binding to Antithrombin-IIL ATIII is heavily glycosylated, therefore we anticipated that it would not bind well to the MALDI plate. As an alternative strategy, avidin was immobilized on the plate and biotinylated AT-III was bound to the avidin. The ATIII biotinylation reaction was carried out in the presence of heparin to protect the protein's binding site for HLGAG oligosaccharides. After washing off the complexed heparin, penta 1, that contains an intact AT-III pentasaccharide binding sequence was used to verify that the protein was immobilized on the surface and was able to bind saccharides. Penta I binding to ATIII was observed up to washes of 0.5M
NaCI, consistent with it being a strong binder to ATIII.
Furthermore, this binding is also specifc. Introduction of a solution of hexal, hexa 2, and penta I to immobilized ATIII followed by a 0.2 M salt wash to remove non-specific binders resulted in signal only for penta 1. Interestingly, there was no signal from hexa 2 that contains a partially intact ATIII binding site, suggesting that, under our selection conditions, only sequences with a full binding site will be selected for.
Selection of FGF-2 Binders in SMC HS. Heparan sulfate at the cell surface of SMCs is known to contain high affinity sites for FGF binding. :In an effort to extend our initial studies with highly sulfated heparin, we sought to identify high affinity FGF
binders in heparan sulfate proteoglycans at the cell surface of SMCs. To this end, SMCs were treated with either heparinase I or heparinase III and the HLGAGs isolated and purified. Consistent with the known substrate specificity of the enzymes, the composition of released fragments is different. Fragments .were then treated with heparinase II to reduce them in size. At certain time points, the digest was spotted on FGF-2 and selection process was accomplished as outlined above. Consistent with our findings with heparin, a single hexasaccharide was identified to be a high affinity binder for FGF-2, namely the nonasulfated hexasaccharide with a sequence +DDD.
The above-methodology describes an alternative protocol for the selection of saccharide binders to proteins. This methodology has been applied, towards the identification of oligosaccharides derived from heparin.that bind to two well-establ.ished systems, FGF and ATIII. As - shown, this procedure produces identical results to the more established methodology of affinity fractionation. For FGF-.1 and,FGF-2, high -'72 -affinity binders can be selected out of a pool of similar saccharides. In addition, ATIII, can be selected for high affinity binders over binders that contain only a partial binding site.

This methodology has a number of critical advantages over prior art strategies.
First, it is possible to derive sequence information from the bound saccharides directly on a target. Second, and more substantially, the analysis with both FGF and ATIII
required only picomoles of material for both the protein and saccharide. Such an advance makes it feasible to use the more biologically relevant HS isolated from the cell surface as substrates, rather thanhighly sulfated heparin from mast cells.
Finally, while the Example demonstrated this technique for the chemically complex and information dense HLGAGs, it is widely applicable towards identifying other polysaccharide-protein interactions.
Example 9: .Methods for identifying branching and methods for sequencing branched polysaccharides.
15. Increasing evidence exists ihat glycosylation patterns are highly influenced by the phenotype of the cell. With the onset of disease, it has been noted that there are changes .
in glycan structure, especially in the degree of branching. For instance, in pathogenic versus normal prion proteins, there is a decrease in levels of glycans with bisecting G1cNAc residues and increased levels of tri- and tetrantennary structures. By judicious .2o application of enzymatic and chemical degradation the identity of branched chains may also. be identified.
MS Analysis of Complex Glycan Structures: As shown in Figure 11, the extended core structures generated from complex N-glycan structures were enzymatically generated and identified. MALDI-MS analysis was performed on the extended core 25 structures derived from enzymatic treatment of a mixture of bi- and triantennary structures. 1 pmol of each saccharide was subjected to digest with an enzyme cocktails that included sialidase from.4. urefaciens and 0-galactosidase from S.
pneumoniae. The mass signature of 1462.4 indicates that one of the structures is biantennary with a core fucose moiety, while the mass signature of 1665.8 is indicative of a triantennary30 structure, also with a core fiicose. [Uj = mannose; [*]=
fucose; [Er-]]= N-aeetylglucosamine; [OJ= galactose; and [d]=N-acetylneuraminic acid.
MALDI-MS sequencing of the N-linked polysaccharide of PSA: Next, rapid sequencing of the glycan structure of PSA from non:nal prostate tissue was performed (Figure 12). Figure 12 is data arising from MALDI-MS microsequencing of the PSA
polysaccharide structure. MALDI-MS was completed using 500 fmol of saccharide.
Analysis was completed with a saturated aqueous solution of 2,5-dihydroxybenzoic with 300 mM spermine as an additive. Analytes were detected in the negative mode at an accelerating voltage of 22 kV. l L of matrix was added to 0.5 L of aqueous sample and allowed to dry on the target. (A) MS of the intact polysaccharide structure. Peaks marked with an asterisk are impurities, and the analyte peak is detected both as M-H
(m/z 2369:5) and as a monosodiated adduct (M+Na-2H, m/z 2392.6). (B) Treatment of [A] with sialidase from A. urefaciens. 10 pmol of saccharide was incubated with enzyme overnight at 37 C in 10 mM sodium acetate pH 5.5 according to the manufacturer's instructions. Two new saccharides were seen, the first, at m/z 2078 corresponding to the loss of one sialic acid moiety and the second at m/z 1786.9 corresponding to the loss of two sialic acids from the non-reducing end. (C) Digest of [B] with galactosidase from S.
pneumoniae. Digest procedures were completed essentially as described above. A
signal product at m/z 1462.8 indicated that two galactose residues were removed upon treatment of [Pl] with the enzyme. (D) Digest of [C] with N-acetylhexosaminidase from S. pneumoniae. One product was observed as both M-H (m/z 1056.3) and M+Na-2H
(m/z 1078.1) corresponding to the loss of two N-acetylhexosamine units from [C]. A
Table of the analysis scheme with schematic structure. and theoretical molecular masses is presented in the center of Figure 12. Shown are the parent polysaccharide and enzymatically derived products seen in this analysis. [0] = mannose; [~r]=
fucose;
[EI]= N-acetylglucosamine; [0]= galactose; and [A]=N-acetylneuraminic acid.

Studies of the intact polysaccharide via NMR (large quantities of PSA were required for this study) yielded sequence information of the glycan [Belanger, A., van Halbeek, H.,. Graves; H.C.B., Grandbois, K., Stamey, T.A., Huang, L., Poppe, I., and Labrie, F., Prostate, 1995. 27: p. 187-197]. Similar to other N-linked glycoproteins, as stated above, PSA contains a core biantennary branched motif. Extending from each mannose arm of PSA is a trisaccharide unit. Together these modifications indicated an expected molecular mass of 2370 Da for the intact polysaccharide. Using MALDI-MS
3o and an exoglycosidase array we have sequenced the putative structure for the N-linked polysaccharide on PSA (Figure 12). Analysis of the intact polysaccharide yields a molecular mass of 2370 Da (Figure 12A), identical to the predicted molecular mass based on its structure. In fact for all structures and enzymatic products derived from them, a mass accuracy of less than one Dalton is realized.
In initial studies, we had found that maximum sensitivity was obtained with 2,5-dihydroxybenzoic acid as the matrix with spermine as an additive [Mechref, Y.
and M.V.
s Novotny, Matrix-assisted laser desorption/ionization mass spectrometry of acidic glycoconjugates facilitated by the use of spermine as a co-matrix. J Am Soc Mass Spectrom, 1998. 9(12): p. 1293-302.]. In this case, oligosaccharides were detected as negative ions. As outlined above, these conditions yielded maximal sensitivity (a limit of detection of around 500 fmol or about 1.5 ng) and also a homogenous signal, which is free of detectable adducts. Of note is the fact that negative mode detection makes amenable the analysis of sialic-containing pendant arms, but detection can also be done in the positive mode with different matrix conditions. Treatment of the polysaccharide with sialidasc (specific cleavage of 2Neua-a6,8 linkages) resulted in a mass decrease of 618 Da consistent with the cleavage of t,.i,c- sialic acid residues (Fig.ui-e 12B). Treatmeint of this saccharide with [3-galactosidase resulted in a further 360 Da decrease in mass, eonfirming the presence of two galactose residues located proximate to the sialic acids (Figure 12C). Importantly when the asilao structure of Figure 12B was treated with another enzyme besides 0-galactosidase, no reduction in mass was observed, confirning the identity of these units as (i-linked galactose residues. Via systematic application of the exoglycosidases, we can "read through" the. entire sequence of the putative glycan str-icture of PSA. In addition, not only can we "read through" the structure, but our methodology was able to complete the analysis using submicrogram amounts of material. Also, since at every step of "reading" the sequence we determined the mass, we had an internal control to ensure that our assumptions of enzyme specificity and N-glycan strueture were correct.
Direct Sequencing of the PSA Polysaccharide Information about the structure of the sugar moiety of PSA can not only be derived by isolating the sugar and sequencing it (such as by using the above methodology), but we can also derive information about the sugar structure without removal from the protein. Figure 13 shows the results of sequencing the sugar of PSA (Sigma Chemical). Figure 13 shows the results of enzymatic degradation of the saccharide chain directly off of PSA. 50 pmol of PSA
( 1.4 g) of PSA was denatured by heat treatment at 80 C for 20 minutes. Then the sample was sequentially treated with the exoenzymes (B-D). After overnight incubation at 37 C, l pmol of the digested PSA was examined by mass spectrometry.
Briefly, the aqueous sample was mixed with sinapinic acid in 30% acetonitrile, allowed to dry, and then examined by MALDI TOF. All spectra were calibrated externally with a mixture of myoglobin, ovalbumin, and BSA to ensure accurate molecular mass determination.
(A) PSA before the addition of exoenzymes. The measured mass of 28,478 agreed well with the reported value of 28,470. (B) Treatment of (A) with sialidase resulted in a mass decrease of 287 Da, consistent with the loss of one sialic acid residue. (C) Treatment of (B) with galactosidase_ A further decrease of 321 Da indicated the loss of two galactose 1 o moieties. (D) Upon digestion of (C) with hexosaminidase, a decrease of 393 Da indicated the loss of two N-acetylglucosamine residues.
The protein had a measured mass of 28,478.3 (Figure 13A). Treatment of the intact protein with sialidase resulted in a decrease of 287 Da, consistent with the loss of one sialic acid residue (Figure 13B). Additional treatment with galactosidase resulted in a decrease in mass of 321, consistent with the loss of two galactose residues (Figure 13C). Finally, treatment with N acetylhexosaminidase resulted in cleavage of two GIcNAc moieties (Figure 13D).
Glycotyping of PSA by EndoF2 Treatment EndoF2 is an endoglycanase that clips only biantenna -ry structures. Tri- and tetrantennary structures do not serve as substrates for this enzyme (Figure 14). In this way, EndoF2 treatment of a glycan structure, either attached to the protein or after isolation, was used to identify branching identity. This becomes especially important in light of the fact that aberrant changes in glycosylation patterns usually result in increased branching. In addition, EndoF2 was used to cleave glycan structures that were still attached to the protein of interest. Indeed, treatment of PSA with EndoF2 resulted in mass shift, consistent with the loss -of a biantennary, complex type glycan structure. Figure 14 showed the results of treatment of biantennary and triantennary saccharides with endoglycanse F2. (A) Treatment of the biantennary saccharide resulted in a niass decrease of 348.6, indicating cleavage between the G1eNAc residues. (B) Treatment of the triantennary saccharide with the same 3o substituents resulted in no cleavage showing that EndoF2 primarily cleaves biantennary structures. (C) EndoF2 treatment of heat denatured PSA. There was a mass reduction of 1709.7 Da in the molecular mass of PSA (compare 11 C and 11 A) indicating that the noi'mal glycan structure of PSA was biantennary.

What is claimed is:

Claims (91)

1. A data structure, tangibly embodied in a computer-readable medium, representing a polymer of chemical units, the data structure comprising:
an identifier including one or more fields, each field for storing a value corresponding to one or more properties of the polymer, wherein at least one field stores a non-character-based value.
2. The data structure of claim 1, wherein each of the fields is capable of storing a binary value.
3. The data structure of claim 1, wherein the identifier is representable as a single-digit hexadecimal number.
4. The data structure of claim 1, wherein the identifier is representable as a decimal value.
5. The data structure of claim 4, wherein the decimal value may be reduced to a plurality of prime divisors, wherein each prime divisor represents a building block of the polymer.
6. The data structure of claim 1, wherein the polymer of chemical units comprises a polysaccharide and wherein each of the chemical units is a saccharide.
7. The data structure of claim 1, wherein the polymer of chemical units comprises a nucleic acid and wherein each of the chemical units is a nucleotide.
8. The data structure of claim 1, wherein the polymer of chemical units comprises a polypeptide and wherein each of the chemical units is an amino acid.
9. The data structure of claim 1, wherein the one or more properties comprise one or more chemical unit properties, each chemical unit property being a property of one of the chemical units of the polymer.
10. The data structure of claim 9, wherein the one or more properties comprise one or more charges, each charge being a charge of one of the chemical units of the polymer.
11. The data structure of claim 9, wherein the one or more properties comprise one or more chemical unit identities, each chemical unit identity being an identity of a chemical unit of the polymer.
12. The data structure of claim 9, wherein the one or more properties comprise one or more confirmations, each confirmation being a confirmation of a chemical unit of the polymer.
13. The data structure of claim 9, wherein the one or more properties comprise one or more substituent identities, each substituent identity being an identity of a substituent of a chemical unit of the polymer.
14. The data structure of claim 1, wherein the one or more properties comprise one or more properties of the polymer.
15. The data structure of claim 14, wherein the one or more properties comprise a total charge of the polymer.
16. The data structure of claim 14, wherein the one or more properties comprise a total number of sulfates of the polymer.
17. The data structure of claim 14, wherein the one or more properties comprise a dye-binding of the polymer.
18. The data structure of claim 14, wherein the one or more properties comprise one or more properties of a polysaccharide.
19 The data structure of claim 18, wherein the one or more properties of a polysaccharide include one or more compositional ratios of substituents.
20. The data structure of claim 18. wherein the one or more properties of a polysaccharide include one or more compositional ratios of iduronic versus glucuronic.
21. The data structure of claim 18, wherein the one or more properties of a polysaccharide include enzymatic sensitivity.
22. The data structure of claim 14, wherein the one or more properties comprise a mass of the polymer.
23. The data structure of claim 1.4, wherein the one or more properties comprise degree of sulfation.
24. The data structure of claim 14, wherein the one or more properties comprise charge.
25. The data structure of claim 14, wherein the one or more properties comprise chirality.
26. The data structure of claim 1, wherein the identifier comprises a numerical identifier.
27. A computer-implemented method for generating a data structure, tangibly embodied in a computer-readable medium, representing a polymer of chemical units, the method comprising an act of:
generating an identifier including one or more fields for storing values, each value corresponding to one or more properties of the polymer, wherein at least one field stores a non-character-based value.
28. A computer-implemented method for determining whether properties of a query sequence of chemical units match properties of a polymer of chemical units, the query sequence being represented by a first data structure, tangibly embodied in a computer-readable medium, including an identifier that includes one or more fields, each field storing a value corresponding to one or more properties of the query sequence, the polymer being represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier that includes one or more fields, each field for storing a value corresponding to one or more properties of the polymer, the method comprising acts of:
(A) generating at least one mask based on the values stored in the one or more fields of the first data structure;
(B) performing at least one binary operation on the values stored in the one or more fields of the second data structure using the at least one mask to generate at least one result; and (C) determining whether the one or more properties of the query sequence match the one or more properties of the polymer based on the at least one result.
29. The method of claim 28, wherein each of the one or more fields of the first and second data structures is a bit field.
30. The method of claim 28, wherein the act (A) comprises an act of:
(A)(1) generating the at least one mask as a sequence of bits that is equivalent to the values stored in the fields of the first data structure.
31. The method of claim 28, wherein the act (A) comprises an act of:
(A)(1) generating the at least one mask as a sequential repetition of the values stored in the fields of the first data structure.
32. The method of claim 28, wherein the at least one mask comprises a plurality of masks and wherein the act (B) comprises acts of:
(B)(1) performing a logical AND operation on the values stored in the fields of the second data structure using each of the plurality of masks to generate a plurality of intermediate results; and (B)(2) combining the plurality of intermediate results using at least one logical OR operation to generate the at least one result.
33. The method of claim 28, wherein the act (C) comprises an act of:
(C)(1) determining that the one or more properties of the query sequence match the one or more properties of the polymer when the at least one result has a non-zero value.
34. The method of claim 28, wherein the at least one binary operation comprises at least one logical AND operation.
35. A database, tangibly embodied in a computer-readable medium, for storing information descriptive of one or more polymers, the database comprising:
one or more data units corresponding to the one or more polymers, each of the data units including an identifier that includes one or more fields, each field for storing a value corresponding to one or more properties of the polymer.
36. A method for determining whether complete building blocks of a query sequence of chemical units match complete building blocks of a polysaccharide, the query sequence being represented by a first data structure, tangibly embodied in a computer-readable medium, including an identifier that includes one or more fields, each field for storing a value corresponding to a complete building block of the query sequence, the polysaccharide being represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier that includes one or more fields, each field for storing a value corresponding to a complete building block of the polysaccharide, the method comprising acts of:
(A) generating at least one mask based on the values stored in the one or more fields of the first data structure;
(B) performing at least one binary operation on the values stored in the one or more fields of the second data structure using the at least one mask to generate at least one result; and (C) determining whether the complete building blocks of the query sequence match the complete building blocks of the polysaccharide based on the at least one result.
37. The method of claim 36, wherein each of the one or more fields of the first and second data structures is a bit field.
38. A data structure, tangibly embodied in a computer-readable medium, representing a polysaccharide, the data structure comprising:
an identifier including one or more fields, each field for storing a value corresponding to a complete building block of the polysaccharide.
39. The data structure of claim 38, wherein each of the one or more fields are capable of storing a binary value.
40. The data structure of claim 38, wherein the identifier is representable as a single-digit hexadecimal number.
41. The data structure of claim 38, wherein the identifier is representable as a decimal value.
42 The data structure of claim 41, wherein, the decimal value can be reduced to a plurality of prime divisors, wherein each prime divisor represents a building block of the polysaccharide.
43. A data structure, tangibly embodied in a computer-readable medium, representing a chemical unit of a polymer, the data structure comprising:
an identifier including one or more fields, each field for storing a value corresponding to one or more properties of the chemical unit, wherein at least one field stores a non-character-based value.
44. The data structure of claim 43, wherein the one or more properties include a charge of the chemical unit.
45. The data structure of claim 43, wherein the one or more properties include an identity of the chemical unit.
46. The data structure of claim 43, wherein the one or more properties include a confirmation of the chemical unit.
47. The data structure of claim 43, wherein the one or more properties include an identity of a substituent of the chemical unit.
48 The data structure of claim 43, wherein each of the fields is capable of storing a binary value.
49. The data structure of claim 43, wherein the identifier is representable as a single-digit hexadecimal number.
50. The data structure of claim 43, wherein the identifier is representable as a decimal value.
51 The data structure of claim 50, wherein the decimal value is a primary number.
52. The data structure of claim 51, wherein the polymer is a polysaccharide, and the primary number identifies the chemical unit as a building block of the polysaccharide.
53. The data structure of claim 43, wherein the polymer is a polysaccharide.
54. In a system including a database of values of properties of polymers of chemical units, a method for determining the composition of a sample polymer of chemical units having a known molecular length, comprising steps of:
(A) selecting, from the database, candidate polymers of chemical units having the same length as the sample polymer of chemical units and for which the value of a predetermined property is similar to the value of the predetermined property of the sample polymer of chemical units;
(B) performing an experiment on the sample polymer of chemical units;
(C) measuring properties of the sample polymer of chemical units resulting from the experiment; and (D) eliminating, from the candidate polymers of chemical units, polymers of chemical units having properties that do not correspond to the experimental results.
55. The method of claim 54, further comprising a step of:
(E) repeatedly performing the step (D) until the number of candidate polymers of chemical units falls below a predetermined threshold.
56. The method of claim 54, wherein the predetermined property is molecular weight.
57. A method for identifying a population of polymers of chemical units having the same property as a sample polymer of chemical units, comprising:
determining a property of a sample polymer of chemical units;
comparing the property of the sample polymer to a reference database of polymers of known sequence and known properties to identify a population of polymers of chemical units having the same property as a sample polymer of chemical units, wherein the reference database of polymers includes identifiers corresponding to the chemical units of the polymers, each of the identifiers including a field storing a value corresponding to the property.
58. The method of claim 57, wherein the step of determining a property of the sample polymer involves the use of mass spectrometry to determine the molecular weight of the polymer.
59. The method of claim 57, wherein the mass spectrometry is MALDI which detects molecular weight with an accuracy of approximately one Dalton.
60. The method of claim 57, wherein polymer is reduced to at least two fragments and the property of the polymer is the size of the fragments and wherein the step of detection involves strong ion exchange chromatography.
61. The method of claim 59, wherein the MALDI analysis is performed on a MALDI

surface having a protein coated thereon.
62. The method of claim 59, wherein the sample polymer is isolated from a cell surface.
63. A method for identifying a subpopulation of polymers having a property in common with a sample polymer of chemical units, comprising:
(A) applying an experimental constraint to the polymer to modify the polymer, (B) detecting a property of the modified polymer;
(C) identifying a population of polymers of chemical units having the same molecular length as the sample polymer; and (D) identifying a subpopulation of the identified population of polymers having the same property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer.
64. The method of claim 63, further comprising repeating steps (A), (B), and (D) on the modified polymer to identify a second subpopulation within the subpopulation of polymers having a second property in common with the twice modified polymer.
65. The method of claim 64, further comprising repeatedly performing the steps (A), (B), and (D) on the modified polymer until the number of polymers within the subpopulation falls below a predetermined threshold.
66. The method of claim 65, wherein the predetermined threshold of polymers within the subpopulation is two polymers and wherein the method is performed to identify the sequence of the polymer.
67. The method of claim 65, wherein the experimental constraints applied to the polymer are different for each repetition.
68. The method of claim 63, wherein the experimental constraint applied to the polymer is digestion with an exoenzyme.
69. The method of claim 63, wherein the experimental constraint applied to the polymer is digestion with an endoenzyme.
70. The method of claim 63, wherein the experimental constraint applied to the polymer is selected from the group consisting of restriction endonuclease digestion;
chemical digestion; chemical modification; interaction with a binding compound;
chemical peeling; and enzymatic modification.
71. The method of claim 63, wherein the property of the polymer is molecular weight.
72. The method of claim 63, wherein the population of polymers of chemical units includes every polymer sequence having the molecular weight of the sample polymer.
73. The method of claim 63, wherein the population of polymers of chemical units includes less than every polymer sequence having the molecular weight of the sample polymer.
74. The method of claim 63, wherein the step of detection involves the use of mass spectrometry to determine the molecular weight of the polymer.
75. The method of claim 74, wherein the mass spectrometry is matrix assisted laser desorption ionization which detects molecular weight with an accuracy of approximately one Dalton.
76. The method of claim 63, wherein polymer is reduced to at least two fragments and the property of the polymer is the size of the fragments and wherein the step of detection involves strong ion exchange chromatography.
77. The method of claim 63, wherein the step of identifying includes selecting the population of polymers of chemical units from a database including molecular weights of polymers of chemical units.
78. The method of claim 77, wherein the database includes identifiers corresponding to chemical units of a plurality of polymers, each of the identifiers including a field storing a value corresponding to a property of the corresponding chemical unit.
79. A method for compositional analysis of chemical units of a sample polymer, comprising:

(A) applying an experimental constraint to the sample polymer to modify the sample polymer, (B) detecting a property of the modified sample polymer;

(C) comparing the modified sample polymer to a reference database of polymers of identical size as the polymer, wherein the polymers of the reference database have also been subjected to the same experimental constraint as the sample polymer, wherein the comparison provides a compositional analysis of the sample polymer.
80. The method of claim 79, wherein the step of detection involves capillary electrophoresis.
81. The method of claim 79, wherein the experimental constraint applied to the polymer involves complete degradation of the polymer into individual chemical units, and wherein the compositional analysis reveals the number and type of units within the polymer.
82. The method of claim 79, wherein the step of detection involves matrix assisted laser desorption ionization mass spectrometry.
83. The method of claim 82, wherein the experimental constraint applied to the polymer involves incomplete enzymatic digestion of the polymer and wherein steps (A), (B), and(C) are repeated until the number of polymers within the reference database falls below a predetermined threshold, and wherein the compositional analysis reveals the identity of a sequence of chemical units of the polymer.
84. The method of claim 77, wherein the reference database includes identifiers corresponding to chemical units of a plurality of polymers, each of the identifiers including a field storing a value corresponding to a property of the corresponding chemical unit.
85. A method for sequencing a polymer, comprising:
(A) applying an experimental constraint to the polymer to modify the polymer, (B) detecting a property of the modified polymer;

(C) identifying a population of polymers having the same molecular length as the sample polymer and having molecular weights similar to the molecular weight of the sample polymer;

(D) identifying a subpopulation of the identified population of polymers having the same property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer;
(E) repeating steps (A), (B), and (D) by applying additional experimental constraints to the polymer and identifying additional subpopulations of polymers until the number of polymers within the subpopulation is one and the sequence of the polymer may be identified.
86. A method for identifying a polysaccharide-protein interaction, comprising:

contacting a protein-coated MALDI surface with a polysaccharide containing sample to produce a polysaccharide-protein-coated MALDI surface, removing unbound polysaccharide from the polysaccharide-protein-coated MALDI surface, and performing MALDI mass spectrometry to identify the polysaccharide that specifically interacts with the protein coated on the MALDI surface.
87. The method of claim 86, wherein a MALDI matrix is added to the polysaccharide-protein-coated MALDI surface.
88. The method of claim 86, further comprising applying an experimental constraint to the polysaccharide bound on the polysaccharide-protein-coated MALDI surface before performing the MALDI mass spectrometry analysis.
89. The method of claim 88, wherein the experimental constraint applied to the polymer is digestion with an exoenzyme.
90. The method of claim 88, wherein the experimental constraint applied to the polymer is digestion with an endoenzyme.
91. The method of claim 88, wherein the experimental constraint applied to the polymer is selected from the group consisting of restriction endonuclease digestion;
chemical digestion; chemical modification; and enzymatic modification.
CA2643162A 1999-04-23 2000-04-24 Polymer identification, compositional analysis and sequencing, based on property comparison Expired - Lifetime CA2643162C (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US13074799P 1999-04-23 1999-04-23
US13079299P 1999-04-23 1999-04-23
US60/130,747 1999-04-23
US60/130,792 1999-04-23
US15994099P 1999-10-14 1999-10-14
US15993999P 1999-10-14 1999-10-14
US60/159,939 1999-10-14
US60/159,940 1999-10-14
CA002370539A CA2370539C (en) 1999-04-23 2000-04-24 System and method for notating polymers

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA002370539A Division CA2370539C (en) 1999-04-23 2000-04-24 System and method for notating polymers

Publications (2)

Publication Number Publication Date
CA2643162A1 true CA2643162A1 (en) 2000-11-02
CA2643162C CA2643162C (en) 2018-01-02

Family

ID=27494876

Family Applications (2)

Application Number Title Priority Date Filing Date
CA002370539A Expired - Lifetime CA2370539C (en) 1999-04-23 2000-04-24 System and method for notating polymers
CA2643162A Expired - Lifetime CA2643162C (en) 1999-04-23 2000-04-24 Polymer identification, compositional analysis and sequencing, based on property comparison

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CA002370539A Expired - Lifetime CA2370539C (en) 1999-04-23 2000-04-24 System and method for notating polymers

Country Status (5)

Country Link
US (8) US7412332B1 (en)
EP (1) EP1190364A2 (en)
JP (1) JP4824170B2 (en)
CA (2) CA2370539C (en)
WO (1) WO2000065521A2 (en)

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7056504B1 (en) 1998-08-27 2006-06-06 Massachusetts Institute Of Technology Rationally designed heparinases derived from heparinase I and II
WO2000012726A2 (en) * 1998-08-27 2000-03-09 Massachusetts Institute Of Technology Rationally designed heparinases derived from heparinase i and ii
CA2370539C (en) * 1999-04-23 2009-01-06 Massachusetts Institute Of Technology System and method for notating polymers
WO2001066772A2 (en) * 2000-03-08 2001-09-13 Massachusetts Institute Of Technology Heparinase iii and uses thereof
DK1319183T3 (en) 2000-09-12 2009-05-18 Massachusetts Inst Technology Methods and products related to low molecular weight heparin
CA2423469A1 (en) * 2000-10-18 2002-04-25 Massachusetts Institute Of Technology Methods and products related to pulmonary delivery of polysaccharides
US7226739B2 (en) * 2001-03-02 2007-06-05 Isis Pharmaceuticals, Inc Methods for rapid detection and identification of bioagents in epidemiological and forensic investigations
US20040214228A9 (en) * 2001-09-14 2004-10-28 Ganesh Venkataraman Methods of evaluating glycomolecules for enhanced activities
CA2459040A1 (en) * 2001-09-14 2003-03-27 Mimeon, Inc. Methods of making glycolmolecules with enhanced activities and uses thereof
WO2003078960A2 (en) 2002-03-11 2003-09-25 Momenta Pharmaceuticals, Inc. Analysis of sulfated polysaccharides
EP1345167A1 (en) * 2002-03-12 2003-09-17 BRITISH TELECOMMUNICATIONS public limited company Method of combinatorial multimodal optimisation
WO2004069152A2 (en) * 2002-05-03 2004-08-19 Massachusetts Institute Of Technology Δ4,5 glycuronidase and uses thereof
CA2486456C (en) * 2002-05-20 2012-07-17 Massachusetts Institute Of Technology Method for sequence determination using nmr
DE60334220D1 (en) * 2002-06-03 2010-10-28 Massachusetts Inst Technology Rationally Constructed Lyases Derived from Chondroitinase B.
US20040147033A1 (en) * 2002-12-20 2004-07-29 Zachary Shriver Glycan markers for diagnosing and monitoring disease
JP4606712B2 (en) * 2003-01-08 2011-01-05 マサチューセッツ インスティテュート オブ テクノロジー 2-O sulfatase compositions and related methods
WO2005026720A1 (en) * 2003-09-04 2005-03-24 Parivid Llc Methods and apparatus for characterizing polymeric mixtures
US20050178959A1 (en) * 2004-02-18 2005-08-18 Viorica Lopez-Avila Methods and compositions for assessing a sample by maldi mass spectrometry
US7507570B2 (en) 2004-03-10 2009-03-24 Massachusetts Institute Of Technology Recombinant chondroitinase ABC I and uses thereof
US20060127950A1 (en) 2004-04-15 2006-06-15 Massachusetts Institute Of Technology Methods and products related to the improved analysis of carbohydrates
US20060057638A1 (en) * 2004-04-15 2006-03-16 Massachusetts Institute Of Technology Methods and products related to the improved analysis of carbohydrates
EP1768687A2 (en) * 2004-06-29 2007-04-04 Massachusetts Institute Of Technology Methods and compositions related to the modulation of intercellular junctions
US20060154894A1 (en) * 2004-09-15 2006-07-13 Massachusetts Institute Of Technology Biologically active surfaces and methods of their use
US20060264713A1 (en) * 2005-05-20 2006-11-23 Christoph Pedain Disease and therapy dissemination representation
GB0514552D0 (en) * 2005-07-15 2005-08-24 Nonlinear Dynamics Ltd A method of analysing representations of separation patterns
GB0514553D0 (en) * 2005-07-15 2005-08-24 Nonlinear Dynamics Ltd A method of analysing a representation of a separation pattern
US20070041979A1 (en) * 2005-08-19 2007-02-22 Raju T S Proteolysis resistant antibody preparations
US7767420B2 (en) 2005-11-03 2010-08-03 Momenta Pharmaceuticals, Inc. Heparan sulfate glycosaminoglycan lyase and uses thereof
WO2007120478A2 (en) * 2006-04-03 2007-10-25 Massachusetts Institute Of Technology Glycomic patterns for the detection of disease
US7756657B2 (en) * 2006-11-14 2010-07-13 Abb Inc. System for storing and presenting sensor and spectrum data for batch processes
US7301339B1 (en) * 2006-12-26 2007-11-27 Schlumberger Technology Corporation Estimating the concentration of a substance in a sample using NMR
WO2008085912A1 (en) 2007-01-05 2008-07-17 Massachusetts Institute Of Technology Compositions of and methods of using sulfatases from flavobacterium heparinum
US8069127B2 (en) * 2007-04-26 2011-11-29 21 Ct, Inc. Method and system for solving an optimization problem with dynamic constraints
US9139876B1 (en) 2007-05-03 2015-09-22 Momenta Pharmacueticals, Inc. Method of analyzing a preparation of a low molecular weight heparin
CN101785003A (en) * 2007-06-15 2010-07-21 新加坡科技研究局 System and method for representing n-linked glycan structures
US8093056B2 (en) * 2007-06-29 2012-01-10 Schlumberger Technology Corporation Method and apparatus for analyzing a hydrocarbon mixture using nuclear magnetic resonance measurements
US20100049445A1 (en) * 2008-06-20 2010-02-25 Eureka Genomics Corporation Method and apparatus for sequencing data samples
US8673267B2 (en) 2009-03-02 2014-03-18 Massachusetts Institute Of Technology Methods and products for in vivo enzyme profiling
US8063374B2 (en) * 2009-09-22 2011-11-22 California Polytechnic Corporation Systems and methods for determining recycled thermoplastic content
WO2011090948A1 (en) 2010-01-19 2011-07-28 Momenta Pharmaceuticals, Inc. Evaluating heparin preparations
CA2794697A1 (en) 2010-04-07 2011-10-13 Momenta Pharmaceuticals, Inc. High mannose glycans
US20140166875A1 (en) 2010-09-02 2014-06-19 Wayne State University Systems and methods for high throughput solvent assisted ionization inlet for mass spectrometry
WO2012058248A2 (en) * 2010-10-25 2012-05-03 Wayne State University Systems and methods extending the laserspray ionization mass spectrometry concept from atmospheric pressure to vacuum
WO2012115952A1 (en) 2011-02-21 2012-08-30 Momenta Pharmaceuticals, Inc. Evaluating heparin preparations
EP2686671A4 (en) 2011-03-12 2015-06-24 Momenta Pharmaceuticals Inc N-acetylhexosamine-containing n-glycans in glycoprotein products
EP2686000B1 (en) 2011-03-15 2021-05-05 Massachusetts Institute of Technology Multiplexed detection with isotope-coded reporters
WO2013177385A1 (en) 2012-05-23 2013-11-28 The Johns Hopkins University Mass spectrometry imaging of glycans from tissue sections and improved analyte detection methods
US9695244B2 (en) 2012-06-01 2017-07-04 Momenta Pharmaceuticals, Inc. Methods related to denosumab
US10450361B2 (en) 2013-03-15 2019-10-22 Momenta Pharmaceuticals, Inc. Methods related to CTLA4-Fc fusion proteins
EP2996772B1 (en) 2013-05-13 2018-12-19 Momenta Pharmaceuticals, Inc. Methods for the treatment of neurodegeneration
EP3004374B1 (en) 2013-06-07 2020-08-12 Massachusetts Institute of Technology Affinity-based detection of ligand-encoded synthetic biomarkers
WO2015057622A1 (en) 2013-10-16 2015-04-23 Momenta Pharmaceuticals, Inc. Sialylated glycoproteins
CN104572622B (en) * 2015-01-05 2018-01-02 武汉传神信息技术有限公司 A kind of screening technique of term
US10381108B2 (en) * 2015-09-16 2019-08-13 Charles Jianping Zhou Web search and information aggregation by way of molecular network
US11448643B2 (en) 2016-04-08 2022-09-20 Massachusetts Institute Of Technology Methods to specifically profile protease activity at lymph nodes
EP3452407A4 (en) 2016-05-05 2019-11-13 Massachusetts Institute Of Technology Methods and uses for remotely triggered protease activity measurements
KR102115390B1 (en) * 2016-07-26 2020-05-27 주식회사 엘지화학 Method for measuring a modified ratio of a polymer
WO2018187688A1 (en) 2017-04-07 2018-10-11 Massachusetts Institute Of Technology Methods to spatially profile protease activity in tissue and sections
DE102018000650A1 (en) * 2018-01-27 2019-08-01 Friedrich-Schiller-Universität Jena Method for the determination of impurities in polyalkylene ethers or polyalkyleneamines and its use
WO2019173332A1 (en) 2018-03-05 2019-09-12 Massachusetts Institute Of Technology Inhalable nanosensors with volatile reporters and uses thereof
US11835522B2 (en) 2019-01-17 2023-12-05 Massachusetts Institute Of Technology Sensors for detecting and imaging of cancer metastasis
US20230222313A1 (en) * 2022-01-12 2023-07-13 Dell Products L.P. Polysaccharide archival storage

Family Cites Families (129)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4692435A (en) 1978-11-06 1987-09-08 Choay, S.A. Mucopolysaccharide composition having a regulatory action on coagulation, medicament containing same and process of preparation
SE449753B (en) 1978-11-06 1987-05-18 Choay Sa MUCOPOLYSACCARIDE COMPOSITION WITH REGULATORY EFFECTS ON COAGULATION, MEDICINAL CONTAINING ITS SAME AND PROCEDURE FOR PREPARING THEREOF
CA1136620A (en) 1979-01-08 1982-11-30 Ulf P.F. Lindahl Heparin fragments having selective anticoagulation activity
US4281108A (en) 1980-01-28 1981-07-28 Hepar Industries, Inc. Process for obtaining low molecular weight heparins endowed with elevated pharmacological properties, and product so obtained
US4443545A (en) 1980-08-25 1984-04-17 Massachusetts Institute Of Technology Process for producing heparinase
US4341869A (en) 1980-08-25 1982-07-27 Massachusetts Institute Of Technology Process for producing heparinase
US4373023A (en) 1980-10-14 1983-02-08 Massachusetts Institute Of Technology Process for neutralizing heparin
US4396762A (en) 1981-08-24 1983-08-02 Massachusetts Institute Of Technology Heparinase derived anticoagulants
DE3202894A1 (en) 1982-01-29 1983-08-11 Otsuka Pharmaceutical Co. Ltd., Tokyo Method for the determination of compounds containing tumour-associated glycoprotein, use of the method for cancer diagnosis and kit for use of the method
US4551296A (en) * 1982-03-19 1985-11-05 Allied Corporation Producing high tenacity, high modulus crystalline article such as fiber or film
US4757056A (en) 1984-03-05 1988-07-12 Hepar Industries, Inc. Method for tumor regression in rats, mice and hamsters using hexuronyl hexosaminoglycan-containing compositions
US4679555A (en) 1984-08-07 1987-07-14 Key Pharmaceuticals, Inc. Method and apparatus for intrapulmonary delivery of heparin
US5106734A (en) 1986-04-30 1992-04-21 Novo Nordisk A/S Process of using light absorption to control enzymatic depolymerization of heparin to produce low molecular weight heparin
EP0245813B2 (en) 1986-05-16 2000-03-22 ITALFARMACO S.p.A. EDTA-free heparins, heparin fractions and fragments, processes for their preparation and pharmaceutical compositions containing them
US4784820A (en) 1986-08-11 1988-11-15 Allied-Signal Inc. Preparation of solution of high molecular weight polymers
US4942156A (en) 1986-08-20 1990-07-17 Hepar Industries, Inc. Low molecular weight heparin derivatives having improved anti-Xa specificity
US4745105A (en) 1986-08-20 1988-05-17 Griffin Charles C Low molecular weight heparin derivatives with improved permeability
US4830013A (en) 1987-01-30 1989-05-16 Minnesota Mining And Manufacturing Co. Intravascular blood parameter measurement system
FR2614026B1 (en) 1987-04-16 1992-04-17 Sanofi Sa LOW MOLECULAR WEIGHT HEPARINS WITH REGULAR STRUCTURE, THEIR PREPARATION AND THEIR BIOLOGICAL APPLICATIONS
SE8702254D0 (en) 1987-05-29 1987-05-29 Kabivitrum Ab NOVEL HEPARIN DERIVATIVES
US5169772A (en) 1988-06-06 1992-12-08 Massachusetts Institute Of Technology Large scale method for purification of high purity heparinase from flavobacterium heparinum
IT1234508B (en) 1988-06-10 1992-05-19 Alfa Wassermann Spa HEPARIN DERIVATIVES AND PROCEDURE FOR THEIR PREPARATION
US5204323B1 (en) 1988-10-06 1995-07-18 Ciba Geigy Corp Hirudin antidotal compositions and methods
GB8826448D0 (en) 1988-11-11 1988-12-14 Thrombosis Res Inst Improvements in/relating to organic compounds
US5776434A (en) 1988-12-06 1998-07-07 Riker Laboratories, Inc. Medicinal aerosol formulations
CA1340966C (en) * 1989-05-19 2000-04-18 Thomas R. Covey Method of protein analysis
IT1237518B (en) 1989-11-24 1993-06-08 Renato Conti SUPER-SULFATED HEPARINS
GB8927546D0 (en) 1989-12-06 1990-02-07 Ciba Geigy Process for the production of biologically active tgf-beta
US5152784A (en) 1989-12-14 1992-10-06 Regents Of The University Of Minnesota Prosthetic devices coated with a polypeptide with type IV collagen activity
FR2663639B1 (en) 1990-06-26 1994-03-18 Rhone Poulenc Sante LOW MOLECULAR WEIGHT POLYSACCHARIDE BLENDS PROCESS FOR PREPARATION AND USE.
US5284558A (en) 1990-07-27 1994-02-08 University Of Iowa Research Foundation Electrophoresis-based sequencing of oligosaccharides
IT1245761B (en) 1991-01-30 1994-10-14 Alfa Wassermann Spa PHARMACEUTICAL FORMULATIONS CONTAINING GLYCOSAMINOGLICANS ABSORBABLE ORALLY.
JP3110064B2 (en) 1991-03-06 2000-11-20 生化学工業株式会社 Novel heparitinase, method for producing the same and bacteria producing the same
US5262325A (en) 1991-04-04 1993-11-16 Ibex Technologies, Inc. Method for the enzymatic neutralization of heparin
SK120193A3 (en) 1991-05-02 1994-07-06 Yeda Res & Dev Pharmaceutical composition for the prevention and/or treatment of pathological processes
TR27248A (en) 1991-06-13 1994-12-21 Dow Italia A method for preparing a polyurethane elastomer from a soft sectioned isocyanate end prepolymer.
US5714376A (en) 1991-10-23 1998-02-03 Massachusetts Institute Of Technology Heparinase gene from flavobacterium heparinum
IT1254216B (en) 1992-02-25 1995-09-14 Opocrin Spa POLYSACCHARIDIC DERIVATIVES OF HEPARIN, EPARAN SULPHATE, THEIR FRACTIONS AND FRAGMENTS, PROCEDURE FOR THEIR PREPARATION AND PHARMACEUTICAL COMPOSITIONS CONTAINING THEM
US5453171A (en) 1992-03-10 1995-09-26 The Board Of Regents Of The University Of Michigan Heparin-selective polymeric membrane electrode
US5856928A (en) * 1992-03-13 1999-01-05 Yan; Johnson F. Gene and protein representation, characterization and interpretation process
GB9206291D0 (en) 1992-03-23 1992-05-06 Cancer Res Campaign Tech Oligosaccharides having growth factor binding affinity
US5389539A (en) 1992-11-30 1995-02-14 Massachusetts Institute Of Technology Purification of heparinase I, II, and III from Flavobacterium heparinum
US5696100A (en) 1992-12-22 1997-12-09 Glycomed Incorporated Method for controlling O-desulfation of heparin and compositions produced thereby
GB9306255D0 (en) 1993-03-25 1993-05-19 Cancer Res Campaign Tech Heparan sulphate oligosaccharides having hepatocyte growth factor binding affinity
FR2704861B1 (en) 1993-05-07 1995-07-28 Sanofi Elf Purified heparin fractions, process for obtaining them and pharmaceutical compositions containing them.
US5744155A (en) 1993-08-13 1998-04-28 Friedman; Doron Bioadhesive emulsion preparations for enhanced drug delivery
WO1995013830A1 (en) 1993-11-17 1995-05-26 Massachusetts Institute Of Technology Method for inhibiting angiogenesis using heparinase
US6013628A (en) 1994-02-28 2000-01-11 Regents Of The University Of Minnesota Method for treating conditions of the eye using polypeptides
US5607859A (en) 1994-03-28 1997-03-04 Massachusetts Institute Of Technology Methods and products for mass spectrometric molecular weight determination of polyionic analytes employing polyionic reagents
US5658749A (en) 1994-04-05 1997-08-19 Corning Clinical Laboratories, Inc. Method for processing mycobacteria
US5753445A (en) 1994-04-26 1998-05-19 The Mount Sinai Medical Center Of The City University Of New York Test for the detection of anti-heparin antibodies
WO1995030424A1 (en) 1994-05-06 1995-11-16 Glycomed Incorporated O-desulfated heparin derivatives, methods of making and uses thereof
US5681733A (en) 1994-06-10 1997-10-28 Ibex Technologies Nucleic acid sequences and expression systems for heparinase II and heparinase III derived from Flavobacterium heparinum
US5619421A (en) 1994-06-17 1997-04-08 Massachusetts Institute Of Technology Computer-implemented process and computer system for estimating the three-dimensional shape of a ring-shaped molecule and of a portion of a molecule containing a ring-shaped structure
US5997863A (en) 1994-07-08 1999-12-07 Ibex Technologies R And D, Inc. Attenuation of wound healing processes
US6309853B1 (en) 1994-08-17 2001-10-30 The Rockfeller University Modulators of body weight, corresponding nucleic acids and proteins, and diagnostic and therapeutic uses thereof
FR2723847A1 (en) 1994-08-29 1996-03-01 Debiopharm Sa HEPARIN - BASED ANTITHROMBOTIC AND NON - HEMORRHAGIC COMPOSITIONS, PROCESS FOR THEIR PREPARATION AND THERAPEUTIC APPLICATIONS.
US5687090A (en) 1994-09-01 1997-11-11 Aspen Technology, Inc. Polymer component characterization method and process simulation apparatus
EP0785774B1 (en) 1994-10-12 2001-01-31 Focal, Inc. Targeted delivery via biodegradable polymers
JP2927401B2 (en) * 1994-12-28 1999-07-28 日本ビクター株式会社 Helical scan type information recording device
US5569366A (en) 1995-01-27 1996-10-29 Beckman Instruments, Inc. Fluorescent labelled carbohydrates and their analysis
US5618917A (en) 1995-02-15 1997-04-08 Arch Development Corporation Methods and compositions for detecting and treating kidney diseases associated with adhesion of crystals to kidney cells
US5763427A (en) 1995-03-31 1998-06-09 Hamilton Civic Hospitals Research Development Inc. Compositions and methods for inhibiting thrombogenesis
US5597811A (en) 1995-04-10 1997-01-28 Amerchol Corporation Oxirane carboxylic acid derivatives of polyglucosamines
DE69634013T2 (en) 1995-05-26 2005-12-15 SurModics, Inc., Eden Prairie PROCESS AND IMPLANTABLE OBJECT FOR PROMOTING ENDOTHELIALIZATION
US5824299A (en) 1995-06-22 1998-10-20 President & Fellows Of Harvard College Modulation of endothelial cell proliferation with IP-10
US5770420A (en) * 1995-09-08 1998-06-23 The Regents Of The University Of Michigan Methods and products for the synthesis of oligosaccharide structures on glycoproteins, glycolipids, or as free molecules, and for the isolation of cloned genetic sequences that determine these structures
CA2235223A1 (en) 1995-10-30 1997-05-09 Massachusetts Institute Of Technology Rationally designed polysaccharide lyases derived from heparinase i
US5752019A (en) 1995-12-22 1998-05-12 International Business Machines Corporation System and method for confirmationally-flexible molecular identification
DE69724420T2 (en) 1996-04-29 2004-06-09 Quadrant Technologies Ltd. METHOD FOR INHALING DRY POWDER
US6228654B1 (en) 1996-05-09 2001-05-08 The Scripps Research Institute Methods for structure analysis of oligosaccharides
US5855913A (en) 1997-01-16 1999-01-05 Massachusetts Instite Of Technology Particles incorporating surfactants for pulmonary drug delivery
USRE37053E1 (en) 1996-05-24 2001-02-13 Massachusetts Institute Of Technology Particles incorporating surfactants for pulmonary drug delivery
US5985309A (en) 1996-05-24 1999-11-16 Massachusetts Institute Of Technology Preparation of particles for inhalation
US5874064A (en) 1996-05-24 1999-02-23 Massachusetts Institute Of Technology Aerodynamically light particles for pulmonary drug delivery
AU727352B2 (en) 1996-07-29 2000-12-14 Paringenix, Inc. Methods of treating asthma with o-desulfated heparin
ES2317657T3 (en) 1996-09-19 2009-04-16 The Regents Of The University Of Michigan POLYMERS CONTAINING POLYSACARIDS SUCH AS ALGINATES OR MODIFIED ALGINATES.
US5767269A (en) 1996-10-01 1998-06-16 Hamilton Civic Hospitals Research Development Inc. Processes for the preparation of low-affinity, low molecular weight heparins useful as antithrombotics
US5803726A (en) * 1996-10-04 1998-09-08 Bacon; David W. Retractable, electric arc-ignited gas pilot for igniting flare stacks
US5759767A (en) 1996-10-11 1998-06-02 Joseph R. Lakowicz Two-photon and multi-photon measurement of analytes in animal and human tissues and fluids
US6642360B2 (en) * 1997-12-03 2003-11-04 Genentech, Inc. Secreted polypeptides that stimulate release of proteoglycans from cartilage
GB9708278D0 (en) 1997-04-24 1997-06-18 Danisco Composition
US5968822A (en) 1997-09-02 1999-10-19 Pecker; Iris Polynucleotide encoding a polypeptide having heparanase activity and expression of same in transduced cells
US6190875B1 (en) 1997-09-02 2001-02-20 Insight Strategy & Marketing Ltd. Method of screening for potential anti-metastatic and anti-inflammatory agents using mammalian heparanase as a probe
US6268146B1 (en) 1998-03-13 2001-07-31 Promega Corporation Analytical methods and materials for nucleic acid detection
US6190522B1 (en) 1998-04-24 2001-02-20 Board Of Regents, The University Of Texas System Analysis of carbohydrates derivatized with visible dye by high-resolution polyacrylamide gel electrophoresis
AU4707899A (en) * 1998-06-23 2000-01-10 Pioneer Hi-Bred International, Inc. Alteration of hemicellulose concentration in plants by rgp
US5985576A (en) * 1998-06-30 1999-11-16 The United States Of America As Represented By The Secretary Of Agriculture Species-specific genetic identification of Mycobacterium paratuberculosis
US7056504B1 (en) 1998-08-27 2006-06-06 Massachusetts Institute Of Technology Rationally designed heparinases derived from heparinase I and II
WO2000012726A2 (en) 1998-08-27 2000-03-09 Massachusetts Institute Of Technology Rationally designed heparinases derived from heparinase i and ii
AU5585599A (en) 1998-08-31 2000-03-21 University Of Washington Stable isotope metabolic labeling for analysis of biopolymers
US6291439B1 (en) 1998-09-02 2001-09-18 Biomarin Pharmaceuticals Methods for diagnosing atherosclerosis by measuring endogenous heparin and methods for treating atherosclerosis using heparin
US6333051B1 (en) * 1998-09-03 2001-12-25 Supratek Pharma, Inc. Nanogel networks and biological agent compositions thereof
US6440705B1 (en) 1998-10-01 2002-08-27 Vincent P. Stanton, Jr. Method for analyzing polynucleotides
US6610484B1 (en) 1999-01-26 2003-08-26 Cytyc Health Corporation Identifying material from a breast duct
US6429302B1 (en) 1999-02-02 2002-08-06 Chiron Corporation Polynucleotides related to pancreatic disease
CA2370539C (en) * 1999-04-23 2009-01-06 Massachusetts Institute Of Technology System and method for notating polymers
JP3689842B2 (en) 1999-05-28 2005-08-31 株式会社J−オイルミルズ Monosaccharide analysis method for sugar composition
CN1195909C (en) 2000-02-16 2005-04-06 帝人株式会社 Meta-form wholly aromatic polyamide fiber and process for producing same
WO2001066772A2 (en) * 2000-03-08 2001-09-13 Massachusetts Institute Of Technology Heparinase iii and uses thereof
DK1319183T3 (en) * 2000-09-12 2009-05-18 Massachusetts Inst Technology Methods and products related to low molecular weight heparin
CA2423469A1 (en) * 2000-10-18 2002-04-25 Massachusetts Institute Of Technology Methods and products related to pulmonary delivery of polysaccharides
IL155518A0 (en) 2000-10-19 2003-11-23 Target Discovery Inc Mass defect labeling for the determination of oligomer sequences
JP2005503120A (en) * 2001-03-27 2005-02-03 マサチューセッツ インスティテュート オブ テクノロジー Methods and products for FGF dimerization
AU2002312146A1 (en) 2001-05-30 2002-12-09 Triad Therapeutics, Inc. Nuclear magnetic resonance-docking of compounds
US6766817B2 (en) * 2001-07-25 2004-07-27 Tubarc Technologies, Llc Fluid conduction utilizing a reversible unsaturated siphon with tubarc porosity action
CA2459040A1 (en) 2001-09-14 2003-03-27 Mimeon, Inc. Methods of making glycolmolecules with enhanced activities and uses thereof
US20040214228A9 (en) 2001-09-14 2004-10-28 Ganesh Venkataraman Methods of evaluating glycomolecules for enhanced activities
US7363168B2 (en) * 2001-10-02 2008-04-22 Stratagene California Adaptive baseline algorithm for quantitative PCR
WO2003078960A2 (en) 2002-03-11 2003-09-25 Momenta Pharmaceuticals, Inc. Analysis of sulfated polysaccharides
US20040087543A1 (en) 2002-04-25 2004-05-06 Zachary Shriver Methods and products for mucosal delivery
WO2004069152A2 (en) * 2002-05-03 2004-08-19 Massachusetts Institute Of Technology Δ4,5 glycuronidase and uses thereof
CA2486456C (en) * 2002-05-20 2012-07-17 Massachusetts Institute Of Technology Method for sequence determination using nmr
DE60334220D1 (en) 2002-06-03 2010-10-28 Massachusetts Inst Technology Rationally Constructed Lyases Derived from Chondroitinase B.
US20040147033A1 (en) 2002-12-20 2004-07-29 Zachary Shriver Glycan markers for diagnosing and monitoring disease
JP4606712B2 (en) * 2003-01-08 2011-01-05 マサチューセッツ インスティテュート オブ テクノロジー 2-O sulfatase compositions and related methods
WO2005026720A1 (en) 2003-09-04 2005-03-24 Parivid Llc Methods and apparatus for characterizing polymeric mixtures
US7851223B2 (en) 2004-02-27 2010-12-14 Roar Holding Llc Method to detect emphysema
US7507570B2 (en) * 2004-03-10 2009-03-24 Massachusetts Institute Of Technology Recombinant chondroitinase ABC I and uses thereof
US20060057638A1 (en) * 2004-04-15 2006-03-16 Massachusetts Institute Of Technology Methods and products related to the improved analysis of carbohydrates
US20060127950A1 (en) * 2004-04-15 2006-06-15 Massachusetts Institute Of Technology Methods and products related to the improved analysis of carbohydrates
WO2005110438A2 (en) * 2004-04-15 2005-11-24 Massachusetts Institute Of Technology Methods and products related to the intracellular delivery of polysaccharides
EP1768687A2 (en) * 2004-06-29 2007-04-04 Massachusetts Institute Of Technology Methods and compositions related to the modulation of intercellular junctions
US20060154894A1 (en) * 2004-09-15 2006-07-13 Massachusetts Institute Of Technology Biologically active surfaces and methods of their use
US20070020243A1 (en) * 2005-01-12 2007-01-25 Massachusetts Institute Of Technology Methods and compositions related to modulating the extracellular stem cell environment
WO2006105315A2 (en) * 2005-03-29 2006-10-05 Massachusetts Institute Of Technology Compositions and methods for regulating inflammatory responses
US20090105463A1 (en) * 2005-03-29 2009-04-23 Massachusetts Institute Of Technology Compositions of and Methods of Using Oversulfated Glycosaminoglycans
AU2006262145B2 (en) * 2005-06-22 2011-09-01 Gen-Probe Incorporated Method and algorithm for quantifying polynucleotides
WO2007120478A2 (en) * 2006-04-03 2007-10-25 Massachusetts Institute Of Technology Glycomic patterns for the detection of disease

Also Published As

Publication number Publication date
US20030191587A1 (en) 2003-10-09
WO2000065521A3 (en) 2001-10-25
US20080301178A1 (en) 2008-12-04
CA2370539C (en) 2009-01-06
US20040204869A1 (en) 2004-10-14
US7139666B2 (en) 2006-11-21
US20040197933A1 (en) 2004-10-07
JP2002543222A (en) 2002-12-17
JP4824170B2 (en) 2011-11-30
CA2370539A1 (en) 2000-11-02
CA2643162C (en) 2018-01-02
US7117100B2 (en) 2006-10-03
US7412332B1 (en) 2008-08-12
US6597996B1 (en) 2003-07-22
US7110889B2 (en) 2006-09-19
US20070066769A1 (en) 2007-03-22
WO2000065521A2 (en) 2000-11-02
US20090119027A1 (en) 2009-05-07
EP1190364A2 (en) 2002-03-27

Similar Documents

Publication Publication Date Title
CA2370539C (en) System and method for notating polymers
US8018231B2 (en) Method for sequence determination using NMR
Tang et al. Automated interpretation of MS/MS spectra of oligosaccharides
EP1319183B1 (en) Methods and products related to low molecular weight heparin
Lawrence et al. Evolutionary differences in glycosaminoglycan fine structure detected by quantitative glycan reductive isotope labeling
Huang et al. LC-MS n analysis of isomeric chondroitin sulfate oligosaccharides using a chemical derivatization strategy
Pepi et al. Developments in mass spectrometry for glycosaminoglycan analysis: a review
Sugahara et al. Structural Studies on the Hexasaccharide Alditols Isolated from the Carbohydrate-Protein Linkage Region of Dermatan Sulfate Proteoglycans of Bovine Aorta: DEMONSTRATION OF IDURONIC ACID-CONTAINING COMPONENTS (∗)
Zamfir et al. Structural characterization of chondroitin/dermatan sulfate oligosaccharides from bovine aorta by capillary electrophoresis and electrospray ionization quadrupole time‐of‐flight tandem mass spectrometry
Hogan et al. Software for peak finding and elemental composition assignment for glycosaminoglycan tandem mass spectra
Wu et al. Negative electron transfer dissociation sequencing of 3-O-sulfation-containing heparan sulfate oligosaccharides
Yates et al. Recent innovations in the structural analysis of heparin
WO2002044714A2 (en) System and method for integrated analysis of data for characterizing carbohydrate polymers
Hounsell et al. Computer-assisted interpretation of1H-nmr spectra in the analysis of the structure of oligosaccharides
Toukach et al. Computer-assisted structural analysis of regular glycopolymers on the basis of 13C NMR data
Roberts et al. tal Health, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.* To whom correspondence should be addressed. E-mail: ramnat@ mit. edu
Hogan et al. GAGrank: Software for glycosaminoglycan sequence ranking using a bipartite graph model
Raman et al. Informatics Concepts to Decode Structure-Function Relationships of Glycosaminoglycans
Bieganski et al. Motif explorer-a tool for interactive exploration of aminoacid sequence motifs
Turnbull Sequencing Heparan Sulfate Saccharides

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20200424