US20080301178A1 - Data structures representing polysaccharides and databases and methods related thereto - Google Patents
Data structures representing polysaccharides and databases and methods related thereto Download PDFInfo
- Publication number
- US20080301178A1 US20080301178A1 US12/133,334 US13333408A US2008301178A1 US 20080301178 A1 US20080301178 A1 US 20080301178A1 US 13333408 A US13333408 A US 13333408A US 2008301178 A1 US2008301178 A1 US 2008301178A1
- Authority
- US
- United States
- Prior art keywords
- data structure
- polymer
- properties
- polysaccharide
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 150000004676 glycans Chemical class 0.000 title claims abstract description 51
- 229920001282 polysaccharide Polymers 0.000 title claims abstract description 49
- 239000005017 polysaccharide Substances 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 33
- 239000000126 substance Substances 0.000 claims abstract description 138
- 150000002016 disaccharides Chemical class 0.000 claims description 36
- 125000001424 substituent group Chemical group 0.000 claims description 20
- IAJILQKETJEXLJ-UHFFFAOYSA-N Galacturonsaeure Natural products O=CC(O)C(O)C(O)C(O)C(O)=O IAJILQKETJEXLJ-UHFFFAOYSA-N 0.000 claims description 17
- 150000002772 monosaccharides Chemical class 0.000 claims description 11
- MSWZFWKMSRAUBD-UHFFFAOYSA-N 2-Amino-2-Deoxy-Hexose Chemical compound NC1C(O)OC(CO)C(O)C1O MSWZFWKMSRAUBD-UHFFFAOYSA-N 0.000 claims 1
- AEMOLEFTQBMNLQ-AQKNRBDQSA-N D-glucopyranuronic acid Chemical compound OC1O[C@H](C(O)=O)[C@@H](O)[C@H](O)[C@H]1O AEMOLEFTQBMNLQ-AQKNRBDQSA-N 0.000 claims 1
- AEMOLEFTQBMNLQ-HNFCZKTMSA-N L-idopyranuronic acid Chemical compound OC1O[C@@H](C(O)=O)[C@@H](O)[C@H](O)[C@H]1O AEMOLEFTQBMNLQ-HNFCZKTMSA-N 0.000 claims 1
- 125000001483 monosaccharide substituent group Chemical group 0.000 claims 1
- 229920000642 polymer Polymers 0.000 abstract description 205
- 150000001720 carbohydrates Chemical class 0.000 abstract description 12
- 239000012634 fragment Substances 0.000 description 18
- 239000000203 mixture Substances 0.000 description 14
- 238000004949 mass spectrometry Methods 0.000 description 11
- 238000005670 sulfation reaction Methods 0.000 description 11
- 150000001413 amino acids Chemical class 0.000 description 9
- 102000039446 nucleic acids Human genes 0.000 description 9
- 108020004707 nucleic acids Proteins 0.000 description 9
- 150000007523 nucleic acids Chemical class 0.000 description 9
- IAJILQKETJEXLJ-QTBDOELSSA-N aldehydo-D-glucuronic acid Chemical compound O=C[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C(O)=O IAJILQKETJEXLJ-QTBDOELSSA-N 0.000 description 8
- 239000000356 contaminant Substances 0.000 description 8
- 239000012535 impurity Substances 0.000 description 8
- 230000019635 sulfation Effects 0.000 description 8
- 229940097043 glucuronic acid Drugs 0.000 description 7
- IAJILQKETJEXLJ-LECHCGJUSA-N iduronic acid Chemical compound O=C[C@@H](O)[C@H](O)[C@@H](O)[C@H](O)C(O)=O IAJILQKETJEXLJ-LECHCGJUSA-N 0.000 description 7
- 108020004414 DNA Proteins 0.000 description 6
- 102000053602 DNA Human genes 0.000 description 6
- 229920001184 polypeptide Polymers 0.000 description 6
- 102000004196 processed proteins & peptides Human genes 0.000 description 6
- 108090000765 processed proteins & peptides Proteins 0.000 description 6
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 5
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 5
- 229920001222 biopolymer Polymers 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 230000002255 enzymatic effect Effects 0.000 description 5
- 239000002773 nucleotide Substances 0.000 description 5
- 125000003729 nucleotide group Chemical group 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 4
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 4
- 238000006640 acetylation reaction Methods 0.000 description 4
- 150000001408 amides Chemical class 0.000 description 4
- 150000001412 amines Chemical class 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000005251 capillar electrophoresis Methods 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 108090000790 Enzymes Proteins 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 238000005481 NMR spectroscopy Methods 0.000 description 3
- 239000002253 acid Substances 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000001962 electrophoresis Methods 0.000 description 3
- 150000002148 esters Chemical class 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 150000003467 sulfuric acid derivatives Chemical class 0.000 description 3
- CXENHBSYCFFKJS-OXYODPPFSA-N (Z,E)-alpha-farnesene Chemical compound CC(C)=CCC\C(C)=C\C\C=C(\C)C=C CXENHBSYCFFKJS-OXYODPPFSA-N 0.000 description 2
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical group CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 2
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 230000021736 acetylation Effects 0.000 description 2
- 230000002730 additional effect Effects 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 125000003277 amino group Chemical group 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 125000004429 atom Chemical group 0.000 description 2
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 238000001360 collision-induced dissociation Methods 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000004992 fast atom bombardment mass spectroscopy Methods 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 238000003752 polymerase chain reaction Methods 0.000 description 2
- 150000004804 polysaccharides Polymers 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 150000003212 purines Chemical group 0.000 description 2
- 150000003230 pyrimidines Chemical group 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- QAOWNCQODCNURD-UHFFFAOYSA-L sulfate group Chemical group S(=O)(=O)([O-])[O-] QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 2
- 229940104230 thymidine Drugs 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 229940035893 uracil Drugs 0.000 description 2
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- MWBWWFOAEOYUST-UHFFFAOYSA-N 2-aminopurine Chemical compound NC1=NC=C2N=CNC2=N1 MWBWWFOAEOYUST-UHFFFAOYSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- RYYIULNRIVUMTQ-UHFFFAOYSA-N 6-chloroguanine Chemical compound NC1=NC(Cl)=C2N=CNC2=N1 RYYIULNRIVUMTQ-UHFFFAOYSA-N 0.000 description 1
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- 229920002683 Glycosaminoglycan Polymers 0.000 description 1
- 108010022901 Heparin Lyase Proteins 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 239000004952 Polyamide Substances 0.000 description 1
- 229920002873 Polyethylenimine Polymers 0.000 description 1
- 239000004642 Polyimide Substances 0.000 description 1
- 229920002396 Polyurea Polymers 0.000 description 1
- 238000001069 Raman spectroscopy Methods 0.000 description 1
- 102000004896 Sulfotransferases Human genes 0.000 description 1
- 108090001033 Sulfotransferases Proteins 0.000 description 1
- UCKMPCXJQFINFW-UHFFFAOYSA-N Sulphide Chemical compound [S-2] UCKMPCXJQFINFW-UHFFFAOYSA-N 0.000 description 1
- 150000001242 acetic acid derivatives Chemical class 0.000 description 1
- 150000007513 acids Chemical class 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- PPQRONHOSHZGFQ-LMVFSUKVSA-N aldehydo-D-ribose 5-phosphate Chemical group OP(=O)(O)OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PPQRONHOSHZGFQ-LMVFSUKVSA-N 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 125000003118 aryl group Chemical group 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000001818 capillary gel electrophoresis Methods 0.000 description 1
- 238000002144 chemical decomposition reaction Methods 0.000 description 1
- 238000002983 circular dichroism Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000005370 electroosmosis Methods 0.000 description 1
- 230000007515 enzymatic degradation Effects 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 230000009144 enzymatic modification Effects 0.000 description 1
- 238000006345 epimerization reaction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000002334 isothermal calorimetry Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000000816 matrix-assisted laser desorption--ionisation Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 229920002647 polyamide Polymers 0.000 description 1
- 229920000412 polyarylene Polymers 0.000 description 1
- 229920000515 polycarbonate Polymers 0.000 description 1
- 239000004417 polycarbonate Substances 0.000 description 1
- 229920000728 polyester Polymers 0.000 description 1
- 229920001721 polyimide Polymers 0.000 description 1
- -1 polysiloxane Polymers 0.000 description 1
- 229920001296 polysiloxane Polymers 0.000 description 1
- 229920002635 polyurethane Polymers 0.000 description 1
- 239000004814 polyurethane Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 150000004044 tetrasaccharides Chemical class 0.000 description 1
- 150000007970 thio esters Chemical class 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K1/00—General methods for the preparation of peptides, i.e. processes for the organic chemical preparation of peptides or proteins of any length
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/40—Encryption of genetic data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/20—Identification of molecular entities, parts thereof or of chemical compositions
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Definitions
- notational systems have been used to encode classes of chemical units.
- a unique code is assigned to each chemical unit in the class.
- a polymer of chemical units can be represented, using such a notational system, as a set of codes corresponding to the chemical units.
- Such notational systems have been used to encode polymers, such as proteins, in a computer-readable format.
- a polymer that has been represented in a computer-readable format according to such a notational system can be processed by a computer.
- Character-based searching algorithms are typically slow because such algorithms search by comparing individual characters in the query sequence against individual characters in the sequences of chemical units stored in the database. The speed of such algorithms is therefore related to the length of the query sequence, resulting in particularly poor performance for long query sequences.
- the invention is directed to a notational system for representing polymers of chemical units.
- the notational system is referred to as Property encoded nomenclature (PEN).
- PEN Property encoded nomenclature
- a polymer is assigned an identifier that includes information about properties of the polymer.
- properties of a disaccharide are each assigned a binary value, and an identifier for the disaccharide includes the binary values assigned to the properties of the disaccharide.
- the identifier is capable of being expressed as a number, such as a single hexadecimal digit.
- the identifier may be stored in a computer readable medium, such as in a data unit (e.g., a record or a table entry) of a polymer database.
- Polymer identifiers may be used in a number of ways. For example, the identifiers may be used to determine whether properties of a query sequence of chemical units match properties of a polymer of chemical units. One application of such matching is to quickly search a polymer database for a particular polymer of interest or for a polymer or polymers having specified properties.
- the invention is directed to a data structure, tangibly embodied in a computer-readable medium, representing a polymer of chemical units.
- the invention is directed to a computer-implemented method for generating such a data structure.
- the data structure may include an identifier that may include one or more fields for storing values corresponding to properties of the polymer. At least one field may be a non-character-based field. Each field may be capable of storing a binary value.
- the identifier may be a numerical identifier, such as a number that is representable as a single-digit hexadecimal number.
- the polymer may be any of a variety of polymers.
- the polymer may be a polysaccharide and the chemical units may be saccharides; (2) the polymer may be a nucleic acid and the chemical units may be nucleotides; or (3) the polymer may be a polypeptide and the chemical units may be amino acids.
- the properties may be properties of the chemical units in the polymer.
- the properties may include charges of chemical units in the polymer, identities of chemical units in the polymer, confirmations of chemical units in the polymer, or identities of substituents of chemical units in the polymer.
- the properties may be properties of the polymer that are not properties of any individual chemical unit within the polymer.
- Example properties include a total charge of the polymer, a total number of sulfates of the polymer, a dye-binding of the polymer, a mass of the polymer, compositional ratios of substituents, compositional ratios of iduronic versus glucuronic, enzymatic sensitivity, degree of sulfation, charge, and chirality.
- the invention is directed to a computer-implemented method for determining whether properties of a query sequence of chemical units match properties of a polymer of chemical units.
- the query sequence may be represented by a first data structure, tangibly embodied in a computer-readable medium, including an identifier that may include one or more bit fields for storing values corresponding to properties of the query sequence.
- the polymer may be represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier that may include one or more bit fields for storing values corresponding to properties of the polymer.
- the method may include acts of generating at least one mask based on the values stored in the one or more bit fields of the first data structure, performing at least one binary operation on the values stored in the one or more bit fields of the second data structure using the at least one mask to generate at least one result, and determining whether the properties of the query sequence match the properties of the polymer based on the at least one result.
- the chemical units may, for example, be any of the chemical units described above.
- the properties may be any of the properties described above.
- the act of generating includes an act of generating the at least one mask as a sequence of bits that is equivalent to the values stored in the one or more bit fields of the first data structure. In another embodiment, the act of generating includes an act of generating the at least one mask as a sequential repetition of the values stored in the one or more bit fields of the first data structure.
- the at least one mask includes a plurality of masks and the act of performing at least one binary operation includes acts of performing a logical AND operation on the values stored in the one or more bit fields of the second data structure using each of the plurality of masks to generate a plurality of intermediate results, and combining the plurality of intermediate results using at least one logical OR operation to generate the at least one result.
- the act of determining includes an act of determining that the properties of the query sequence match the properties of the polymer when the at least one result has a non-zero value.
- the at least one binary operation includes at least one logical AND operation.
- the invention is directed to a database, tangibly embodied in a computer-readable medium, for storing information descriptive of one or more polymers.
- the database may include one or more data units (e.g., records or table entries) corresponding to the one or more polymers, each of the data units may include an identifier that may include one or more fields for storing values corresponding to properties of the polymer.
- the invention is directed to a data structure, tangibly embodied in a computer-readable medium, representing a chemical unit of a polymer.
- the data structure may comprise an identifier including one or more fields. Each field may be for storing a value corresponding to one or more properties of the chemical unit. At least one field may store a non-character-based value such as, for example, a binary or decimal value.
- aspects of the invention include the various combinations of one or more of the foregoing aspects of the invention, as well as the combinations of one or more of the various embodiments thereof as found in the following detailed description or as may be derived therefrom. It should be understood that the foregoing aspects of the invention also have corresponding computer-implemented processes which are also aspects of the present invention. It should also be understood that other embodiments of the present invention may be derived by those of ordinary skill in the art both from the following detailed description of a particular embodiment of the invention.
- FIG. 1 is a block diagram illustrating an example of a computer system for storing and manipulating polymer information.
- FIG. 2A is a diagram illustrating an example of a record for storing information about a polymer and its constituent chemical units.
- FIG. 2B is a diagram illustrating an example of a record for storing information about a polymer.
- FIG. 2C is a diagram illustrating an example of a record for storing information about constituent chemical units of a polymer.
- FIG. 3 is a flow chart illustrating an example of a method for determining whether properties of a first polymer of chemical units match properties of a second chemical unit.
- FIG. 1 shows an example of a computer system 100 for storing and manipulating polymer information.
- the computer system 100 includes a polymer database 102 which includes a plurality of records 104 a - n storing information corresponding to a plurality of polymers.
- Each of the records 104 a - n may store information about properties of the corresponding polymer, properties of the corresponding polymer's constituent chemical units, or both.
- the polymers for which information is stored in the polymer database 102 may be any kind of polymers.
- the polymers may include polysaccharides, nucleic acids, or polypeptides.
- a “polymer” as used herein is a compound having a linear and/or branched backbone of chemical units which are secured together by linkages. In some but not all cases the backbone of the polymer may be branched.
- the term “backbone” is given its usual meaning in the field of polymer chemistry.
- the polymers may be heterogeneous in backbone composition thereby containing any possible combination of polymer units linked together such as peptide-nucleic acids.
- a polymer is homogeneous in backbone composition and is, for example, a nucleic acid, a polypeptide, a polysaccharide, a carbohydrate, a polyurethane, a polycarbonate, a polyurea, a polyethyleneimine, a polyarylene sulfide, a polysiloxane, a polyimide, a polyacetate, a polyamide, a polyester, or a polythioester.
- a “polysaccharide” is a biopolymer comprised of linked saccharide or sugar units.
- nucleic acid as used herein is a biopolymer comprised of nucleotides, such as deoxyribose nucleic acid (DNA) or ribose nucleic acid (RNA).
- a polypeptide as used herein is a biopolymer comprised of linked amino acids.
- linked units of a polymer “linked” or “linkage” means two entities are bound to one another by any physicochemical means. Any linkage known to those of ordinary skill in the art, covalent or non-covalent, is embraced. Such linkages are well known to those of ordinary skill in the art. Natural linkages, which are those ordinarily found in nature connecting the chemical units of a particular polymer, are most common. Natural linkages include, for instance, amide, ester and thioester linkages. The chemical units of a polymer analyzed by the methods of the invention may be linked, however, by synthetic or modified linkages. Polymers where the units are linked by covalent bonds will be most common but also include hydrogen bonded, etc.
- the polymer is made up of a plurality of chemical units.
- a “chemical unit” as used herein is a building block or monomer which can be linked directly or indirectly to other building blocks or monomers to form a polymer.
- the polymer preferably is a polymer of at least two different linked units. The particular type of unit will depend on the type of polymer.
- DNA is a biopolymer comprised of a deoxyribose phosphate backbone composed of units of purines and pyrimidines such as adenine, cytosine, guanine, thymine, 5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 2,6-diaminopurine, hypoxanthine, and other naturally and non-naturally occurring nucleobases, substituted and unsubstituted aromatic moieties.
- RNA is a biopolymer comprised of a ribose phosphate backbone composed of units of purines and pyrimidines such as those described for DNA but wherein uracil is substituted for thymidine.
- DNA units may be linked to the other units of the polymer by their 5′ or 3′ hydroxyl group thereby forming an ester linkage.
- RNA units may be linked to the other units of the polymer by their 5′, 3′ or 2′ hydroxyl group thereby forming an ester linkage.
- DNA or RNA units having a terminal 5′, 3′ or 2′ amino group may be linked to the other units of the polymer by the amino group thereby forming an amide linkage.
- nucleic acid is represented by a sequence of letters it will be understood that the nucleotides are in 5′ ⁇ 3′ order from left to right and that “A” denotes adenosine, “C” denotes cytidine, “G” denotes guanosine, “T” denotes thymidine, and “U” denotes uracil unless otherwise noted.
- the chemical units of a polypeptide are amino acids, including the 20 naturally occurring amino acids as well as modified amino acids.
- Amino acids may exist as amides or free acids and are linked to the other units in the backbone of the polymers through their a-amino group thereby forming an amide linkage to the polymer.
- a polysaccharide is a polymer composed of monosaccharides linked to one another.
- the basic building block of the polysaccharide is actually a disaccharide unit which can be repeating or non-repeating.
- a unit when used with respect to a polysaccharide refers to a basic building block of a polysaccharide and can include a monomeric building block (monosaccharide) or a dimeric building block (disaccharide).
- a “plurality of chemical units” is at least two units linked to one another.
- the polymers may be native or naturally-occurring polymers which occur in nature or non-naturally occurring polymers which do not exist in nature.
- the polymers typically include at least a portion of a naturally occurring polymer.
- the polymers can be isolated or synthesized de novo.
- the polymers can be isolated from natural sources e.g. purified, as by cleavage and gel separation or may be synthesized e.g., (i) amplified in vitro by, for example, polymerase chain reaction (PCR); (ii) synthesized by, for example, chemical synthesis; (iii) recombinantly produced by cloning, etc.
- PCR polymerase chain reaction
- FIG. 2A illustrates an example of the format of a data unit 200 in the polymer database 102 (i.e., one of the data units 104 a - n ).
- the data unit 200 may include a polymer identifier (ID) 202 that identifies the polymer corresponding to the data unit 200 .
- the polymer ID 202 is described in more detail below with respect to FIG. 2B .
- the data unit 200 also may include one or more chemical unit identifiers (IDs) 204 a - n corresponding to chemical units that are constituents of the polymer corresponding to the data unit 200 .
- the chemical unit IDs 204 a - n are described in more detail below with respect to FIG. 2C .
- the format of the data unit 200 shown in FIG. 2A is merely an example of a format that may be used to represent polymers in the polymer database 102 .
- Polymers may be represented in the polymer database in other ways.
- the data unit 200 may include only the polymer ID 202 or may only include one or more of the chemical unit IDs 204 a - n.
- FIG. 2B illustrates an example of the polymer ID 202 .
- the polymer ID 202 may include one or more fields 202 a - n for storing information about properties of the polymer corresponding to the data unit 200 ( FIG. 2A ).
- FIG. 2C illustrates an example of the chemical unit 204 a .
- the chemical unit ID 204 a may include one or more fields 206 a - m for storing information about properties of the chemical unit corresponding to the chemical unit ID 204 a .
- the fields 206 a - m of the chemical unit ID 204 a may store any kind of value that is capable of being stored in a computer readable medium, such as, for example, a binary value, a hexadecimal value, an integral decimal value, or a floating point value.
- Each field 206 a - m may store information about any property of the corresponding chemical unit.
- a “property” as used herein is a characteristic (e.g., structural characteristic) of the polymer that provides information (e.g., structural information) about the polymer.
- information e.g., structural information
- property provides information other than the identity of a unit of the polymer or the polymer itself.
- a compilation of several properties of a polymer may provide sufficient information to identify a chemical unit or even the entire polymer but the property of the polymer itself does not encompass the chemical basis of the chemical unit or polymer.
- polysaccharides When the term property is used with respect to polysaccharides, to define a polysaccharide property, it has the same meaning as described above except that due to the complexity of the polysaccharide, a property may identify a type of monomeric building block of the polysaccharide.
- Chemical units of polysaccharides are much more complex than chemical units of other polymers, such as nucleic acids and polypeptides.
- the polysaccharide unit has more variables in addition to its basic chemical structure than other chemical units.
- the polysaccharide may be acetylated or sulfated at several sites on the chemical unit, or it may be charged or uncharged.
- one property of a polysaccharide may be the identity of one or more basic building blocks of the polysaccharides.
- a basic building block alone may not provide information about the charge and the nature of substituents of the saccharide or disaccharide.
- a building block of uronic acid may be iduronic or glucuronic acid.
- Each of these building blocks may have additional substituents that add complexity to the structure of the chemical unit.
- a single property may not identify such additional substitutes charges, etc., in addition to identifying a complete building block of a polysaccharide.
- This information may be assembled from several properties.
- a property of a polymer as used herein does not encompass an amino acid or nucleotide but does encompass a saccharide or disaccharide building block of a polysaccharide.
- a type of property that provides information about a polymer may depend on a type of polymer being analyzed. For instance, if the polymer is a polysaccharide, properties such as charge, molecular weight, nature and degree of sulfation or acetylation, and type of saccharide may provide information about the polymer.
- Properties may include, but are not limited to, charge, chirality, nature of substituents, quantity of substituents, molecular weight, molecular length, compositional ratios of substituents or units, type of basic building block of a polysaccharide, hydrophobicity, enzymatic sensitivity, hydrophilicity, secondary structure and conformation (i.e., position of helicies), spatial distribution of substituents, ratio of one set of modifications to another set of modifications (i.e., relative amounts of 2-O sulfation to N-sulfation or ratio of iduronic acid to glucuronic acid), and binding sites for proteins.
- a substituent, as used herein is an atom or group of atoms that substitute a unit, but are not themselves the units.
- a property of a polymer may be identified by any means known in the art.
- the procedure used to identify a property may depend on a type of property.
- Molecular weight for instance, may be determined by several methods including mass spectrometry.
- mass spectrometry for determining the molecular weight of polymers is well known in the art.
- Mass Spectrometry has been used as a powerful tool to characterize polymers because of its accuracy ( ⁇ Dalton) in reporting the masses of fragments generated (e.g., by enzymatic cleavage), and also because only pM sample concentrations are required.
- MALDI-MS matrix-assisted laser desorption ionization mass spectrometry
- mass spectrometry known in the art, such as, electron spray-MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) can also be used to identify the molecular weight of the polymer or polymer fragments.
- FAB-MS fast atom bombardment mass spectrometry
- CAD collision-activated dissociation mass spectrometry
- the mass spectrometry data may be a valuable tool to ascertain information about the polymer fragment sizes after the polymer has undergone degradation with enzymes or chemicals. After a molecular weight of a polymer is identified, it may be compared to molecular weights of other known polymers. Because masses obtained from the mass spectrometry data are accurate to one Dalton (1 D), a size of one or more polymer fragments obtained by enzymatic digestion may be precisely determined, and a number of substituents (i.e., sulfates and acetate groups present) may be determined.
- substituents i.e., sulfates and acetate groups present
- a “mass line” as used herein is an information database, preferably in the form of a graph or chart which stores information for each possible type of polymer having a unique sequence based on the molecular weight of the polymer.
- a mass line may describe a number of polymers having a particular molecular weight.
- a two-unit nucleic acid molecule i.e., a nucleic acid having two chemical units
- a two-unit polysaccharide i.e., disaccharide
- a mass line may be generated by uniquely assigning a particular mass to a particular length of a given fragment (all possible di, tetra, hexa, octa, up to a hexadecasaccharide), and tabulating the results (An Example is shown in FIG. 4 ).
- mass spectrometry data indicates the mass of a fragment to 1 D accuracy
- a length may be assigned uniquely to fragment by looking up a mass on the mass line. Further, it may be determined from the mass line that, within a fragment of particular length higher than a disaccharide, there is a minimum of 4.02 D different in masses indicating that two acetate groups (84.08 D) replaced a sulfate group (80.06 D). Therefore, a number of sulfates and acetates of a polymer fragment may be determined from the mass from the mass spectrometry data and, such number may be assigned to the polymer fragment.
- compositional ratios of substituents or chemical units may be determined using methodology known in the art, such as capillary electrophoresis.
- a polymer may be subjected to an experimental constraint such as enzymatic or chemical degradation to separate each of the chemical units of the polymers. These units then may be separated using capillary electrophoresis to determine the quantity and type of substituents or chemical units present in the polymer. Additionally, a number of substituents or chemical units can be determined using calculations based on the molecular weight of the polymer.
- reaction samples may be analyzed by small-diameter, gel-filled capillaries.
- the small diameter of the capillaries 50 ⁇ m allows for efficient dissipation of heat generated during electrophoresis.
- high field strengths can be used without excessive Joule heating (400 V/m), lowering the separation time to about 20 minutes per reaction run, therefor increasing resolution over conventional gel electrophoresis.
- many capillaries may be analyzed in parallel, allowing amplification of generated polymer information.
- compositional analysis also may be used to determine a presence and composition of an impurity as well as a main property of the polymer. Such determinations may be accomplished if the impurity does not contain an identical composition as the polymer.
- To determine whether an impurity is present may involve accurately integrating an area under each peak that appears in the electrophoretogram and normalizing the peaks to the smallest of the major peaks. The sum of the normalized peaks should be equal to one or close to being equal to one. If it is not, then one or more impurities are present. Impurities even may be detected in unknown samples if at least one of the disaccharide units of the impurity differs from any disaccharide unit of the unknown.
- one or more aspects of a composition of the components may be determined using capillary electrophoresis. Because all known disaccharide units may be baseline-separated by the capillary electrophoresis method described above and because migration times typically are determined using electrophoresis (i.e., as opposed to electroosmotic flow) and are reproducible, reliable assignment to a polymer fragment of the various saccharide units may be achieved. Consequently, both a composition of the major peak and a composition of a minor contaminant may be assigned to a polymer fragment. The composition for both the major and minor components of a solution may be assigned as described below.
- compositions involve determining the composition of the major AT-III binding HLGAG decasaccharide (+DDD4-7) and its minor contaminant (+D5D4-7) present in solution in a 9:1 ratio.
- Complete digestion of this 9:1 mixture with a heparinases yields 4 peaks: three representative of the major decasaccharide (viz., D, 4, and ⁇ 7) which are also present in the contaminant and one peak, 5, that is present only in the contaminant.
- the area of each peak for D, 4, and ⁇ 7 represents an additive combination of a contribution from the major decasaccharide and the contribution from the contaminant, whereas the peak for 5 represents only the contaminant.
- the area under the 5 peak may be used as a starting point. This area represents an area under the peak for one disaccharide unit of the contaminant. Subtracting this area from the total area of 4 and ⁇ 7 and subtracted twice this area from an area under D yields a 1:1:3 ratio of 4: ⁇ 7:D. Such a ratio confirms the composition of the major component and indicates that the composition of the impurity is two Ds, one 4, one ⁇ 7 and one 5.
- hydrophobicity may be determined using reverse-phase high-pressure liquid chromatography (RP-HPLC).
- Enzymatic sensitivity may be identified by exposing the polymer to an enzyme and determining a number of fragments present after such exposure. The chirality may be determined using circular dichroism. Protein binding sites may be determined by mass spectrometry, isothermal calorimetry and NMR.
- Enzymatic modification (not degradation) may be determined in a similar manner as enzymatic degradation, i.e., by exposing a substrate to the enzyme and using MALDI-MS to determine if the substrate is modified.
- a sulfotransferase may transfer a sulfate group to an HS chain having a concomitant increase in 80 Da.
- Conformation may be determined by modeling and nuclear magnetic resonance (NMR).
- the relative amounts of sulfation may be determined by compositional analysis or approximately determined by raman spectroscopy.
- FIG. 2D illustrates an example of the chemical unit ID 204 a .
- the chemical unit ID 204 a contains one or more fields 212 a - e for storing information about properties of a heparin-like glycosaminoglycan (HLGAG).
- HLGAGs are complex polysaccharide molecules made up of disaccharide repeat units comprising hexoseamine and glucuronic/iduronic acid that are linked by ⁇ / ⁇ 1-4 glycosidic linkages.
- HLGAG disaccharide unit
- X may be sulfated (—SO 3 H) or unsulfated (—H)
- Y may be sulfated (—SO 3 H) or acetylated (—COCH 3 ) or, in rare cases, neither sulfated nor acetylated.
- the fields 212 a - e may store any kinds of values, such as, for example single-bit values, single-digit hexadecimal values, or decimal values.
- the chemical unit ID 204 a includes each of the following fields: (1) a field 212 a for storing a value indicating whether the polymer contains an iduronic or a glucuronic acid (I/G); (2) a field 212 b for storing a value indicating whether the 2X position of the iduronic or glucuronic acid is sulfated or unsulfated; (3) a field 212 c for storing a value indicating whether the hexoseamine is sulfated or unsulfated; (4) a field 212 d indicating whether the 3X position of the hexoseamine is sulfated or unsulfated; and (5) a field 212 e indicating whether the NX position of the hexoseamine is sulfated or ace
- Table 2 illustrates an example of a data structure having a plurality of entries, where each entry represents an HLGAG encoded in accordance with FIG. 2D .
- Bit values for each of the fields 212 a - e may be assigned in any known manner. For example, with respect to field 212 a (I/G), a value of one may indicate Iduronic and a value of zero may indicate Glucuronic, or vice versa.
- Representing a HLGAG using a bit field may have a number of advantages. Because a property of an HLGAG may have one of two possible states, a binary bit is ideally-suited for storing information representing an HLGAG property. Bit fields may be used to store such information in a computer readable medium (e.g., a computer memory or storage device), for example, by packing multiple bits (representing multiple fields) into a single byte or sequence of bytes. Furthermore, bit fields may be stored and manipulated quickly and efficiently by digital computer processors, which typically store information using bits and which typically can quickly perform operations (e.g., shift, AND, OR) on bits. For example, as described in more detail below, a plurality of properties each stored as a bit field can be searched more quickly than searches conducted using typical character-based searching methods.
- a plurality of properties each stored as a bit field can be searched more quickly than searches conducted using typical character-based searching methods.
- bit fields to represent properties of HLGAGs permits a user to more easily incorporate additional properties (e.g., 4-O sulfation vs. unsulfation) into a chemical unit ID 204 a by adding extra bits to represent the additional properties.
- additional properties e.g., 4-O sulfation vs. unsulfation
- the four fields 212 b - e may be represented as a single hexadecimal (base 16 ) number where each of the fields 212 a - e represents one bit of the hexadecimal number.
- the five fields 212 a - e of the record 210 may be represented as signed hexadecimal digit, in which the fields 212 b - 212 e collectively encode a single-digit hexadecimal number as described above and the I/G field is used as a sign bit.
- the hexadecimal numbers 0-F may be used to code chemical units containing iduronic acid and the hexadecimal numbers ⁇ 0 to ⁇ F may be used to code units containing glucuronic acid.
- the chemical unit ID 204 a may, however, be encoded using other forms of representations, such as by using a twos-complement representation.
- the fields 212 a - e of the chemical unit ID 204 a may be arranged in any order.
- a gray code system may be used to code HLGAGs.
- each successive value differs from the previous value only in a single bit position.
- the values representing HLGAGs may be arranged so that any two neighboring values differ in the value of only one property.
- Table 3 An example of a gray code system used to code HLGAGs is shown in Table 3.
- Table 3 illustrates that use of a gray coding scheme arranges the disaccharide building blocks such that neighboring table entries differ from each other only in the value of a single property.
- One advantage of using gray codes to encode HLGAGs is that a biosynthesis of HLGAG fragments may follow a specific sequence of modifications starting from the basic building block G-H HNac .
- bit weights of 8, 4, 2, and 1 are used to calculate the numerical equivalent of a hexadecimal number with the most significant bit (I/G) being used as a sign bit.
- weights of each of the fields 212 a - e may be changed thereby implementing an alternative weighting system.
- bit fields 212 a - e may have weights of 16, 8, 4, ⁇ 2, and ⁇ 1, respectively, as shown in Table 4.
- Modifying the weights of the bits may be used to score the disaccharide units. For example, a database of sequences may be created and the different disaccharide units may be scored based on their relative abundance in the sequences present in the database. Some units, for example, I—H NAc.3S 6S , which rarely occur in naturally-occurring HLGAGs, may receive a low score based on a scheme in which the bits are weighted in the manner shown in Table 4.
- the sulfation and acetylation positions may be arranged in an shown in Table 2: I/G, 2X, 6X, 3X, NX. These positions may, however, be arranged differently, resulting in a same set of codes representing different disaccharide units.
- Table 5 shows an arrangement in which the positions are arranged as I/G, 2X, NX, 3X, 6X.
- disaccharide units in some HLGAG sequences are neither N-sulfated nor N-acetylated.
- Such disaccharide units may be represented using the chemical unit ID 204 a in any of a number of ways.
- disaccharide units that contain a free amine in the N position may be represented by, for example, adding an additional bit field.
- an additional field NY may be used in the chemical unit ID 204 a .
- an NY field having a value of zero may correspond to a free amine
- an NY field having a value of one may correspond to N-acetylation, or vice versa.
- a value of one in the NX field 212 e may correspond to N-sulfation.
- disaccharide units that contain a free amine in the N position may be represented using a tristate field.
- the field 212 e (NX) in the chemical unit ID 204 a may be a tristate field having three permissible values.
- a value of zero may correspond to a free amine
- a value of one may correspond to N-acetylation
- a value of two could correspond to N-sulfation.
- the values of any of the fields 212 a - e may be represented using a number system with a base higher than two. For example, if the value of the field 212 e (NX) is represented by a single-digit number having a base of three, then the field 212 e may store three permissible values.
- user may perform a query on the polymer database 102 to search for particular information.
- a user may search the polymer database 102 for specified polymers, specified chemical units, or polymers or chemical units having specified properties.
- a user may provide to a query user interface 108 user input 106 indicating properties for which to search.
- the user input 106 may, for example, indicate one or more chemical units, a polymer of chemical units or one or more properties to search for using, for example, a standard character-based notation.
- the query user interface 108 may, for example, provide a graphical user interface (GUI) which allows the user to select from a list of properties using an input device such as a keyboard or a mouse.
- GUI graphical user interface
- the query user interface 108 may generate a search query 110 based on the user input 106 .
- a search engine 112 may receive the search query 110 and generate a mask 114 based on the search query.
- Example formats of the mask 114 and example techniques to determine whether properties specified by the mask 114 match properties of polymers in the polymer database 102 are described in more detail below in connection to FIG. 3 .
- the search engine 112 may determine whether properties specified by the mask 114 match properties of polymers stored in the polymer database 102 . Subsequently, the search engine 112 may generate search results 116 based on the search indicating whether the polymer database 102 includes polymers having the properties specified by the mask 114 .
- the search results 116 also may indicate polymers in the polymer database 102 that have the properties specified by the mask 114 . For example, if the user input 106 specified properties of a chemical unit, the search results 116 may indicate which polymers in the polymer database 102 include the specified chemical unit. Alternatively, if the user input 106 specified particular chemical unit properties, the search results 116 may indicate polymers in the polymer database 102 that include chemical units having the specified chemical unit properties. Similarly, if the user input 106 specified particular polymer properties, the search results 116 may indicate which polymers in the polymer database 102 have the specified polymer properties.
- FIG. 3 is a flowchart illustrating an example of a process 300 that may be used by the search engine 112 to generate the search results 116 .
- the search engine 112 may receive a search query 110 from the query user interface 108 .
- the search engine 112 may generate a mask 114 generated based on the search query 110 .
- the search engine 112 may perform a binary operation on one or more of the records 104 a - n in the polymer database 102 by applying the mask 114 .
- the search engine 112 may generate the search results 116 based on the results of the binary operation performed in step 306 .
- the received search query 110 may indicate to search the polymer database 102 for a particular chemical unit, e.g. the chemical unit I 2S —H NS . If, for example, the coding scheme shown in Table 1 is used to encode chemical units in the polymer database, the chemical unit I 2S —H NS may be represented by a binary value of 01001.
- the search engine 112 may use the binary value of the chemical unit, i.e., 01001, as the value of the mask 114 .
- the values of the bits of the mask 114 may specify the properties of the chemical unit I 2S —H NS .
- the value of zero in the leftmost bit position may indicate Iduronic, and the value of one in the next bit position may indicate that the 2X position is sulfated.
- the search engine 112 may use this mask 114 to determine whether polymers in the polymer database 102 contain the chemical unit I 2S —H NS . To make this determination, the search engine 112 may perform a binary operation on the data units 104 a - n of the polymer database 102 using the mask 114 (step 306 ). For example, the search engine 112 may perform a logical AND operation on each chemical unit of each of the polymers in the polymer database 102 using the mask 114 . If the result of the logical AND operation on a particular chemical unit is equal to the value of the mask 114 , then the chemical unit may satisfy the search query 110 , and, in act 308 , the search engine 112 may indicate a successful match in the search results 116 . The search engine 112 may generate additional information in the search results 116 , such as the polymer identifier of the polymer containing the matching chemical unit.
- the search engine 112 In response to receiving the search query in act 302 , in act 304 , the search engine 112 also may generate the mask 114 that indicates one or more properties of a particular polymer or chemical unit. To generate the mask 114 for such a search query, the search engine 112 may set each bit position in the mask according to a property specified by the search query to the value specified by the search query.
- search query 110 that indicates a search for all chemical units in which both the 2X position and the 6X position are sulfated.
- the search engine 112 may set the bit positions of the mask corresponding to the 2X and 6X positions to a value corresponding to being sulfated.
- the mask corresponding to this search query is 01100.
- the two bits of this mask that have a value of one correspond to the bit positions in Table 1 corresponding to the 2X and 6X positions.
- the search engine 112 may perform a logical AND operation on the chemical unit identifier of the chemical unit in the polymer database 102 using the mask 114 .
- the search engine 112 may compare the result of the logical AND operation to the mask 114 . If the values of the bit positions of the logical AND operation corresponding to the properties specified by the search query are equal to the values of the same bit positions of the mask 114 , then the chemical unit has the properties specified by the search query 110 , and the search engine 112 indicates a successful match in the search results 116 .
- the search engine 112 compares bit positions 3 and 2 of the result of the logical AND operation to bit positions 3 and 2 of the mask. If the values in both bit positions are equal, then the chemical unit has the properties specified by the mask 114 .
- the techniques described above for generating the mask 114 and searching with a mask 114 also may be used to perform searches with respect to sequences of chemical units or entire polymers. For example, if the search query 110 indicates a sequence of chemical units, the search engine 112 may fill the mask 114 with a sequence of bits corresponding to the concatenation of the binary encodings of the specified sequence of chemical units. The search engine 112 may then perform a binary AND operation on the polymer identifiers in the polymer database 102 using the mask 114 , and generate the search results 116 as described above.
- the techniques described above for generating the mask 114 and searching with the mask 114 are provided merely as an example. Other techniques for generating and searching with the mask 114 may also be used.
- the search engine 112 also may use more than one mask for each search query 110 , and the search engine 112 may perform multiple binary operations in parallel in order to improve computational efficiency.
- binary operations other than a logical AND may be used to determine whether properties of the polymers in the polymer database 102 match the properties specified by the mask 114 .
- Other binary operations include, for example, logical OR and logical XOR (exclusive or). Such binary operations may be used alone or in combination with each other.
- the polymer database 102 may be searched quickly for particular chemical units.
- One advantage of the process 300 if used in conjunction with a chemical unit coding scheme that encodes properties of chemical units using binary values is that a chemical unit identifier (e.g., the chemical unit identifier 204 a ) may be compared to a search query (in the form of a mask) using a single binary operation (e.g., a binary AND operation).
- the speed of the techniques described above for searching binary operations may be constant in relation to the length of a sub-sequence that is the basis for the search query.
- the search engine 112 can search for a query sequence of chemical units using a single binary operation (e.g., a logical AND operation) regardless of the length of the query sequence, searches may be performed more quickly than conventional character-based methods whose speed is related to the length of the query sequence.
- the binary operations used by the search engine 112 may be performed more quickly because conventional computer processors are designed to perform binary operations on binary data.
- a further advantage of the techniques described above for searching using binary operations is that encoding one or more properties of a polymer into the notational representation of the polymer enables the search engine 112 to quickly and directly search the polymer database 102 for particular properties of polymers. Because the properties of a polymer are encoded into the polymer's notational representation, the search engine 112 may determine whether the polymer has a specified property by determining whether the specified property is encoded in the polymer's notational representation. For example, as described above, the search engine 112 may determine whether the polymer has the specified property by performing a logical AND operation on the polymer's notational representation using the mask 114 . This operation may be performed quickly by conventional computer processors and may be performed using only the polymer's notational representation and the mask, without reference to additional information about the properties of the polymer.
- complete building block of a polymer may be assigned a unique numeric identifier, which may be used to classify the complete building block.
- each numeric identifier may represent a complete building block of a polysaccharide, including the exact chemical structure as defined by the basic building block of a polysaccharide and all of its substituents, charges etc.
- a basic building block refers to a basic ring structure such as iduronic acid or glucuronic acid but does not include substituents, charges etc.
- building block information may be generated and processed in a same or similar manner as described above with respect to “properties” of polymers.
- a computer system that may implement the system 100 of FIG. 1 as a computer program typically may include a main unit connected to both an output device which displays information to a user and an input device which receives input from a user.
- the main unit generally includes a processor connected to a memory system via an interconnection mechanism.
- the input device and output device also may be connected to the processor and memory system via the interconnection mechanism.
- Example output devices include a cathode ray tube (CRT) display, liquid crystal displays (LCD), printers, communication devices such as a modem, and audio output.
- Example input devices also may be connected to the computer system.
- Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet, communication device, and data input devices such as sensors. The subject matter disclosed herein is not limited to the particular input or output devices used in combination with the computer system or to those described herein.
- the computer system may be a general purpose computer system which is programmable using a computer programming language, such as C++, Java, or other language, such as a scripting language or assembly language.
- the computer system also may include specially-programmed, special purpose hardware such as, for example, an Application-Specific Integrated Circuit (ASIC).
- ASIC Application-Specific Integrated Circuit
- the processor typically is a commercially-available processor, of which the series x86, Celeron, and Pentium processors, available from Intel, and similar devices from AMD and Cyrix, the 680X0 series microprocessors available from Motorola, the PowerPC microprocessor from IBM and the Alpha-series processors from Digital Equipment Corporation, are examples. Many other processors are available.
- Such a microprocessor executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services.
- the processor and operating system define a computer platform for which application programs in high-level programming languages may be written.
- a memory system typically includes a computer readable and writeable nonvolatile recording medium, of which a magnetic disk, a flash memory and tape are examples.
- the disk may be removable, such as a “floppy disk,” or permanent, known as a hard drive.
- a disk has a number of tracks in which signals are stored, typically in binary form, i.e., a form interpreted as a sequence of one and zeros. Such signals may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program.
- the processor causes data to be read from the nonvolatile recording medium into an integrated circuit memory element, which is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM).
- DRAM dynamic random access memory
- SRAM static memory
- the integrated circuit memory element typically allows for faster access to the information by the processor than does the disk.
- the processor generally manipulates the data within the integrated circuit memory and then copies the data to the disk after processing is completed.
- a variety of mechanisms are known for managing data movement between the disk and the integrated circuit memory element, and the subject matter disclosed herein is not limited to such mechanisms. Further, the subject matter disclosed herein is not limited to a particular memory system.
- the subject matter disclosed herein is not limited to a particular computer platform, particular processor, or particular high-level programming language. Additionally, the computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network. It should be understood that each module (e.g. 110 , 120 ) in FIG. 1 may be separate modules of a computer program, or may be separate computer programs. Such modules may be operable on separate computers. Data (e.g., 104 , 106 , 110 , 114 and 116 ) may be stored in a memory system or transmitted between computer systems. The subject matter disclosed herein is not limited to any particular implementation using software or hardware or firmware, or any combination thereof.
- the various elements of the system may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor.
- Various steps of the process may be performed by a computer processor executing a program tangibly embodied on a computer-readable medium to perform functions by operating on input and generating output.
- Computer programming languages suitable for implementing such a system include procedural programming languages, object-oriented programming languages, and combinations of the two.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Computing Systems (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Peptides Or Proteins (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Other Resins Obtained By Reactions Not Involving Carbon-To-Carbon Unsaturated Bonds (AREA)
Abstract
Description
- This application is a divisional of U.S. patent application Ser. No. 09/557,997, filed Apr. 24, 2000, currently pending, which claims priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application Nos. 60/130,747, filed Apr. 23, 1999; 60/130,792, filed Apr. 23, 1999; 60/159,939, filed Oct. 14, 1999; and 60/159,940, filed Oct. 14, 1999. The contents of each of which are incorporated herein by reference in their entirety.
- Various notational systems have been used to encode classes of chemical units. In such systems, a unique code is assigned to each chemical unit in the class. For example, in a conventional notational system for encoding amino acids, a single letter of the alphabet is assigned to each known amino acid. A polymer of chemical units can be represented, using such a notational system, as a set of codes corresponding to the chemical units. Such notational systems have been used to encode polymers, such as proteins, in a computer-readable format. A polymer that has been represented in a computer-readable format according to such a notational system can be processed by a computer.
- Conventional notational schemes for representing chemical units have represented the chemical units as characters (e.g., A, T, G, and C for nucleic acids), and have represented polymers of chemical units as sequences or sets of characters. Various operations may be performed on such a notational representation of a chemical unit or a polymer comprised of chemical units. For example, a user may search a database of chemical units for a query sequence of chemical units. The user typically provides a character-based notational representation of the sequence in the form of a sequence of characters, which is compared against the character-based notational representations of sequences of chemical units stored in the database. Character-based searching algorithms, however, are typically slow because such algorithms search by comparing individual characters in the query sequence against individual characters in the sequences of chemical units stored in the database. The speed of such algorithms is therefore related to the length of the query sequence, resulting in particularly poor performance for long query sequences.
- In one aspect, the invention is directed to a notational system for representing polymers of chemical units. The notational system is referred to as Property encoded nomenclature (PEN). According to one embodiment of the notational system, a polymer is assigned an identifier that includes information about properties of the polymer. For example, in one embodiment, properties of a disaccharide are each assigned a binary value, and an identifier for the disaccharide includes the binary values assigned to the properties of the disaccharide. In one embodiment, the identifier is capable of being expressed as a number, such as a single hexadecimal digit. The identifier may be stored in a computer readable medium, such as in a data unit (e.g., a record or a table entry) of a polymer database. Polymer identifiers may be used in a number of ways. For example, the identifiers may be used to determine whether properties of a query sequence of chemical units match properties of a polymer of chemical units. One application of such matching is to quickly search a polymer database for a particular polymer of interest or for a polymer or polymers having specified properties.
- In one aspect, the invention is directed to a data structure, tangibly embodied in a computer-readable medium, representing a polymer of chemical units. In another aspect, the invention is directed to a computer-implemented method for generating such a data structure. The data structure may include an identifier that may include one or more fields for storing values corresponding to properties of the polymer. At least one field may be a non-character-based field. Each field may be capable of storing a binary value. The identifier may be a numerical identifier, such as a number that is representable as a single-digit hexadecimal number.
- The polymer may be any of a variety of polymers. For example, (1) the polymer may be a polysaccharide and the chemical units may be saccharides; (2) the polymer may be a nucleic acid and the chemical units may be nucleotides; or (3) the polymer may be a polypeptide and the chemical units may be amino acids.
- The properties may be properties of the chemical units in the polymer. For example, the properties may include charges of chemical units in the polymer, identities of chemical units in the polymer, confirmations of chemical units in the polymer, or identities of substituents of chemical units in the polymer. The properties may be properties of the polymer that are not properties of any individual chemical unit within the polymer. Example properties include a total charge of the polymer, a total number of sulfates of the polymer, a dye-binding of the polymer, a mass of the polymer, compositional ratios of substituents, compositional ratios of iduronic versus glucuronic, enzymatic sensitivity, degree of sulfation, charge, and chirality.
- In another aspect, the invention is directed to a computer-implemented method for determining whether properties of a query sequence of chemical units match properties of a polymer of chemical units. The query sequence may be represented by a first data structure, tangibly embodied in a computer-readable medium, including an identifier that may include one or more bit fields for storing values corresponding to properties of the query sequence. The polymer may be represented by a second data structure, tangibly embodied in a computer-readable medium, including an identifier that may include one or more bit fields for storing values corresponding to properties of the polymer. The method may include acts of generating at least one mask based on the values stored in the one or more bit fields of the first data structure, performing at least one binary operation on the values stored in the one or more bit fields of the second data structure using the at least one mask to generate at least one result, and determining whether the properties of the query sequence match the properties of the polymer based on the at least one result. The chemical units may, for example, be any of the chemical units described above. Similarly, the properties may be any of the properties described above.
- In one embodiment, the act of generating includes an act of generating the at least one mask as a sequence of bits that is equivalent to the values stored in the one or more bit fields of the first data structure. In another embodiment, the act of generating includes an act of generating the at least one mask as a sequential repetition of the values stored in the one or more bit fields of the first data structure.
- In a further embodiment, the at least one mask includes a plurality of masks and the act of performing at least one binary operation includes acts of performing a logical AND operation on the values stored in the one or more bit fields of the second data structure using each of the plurality of masks to generate a plurality of intermediate results, and combining the plurality of intermediate results using at least one logical OR operation to generate the at least one result. In one embodiment, the act of determining includes an act of determining that the properties of the query sequence match the properties of the polymer when the at least one result has a non-zero value. In a further embodiment, the at least one binary operation includes at least one logical AND operation.
- In another aspect, the invention is directed to a database, tangibly embodied in a computer-readable medium, for storing information descriptive of one or more polymers. The database may include one or more data units (e.g., records or table entries) corresponding to the one or more polymers, each of the data units may include an identifier that may include one or more fields for storing values corresponding to properties of the polymer.
- In another embodiment, the invention is directed to a data structure, tangibly embodied in a computer-readable medium, representing a chemical unit of a polymer. The data structure may comprise an identifier including one or more fields. Each field may be for storing a value corresponding to one or more properties of the chemical unit. At least one field may store a non-character-based value such as, for example, a binary or decimal value.
- Other aspects of the invention include the various combinations of one or more of the foregoing aspects of the invention, as well as the combinations of one or more of the various embodiments thereof as found in the following detailed description or as may be derived therefrom. It should be understood that the foregoing aspects of the invention also have corresponding computer-implemented processes which are also aspects of the present invention. It should also be understood that other embodiments of the present invention may be derived by those of ordinary skill in the art both from the following detailed description of a particular embodiment of the invention.
-
FIG. 1 is a block diagram illustrating an example of a computer system for storing and manipulating polymer information. -
FIG. 2A is a diagram illustrating an example of a record for storing information about a polymer and its constituent chemical units. -
FIG. 2B is a diagram illustrating an example of a record for storing information about a polymer. -
FIG. 2C is a diagram illustrating an example of a record for storing information about constituent chemical units of a polymer. -
FIG. 3 is a flow chart illustrating an example of a method for determining whether properties of a first polymer of chemical units match properties of a second chemical unit. - The present invention will be better understood in view of the following detailed description of a particular embodiment thereof, taken in conjunction with the attached drawings. All references cited herein are hereby expressly incorporated by reference.
-
FIG. 1 shows an example of acomputer system 100 for storing and manipulating polymer information. Thecomputer system 100 includes apolymer database 102 which includes a plurality of records 104 a-n storing information corresponding to a plurality of polymers. Each of the records 104 a-n may store information about properties of the corresponding polymer, properties of the corresponding polymer's constituent chemical units, or both. The polymers for which information is stored in thepolymer database 102 may be any kind of polymers. For example, the polymers may include polysaccharides, nucleic acids, or polypeptides. - A “polymer” as used herein is a compound having a linear and/or branched backbone of chemical units which are secured together by linkages. In some but not all cases the backbone of the polymer may be branched. The term “backbone” is given its usual meaning in the field of polymer chemistry. The polymers may be heterogeneous in backbone composition thereby containing any possible combination of polymer units linked together such as peptide-nucleic acids. In an embodiment, a polymer is homogeneous in backbone composition and is, for example, a nucleic acid, a polypeptide, a polysaccharide, a carbohydrate, a polyurethane, a polycarbonate, a polyurea, a polyethyleneimine, a polyarylene sulfide, a polysiloxane, a polyimide, a polyacetate, a polyamide, a polyester, or a polythioester. A “polysaccharide” is a biopolymer comprised of linked saccharide or sugar units. A “nucleic acid” as used herein is a biopolymer comprised of nucleotides, such as deoxyribose nucleic acid (DNA) or ribose nucleic acid (RNA). A polypeptide as used herein is a biopolymer comprised of linked amino acids.
- As used herein with respect to linked units of a polymer, “linked” or “linkage” means two entities are bound to one another by any physicochemical means. Any linkage known to those of ordinary skill in the art, covalent or non-covalent, is embraced. Such linkages are well known to those of ordinary skill in the art. Natural linkages, which are those ordinarily found in nature connecting the chemical units of a particular polymer, are most common. Natural linkages include, for instance, amide, ester and thioester linkages. The chemical units of a polymer analyzed by the methods of the invention may be linked, however, by synthetic or modified linkages. Polymers where the units are linked by covalent bonds will be most common but also include hydrogen bonded, etc.
- The polymer is made up of a plurality of chemical units. A “chemical unit” as used herein is a building block or monomer which can be linked directly or indirectly to other building blocks or monomers to form a polymer. The polymer preferably is a polymer of at least two different linked units. The particular type of unit will depend on the type of polymer. For instance DNA is a biopolymer comprised of a deoxyribose phosphate backbone composed of units of purines and pyrimidines such as adenine, cytosine, guanine, thymine, 5-methylcytosine, 2-aminopurine, 2-amino-6-chloropurine, 2,6-diaminopurine, hypoxanthine, and other naturally and non-naturally occurring nucleobases, substituted and unsubstituted aromatic moieties. RNA is a biopolymer comprised of a ribose phosphate backbone composed of units of purines and pyrimidines such as those described for DNA but wherein uracil is substituted for thymidine. DNA units may be linked to the other units of the polymer by their 5′ or 3′ hydroxyl group thereby forming an ester linkage. RNA units may be linked to the other units of the polymer by their 5′, 3′ or 2′ hydroxyl group thereby forming an ester linkage. Alternatively, DNA or RNA units having a terminal 5′, 3′ or 2′ amino group may be linked to the other units of the polymer by the amino group thereby forming an amide linkage.
- Whenever a nucleic acid is represented by a sequence of letters it will be understood that the nucleotides are in 5′→3′ order from left to right and that “A” denotes adenosine, “C” denotes cytidine, “G” denotes guanosine, “T” denotes thymidine, and “U” denotes uracil unless otherwise noted.
- The chemical units of a polypeptide are amino acids, including the 20 naturally occurring amino acids as well as modified amino acids. Amino acids may exist as amides or free acids and are linked to the other units in the backbone of the polymers through their a-amino group thereby forming an amide linkage to the polymer.
- A polysaccharide is a polymer composed of monosaccharides linked to one another. In many polysaccharides the basic building block of the polysaccharide is actually a disaccharide unit which can be repeating or non-repeating. Thus, a unit when used with respect to a polysaccharide refers to a basic building block of a polysaccharide and can include a monomeric building block (monosaccharide) or a dimeric building block (disaccharide).
- A “plurality of chemical units” is at least two units linked to one another.
- The polymers may be native or naturally-occurring polymers which occur in nature or non-naturally occurring polymers which do not exist in nature. The polymers typically include at least a portion of a naturally occurring polymer. The polymers can be isolated or synthesized de novo. For example, the polymers can be isolated from natural sources e.g. purified, as by cleavage and gel separation or may be synthesized e.g., (i) amplified in vitro by, for example, polymerase chain reaction (PCR); (ii) synthesized by, for example, chemical synthesis; (iii) recombinantly produced by cloning, etc.
-
FIG. 2A illustrates an example of the format of adata unit 200 in the polymer database 102 (i.e., one of the data units 104 a-n). As shown inFIG. 2A , thedata unit 200 may include a polymer identifier (ID) 202 that identifies the polymer corresponding to thedata unit 200. Thepolymer ID 202 is described in more detail below with respect toFIG. 2B . Thedata unit 200 also may include one or more chemical unit identifiers (IDs) 204 a-n corresponding to chemical units that are constituents of the polymer corresponding to thedata unit 200. The chemical unit IDs 204 a-n are described in more detail below with respect toFIG. 2C . The format of thedata unit 200 shown inFIG. 2A is merely an example of a format that may be used to represent polymers in thepolymer database 102. Polymers may be represented in the polymer database in other ways. For example, thedata unit 200 may include only thepolymer ID 202 or may only include one or more of the chemical unit IDs 204 a-n. -
FIG. 2B illustrates an example of thepolymer ID 202. Thepolymer ID 202 may include one ormore fields 202 a-n for storing information about properties of the polymer corresponding to the data unit 200 (FIG. 2A ). Similarly,FIG. 2C illustrates an example of thechemical unit 204 a. Thechemical unit ID 204 a may include one or more fields 206 a-m for storing information about properties of the chemical unit corresponding to thechemical unit ID 204 a. Although the following description refers to the fields 206 a-m of thechemical unit ID 204 a, such description is equally applicable to thefields 202 a-n of thepolymer ID 202 a (and the fields of thechemical unit IDs 204 b-n). - The fields 206 a-m of the
chemical unit ID 204 a may store any kind of value that is capable of being stored in a computer readable medium, such as, for example, a binary value, a hexadecimal value, an integral decimal value, or a floating point value. - Each field 206 a-m may store information about any property of the corresponding chemical unit. A “property” as used herein is a characteristic (e.g., structural characteristic) of the polymer that provides information (e.g., structural information) about the polymer. When the term property is used with respect to any polymer except a polysaccharide the property provides information other than the identity of a unit of the polymer or the polymer itself. A compilation of several properties of a polymer may provide sufficient information to identify a chemical unit or even the entire polymer but the property of the polymer itself does not encompass the chemical basis of the chemical unit or polymer.
- When the term property is used with respect to polysaccharides, to define a polysaccharide property, it has the same meaning as described above except that due to the complexity of the polysaccharide, a property may identify a type of monomeric building block of the polysaccharide. Chemical units of polysaccharides are much more complex than chemical units of other polymers, such as nucleic acids and polypeptides. The polysaccharide unit has more variables in addition to its basic chemical structure than other chemical units. For example, the polysaccharide may be acetylated or sulfated at several sites on the chemical unit, or it may be charged or uncharged. Thus, one property of a polysaccharide may be the identity of one or more basic building blocks of the polysaccharides.
- A basic building block alone, however, may not provide information about the charge and the nature of substituents of the saccharide or disaccharide. For example, a building block of uronic acid may be iduronic or glucuronic acid. Each of these building blocks may have additional substituents that add complexity to the structure of the chemical unit. A single property, however, may not identify such additional substitutes charges, etc., in addition to identifying a complete building block of a polysaccharide. This information, however, may be assembled from several properties. Thus, a property of a polymer as used herein does not encompass an amino acid or nucleotide but does encompass a saccharide or disaccharide building block of a polysaccharide.
- A type of property that provides information about a polymer may depend on a type of polymer being analyzed. For instance, if the polymer is a polysaccharide, properties such as charge, molecular weight, nature and degree of sulfation or acetylation, and type of saccharide may provide information about the polymer. Properties may include, but are not limited to, charge, chirality, nature of substituents, quantity of substituents, molecular weight, molecular length, compositional ratios of substituents or units, type of basic building block of a polysaccharide, hydrophobicity, enzymatic sensitivity, hydrophilicity, secondary structure and conformation (i.e., position of helicies), spatial distribution of substituents, ratio of one set of modifications to another set of modifications (i.e., relative amounts of 2-O sulfation to N-sulfation or ratio of iduronic acid to glucuronic acid), and binding sites for proteins. Other properties may be identified by those of ordinary skill in the art. A substituent, as used herein is an atom or group of atoms that substitute a unit, but are not themselves the units.
- A property of a polymer may be identified by any means known in the art. The procedure used to identify a property may depend on a type of property. Molecular weight, for instance, may be determined by several methods including mass spectrometry. The use of mass spectrometry for determining the molecular weight of polymers is well known in the art. Mass Spectrometry has been used as a powerful tool to characterize polymers because of its accuracy (±Dalton) in reporting the masses of fragments generated (e.g., by enzymatic cleavage), and also because only pM sample concentrations are required. For example, matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) has been described for identifying the molecular weight of polysaccharide fragments in publications such as Rhomberg, A. J. et al, PNAS, USA, v. 95, p. 4176-4181 (1998); Rhomberg, A. J. et al, PNAS, USA, v. 95, p. 12232-12237 (1998); and Ernst, S. et. al., PNAS, USA, v. 95, p. 4182-4187 (1998), each of which is hereby incorporated by reference. Other types of mass spectrometry known in the art, such as, electron spray-MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) can also be used to identify the molecular weight of the polymer or polymer fragments.
- The mass spectrometry data may be a valuable tool to ascertain information about the polymer fragment sizes after the polymer has undergone degradation with enzymes or chemicals. After a molecular weight of a polymer is identified, it may be compared to molecular weights of other known polymers. Because masses obtained from the mass spectrometry data are accurate to one Dalton (1 D), a size of one or more polymer fragments obtained by enzymatic digestion may be precisely determined, and a number of substituents (i.e., sulfates and acetate groups present) may be determined. One technique for comparing molecular weights is to generate a mass line and compare the molecular weight of the unknown polymer to the mass line to determine a subpopulation of polymers which have the same molecular weight. A “mass line” as used herein is an information database, preferably in the form of a graph or chart which stores information for each possible type of polymer having a unique sequence based on the molecular weight of the polymer. Thus, a mass line may describe a number of polymers having a particular molecular weight. A two-unit nucleic acid molecule (i.e., a nucleic acid having two chemical units) has 16 (4 units2) possible polymers at a molecular weight corresponding to two nucleotides. A two-unit polysaccharide (i.e., disaccharide) has 32 possible polymers at a molecular weight corresponding to two saccharides. Thus, a mass line may be generated by uniquely assigning a particular mass to a particular length of a given fragment (all possible di, tetra, hexa, octa, up to a hexadecasaccharide), and tabulating the results (An Example is shown in
FIG. 4 ). - Table 1 below shows an example of a computed set of values for a polysaccharide. From Table 1, a number of chemical units of a polymer may be determined from the minimum difference in mass between a fragment of length n+1 and a fragment of length n. For example, if the repeat is a disaccharide unit, a fragment of length n has 2n monosaccharide units. For example, n=1 may correspond to a length of a disaccharide and n=2 may correspond to a length of a tetrasaccharide, etc.
-
TABLE 1 Minimum difference in mass Fragment Length n between n + 1 and n (D) 1 101.13 2 13.03 3 13.03 4 9.01 5 9.01 6 4.99 7 4.99 8 0.97 9 0.97 - Because mass spectrometry data indicates the mass of a fragment to 1 D accuracy, a length may be assigned uniquely to fragment by looking up a mass on the mass line. Further, it may be determined from the mass line that, within a fragment of particular length higher than a disaccharide, there is a minimum of 4.02 D different in masses indicating that two acetate groups (84.08 D) replaced a sulfate group (80.06 D). Therefore, a number of sulfates and acetates of a polymer fragment may be determined from the mass from the mass spectrometry data and, such number may be assigned to the polymer fragment.
- In addition to molecular weight, other properties may be determined using methods known in the art. The compositional ratios of substituents or chemical units (quantity and type of total substituents or chemical units) may be determined using methodology known in the art, such as capillary electrophoresis. A polymer may be subjected to an experimental constraint such as enzymatic or chemical degradation to separate each of the chemical units of the polymers. These units then may be separated using capillary electrophoresis to determine the quantity and type of substituents or chemical units present in the polymer. Additionally, a number of substituents or chemical units can be determined using calculations based on the molecular weight of the polymer.
- In the method of capillary gel-electrophoresis, reaction samples may be analyzed by small-diameter, gel-filled capillaries. The small diameter of the capillaries (50 μm) allows for efficient dissipation of heat generated during electrophoresis. Thus, high field strengths can be used without excessive Joule heating (400 V/m), lowering the separation time to about 20 minutes per reaction run, therefor increasing resolution over conventional gel electrophoresis. Additionally, many capillaries may be analyzed in parallel, allowing amplification of generated polymer information.
- In addition to being useful for identifying a property, compositional analysis also may be used to determine a presence and composition of an impurity as well as a main property of the polymer. Such determinations may be accomplished if the impurity does not contain an identical composition as the polymer. To determine whether an impurity is present may involve accurately integrating an area under each peak that appears in the electrophoretogram and normalizing the peaks to the smallest of the major peaks. The sum of the normalized peaks should be equal to one or close to being equal to one. If it is not, then one or more impurities are present. Impurities even may be detected in unknown samples if at least one of the disaccharide units of the impurity differs from any disaccharide unit of the unknown.
- If an impurity is present, one or more aspects of a composition of the components may be determined using capillary electrophoresis. Because all known disaccharide units may be baseline-separated by the capillary electrophoresis method described above and because migration times typically are determined using electrophoresis (i.e., as opposed to electroosmotic flow) and are reproducible, reliable assignment to a polymer fragment of the various saccharide units may be achieved. Consequently, both a composition of the major peak and a composition of a minor contaminant may be assigned to a polymer fragment. The composition for both the major and minor components of a solution may be assigned as described below.
- One example of such assignment of compositions involves determining the composition of the major AT-III binding HLGAG decasaccharide (+DDD4-7) and its minor contaminant (+D5D4-7) present in solution in a 9:1 ratio. Complete digestion of this 9:1 mixture with a heparinases yields 4 peaks: three representative of the major decasaccharide (viz., D, 4, and −7) which are also present in the contaminant and one peak, 5, that is present only in the contaminant. In other words, the area of each peak for D, 4, and −7 represents an additive combination of a contribution from the major decasaccharide and the contribution from the contaminant, whereas the peak for 5 represents only the contaminant.
- To assign the composition of the contaminant and the major component, the area under the 5 peak may be used as a starting point. This area represents an area under the peak for one disaccharide unit of the contaminant. Subtracting this area from the total area of 4 and −7 and subtracted twice this area from an area under D yields a 1:1:3 ratio of 4:−7:D. Such a ratio confirms the composition of the major component and indicates that the composition of the impurity is two Ds, one 4, one −7 and one 5.
- Methods of identifying other types of properties may be easily identifiable to those of skill in the art and may depend on the type of property and the type of polymer. For example, hydrophobicity may be determined using reverse-phase high-pressure liquid chromatography (RP-HPLC). Enzymatic sensitivity may be identified by exposing the polymer to an enzyme and determining a number of fragments present after such exposure. The chirality may be determined using circular dichroism. Protein binding sites may be determined by mass spectrometry, isothermal calorimetry and NMR. Enzymatic modification (not degradation) may be determined in a similar manner as enzymatic degradation, i.e., by exposing a substrate to the enzyme and using MALDI-MS to determine if the substrate is modified. For example, a sulfotransferase may transfer a sulfate group to an HS chain having a concomitant increase in 80 Da. Conformation may be determined by modeling and nuclear magnetic resonance (NMR). The relative amounts of sulfation may be determined by compositional analysis or approximately determined by raman spectroscopy.
-
FIG. 2D illustrates an example of thechemical unit ID 204 a. Thechemical unit ID 204 a contains one or more fields 212 a-e for storing information about properties of a heparin-like glycosaminoglycan (HLGAG). HLGAGs are complex polysaccharide molecules made up of disaccharide repeat units comprising hexoseamine and glucuronic/iduronic acid that are linked by α/β 1-4 glycosidic linkages. These defining units may be modified by: sulfation at the N, 3-O and 6-O position of the hexoseamine, 2-O sulfation of the uronic acid, and C5 epimerization that converts the glucuronic acid to iduronic acid. The disaccharide unit of HLGAG may be represented as: -
(α1→4)I/G2OX(α/β1→4)H3OX,NY 6OX(α1→4), - where X may be sulfated (—SO3H) or unsulfated (—H), and Y may be sulfated (—SO3H) or acetylated (—COCH3) or, in rare cases, neither sulfated nor acetylated.
- The fields 212 a-e may store any kinds of values, such as, for example single-bit values, single-digit hexadecimal values, or decimal values. In one embodiment, the
chemical unit ID 204 a includes each of the following fields: (1) afield 212 a for storing a value indicating whether the polymer contains an iduronic or a glucuronic acid (I/G); (2) afield 212 b for storing a value indicating whether the 2X position of the iduronic or glucuronic acid is sulfated or unsulfated; (3) afield 212 c for storing a value indicating whether the hexoseamine is sulfated or unsulfated; (4) afield 212 d indicating whether the 3X position of the hexoseamine is sulfated or unsulfated; and (5) afield 212 e indicating whether the NX position of the hexoseamine is sulfated or acetylated. Optionally, each of the fields 212 a-e may be represented as a single bit. - Table 2 illustrates an example of a data structure having a plurality of entries, where each entry represents an HLGAG encoded in accordance with
FIG. 2D . Bit values for each of the fields 212 a-e may be assigned in any known manner. For example, with respect to field 212 a (I/G), a value of one may indicate Iduronic and a value of zero may indicate Glucuronic, or vice versa. -
TABLE 2 ALPH MASS I/G 2X 6X 3X NX CODE DISACC (ΔU) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 1 I-HNS 417.35 0 0 0 1 0 2 I-HNAc,3S 459.39 0 0 0 1 1 3 I-HNS,3S 497.41 0 0 1 0 0 4 I-HNAc,6S 459.39 0 0 1 0 1 5 I-HNS,6S 497.41 0 0 1 1 0 6 I-HNAc,3S,6S 539.45 0 0 1 1 1 7 I-HNS,3S,6S 577.47 0 1 0 0 0 8 I2S-HNAc 459.39 0 1 0 0 1 9 I2S-HNS 497.41 0 1 0 1 0 A I2S-HNAc,3S 539.45 0 1 0 1 1 B I2S-HNS,3S 577.47 0 1 1 0 0 C I2S-HNAc,6S 539.45 0 1 1 0 1 D I2S-HNS,6S 577.47 0 1 1 1 0 E I2S- 619.51 HNAc,3S,6S 0 1 1 1 1 F I2S-HNS,3S,6S 657.53 1 0 0 0 0 −0 G-HNAc 379.33 1 0 0 0 1 −1 G-HNS 417.35 1 0 0 1 0 −2 G-HNAc,3S 459.39 1 0 0 1 1 −3 G-HNS,3S 497.41 1 0 1 0 0 −4 G-HNAc,6S 459.39 1 0 1 0 1 −5 G-HNS,6S 497.41 1 0 1 1 0 −6 G-HNAc,3S,6S 539.45 1 0 1 1 1 −7 G-HNS,3S,6S 577.47 1 1 0 0 0 −8 G2S-HNAc 459.39 1 1 0 0 1 −9 G2S-HNS 497.41 1 1 0 1 0 −A G2S-HNAc,3S 539.45 1 1 0 1 1 −B G2S-HNS,3S 577.47 1 1 1 0 0 G2S-HNAc,6S 1 1 1 0 1 −D G2S-HNS,6S 577.47 1 1 1 1 0 −E G2S- 619.51 HNAc,3S,6S 1 1 1 1 1 −F G2S- 657.53 HNS,3S,6S - Representing a HLGAG using a bit field may have a number of advantages. Because a property of an HLGAG may have one of two possible states, a binary bit is ideally-suited for storing information representing an HLGAG property. Bit fields may be used to store such information in a computer readable medium (e.g., a computer memory or storage device), for example, by packing multiple bits (representing multiple fields) into a single byte or sequence of bytes. Furthermore, bit fields may be stored and manipulated quickly and efficiently by digital computer processors, which typically store information using bits and which typically can quickly perform operations (e.g., shift, AND, OR) on bits. For example, as described in more detail below, a plurality of properties each stored as a bit field can be searched more quickly than searches conducted using typical character-based searching methods.
- Further, using bit fields to represent properties of HLGAGs permits a user to more easily incorporate additional properties (e.g., 4-O sulfation vs. unsulfation) into a
chemical unit ID 204 a by adding extra bits to represent the additional properties. - In one embodiment, the four
fields 212 b-e (each of which may store a single-bit value) may be represented as a single hexadecimal (base 16) number where each of the fields 212 a-e represents one bit of the hexadecimal number. Using hexadecimal numbers to represent disaccharide units is convenient both for representation and processing because hexadecimal digits are a common form of representation used by conventional computers. - Optionally, the five fields 212 a-e of the record 210 may be represented as signed hexadecimal digit, in which the
fields 212 b-212 e collectively encode a single-digit hexadecimal number as described above and the I/G field is used as a sign bit. In such a signed representation, the hexadecimal numbers 0-F may be used to code chemical units containing iduronic acid and the hexadecimal numbers −0 to −F may be used to code units containing glucuronic acid. Thechemical unit ID 204 a may, however, be encoded using other forms of representations, such as by using a twos-complement representation. - The fields 212 a-e of the
chemical unit ID 204 a may be arranged in any order. For example, a gray code system may be used to code HLGAGs. In a gray code numbering scheme, each successive value differs from the previous value only in a single bit position. For example, in the case of HLGAGs, the values representing HLGAGs may be arranged so that any two neighboring values differ in the value of only one property. An example of a gray code system used to code HLGAGs is shown in Table 3. -
TABLE 3 I/G Numeric MASS 16 2X 8 6X 4 3X 2 NX 1 Value DISACC (ΔU) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 1 I-HNS 417.35 0 0 0 1 1 3 I-HNS,3S 497.41 0 0 0 1 0 2 I-HNAc,3S 459.39 0 0 1 1 0 6 I-HNAc,3S,6S 539.45 0 0 1 1 1 7 I-HNS,3S,6S 577.47 0 0 1 0 1 5 I-HNS,6S 497.41 0 0 1 0 0 4 I-HNAc,6S 459.39 0 1 1 0 0 12 I2S-HNAc,6S 539.45 0 1 1 0 1 13 I2S-HNS,6S 577.47 0 1 1 1 1 15 I2S-HNS,3S,6S 657.53 0 1 1 1 0 14 I2S-HNAc,3S,6S 619.51 0 1 0 1 0 10 I2S-HNAc,3S 539.45 0 1 0 1 1 11 I2S-HNS,3S 577.47 0 1 0 0 1 9 I2S-HNS 497.41 0 1 0 0 0 8 I2S-HNAc 459.39 1 1 0 0 0 24 G2S-HNAc 459.39 1 1 0 0 1 25 G2S-HNS 497.41 1 1 0 1 1 27 G2S-HNS,3S 577.41 1 1 0 1 0 26 G2S-HNAc,3S 539.45 1 1 1 1 0 30 G2S-HNAc,3S,6S 619.51 1 1 1 1 1 31 G2S-HNS,3S,6S 657.53 1 1 1 0 1 29 G2S-HNS,6S 577.47 1 1 1 0 0 28 G2S-HNAc,6S 539.45 1 0 1 0 0 20 G-HNAc,6S 459.39 1 0 1 0 1 21 G-HNS,6S 497.41 1 0 1 1 1 23 G-HNS,3S,6S 577.47 1 0 1 1 0 22 G-HNAc,3S,6S 539.45 1 0 0 1 0 18 G-HNAc,3S 459.39 1 0 0 1 1 19 G-HNS,3S 497.41 1 0 0 0 1 17 G-HNS 417.35 1 0 0 0 0 16 G-HNAc 379.33 - Table 3 illustrates that use of a gray coding scheme arranges the disaccharide building blocks such that neighboring table entries differ from each other only in the value of a single property. One advantage of using gray codes to encode HLGAGs is that a biosynthesis of HLGAG fragments may follow a specific sequence of modifications starting from the basic building block G-HHNac.
- In Table 3, bit weights of 8, 4, 2, and 1 are used to calculate the numerical equivalent of a hexadecimal number with the most significant bit (I/G) being used as a sign bit. For example, the hexadecimal code A (01010 binary) is equal to 8*1+4*0+2*1+1*0=10.
- In another embodiment, the weights of each of the fields 212 a-e may be changed thereby implementing an alternative weighting system. For example, bit fields 212 a-e may have weights of 16, 8, 4, −2, and −1, respectively, as shown in Table 4.
-
TABLE 4 I/G MASS 16 2X 8 NX 4 3X −2 6X −1 Value DISACC (ΔU) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 −1 I-HNAc,6S 459.39 0 0 0 1 0 −2 I-HNAc,3S 459.39 0 0 0 1 1 −3 I-HNAc,3S,6S 539.45 0 0 1 0 0 4 I-HNS 417.35 0 0 1 0 1 3 I-HNS,6S 497.41 0 0 1 1 0 2 I-HNS,3S 497.41 0 0 1 1 1 1 I-HNS,3S,6S 577.47 0 1 0 0 0 8 I2S-HNAc 459.39 0 1 0 0 1 7 I2S-HNAc,6S 539.45 0 1 0 1 0 6 I2S-HNAc,3S 539.45 0 1 0 1 1 5 I2S-HNAc,3S,6S 619.51 0 1 1 0 0 12 I2S-HNS 497.41 0 1 1 0 1 11 I2S-HNS,6S 577.47 0 1 1 1 0 10 I2S-HNS,3S 577.47 0 1 1 1 1 9 I2S-HNS,3S,6S 657.53 1 0 0 0 0 16 G-HNAc 379.33 1 0 0 0 1 15 G-HNAc,6S 459.39 1 0 0 1 0 14 G-HNAc,3S 459.39 1 0 0 1 1 13 G-HNAc,3S,6S 539.45 1 0 1 0 0 20 G-HNS 417.35 1 0 1 0 1 19 G-HNS,6S 497.41 1 0 1 1 0 18 G-HNS,3S 497.41 1 0 1 1 1 17 G-HNS,3S,6S 577.47 1 1 0 0 0 24 G2S-HNAc 459.39 1 1 0 0 1 23 G2S-HNAc,6S 539.45 1 1 0 1 0 22 G2S-HNAc,3S 539.45 1 1 0 1 1 21 G2S-HNAc,3S,6S 619.51 1 1 1 0 0 28 G2S-HNS 497.41 1 1 1 0 1 27 G2S-HNS,6S 577.47 1 1 1 1 0 26 G2S-HNS,3S 577.47 1 1 1 1 1 25 G2S-HNS,3S,6S 657.53 - Modifying the weights of the bits may be used to score the disaccharide units. For example, a database of sequences may be created and the different disaccharide units may be scored based on their relative abundance in the sequences present in the database. Some units, for example, I—HNAc.3S 6S, which rarely occur in naturally-occurring HLGAGs, may receive a low score based on a scheme in which the bits are weighted in the manner shown in Table 4.
- Optionally, the sulfation and acetylation positions may be arranged in an shown in Table 2: I/G, 2X, 6X, 3X, NX. These positions may, however, be arranged differently, resulting in a same set of codes representing different disaccharide units. Table 5, for example, shows an arrangement in which the positions are arranged as I/G, 2X, NX, 3X, 6X.
-
TABLE 5 ALPH MASS I/G 2X NX 3X 6X CODE DISACC (ΔU) 0 0 0 0 0 0 I-HNAc 379.33 0 0 0 0 1 1 I-HNAc,6S 459.39 0 0 0 1 0 2 I-HNAc,3S 459.39 0 0 0 1 1 3 I-HNAc,3S,6S 539.45 0 0 1 0 0 4 I-HNS 417.35 0 0 1 0 1 5 I-HNS,6S 497.41 0 0 1 1 0 6 I-HNS,3S 497.41 0 0 1 1 1 7 I-HNS,3S,6S 577.47 0 1 0 0 0 8 I2S-HNAc 459.39 0 1 0 0 1 9 I2S-HNAc,6S 539.45 0 1 0 1 0 A I2S-HNAc,3S 539.45 0 1 0 1 1 B I2S- 619.51 HNAc,3S,6S 0 1 1 0 0 C I2S-HNS 497.41 0 1 1 0 1 D I2S-HNS,6S 577.47 0 1 1 1 0 E I2S-HNS,3S 577.47 0 1 1 1 1 F I2S-HNS,3S,6S 657.53 1 0 0 0 0 −0 G-HNAc 379.33 1 0 0 0 1 −1 G-HNAc,6S 459.39 1 0 0 1 0 −2 G-HNAc,3S 459.39 1 0 0 1 1 −3 G-HNAc,3S,6S 539.45 1 0 1 0 0 −4 G-HNS 417.35 1 0 1 0 1 −5 G-HNS,6S 497.41 1 0 1 1 0 −6 G-HNS,3S 497.41 1 0 1 1 1 −7 G-HNS,3S,6S 577.47 1 1 0 0 0 −8 G2S-HNAc 459.39 1 1 0 0 1 −9 G2S-HNAc,6S 539.45 1 1 0 1 0 −A G2S-HNAc,3S 539.45 1 1 0 1 1 −B G2S- 619.51 HNAc,3S,6S 1 1 1 0 0 −C G2S-HNS 497.41 1 1 1 0 1 −D G2S-HNS,6S 577.47 1 1 1 1 0 −E G2S-HNS,3S 577.47 1 1 1 1 1 −F G2S- 657.53 HNS,3S,6S - It has been observed that disaccharide units in some HLGAG sequences are neither N-sulfated nor N-acetylated. Such disaccharide units may be represented using the
chemical unit ID 204 a in any of a number of ways. - If the properties of a chemical unit are represented by bit fields, disaccharide units that contain a free amine in the N position may be represented by, for example, adding an additional bit field. For example, referring to
FIG. 2D , an additional field NY may be used in thechemical unit ID 204 a. For example, an NY field having a value of zero may correspond to a free amine, and an NY field having a value of one may correspond to N-acetylation, or vice versa. Further, a value of one in theNX field 212 e may correspond to N-sulfation. - Optionally, disaccharide units that contain a free amine in the N position may be represented using a tristate field. For example, the
field 212 e (NX) in thechemical unit ID 204 a may be a tristate field having three permissible values. For example, a value of zero may correspond to a free amine, a value of one may correspond to N-acetylation, and a value of two could correspond to N-sulfation. Similarly, the values of any of the fields 212 a-e may be represented using a number system with a base higher than two. For example, if the value of thefield 212 e (NX) is represented by a single-digit number having a base of three, then thefield 212 e may store three permissible values. - Referring to
FIG. 1 , user may perform a query on thepolymer database 102 to search for particular information. For example, a user may search thepolymer database 102 for specified polymers, specified chemical units, or polymers or chemical units having specified properties. A user may provide to aquery user interface 108user input 106 indicating properties for which to search. Theuser input 106 may, for example, indicate one or more chemical units, a polymer of chemical units or one or more properties to search for using, for example, a standard character-based notation. Thequery user interface 108 may, for example, provide a graphical user interface (GUI) which allows the user to select from a list of properties using an input device such as a keyboard or a mouse. - The
query user interface 108 may generate asearch query 110 based on theuser input 106. Asearch engine 112 may receive thesearch query 110 and generate amask 114 based on the search query. Example formats of themask 114, and example techniques to determine whether properties specified by themask 114 match properties of polymers in thepolymer database 102 are described in more detail below in connection toFIG. 3 . - The
search engine 112 may determine whether properties specified by themask 114 match properties of polymers stored in thepolymer database 102. Subsequently, thesearch engine 112 may generatesearch results 116 based on the search indicating whether thepolymer database 102 includes polymers having the properties specified by themask 114. The search results 116 also may indicate polymers in thepolymer database 102 that have the properties specified by themask 114. For example, if theuser input 106 specified properties of a chemical unit, the search results 116 may indicate which polymers in thepolymer database 102 include the specified chemical unit. Alternatively, if theuser input 106 specified particular chemical unit properties, the search results 116 may indicate polymers in thepolymer database 102 that include chemical units having the specified chemical unit properties. Similarly, if theuser input 106 specified particular polymer properties, the search results 116 may indicate which polymers in thepolymer database 102 have the specified polymer properties. -
FIG. 3 is a flowchart illustrating an example of aprocess 300 that may be used by thesearch engine 112 to generate the search results 116. Inact 302, thesearch engine 112 may receive asearch query 110 from thequery user interface 108. Next, inact 304, thesearch engine 112 may generate amask 114 generated based on thesearch query 110. In a followingact 306, thesearch engine 112 may perform a binary operation on one or more of the records 104 a-n in thepolymer database 102 by applying themask 114. Next, inact 308, thesearch engine 112 may generate the search results 116 based on the results of the binary operation performed instep 306. - The
process 300 will now be described in more detail with respect to an embodiment in which the fields 206 a-m of thechemical unit 204 a are binary fields. Inact 302, the receivedsearch query 110 may indicate to search thepolymer database 102 for a particular chemical unit, e.g. the chemical unit I2S—HNS. If, for example, the coding scheme shown in Table 1 is used to encode chemical units in the polymer database, the chemical unit I2S—HNS may be represented by a binary value of 01001. To generate themask 114 for this chemical unit (step 304), thesearch engine 112 may use the binary value of the chemical unit, i.e., 01001, as the value of themask 114. As a result, the values of the bits of themask 114 may specify the properties of the chemical unit I2S—HNS. For example, the value of zero in the leftmost bit position may indicate Iduronic, and the value of one in the next bit position may indicate that the 2X position is sulfated. - The
search engine 112 may use thismask 114 to determine whether polymers in thepolymer database 102 contain the chemical unit I2S—HNS. To make this determination, thesearch engine 112 may perform a binary operation on the data units 104 a-n of thepolymer database 102 using the mask 114 (step 306). For example, thesearch engine 112 may perform a logical AND operation on each chemical unit of each of the polymers in thepolymer database 102 using themask 114. If the result of the logical AND operation on a particular chemical unit is equal to the value of themask 114, then the chemical unit may satisfy thesearch query 110, and, inact 308, thesearch engine 112 may indicate a successful match in the search results 116. Thesearch engine 112 may generate additional information in the search results 116, such as the polymer identifier of the polymer containing the matching chemical unit. - In response to receiving the search query in
act 302, inact 304, thesearch engine 112 also may generate themask 114 that indicates one or more properties of a particular polymer or chemical unit. To generate themask 114 for such a search query, thesearch engine 112 may set each bit position in the mask according to a property specified by the search query to the value specified by the search query. Consider, for example,search query 110 that indicates a search for all chemical units in which both the 2X position and the 6X position are sulfated. To generate a mask corresponding to this search query, thesearch engine 112 may set the bit positions of the mask corresponding to the 2X and 6X positions to a value corresponding to being sulfated. Using the coding scheme shown above in Table 1, for example, in which the 2X and 6X positions have bit positions of 3 and 2 (counting from the rightmost position beginning at bit position zero), respectively, the mask corresponding to this search query is 01100. The two bits of this mask that have a value of one correspond to the bit positions in Table 1 corresponding to the 2X and 6X positions. - To determine whether the one or more properties of a particular chemical unit in the
polymer database 102 match the one or more properties specified by themask 114, thesearch engine 112 may perform a logical AND operation on the chemical unit identifier of the chemical unit in thepolymer database 102 using themask 114. To generate search results for this chemical unit (i.e., act 308), thesearch engine 112 may compare the result of the logical AND operation to themask 114. If the values of the bit positions of the logical AND operation corresponding to the properties specified by the search query are equal to the values of the same bit positions of themask 114, then the chemical unit has the properties specified by thesearch query 110, and thesearch engine 112 indicates a successful match in the search results 116. - For example, consider the
search query 110 described above, which indicates a search for all chemical units in which both the 2X position and the 6X position are sulfated. Using the coding scheme of Table 1, the bit positions corresponding to the 2X and 6× positions arebit positions mask 114, thesearch engine 112 comparesbit positions positions mask 114. - The techniques described above for generating the
mask 114 and searching with amask 114 also may be used to perform searches with respect to sequences of chemical units or entire polymers. For example, if thesearch query 110 indicates a sequence of chemical units, thesearch engine 112 may fill themask 114 with a sequence of bits corresponding to the concatenation of the binary encodings of the specified sequence of chemical units. Thesearch engine 112 may then perform a binary AND operation on the polymer identifiers in thepolymer database 102 using themask 114, and generate the search results 116 as described above. - The techniques described above for generating the
mask 114 and searching with themask 114 are provided merely as an example. Other techniques for generating and searching with themask 114 may also be used. Thesearch engine 112 also may use more than one mask for eachsearch query 110, and thesearch engine 112 may perform multiple binary operations in parallel in order to improve computational efficiency. In addition, binary operations other than a logical AND may be used to determine whether properties of the polymers in thepolymer database 102 match the properties specified by themask 114. Other binary operations include, for example, logical OR and logical XOR (exclusive or). Such binary operations may be used alone or in combination with each other. - Using the techniques described above, the
polymer database 102 may be searched quickly for particular chemical units. One advantage of theprocess 300, if used in conjunction with a chemical unit coding scheme that encodes properties of chemical units using binary values is that a chemical unit identifier (e.g., thechemical unit identifier 204 a) may be compared to a search query (in the form of a mask) using a single binary operation (e.g., a binary AND operation). As described above, conventional notation systems that use character-based notation systems to encode sequences of chemical units (e.g., systems which encode DNA sequences as sequences of characters) typically search for a sub-sequence of chemical units (represented by a first sequence of characters) within a super-sequence of chemical units (represented by a second sequence of characters) and use character-based comparison. Such a comparison typically is slow because it sequentially compares each character in a first sequence of characters (corresponding to the sub-sequence) to characters in a second sequence until a match is found. Consequently, the speed of the search is related to the length of the sub-sequence—i.e., the longer the sub-sequence, the slower the search. - In contrast, the speed of the techniques described above for searching binary operations may be constant in relation to the length of a sub-sequence that is the basis for the search query. Because the
search engine 112 can search for a query sequence of chemical units using a single binary operation (e.g., a logical AND operation) regardless of the length of the query sequence, searches may be performed more quickly than conventional character-based methods whose speed is related to the length of the query sequence. Further, the binary operations used by thesearch engine 112 may be performed more quickly because conventional computer processors are designed to perform binary operations on binary data. - A further advantage of the techniques described above for searching using binary operations is that encoding one or more properties of a polymer into the notational representation of the polymer enables the
search engine 112 to quickly and directly search thepolymer database 102 for particular properties of polymers. Because the properties of a polymer are encoded into the polymer's notational representation, thesearch engine 112 may determine whether the polymer has a specified property by determining whether the specified property is encoded in the polymer's notational representation. For example, as described above, thesearch engine 112 may determine whether the polymer has the specified property by performing a logical AND operation on the polymer's notational representation using themask 114. This operation may be performed quickly by conventional computer processors and may be performed using only the polymer's notational representation and the mask, without reference to additional information about the properties of the polymer. - Some aspects of the techniques described herein for representing properties using binary notation may be useful for generating, searching and manipulating information about polysaccharides. Accordingly, complete building block of a polymer may be assigned a unique numeric identifier, which may be used to classify the complete building block. For example, each numeric identifier may represent a complete building block of a polysaccharide, including the exact chemical structure as defined by the basic building block of a polysaccharide and all of its substituents, charges etc. A basic building block refers to a basic ring structure such as iduronic acid or glucuronic acid but does not include substituents, charges etc. Such building block information may be generated and processed in a same or similar manner as described above with respect to “properties” of polymers.
- A computer system that may implement the
system 100 ofFIG. 1 as a computer program typically may include a main unit connected to both an output device which displays information to a user and an input device which receives input from a user. The main unit generally includes a processor connected to a memory system via an interconnection mechanism. The input device and output device also may be connected to the processor and memory system via the interconnection mechanism. - One or more output devices may be connected to the computer system. Example output devices include a cathode ray tube (CRT) display, liquid crystal displays (LCD), printers, communication devices such as a modem, and audio output. One or more input devices also may be connected to the computer system. Example input devices include a keyboard, keypad, track ball, mouse, pen and tablet, communication device, and data input devices such as sensors. The subject matter disclosed herein is not limited to the particular input or output devices used in combination with the computer system or to those described herein.
- The computer system may be a general purpose computer system which is programmable using a computer programming language, such as C++, Java, or other language, such as a scripting language or assembly language. The computer system also may include specially-programmed, special purpose hardware such as, for example, an Application-Specific Integrated Circuit (ASIC). In a general purpose computer system, the processor typically is a commercially-available processor, of which the series x86, Celeron, and Pentium processors, available from Intel, and similar devices from AMD and Cyrix, the 680X0 series microprocessors available from Motorola, the PowerPC microprocessor from IBM and the Alpha-series processors from Digital Equipment Corporation, are examples. Many other processors are available. Such a microprocessor executes a program called an operating system, of which Windows NT, Linux, UNIX, DOS, VMS and OS8 are examples, which controls the execution of other computer programs and provides scheduling, debugging, input/output control, accounting, compilation, storage assignment, data management and memory management, and communication control and related services. The processor and operating system define a computer platform for which application programs in high-level programming languages may be written.
- A memory system typically includes a computer readable and writeable nonvolatile recording medium, of which a magnetic disk, a flash memory and tape are examples. The disk may be removable, such as a “floppy disk,” or permanent, known as a hard drive. A disk has a number of tracks in which signals are stored, typically in binary form, i.e., a form interpreted as a sequence of one and zeros. Such signals may define an application program to be executed by the microprocessor, or information stored on the disk to be processed by the application program. Typically, in operation, the processor causes data to be read from the nonvolatile recording medium into an integrated circuit memory element, which is typically a volatile, random access memory such as a dynamic random access memory (DRAM) or static memory (SRAM). The integrated circuit memory element typically allows for faster access to the information by the processor than does the disk. The processor generally manipulates the data within the integrated circuit memory and then copies the data to the disk after processing is completed. A variety of mechanisms are known for managing data movement between the disk and the integrated circuit memory element, and the subject matter disclosed herein is not limited to such mechanisms. Further, the subject matter disclosed herein is not limited to a particular memory system.
- The subject matter disclosed herein is not limited to a particular computer platform, particular processor, or particular high-level programming language. Additionally, the computer system may be a multiprocessor computer system or may include multiple computers connected over a computer network. It should be understood that each module (e.g. 110, 120) in
FIG. 1 may be separate modules of a computer program, or may be separate computer programs. Such modules may be operable on separate computers. Data (e.g., 104, 106, 110, 114 and 116) may be stored in a memory system or transmitted between computer systems. The subject matter disclosed herein is not limited to any particular implementation using software or hardware or firmware, or any combination thereof. The various elements of the system, either individually or in combination, may be implemented as a computer program product tangibly embodied in a machine-readable storage device for execution by a computer processor. Various steps of the process may be performed by a computer processor executing a program tangibly embodied on a computer-readable medium to perform functions by operating on input and generating output. Computer programming languages suitable for implementing such a system include procedural programming languages, object-oriented programming languages, and combinations of the two. - Having now described a few embodiments, it should be apparent to those skilled in the art that the foregoing is merely illustrative and not limiting, having been presented by way of example only. Numerous modifications and other embodiments are within the scope of one of ordinary skill in the art and are contemplated as falling within the scope of the invention.
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/133,334 US20080301178A1 (en) | 1999-04-23 | 2008-06-04 | Data structures representing polysaccharides and databases and methods related thereto |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13079299P | 1999-04-23 | 1999-04-23 | |
US13074799P | 1999-04-23 | 1999-04-23 | |
US15994099P | 1999-10-14 | 1999-10-14 | |
US15993999P | 1999-10-14 | 1999-10-14 | |
US09/557,997 US7412332B1 (en) | 1999-04-23 | 2000-04-24 | Method for analyzing polysaccharides |
US12/133,334 US20080301178A1 (en) | 1999-04-23 | 2008-06-04 | Data structures representing polysaccharides and databases and methods related thereto |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/557,997 Division US7412332B1 (en) | 1999-04-23 | 2000-04-24 | Method for analyzing polysaccharides |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080301178A1 true US20080301178A1 (en) | 2008-12-04 |
Family
ID=27494876
Family Applications (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/557,997 Expired - Lifetime US7412332B1 (en) | 1999-04-23 | 2000-04-24 | Method for analyzing polysaccharides |
US09/558,137 Expired - Lifetime US6597996B1 (en) | 1999-04-23 | 2000-04-24 | Method for indentifying or characterizing properties of polymeric units |
US10/356,349 Expired - Lifetime US7139666B2 (en) | 1999-04-23 | 2003-01-31 | Method for identifying or characterizing properties of polymeric units |
US10/760,133 Expired - Lifetime US7110889B2 (en) | 1999-04-23 | 2004-01-16 | Method for identifying or characterizing properties of polymeric units |
US10/759,520 Expired - Lifetime US7117100B2 (en) | 1999-04-23 | 2004-01-16 | Method for the compositional analysis of polymers |
US11/518,394 Abandoned US20070066769A1 (en) | 1999-04-23 | 2006-09-08 | Method for identifying or characterizing properties of polymeric units |
US12/133,334 Abandoned US20080301178A1 (en) | 1999-04-23 | 2008-06-04 | Data structures representing polysaccharides and databases and methods related thereto |
US12/260,992 Abandoned US20090119027A1 (en) | 1999-04-23 | 2008-10-29 | Method for identifying or characterizing properties of polymeric units |
Family Applications Before (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/557,997 Expired - Lifetime US7412332B1 (en) | 1999-04-23 | 2000-04-24 | Method for analyzing polysaccharides |
US09/558,137 Expired - Lifetime US6597996B1 (en) | 1999-04-23 | 2000-04-24 | Method for indentifying or characterizing properties of polymeric units |
US10/356,349 Expired - Lifetime US7139666B2 (en) | 1999-04-23 | 2003-01-31 | Method for identifying or characterizing properties of polymeric units |
US10/760,133 Expired - Lifetime US7110889B2 (en) | 1999-04-23 | 2004-01-16 | Method for identifying or characterizing properties of polymeric units |
US10/759,520 Expired - Lifetime US7117100B2 (en) | 1999-04-23 | 2004-01-16 | Method for the compositional analysis of polymers |
US11/518,394 Abandoned US20070066769A1 (en) | 1999-04-23 | 2006-09-08 | Method for identifying or characterizing properties of polymeric units |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/260,992 Abandoned US20090119027A1 (en) | 1999-04-23 | 2008-10-29 | Method for identifying or characterizing properties of polymeric units |
Country Status (5)
Country | Link |
---|---|
US (8) | US7412332B1 (en) |
EP (1) | EP1190364A2 (en) |
JP (1) | JP4824170B2 (en) |
CA (2) | CA2643162C (en) |
WO (1) | WO2000065521A2 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040091471A1 (en) * | 2002-05-03 | 2004-05-13 | Myette James R. | Delta 4, 5 glycuronidase and uses thereof |
US20060067927A1 (en) * | 2004-06-29 | 2006-03-30 | Massachusetts Institute Of Technology | Methods and compositions related to the modulation of intercellular junctions |
US20070161073A1 (en) * | 2000-09-12 | 2007-07-12 | Massachusetts Institute Of Technology | Methods and products related to evaluating the quality of a polysaccharide |
US20070202563A1 (en) * | 2004-03-10 | 2007-08-30 | Massachusetts Institute Of Technology | Chondroitinase ABC I and methods of analyzing therewith |
US20080071148A1 (en) * | 2006-04-03 | 2008-03-20 | Massachusetts Institute Of Technology | Glycomic patterns for the detection of disease |
US20080278164A1 (en) * | 2002-05-20 | 2008-11-13 | Massachusetts Institute Of Technology | Novel method for sequence determination using nmr |
US7709461B2 (en) | 2000-10-18 | 2010-05-04 | Massachusetts Institute Of Technology | Methods and products related to pulmonary delivery of polysaccharides |
US7842492B2 (en) | 2007-01-05 | 2010-11-30 | Massachusetts Institute Of Technology | Compositions of and methods of using sulfatases from flavobacterium heparinum |
US20110068262A1 (en) * | 2009-09-22 | 2011-03-24 | Keith Vorst | Systems and methods for determining recycled thermoplastic content |
US7939292B2 (en) | 2000-03-08 | 2011-05-10 | Massachusetts Institute Of Technology | Modified heparinase III and methods of sequencing therewith |
Families Citing this family (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003527822A (en) * | 1998-08-27 | 2003-09-24 | マサチューセッツ インスティテュート オブ テクノロジー | Rationally designed heparinases from heparinases I and II |
US7056504B1 (en) | 1998-08-27 | 2006-06-06 | Massachusetts Institute Of Technology | Rationally designed heparinases derived from heparinase I and II |
US7412332B1 (en) * | 1999-04-23 | 2008-08-12 | Massachusetts Institute Of Technology | Method for analyzing polysaccharides |
US7226739B2 (en) * | 2001-03-02 | 2007-06-05 | Isis Pharmaceuticals, Inc | Methods for rapid detection and identification of bioagents in epidemiological and forensic investigations |
US20040214228A9 (en) * | 2001-09-14 | 2004-10-28 | Ganesh Venkataraman | Methods of evaluating glycomolecules for enhanced activities |
CA2459040A1 (en) * | 2001-09-14 | 2003-03-27 | Mimeon, Inc. | Methods of making glycolmolecules with enhanced activities and uses thereof |
EP2284535A1 (en) | 2002-03-11 | 2011-02-16 | Momenta Pharmaceuticals, Inc. | Low molecular weight heparins |
EP1345167A1 (en) * | 2002-03-12 | 2003-09-17 | BRITISH TELECOMMUNICATIONS public limited company | Method of combinatorial multimodal optimisation |
EP1532241B1 (en) * | 2002-06-03 | 2010-09-15 | Massachusetts Institute Of Technology | Rationally designed polysaccharide lyases derived from chondroitinase b |
US20040147033A1 (en) * | 2002-12-20 | 2004-07-29 | Zachary Shriver | Glycan markers for diagnosing and monitoring disease |
JP4606712B2 (en) * | 2003-01-08 | 2011-01-05 | マサチューセッツ インスティテュート オブ テクノロジー | 2-O sulfatase compositions and related methods |
US7407810B2 (en) * | 2003-09-04 | 2008-08-05 | Momenta Pharmaceuticals, Inc. | Methods and apparatus for characterizing polymeric mixtures |
US20050178959A1 (en) * | 2004-02-18 | 2005-08-18 | Viorica Lopez-Avila | Methods and compositions for assessing a sample by maldi mass spectrometry |
US20060057638A1 (en) * | 2004-04-15 | 2006-03-16 | Massachusetts Institute Of Technology | Methods and products related to the improved analysis of carbohydrates |
US20060127950A1 (en) | 2004-04-15 | 2006-06-15 | Massachusetts Institute Of Technology | Methods and products related to the improved analysis of carbohydrates |
WO2006083328A2 (en) * | 2004-09-15 | 2006-08-10 | Massachusetts Institute Of Technology | Biologically active surfaces and methods of their use |
US20060264713A1 (en) * | 2005-05-20 | 2006-11-23 | Christoph Pedain | Disease and therapy dissemination representation |
GB0514552D0 (en) * | 2005-07-15 | 2005-08-24 | Nonlinear Dynamics Ltd | A method of analysing representations of separation patterns |
GB0514553D0 (en) * | 2005-07-15 | 2005-08-24 | Nonlinear Dynamics Ltd | A method of analysing a representation of a separation pattern |
WO2007024743A2 (en) * | 2005-08-19 | 2007-03-01 | Centocor, Inc. | Proteolysis resistant antibody preparations |
US7767420B2 (en) | 2005-11-03 | 2010-08-03 | Momenta Pharmaceuticals, Inc. | Heparan sulfate glycosaminoglycan lyase and uses thereof |
US7756657B2 (en) * | 2006-11-14 | 2010-07-13 | Abb Inc. | System for storing and presenting sensor and spectrum data for batch processes |
US7301339B1 (en) * | 2006-12-26 | 2007-11-27 | Schlumberger Technology Corporation | Estimating the concentration of a substance in a sample using NMR |
US8069127B2 (en) * | 2007-04-26 | 2011-11-29 | 21 Ct, Inc. | Method and system for solving an optimization problem with dynamic constraints |
US9139876B1 (en) | 2007-05-03 | 2015-09-22 | Momenta Pharmacueticals, Inc. | Method of analyzing a preparation of a low molecular weight heparin |
EP2162836A1 (en) * | 2007-06-15 | 2010-03-17 | Agency for Science, Technology and Research | System and method for representing n-linked glycan structures |
US8093056B2 (en) * | 2007-06-29 | 2012-01-10 | Schlumberger Technology Corporation | Method and apparatus for analyzing a hydrocarbon mixture using nuclear magnetic resonance measurements |
US20100049445A1 (en) * | 2008-06-20 | 2010-02-25 | Eureka Genomics Corporation | Method and apparatus for sequencing data samples |
WO2010101628A2 (en) | 2009-03-02 | 2010-09-10 | Massachusetts Institute Of Technology | Methods and products for in vivo enzyme profiling |
WO2011090948A1 (en) | 2010-01-19 | 2011-07-28 | Momenta Pharmaceuticals, Inc. | Evaluating heparin preparations |
CN102869784A (en) | 2010-04-07 | 2013-01-09 | 动量制药公司 | High mannose glycans |
US20140166875A1 (en) | 2010-09-02 | 2014-06-19 | Wayne State University | Systems and methods for high throughput solvent assisted ionization inlet for mass spectrometry |
WO2012058248A2 (en) | 2010-10-25 | 2012-05-03 | Wayne State University | Systems and methods extending the laserspray ionization mass spectrometry concept from atmospheric pressure to vacuum |
WO2012115952A1 (en) | 2011-02-21 | 2012-08-30 | Momenta Pharmaceuticals, Inc. | Evaluating heparin preparations |
WO2012125553A2 (en) | 2011-03-12 | 2012-09-20 | Momenta Pharmaceuticals, Inc. | N-acetylhexosamine-containing n-glycans in glycoprotein products |
WO2012125808A1 (en) | 2011-03-15 | 2012-09-20 | Massachusetts Institute Of Technology | Multiplexed detection with isotope-coded reporters |
WO2013177385A1 (en) * | 2012-05-23 | 2013-11-28 | The Johns Hopkins University | Mass spectrometry imaging of glycans from tissue sections and improved analyte detection methods |
US9695244B2 (en) | 2012-06-01 | 2017-07-04 | Momenta Pharmaceuticals, Inc. | Methods related to denosumab |
WO2014149067A1 (en) | 2013-03-15 | 2014-09-25 | Momenta Pharmaceuticals, Inc. | Methods related to ctla4-fc fusion proteins |
WO2014186310A1 (en) | 2013-05-13 | 2014-11-20 | Momenta Pharmaceuticals, Inc. | Methods for the treatment of neurodegeneration |
ES2828985T3 (en) | 2013-06-07 | 2021-05-28 | Massachusetts Inst Technology | Affinity-based detection of synthetic ligand-encoded biomarkers |
WO2015057622A1 (en) | 2013-10-16 | 2015-04-23 | Momenta Pharmaceuticals, Inc. | Sialylated glycoproteins |
CN104572622B (en) * | 2015-01-05 | 2018-01-02 | 武汉传神信息技术有限公司 | A kind of screening technique of term |
US10381108B2 (en) * | 2015-09-16 | 2019-08-13 | Charles Jianping Zhou | Web search and information aggregation by way of molecular network |
CA3020324A1 (en) | 2016-04-08 | 2017-10-12 | Massachusetts Institute Of Technology | Methods to specifically profile protease activity at lymph nodes |
EP3452407B1 (en) | 2016-05-05 | 2024-04-03 | Massachusetts Institute Of Technology | Methods and uses for remotely triggered protease activity measurements |
KR102115390B1 (en) * | 2016-07-26 | 2020-05-27 | 주식회사 엘지화학 | Method for measuring a modified ratio of a polymer |
WO2018187688A1 (en) | 2017-04-07 | 2018-10-11 | Massachusetts Institute Of Technology | Methods to spatially profile protease activity in tissue and sections |
DE102018000650A1 (en) * | 2018-01-27 | 2019-08-01 | Friedrich-Schiller-Universität Jena | Method for the determination of impurities in polyalkylene ethers or polyalkyleneamines and its use |
WO2019173332A1 (en) | 2018-03-05 | 2019-09-12 | Massachusetts Institute Of Technology | Inhalable nanosensors with volatile reporters and uses thereof |
US11835522B2 (en) | 2019-01-17 | 2023-12-05 | Massachusetts Institute Of Technology | Sensors for detecting and imaging of cancer metastasis |
US20230222313A1 (en) * | 2022-01-12 | 2023-07-13 | Dell Products L.P. | Polysaccharide archival storage |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6194638B1 (en) * | 1998-06-23 | 2001-02-27 | Pioneer Hi-Bred International, Inc. | Alteration of hemicellulose concentration in plants |
Family Cites Families (128)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4692435A (en) | 1978-11-06 | 1987-09-08 | Choay, S.A. | Mucopolysaccharide composition having a regulatory action on coagulation, medicament containing same and process of preparation |
SE449753B (en) | 1978-11-06 | 1987-05-18 | Choay Sa | MUCOPOLYSACCARIDE COMPOSITION WITH REGULATORY EFFECTS ON COAGULATION, MEDICINAL CONTAINING ITS SAME AND PROCEDURE FOR PREPARING THEREOF |
CA1136620A (en) | 1979-01-08 | 1982-11-30 | Ulf P.F. Lindahl | Heparin fragments having selective anticoagulation activity |
US4281108A (en) | 1980-01-28 | 1981-07-28 | Hepar Industries, Inc. | Process for obtaining low molecular weight heparins endowed with elevated pharmacological properties, and product so obtained |
US4443545A (en) | 1980-08-25 | 1984-04-17 | Massachusetts Institute Of Technology | Process for producing heparinase |
US4341869A (en) | 1980-08-25 | 1982-07-27 | Massachusetts Institute Of Technology | Process for producing heparinase |
US4373023A (en) | 1980-10-14 | 1983-02-08 | Massachusetts Institute Of Technology | Process for neutralizing heparin |
US4396762A (en) | 1981-08-24 | 1983-08-02 | Massachusetts Institute Of Technology | Heparinase derived anticoagulants |
DE3202894A1 (en) | 1982-01-29 | 1983-08-11 | Otsuka Pharmaceutical Co. Ltd., Tokyo | Method for the determination of compounds containing tumour-associated glycoprotein, use of the method for cancer diagnosis and kit for use of the method |
US4551296A (en) * | 1982-03-19 | 1985-11-05 | Allied Corporation | Producing high tenacity, high modulus crystalline article such as fiber or film |
US4757056A (en) | 1984-03-05 | 1988-07-12 | Hepar Industries, Inc. | Method for tumor regression in rats, mice and hamsters using hexuronyl hexosaminoglycan-containing compositions |
US4679555A (en) | 1984-08-07 | 1987-07-14 | Key Pharmaceuticals, Inc. | Method and apparatus for intrapulmonary delivery of heparin |
US5106734A (en) | 1986-04-30 | 1992-04-21 | Novo Nordisk A/S | Process of using light absorption to control enzymatic depolymerization of heparin to produce low molecular weight heparin |
DE3787996T2 (en) | 1986-05-16 | 1994-03-03 | Italfarmaco Spa | Heparin free from E.D.T.A., fractions and fragments of heparin, processes for their preparation and pharmaceutical compositions containing them. |
US4784820A (en) | 1986-08-11 | 1988-11-15 | Allied-Signal Inc. | Preparation of solution of high molecular weight polymers |
US4745105A (en) | 1986-08-20 | 1988-05-17 | Griffin Charles C | Low molecular weight heparin derivatives with improved permeability |
US4942156A (en) | 1986-08-20 | 1990-07-17 | Hepar Industries, Inc. | Low molecular weight heparin derivatives having improved anti-Xa specificity |
US4830013A (en) | 1987-01-30 | 1989-05-16 | Minnesota Mining And Manufacturing Co. | Intravascular blood parameter measurement system |
FR2614026B1 (en) | 1987-04-16 | 1992-04-17 | Sanofi Sa | LOW MOLECULAR WEIGHT HEPARINS WITH REGULAR STRUCTURE, THEIR PREPARATION AND THEIR BIOLOGICAL APPLICATIONS |
SE8702254D0 (en) | 1987-05-29 | 1987-05-29 | Kabivitrum Ab | NOVEL HEPARIN DERIVATIVES |
US5169772A (en) | 1988-06-06 | 1992-12-08 | Massachusetts Institute Of Technology | Large scale method for purification of high purity heparinase from flavobacterium heparinum |
IT1234508B (en) | 1988-06-10 | 1992-05-19 | Alfa Wassermann Spa | HEPARIN DERIVATIVES AND PROCEDURE FOR THEIR PREPARATION |
US5204323B1 (en) | 1988-10-06 | 1995-07-18 | Ciba Geigy Corp | Hirudin antidotal compositions and methods |
GB8826448D0 (en) | 1988-11-11 | 1988-12-14 | Thrombosis Res Inst | Improvements in/relating to organic compounds |
US5766573A (en) | 1988-12-06 | 1998-06-16 | Riker Laboratories, Inc. | Medicinal aerosol formulations |
CA1340966C (en) | 1989-05-19 | 2000-04-18 | Thomas R. Covey | Method of protein analysis |
IT1237518B (en) | 1989-11-24 | 1993-06-08 | Renato Conti | SUPER-SULFATED HEPARINS |
GB8927546D0 (en) | 1989-12-06 | 1990-02-07 | Ciba Geigy | Process for the production of biologically active tgf-beta |
US5152784A (en) | 1989-12-14 | 1992-10-06 | Regents Of The University Of Minnesota | Prosthetic devices coated with a polypeptide with type IV collagen activity |
FR2663639B1 (en) | 1990-06-26 | 1994-03-18 | Rhone Poulenc Sante | LOW MOLECULAR WEIGHT POLYSACCHARIDE BLENDS PROCESS FOR PREPARATION AND USE. |
US5284558A (en) | 1990-07-27 | 1994-02-08 | University Of Iowa Research Foundation | Electrophoresis-based sequencing of oligosaccharides |
IT1245761B (en) | 1991-01-30 | 1994-10-14 | Alfa Wassermann Spa | PHARMACEUTICAL FORMULATIONS CONTAINING GLYCOSAMINOGLICANS ABSORBABLE ORALLY. |
JP3110064B2 (en) | 1991-03-06 | 2000-11-20 | 生化学工業株式会社 | Novel heparitinase, method for producing the same and bacteria producing the same |
US5262325A (en) | 1991-04-04 | 1993-11-16 | Ibex Technologies, Inc. | Method for the enzymatic neutralization of heparin |
JPH06507635A (en) | 1991-05-02 | 1994-09-01 | イエダ リサーチ アンド ディベロップメント カンパニー リミテッド | Compositions for the prevention and/or treatment of pathological processes |
EG20399A (en) | 1991-06-13 | 1999-02-28 | Dow Chemical Co | A soft segment isocyanate terminate prepolymer and polyurethane elastomer therefrom |
US5714376A (en) | 1991-10-23 | 1998-02-03 | Massachusetts Institute Of Technology | Heparinase gene from flavobacterium heparinum |
IT1254216B (en) | 1992-02-25 | 1995-09-14 | Opocrin Spa | POLYSACCHARIDIC DERIVATIVES OF HEPARIN, EPARAN SULPHATE, THEIR FRACTIONS AND FRAGMENTS, PROCEDURE FOR THEIR PREPARATION AND PHARMACEUTICAL COMPOSITIONS CONTAINING THEM |
US5453171A (en) | 1992-03-10 | 1995-09-26 | The Board Of Regents Of The University Of Michigan | Heparin-selective polymeric membrane electrode |
US5856928A (en) | 1992-03-13 | 1999-01-05 | Yan; Johnson F. | Gene and protein representation, characterization and interpretation process |
GB9206291D0 (en) | 1992-03-23 | 1992-05-06 | Cancer Res Campaign Tech | Oligosaccharides having growth factor binding affinity |
US5389539A (en) | 1992-11-30 | 1995-02-14 | Massachusetts Institute Of Technology | Purification of heparinase I, II, and III from Flavobacterium heparinum |
US5696100A (en) | 1992-12-22 | 1997-12-09 | Glycomed Incorporated | Method for controlling O-desulfation of heparin and compositions produced thereby |
GB9306255D0 (en) | 1993-03-25 | 1993-05-19 | Cancer Res Campaign Tech | Heparan sulphate oligosaccharides having hepatocyte growth factor binding affinity |
FR2704861B1 (en) | 1993-05-07 | 1995-07-28 | Sanofi Elf | Purified heparin fractions, process for obtaining them and pharmaceutical compositions containing them. |
US5744155A (en) | 1993-08-13 | 1998-04-28 | Friedman; Doron | Bioadhesive emulsion preparations for enhanced drug delivery |
EP0726773A1 (en) | 1993-11-17 | 1996-08-21 | Massachusetts Institute Of Technology | Method for inhibiting angiogenesis using heparinase |
US6013628A (en) | 1994-02-28 | 2000-01-11 | Regents Of The University Of Minnesota | Method for treating conditions of the eye using polypeptides |
US5607859A (en) | 1994-03-28 | 1997-03-04 | Massachusetts Institute Of Technology | Methods and products for mass spectrometric molecular weight determination of polyionic analytes employing polyionic reagents |
US5658749A (en) | 1994-04-05 | 1997-08-19 | Corning Clinical Laboratories, Inc. | Method for processing mycobacteria |
US5753445A (en) | 1994-04-26 | 1998-05-19 | The Mount Sinai Medical Center Of The City University Of New York | Test for the detection of anti-heparin antibodies |
CA2189038A1 (en) | 1994-05-06 | 1995-11-09 | Kevin R. Holme | O-desulfated heparin derivatives, methods of making and uses thereof |
US5681733A (en) | 1994-06-10 | 1997-10-28 | Ibex Technologies | Nucleic acid sequences and expression systems for heparinase II and heparinase III derived from Flavobacterium heparinum |
US5619421A (en) | 1994-06-17 | 1997-04-08 | Massachusetts Institute Of Technology | Computer-implemented process and computer system for estimating the three-dimensional shape of a ring-shaped molecule and of a portion of a molecule containing a ring-shaped structure |
US5997863A (en) | 1994-07-08 | 1999-12-07 | Ibex Technologies R And D, Inc. | Attenuation of wound healing processes |
US6309853B1 (en) | 1994-08-17 | 2001-10-30 | The Rockfeller University | Modulators of body weight, corresponding nucleic acids and proteins, and diagnostic and therapeutic uses thereof |
FR2723847A1 (en) | 1994-08-29 | 1996-03-01 | Debiopharm Sa | HEPARIN - BASED ANTITHROMBOTIC AND NON - HEMORRHAGIC COMPOSITIONS, PROCESS FOR THEIR PREPARATION AND THERAPEUTIC APPLICATIONS. |
US5687090A (en) | 1994-09-01 | 1997-11-11 | Aspen Technology, Inc. | Polymer component characterization method and process simulation apparatus |
WO1996011671A1 (en) | 1994-10-12 | 1996-04-25 | Focal, Inc. | Targeted delivery via biodegradable polymers |
JP2927401B2 (en) * | 1994-12-28 | 1999-07-28 | 日本ビクター株式会社 | Helical scan type information recording device |
US5569366A (en) | 1995-01-27 | 1996-10-29 | Beckman Instruments, Inc. | Fluorescent labelled carbohydrates and their analysis |
US5618917A (en) | 1995-02-15 | 1997-04-08 | Arch Development Corporation | Methods and compositions for detecting and treating kidney diseases associated with adhesion of crystals to kidney cells |
US5763427A (en) | 1995-03-31 | 1998-06-09 | Hamilton Civic Hospitals Research Development Inc. | Compositions and methods for inhibiting thrombogenesis |
US5597811A (en) | 1995-04-10 | 1997-01-28 | Amerchol Corporation | Oxirane carboxylic acid derivatives of polyglucosamines |
JP3318578B2 (en) | 1995-05-26 | 2002-08-26 | サーモディックス,インコーポレイティド | Methods for promoting endothelialization and implantable products |
US5824299A (en) | 1995-06-22 | 1998-10-20 | President & Fellows Of Harvard College | Modulation of endothelial cell proliferation with IP-10 |
US5770420A (en) * | 1995-09-08 | 1998-06-23 | The Regents Of The University Of Michigan | Methods and products for the synthesis of oligosaccharide structures on glycoproteins, glycolipids, or as free molecules, and for the isolation of cloned genetic sequences that determine these structures |
WO1997016556A1 (en) | 1995-10-30 | 1997-05-09 | Massachusetts Institute Of Technology | Rationally designed polysaccharide lyases derived from heparinase i |
US5752019A (en) * | 1995-12-22 | 1998-05-12 | International Business Machines Corporation | System and method for confirmationally-flexible molecular identification |
ATE247948T1 (en) | 1996-04-29 | 2003-09-15 | Dura Pharma Inc | METHOD FOR INHALING DRY POWDER |
US6228654B1 (en) | 1996-05-09 | 2001-05-08 | The Scripps Research Institute | Methods for structure analysis of oligosaccharides |
US5855913A (en) | 1997-01-16 | 1999-01-05 | Massachusetts Instite Of Technology | Particles incorporating surfactants for pulmonary drug delivery |
US5874064A (en) | 1996-05-24 | 1999-02-23 | Massachusetts Institute Of Technology | Aerodynamically light particles for pulmonary drug delivery |
US5985309A (en) | 1996-05-24 | 1999-11-16 | Massachusetts Institute Of Technology | Preparation of particles for inhalation |
USRE37053E1 (en) | 1996-05-24 | 2001-02-13 | Massachusetts Institute Of Technology | Particles incorporating surfactants for pulmonary drug delivery |
ES2231880T3 (en) | 1996-07-29 | 2005-05-16 | Paringenix, Inc. | PROCEDURE TO TREAT ASTHMA WITH O-DESULPHATED HEPARINE. |
DE69739085D1 (en) | 1996-09-19 | 2008-12-18 | Univ Michigan | POLYMERS CONTAIN POLYSACCHARIDES SUCH AS ALGINATES OR MODIFIED ALGINATES |
US5767269A (en) | 1996-10-01 | 1998-06-16 | Hamilton Civic Hospitals Research Development Inc. | Processes for the preparation of low-affinity, low molecular weight heparins useful as antithrombotics |
US5803726A (en) * | 1996-10-04 | 1998-09-08 | Bacon; David W. | Retractable, electric arc-ignited gas pilot for igniting flare stacks |
US5759767A (en) | 1996-10-11 | 1998-06-02 | Joseph R. Lakowicz | Two-photon and multi-photon measurement of analytes in animal and human tissues and fluids |
US6642360B2 (en) * | 1997-12-03 | 2003-11-04 | Genentech, Inc. | Secreted polypeptides that stimulate release of proteoglycans from cartilage |
GB9708278D0 (en) | 1997-04-24 | 1997-06-18 | Danisco | Composition |
US6190875B1 (en) | 1997-09-02 | 2001-02-20 | Insight Strategy & Marketing Ltd. | Method of screening for potential anti-metastatic and anti-inflammatory agents using mammalian heparanase as a probe |
US5968822A (en) | 1997-09-02 | 1999-10-19 | Pecker; Iris | Polynucleotide encoding a polypeptide having heparanase activity and expression of same in transduced cells |
US6268146B1 (en) * | 1998-03-13 | 2001-07-31 | Promega Corporation | Analytical methods and materials for nucleic acid detection |
US6190522B1 (en) | 1998-04-24 | 2001-02-20 | Board Of Regents, The University Of Texas System | Analysis of carbohydrates derivatized with visible dye by high-resolution polyacrylamide gel electrophoresis |
US5985576A (en) * | 1998-06-30 | 1999-11-16 | The United States Of America As Represented By The Secretary Of Agriculture | Species-specific genetic identification of Mycobacterium paratuberculosis |
JP2003527822A (en) * | 1998-08-27 | 2003-09-24 | マサチューセッツ インスティテュート オブ テクノロジー | Rationally designed heparinases from heparinases I and II |
US7056504B1 (en) | 1998-08-27 | 2006-06-06 | Massachusetts Institute Of Technology | Rationally designed heparinases derived from heparinase I and II |
CA2341157A1 (en) | 1998-08-31 | 2000-03-09 | University Of Washington | Stable isotope metabolic labeling for analysis of biopolymers |
US6291439B1 (en) | 1998-09-02 | 2001-09-18 | Biomarin Pharmaceuticals | Methods for diagnosing atherosclerosis by measuring endogenous heparin and methods for treating atherosclerosis using heparin |
US6333051B1 (en) | 1998-09-03 | 2001-12-25 | Supratek Pharma, Inc. | Nanogel networks and biological agent compositions thereof |
US6440705B1 (en) | 1998-10-01 | 2002-08-27 | Vincent P. Stanton, Jr. | Method for analyzing polynucleotides |
US6610484B1 (en) | 1999-01-26 | 2003-08-26 | Cytyc Health Corporation | Identifying material from a breast duct |
US6429302B1 (en) | 1999-02-02 | 2002-08-06 | Chiron Corporation | Polynucleotides related to pancreatic disease |
US7412332B1 (en) | 1999-04-23 | 2008-08-12 | Massachusetts Institute Of Technology | Method for analyzing polysaccharides |
JP3689842B2 (en) | 1999-05-28 | 2005-08-31 | 株式会社J−オイルミルズ | Monosaccharide analysis method for sugar composition |
US6569366B1 (en) | 2000-02-16 | 2003-05-27 | Teijin Limited | Process for producing meta-type wholly aromatic polyamide filaments |
AU4351201A (en) | 2000-03-08 | 2001-09-17 | Massachusetts Inst Technology | Heparinase iii and uses thereof |
PT1319183E (en) | 2000-09-12 | 2009-06-29 | Massachusetts Inst Technology | Methods and products related to low molecular weight heparin |
AU2440802A (en) * | 2000-10-18 | 2002-04-29 | Massachusetts Inst Technology | Methods and products related to pulmonary delivery of polysaccharides |
WO2002066952A2 (en) | 2000-10-19 | 2002-08-29 | Target Discovery, Inc | Mass defect labeling for the determination of oligomer sequences |
US20030008820A1 (en) * | 2001-03-27 | 2003-01-09 | Massachusetts Institute Of Technology | Methods and products related to FGF dimerization |
AU2002312146A1 (en) | 2001-05-30 | 2002-12-09 | Triad Therapeutics, Inc. | Nuclear magnetic resonance-docking of compounds |
US6766817B2 (en) * | 2001-07-25 | 2004-07-27 | Tubarc Technologies, Llc | Fluid conduction utilizing a reversible unsaturated siphon with tubarc porosity action |
CA2459040A1 (en) | 2001-09-14 | 2003-03-27 | Mimeon, Inc. | Methods of making glycolmolecules with enhanced activities and uses thereof |
US20040214228A9 (en) | 2001-09-14 | 2004-10-28 | Ganesh Venkataraman | Methods of evaluating glycomolecules for enhanced activities |
US7363168B2 (en) * | 2001-10-02 | 2008-04-22 | Stratagene California | Adaptive baseline algorithm for quantitative PCR |
EP2284535A1 (en) | 2002-03-11 | 2011-02-16 | Momenta Pharmaceuticals, Inc. | Low molecular weight heparins |
WO2003090696A2 (en) | 2002-04-25 | 2003-11-06 | Momenta Pharmaceuticals, Inc. | Methods and products for mucosal delivery |
EP1575534B1 (en) * | 2002-05-03 | 2013-04-10 | Massachusetts Institute Of Technology | D4,5 glycuronidase and uses thereof |
WO2004055491A2 (en) * | 2002-05-20 | 2004-07-01 | Massachusetts Institute Of Technology | Novel method for sequence determination using nmr |
EP1532241B1 (en) | 2002-06-03 | 2010-09-15 | Massachusetts Institute Of Technology | Rationally designed polysaccharide lyases derived from chondroitinase b |
US20040147033A1 (en) | 2002-12-20 | 2004-07-29 | Zachary Shriver | Glycan markers for diagnosing and monitoring disease |
JP4606712B2 (en) * | 2003-01-08 | 2011-01-05 | マサチューセッツ インスティテュート オブ テクノロジー | 2-O sulfatase compositions and related methods |
US7407810B2 (en) | 2003-09-04 | 2008-08-05 | Momenta Pharmaceuticals, Inc. | Methods and apparatus for characterizing polymeric mixtures |
US7851223B2 (en) | 2004-02-27 | 2010-12-14 | Roar Holding Llc | Method to detect emphysema |
US7507570B2 (en) * | 2004-03-10 | 2009-03-24 | Massachusetts Institute Of Technology | Recombinant chondroitinase ABC I and uses thereof |
WO2005110438A2 (en) * | 2004-04-15 | 2005-11-24 | Massachusetts Institute Of Technology | Methods and products related to the intracellular delivery of polysaccharides |
US20060127950A1 (en) * | 2004-04-15 | 2006-06-15 | Massachusetts Institute Of Technology | Methods and products related to the improved analysis of carbohydrates |
US20060057638A1 (en) * | 2004-04-15 | 2006-03-16 | Massachusetts Institute Of Technology | Methods and products related to the improved analysis of carbohydrates |
EP1768687A2 (en) * | 2004-06-29 | 2007-04-04 | Massachusetts Institute Of Technology | Methods and compositions related to the modulation of intercellular junctions |
WO2006083328A2 (en) * | 2004-09-15 | 2006-08-10 | Massachusetts Institute Of Technology | Biologically active surfaces and methods of their use |
JP2008526258A (en) * | 2005-01-12 | 2008-07-24 | マサチューセッツ・インスティテュート・オブ・テクノロジー | Methods and compositions relating to modulating the extracellular stem cell environment |
WO2006105313A2 (en) * | 2005-03-29 | 2006-10-05 | Massachusetts Institute Of Technology | Compositions of and methods of using oversulfated glycosaminoglycans |
WO2006105315A2 (en) * | 2005-03-29 | 2006-10-05 | Massachusetts Institute Of Technology | Compositions and methods for regulating inflammatory responses |
US7739054B2 (en) * | 2005-06-22 | 2010-06-15 | Gen-Probe Incorporated | Method and algorithm for quantifying polynucleotides |
WO2007120478A2 (en) * | 2006-04-03 | 2007-10-25 | Massachusetts Institute Of Technology | Glycomic patterns for the detection of disease |
-
2000
- 2000-04-24 US US09/557,997 patent/US7412332B1/en not_active Expired - Lifetime
- 2000-04-24 EP EP00923599A patent/EP1190364A2/en not_active Withdrawn
- 2000-04-24 CA CA2643162A patent/CA2643162C/en not_active Expired - Lifetime
- 2000-04-24 CA CA002370539A patent/CA2370539C/en not_active Expired - Lifetime
- 2000-04-24 JP JP2000614193A patent/JP4824170B2/en not_active Expired - Lifetime
- 2000-04-24 US US09/558,137 patent/US6597996B1/en not_active Expired - Lifetime
- 2000-04-24 WO PCT/US2000/010990 patent/WO2000065521A2/en active Application Filing
-
2003
- 2003-01-31 US US10/356,349 patent/US7139666B2/en not_active Expired - Lifetime
-
2004
- 2004-01-16 US US10/760,133 patent/US7110889B2/en not_active Expired - Lifetime
- 2004-01-16 US US10/759,520 patent/US7117100B2/en not_active Expired - Lifetime
-
2006
- 2006-09-08 US US11/518,394 patent/US20070066769A1/en not_active Abandoned
-
2008
- 2008-06-04 US US12/133,334 patent/US20080301178A1/en not_active Abandoned
- 2008-10-29 US US12/260,992 patent/US20090119027A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6194638B1 (en) * | 1998-06-23 | 2001-02-27 | Pioneer Hi-Bred International, Inc. | Alteration of hemicellulose concentration in plants |
Non-Patent Citations (2)
Title |
---|
Bohne et al., "W3-SWEET: Carbohydrate Modeling By Internet", 1998, Journal of Molecular Modeling, Volume 4, pages 33-43. * |
Bruno et al., "Representation and searching of carbohydrate structures using graph-theoretic techniques", 1997, Carbohydrate Research, Volume 304, pages 61-67. * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7939292B2 (en) | 2000-03-08 | 2011-05-10 | Massachusetts Institute Of Technology | Modified heparinase III and methods of sequencing therewith |
US20070161073A1 (en) * | 2000-09-12 | 2007-07-12 | Massachusetts Institute Of Technology | Methods and products related to evaluating the quality of a polysaccharide |
US8512969B2 (en) | 2000-09-12 | 2013-08-20 | Massachusetts Institute Of Technology | Methods for analyzing a heparin sample |
US8173384B2 (en) | 2000-09-12 | 2012-05-08 | Massachusetts Institute Of Technology | Methods for analyzing or processing a heparin sample |
US7687479B2 (en) | 2000-09-12 | 2010-03-30 | Massachusetts Institute Of Technology | Methods and producing low molecular weight heparin |
US7709461B2 (en) | 2000-10-18 | 2010-05-04 | Massachusetts Institute Of Technology | Methods and products related to pulmonary delivery of polysaccharides |
US7951560B2 (en) | 2002-05-03 | 2011-05-31 | Massachusetts Institute Of Technology | Delta 4,5 glycuronidase compositions and methods related thereto |
US20060177885A1 (en) * | 2002-05-03 | 2006-08-10 | Massachusetts Institute Of Technology | Delta 4,5 glycuronidase and methods of analyzing therewith |
US20050214276A9 (en) * | 2002-05-03 | 2005-09-29 | Myette James R | Delta 4, 5 glycuronidase and uses thereof |
US20060183891A1 (en) * | 2002-05-03 | 2006-08-17 | Massachusetts Institute Of Technology | Delta 4,5 glycuronidase nucleic acid compositions |
US20040091471A1 (en) * | 2002-05-03 | 2004-05-13 | Myette James R. | Delta 4, 5 glycuronidase and uses thereof |
US20060177911A1 (en) * | 2002-05-03 | 2006-08-10 | Massachusetts Institute Of Technology | Delta 4,5 glycuronidase and methods of cleaving therewith |
US20060177910A1 (en) * | 2002-05-03 | 2006-08-10 | Massachusetts Institute Of Technology | Delta 4,5 glycuronidase and methods of hydrolyzing therewith |
US7695711B2 (en) | 2002-05-03 | 2010-04-13 | Massachusetts Institute Of Technology | Δ 4,5 glycuronidase nucleic acid compositions |
US8018231B2 (en) | 2002-05-20 | 2011-09-13 | Massachussetts Institute Of Technology | Method for sequence determination using NMR |
US20080278164A1 (en) * | 2002-05-20 | 2008-11-13 | Massachusetts Institute Of Technology | Novel method for sequence determination using nmr |
US7737692B2 (en) | 2002-05-20 | 2010-06-15 | Massachusetts Institute Of Technology | Method for sequence determination using NMR |
US20090045811A1 (en) * | 2002-05-20 | 2009-02-19 | Massachusetts Institute Of Technology | Novel method for sequence determination using nmr |
US7728589B2 (en) | 2002-05-20 | 2010-06-01 | Massachusetts Institute Of Technology | Method for sequence determination using NMR |
US8338119B2 (en) | 2004-03-10 | 2012-12-25 | Massachusetts Institute Of Technology | Chondroitinase ABC I and methods of degrading therewith |
US7662604B2 (en) | 2004-03-10 | 2010-02-16 | Massachusetts Institute Of Technology | Chondroitinase ABC I and methods of production |
US20070224670A1 (en) * | 2004-03-10 | 2007-09-27 | Massachusetts Institute Of Technology | Chondroitinase ABC I and methods of production |
US7592152B2 (en) | 2004-03-10 | 2009-09-22 | Massachusetts Institute Of Technology | Chondroitinase ABC I and methods of analyzing therewith |
US20070202563A1 (en) * | 2004-03-10 | 2007-08-30 | Massachusetts Institute Of Technology | Chondroitinase ABC I and methods of analyzing therewith |
US20060067927A1 (en) * | 2004-06-29 | 2006-03-30 | Massachusetts Institute Of Technology | Methods and compositions related to the modulation of intercellular junctions |
US8529889B2 (en) | 2004-06-29 | 2013-09-10 | Massachusetts Institute Of Technology | Methods and compositions related to the modulation of intercellular junctions |
US20080071148A1 (en) * | 2006-04-03 | 2008-03-20 | Massachusetts Institute Of Technology | Glycomic patterns for the detection of disease |
US7842492B2 (en) | 2007-01-05 | 2010-11-30 | Massachusetts Institute Of Technology | Compositions of and methods of using sulfatases from flavobacterium heparinum |
US20110033901A1 (en) * | 2007-01-05 | 2011-02-10 | Massachusetts Institute Of Technology | Compositions of and methods of using sulfatases from flavobacterium heparinum |
US8846363B2 (en) | 2007-01-05 | 2014-09-30 | James R. Myette | Compositions of and methods of using sulfatases from Flavobacterium heparinum |
US20110068262A1 (en) * | 2009-09-22 | 2011-03-24 | Keith Vorst | Systems and methods for determining recycled thermoplastic content |
US8063374B2 (en) * | 2009-09-22 | 2011-11-22 | California Polytechnic Corporation | Systems and methods for determining recycled thermoplastic content |
Also Published As
Publication number | Publication date |
---|---|
EP1190364A2 (en) | 2002-03-27 |
US6597996B1 (en) | 2003-07-22 |
WO2000065521A3 (en) | 2001-10-25 |
US7412332B1 (en) | 2008-08-12 |
CA2370539C (en) | 2009-01-06 |
US20030191587A1 (en) | 2003-10-09 |
US20040197933A1 (en) | 2004-10-07 |
CA2370539A1 (en) | 2000-11-02 |
JP2002543222A (en) | 2002-12-17 |
US7117100B2 (en) | 2006-10-03 |
WO2000065521A2 (en) | 2000-11-02 |
CA2643162A1 (en) | 2000-11-02 |
US7139666B2 (en) | 2006-11-21 |
CA2643162C (en) | 2018-01-02 |
US20070066769A1 (en) | 2007-03-22 |
US7110889B2 (en) | 2006-09-19 |
US20090119027A1 (en) | 2009-05-07 |
JP4824170B2 (en) | 2011-11-30 |
US20040204869A1 (en) | 2004-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7412332B1 (en) | Method for analyzing polysaccharides | |
US7016896B2 (en) | Pattern search method, pattern search apparatus and computer program therefor, and storage medium thereof | |
US20060286566A1 (en) | Detecting apparent mutations in nucleic acid sequences | |
Giegerich et al. | Efficient implementation of lazy suffix trees | |
Claverie et al. | [15] k-tuple frequency analysis: From intron/exon discrimination to T-cell epitope mapping | |
US20080154938A1 (en) | System and method for generation of computer index files | |
US7737692B2 (en) | Method for sequence determination using NMR | |
EP1578020A1 (en) | Data compressing method, program and apparatus | |
CN109830263A (en) | A kind of DNA storage method based on oligonucleotide sequence code storage | |
EP1609081A2 (en) | System and method for storing and accessing data in an interlocking trees datastore | |
CA2395327A1 (en) | Sequence database search with sequence search trees | |
CN1613073A (en) | Enhanced multiway radix tree | |
Buchler et al. | Protein heteronuclear NMR assignments using mean-field simulated annealing | |
Giegerich et al. | A comparison of imperative and purely functional suffix tree constructions | |
US6970892B2 (en) | Implementing standards-based file operations in proprietary operating systems | |
Floratos et al. | On the time complexity of the TEIRESIAS algorithm | |
US8639445B2 (en) | Identification of related residues in biomolecular sequences by multiple sequence alignment and phylogenetic analysis | |
Levy et al. | Xlandscape: the graphical display of word frequencies in sequences. | |
Behboudi et al. | RPTRF: A rapid perfect tandem repeat finder tool for DNA sequences | |
Bruno et al. | Representation and searching of carbohydrate structures using graph-theoretic techniques | |
Dayringer et al. | Computer‐aided interpretation of mass spectra. STIRS prediction of rings‐plus‐double‐bonds values | |
Ihlenfeldt et al. | Augmenting connectivity information by compound name parsing: Automatic assignment of stereochemistry and isotope labeling | |
Hardy et al. | The sequence alignment software library at USC | |
Kaniwa et al. | Repeat finding techniques, data structures and algorithms in DNA sequences: a survey | |
Gusev et al. | COMPLEXITY DECOMPOSITIONS IN PROBLEMS OF COMPARISON OF SYMBOLIC SEQUENCES1 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MASSACHUSET Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VENKATARAMAN, GANESH;SHRIVER, ZACHARY;RAMAN, RAHUL;AND OTHERS;REEL/FRAME:021152/0270;SIGNING DATES FROM 20001024 TO 20001025 |
|
AS | Assignment |
Owner name: NATIONAL INSTITUTES OF HEALTH (NIH), U.S. DEPT. OF Free format text: CONFIRMATORY LICENSE;ASSIGNOR:MASSACHUSETTS INSTITUTE OF TECHNOLOGY;REEL/FRAME:029442/0630 Effective date: 20121207 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: NIH - DEITR, MARYLAND Free format text: CONFIRMATORY LICENSE;ASSIGNOR:MASSACHUSETTS INSTITUTE OF TECHNOLOGY;REEL/FRAME:066268/0357 Effective date: 20240127 |