WO1990012033A1

WO1990012033A1 - Construction and use of synthetic constructs encoding syndecan

Info

Publication number: WO1990012033A1
Application number: PCT/US1990/001496
Authority: WO
Inventors: Merton R. Bernfield; Scott Saunders
Original assignee: The Board Of Trustees Of The Leland Stanford Junior University
Priority date: 1989-03-29
Filing date: 1990-03-22
Publication date: 1990-10-18

Abstract

A purified mammalian peptide, and genetic information encoding such peptides, having a molecular weight of from about 31 kD to about 35 kD and comprising an amino terminus extracellular region, a carboxy terminus cytoplasmic region, and a transmembrane region between said cytoplasmic and extracellular regions, a dibasic sequence extracellularly adjacent the transmembrane region of the peptide, and at least one glycosylation site in the extracellular region including an Xac-Xaa-Ser-Gly-Xac sequence, wherein Xac is an acidic amino acid and Xaa is any amino acid. Additional peptides having this glycosylation site and genetic information useful for preparing a number of variations based on this peptide are also provided.

Description

CONSTRUCTION AND USE OF SYNTHETIC CONSTRUCTS ENCODING SYNDECAN

Work leading to the present invention was supported in part by a National Institutes of Health grant. The government has rights in this invention as a result of this support.

BACKGROUND OF THE INVENTION Field of the Invention

This invention relates generally to the field of genetic engineering and more particularly to genes for proteoglycans, their insertion into recombinant DNA vectors, and the production of the resulting core proteins in recipient strains of microorganisms and the proteoglycan in recipient eukaryotic cells.

Description of the Background

The cellular behavior responsible for the development, repair and maintenance of tissues is regulated, in large part, by interactions between cells and their extracellular matrix. These interactions are mediated by cell surface molecules acting as receptors that bind the large insoluble matrix molecules and induce responses that result in changes of cellular phenotype. Several proteins associated with the cell surface can bind matrix components. These proteins differ in their specificity and affinity and in their mode of association with the cell surface. Some bind cells to single matrix ligands while others, such as some members of the integrin super family, appear to have multiple matrix ligands. Of the various matrix- binding proteins at the cell surface, only the integrins are known to be integral membrane proteins. The integrin fibronectin receptor codistributes both with extracellular fibronectin and with intracellular cytoskeletal components, apparently via an association of the receptor's cytoplasmic domain with the cytoskeletal protein talin.

The present inventors have studied a lipophilic proteoglycan containing both heparan sulfate and chondroitin sulfate that is found at the surface of mouse mammary epithelial cells and that behaves as a high affinity receptor specific for multiple components of the interstitial matrix. This proteoglycan has been given the name syndecan in the mouse. The proteoglycan binds the epithelial cells via its heparan sulfate chains to collagen types I, III, and V (Koda, J.E., Rapraeger, A., and Bernfield, M., J. Biol. Chem. (1985) 260; 8157-8162), fibronection (Saunders, S. and Bernfield, M. , J. Cell Biol. (1988) lpj6: 423-430), and thrombospondin. When its extracellular domain (ectodomain) is cross-linked at the cell surface, it associates intracellularly with the actin cytoskeleton (Rapraeger, A., Jalkanen, M. , and Bernfield, M. , J. Cell Biol. (1986) JL03: 2683-2696), and the isolated proteoglycan binds directly or indirectly to F-actin (Rapraeger, A., and Bernfield, M. , J. Biol. Chem. (1985) 260: 4103-4109). Cultured epithelial cells shed the ectodomain from their apical surfaces as a non- lipophilic proteoglycan that contains all of the glycosarαinoglycan of the intact molecule and polarize the proteoglycan exclusively to their basolateral surfaces, a location consistent with its matrix receptor function. Upon suspension of these cells, the ectodomain is cleaved from the cell surface; the proteoglycan is not replaced while the cells are suspended (Jalkanen, M. , Rapraeger, A., Saunders, S., and Bernfield, M. , J. Cell Biol. (1987) 105: 3087- 3096). The proteoglycan is mainly on epithelia in mature tissues (Hayashi, K., Hayashi, M. , Jalkanen, M. , Firestone, J.H., Trelstad, R.L., and Bernfield, M. , J. Histochem. Cytochem. (1987) 35_: 1079-1088), and some of the present inventors have previously proposed that it is a matrix anchor that stablizes the morphology of epithelial sheets by linking the cytoskeleton to the extracellular matrix (Bernfield, M. , Rapraeger, Al, Jalkanen, M. , and Banerjee, S.D., Basement Membranes (1985) 343-352).

Syndecan undergoes substantial regulation; its size, glycosaminoglycan composition and location at the cell surface vary between epithelial types, and its expression changes during development. The proteoglycan is located exclusively at the basolateral cell surface of simple epithelia but surrounds stratified epithelial cells. At basolateral cell surfaces, it appears to contain two heparan sulfate and two chrondroitin sulfate chains, but where it surrounds cells, it contains only a single heparan sulfate chain and a single small chrondroitin sulfate chain (Sanderson, R.D., and Bernfield, M. , Proc. Natl. Acad. Sci. USA (1987) 23J3: 491-497). In self-renewing epithelial cell populations, such as the epidermis or vagina, the proteoglycan is lost when the cells terminally differentiate (Hayashi, K., Hayashi, M. , Boutin, E., Cunha, G.R., Bernfield, M. , and Trelstad, R. ., J. Lab. Invest. (1988) J58_: 68-76). In embryos, the proteoglycan is transiently lost when epithelia change their shape and is transiently expressed by mesenchymal cells undergoing morphogenetic tissue interaction.

Heparan sulfate proteoglycans are ubiquitous on the surfaces of adherent cells and bind various ligands including extracellular matrix, growth factors, proteinase inhibitors, and lipoprotein lipase; see Fransson, L., Trends Biochem. Sci. (1987) _12: 406-

411. However, despite much study of these molecules, no structure is known for the core protein of any such cell surface proteoglycan.

For general background on genetic engineering, see Watson, J.D., The Molecular Biology of the Gene, 4th Ed., Benjamin, Menlo Park, Calif., (1988).

SUMMARY OF THE INVENTION Accordingly, it is an object of this invention to provide eukaryotic cells capable of providing useful quantities of syndecan and proteins of similar function from multiple species.

It is a further object of this invention to provide a recombinant DNA vector containing a heterologous segment encoding syndecan or a related protein that is capable of being inserted into a microorganism or eukaryotic cell and expressing the encoded protein.

It is still another object of this invention to provide a DNA or RNA segment of defined structure that can be produced synthetically or isolated from natural sources and that can be used in the production of the desired recombinant DNA vectors or that can be used to recover related genes from other sources.

It is yet another object of this invention to provide a peptide that can be produced synthetically in a laboratory or by a microorganism which will mimic the activity of natural syndecan core protein and which can be used to produce proteoglycans and glycosaminoglycans in eukaryotic cells in a reproducible and standardized manner. These and other objects of the invention as will hereinafter become more readily apparent have been accomplished by providing an isolated peptide having a molecular weight of from about 31 kD to about 35 kD and comprising a hydrophilic amino terminus extracellular region, a hydrophilic carboxy terminus cytoplasmic region, and a hydrophobic transmembrane region between said cytoplasmic and extracellular regions, a dibasic sequence extracellularly adjacent the transmembrane region of the peptide, and at least one glycosylation site in the extracellular region including an Xac-Xaa- Ser-Gly-Xac sequence, wherein Xac is an acidic amino acid and Xaa is any amino acid and wherein said peptide is capable of functioning as a core protein for attachment of a heparan sulfate chain at said Ser.

Particularly preferred are peptides of: (a) a first formula

M-R- R-A-A- L-W-L-W-L- C-A-L-A- L-R-L- Q-P-A- L-P-Q-I-V-A-V-N-V-P-P-E-D-Q-D-G-S-G-D-D- S-D- N-F-S- G-S-G-T-G-A-L-P-D-T-L-S-R-Q-T- P-S-^■T-W-K-^•D-V-W-L-L- T-A-T-P-T-A-P-E-P-T- S-S- N-T-E- T-A-F-T-S- V-L-P-A-G-E-K-P-E E- G-E-^■P-V-L-•H-V-E-A-E-•P-G-F-T-•A-R-D-^•K-E-K- E-V--T-T-R-^■P-R-E-T-V-^•Q-L-P-I-^■T-Q-R-^•A-S-T- V-R-•V-T-T-^•A-Q-A-A-V-^■T-S-H-P-^•H-G-G-^■M-Q-P- G-L-^■H-E-T'^■S-A-P-T-A^•P-G-Q-P-^■D-H-Q-^■P-P-R- V-E--G-G-G'^■T-S-V-I-K-^•E-V-V-E-•D-G-T-^■A-N-Q- L-P'-A-G-E ^■G-S-G-E-Q-^■D-F-T-F-^•E-T-S-^•G-E-N- T-A -V-A-A -V-E-P-G-L'^■R-N-Q-P-•P-V-D-^•E-G-A- T-G-A-S-Q-S-L-L-D-R'-K-E-V-L--G-G-V--I-A-G- G- -V-G-L-I-F-A-V-C^■L-V-A-F--M-L-Y--R-M-K- K-K-D-E-G-S-Y-S-L-E-E-P-K-Q--A-N-G--G-A-Y- Q-K-P-T-K-Q-E-E-F-Y-A

wherein A is alanine, C is cysteine, D is aspartate, E is glutamate, F is phenylalanine, G is glycine, H is histidine, I is isoleucine, K is lysine, L is leucine, M is methionine, N is asparagine, P is proline, Q is glutamine, R is arginine, S is serine, T is threonine, V is valine, W is tryptophan, and Y is tyrosine. (b) a second formula in which 1 to 10 amino acids in said first formula are replaced by different amino acids,

(c) a third formula in which from 1 to 20 amino acids are absent from either the amino terminal, the carboxy terminal, or both terminals of said first formula or said second formula, or

(d) a fourth formula in which from 1 to 10 additional amino acids are attached sequentially to the amino terminal, carboxy terminal, or both terminals of said first formula or said second formula and salts of compounds having said formulas, wherein said peptide retains an Xac-Xaa-Ser-Gly-Xac sequence capable of acting as an attachment site for heparan sulfate chain synthesis.

DNA and RNA molecules, recombinant DNA vectors, and modified microorganisms or eukaryotic cells comprising a nucleotide sequence that encodes any of the peptides indicated above are also part of the present invention. In particular, sequences comprising all or part of the following DNA sequence, a complementary DNA or RNA sequence, or a corresponding RNA sequence are especially preferred:

ATGAGACGCGCGGCGCTCTGGCTCTGGCTCTGCGCGCTGGCGCTGCGCCTGCAGCCTGCC CTCCCGCAAATTGTGGCTGTAAATGTTCCTCCTGAAGATCAGGATGGCTCTGGGGATGAC TCTGACAACTTCTCTGGCTCTGGCACAGGTGCTTTGCCAGATACTTTGTCACGGCAGACA CCTTCCACTTGGAAGGACGTGTGGCTGTTGACAGCCACGCCCACAGCTCCAGAGCCCACC AGCAGCAACACCGAGACTGCTTTTACCTCTGTCCTGCCAGCCGGAGAGAAGCCCGAGGAG GGAGAGCCTGTGCTCCATGTAGAAGCAGAGCCTGGCTTCACTGCTCGGGACAAGGAAAAG GAGGTCACCACCAGGCCCAGGGAGACCGTGCAGCTCCCCATCACCCAACGGGCCTCAACA GTCAGAGTCACCACAGCCCAGGCAGCTGTCACATCTCATCCGCACGGGGGCATGCAACCT GGCCTCCATGAGACCTCGGCTCCCACAGCACCTGGTCAACCTGACCATCAGCCTCCACGT GTGGAGGGTGGCGGCACTTCTGTCATCAAAGAGGTTGTCGAGGATGGAACTGCCAATCAG CTTCCCGCAGGAGAGGGCTCTGGAGAACAAGACTTCACCTTTGAAACATCTGGGGAGAAC ACAGCTGTGGCTGCCGTAGAGCCCGGCCTGCGGAATCAGCCCCCGGTGGACGAAGGAGCC ACAGGTGCTTCTCAGAGCCTTTTGGACAGGAAGGAAGTGCTGGGAGGTGTCATTGCCGGA GGCCTAGTGGGCCTCATCTTTGCTGTGTGCCTGGTGGCTTTCATGCTGTACCGGATGAAG AAGAAGGACGAAGGCAGCTACTCCTTGGAGGAGCCCAAACAAGCCAATGGCGGTGCCTAC CAGAAACCCACCAAGCAGGAGGAGTTCTACGCC.

DNA and RNA molecules containing segments of the larger sequence are also provided for use in carrying out preferred aspects of the invention relating to the production of such peptides by the techniques of genetic engineering and the production of oligonucleotide probes.

BRIEF DESCRIPTION OF THE FIGURES The accompanying Figures are provided to illustrate the invention but are not considered to be limiting thereof unless so specified.

Figure 1 is a formula showing the cDNA sequence for syndecan and the corresponding amino acid sequence.

Figure 2 is a restriction map showing sequencing strategy of syndecan cDNA clones.

Figure 3 is a table showing potnetial glycosylation sites of the syndecan core protein and homology of these regions to the glycosylation site of other proteins.

Figure 4 is a schematic diagram showing different regions of the syndecan core protein.

Figure 5 is a table showing DNA sequence similarities between murine syndecan and human insulin receptor.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS Using a library from mouse mammary epithelial cells, we have molecularly cloned and sequenced full length cDNAs for a cell surface proteoglycan matrix receptor and have assessed the expression of its mRNA in various tissues. The 311 amino acid core protein has a unique sequence that contains several structural features consistent with its role as a matrix anchor and as an acceptor of two distinct types of glycosaminoglycan chains. The expression of its mRNA is tissue-type specific, and both the 5' and 3' untranslated regions of its cDNA show substantial sequence homology to those of the human insulin receptor cDNA. This core protein cDNA defines a new class of matrix receptor, an integral membrane proteoglycan, for which we propose the name syndecan (from the Greek, syndein, to bind together).

Using this information a variety of recombin¬ ant DNA vectors capable of providing syndecan in reasonable quantities are provided. Additional recombinant DNA vectors of related structure that code for synthetic proteins having the key. structural features identified herein as well as for proteins of the same family from other sources can be produced from the syndecan DNA using standard techniques of recombinant DNA technology. A transformant expressing syndecan has been produced as an example of this technology. The newly discovered sequence and structure information can be used, through transfection of eukaryotic cells, to prepare proteoglycans having cleavage sequences and attachment sites that allow ready production of pure proteoglycans and glycosaminoglycans.

Since there is a known and definite correspondence between amino acids in a peptide and the DNA sequence that codes for the peptide, the DNA sequence of a DNA or RNA molecule coding for syndecan (or any of the modified peptides later discussed) can be use to derive the amino acid sequence (and vice versa, at least to the extent that degeneracy of coding). Such a sequence of nucleotides is shown in Table 1 along with the corresponding amino acid sequence. TABLE 1

1

ATGAGACGCGCGGCGCTCTGGCTCTGGCTCTGCGCGCTGGCGCTGCGCCTGCAGCCTGCC

M R R A A L W L W L C A L A L R L Q P A

21

CTCCCGCAAATTGTGGCTGTAAATGTTCCTCCTGAAGATCAGGATGGCTCTGGGGATGAC

L P Q I V A V N V P P E D Q D G S G D D

41

TCTGACAACTTCTCTGGCTCTGGCACAGGTGCTTTGCCAGATACTTTGTCACGGCAGACA

S D N F S G S G T G A L P D T L S R Q T

61

CCTTCCACTTGGAAGGACGTGTGGCTGTTGACAGCCACGCCCACAGCTCCAGAGCCCACC

P S T W K D V W L L T A T P T A P E P T

81

AGCAGCAACACCGAGACTGCTTTTACCTCTGTCCTGCCAGCCGGAGAGAAGCCCGAGGAG

S S N T E T A F T S V L P A G E K P E E

101

GGAGAGCCTGTGCTCCATGTAGAAGCAGAGCCTGGCTTCACTGCTCGGGACAAGGAAAAG

G E P V L H V E A E P G F T A R D K E K

121

GAGGTCACCACCAGGCCCAGGGAGACCGTGCAGCTCCCCATCACCCAACGGGCCTCAACA

E V T T R P R E T V Q L P I T Q R A S T

141

GTCAGAGTCACCACAGCCCAGGCAGCTGTCACATCTCATCCGCACGGGGGCATGCAACCT

V R V T T A Q A A V T S H P H G G M Q P

161

GGCCTCCATGAGACCTCGGCTCCCACAGCACCTGGTCAACCTGACCATCAGCCTCCACGT

G L H E T S A P T A P G Q P D H Q P P R

181

GTGGAGGGTGGCGGCACTTCTGTCATCAAAGAGGTTGTCGAGGATGGAACTGCCAATCAG

V E G G G T S V I K E V V E D G T A N Q

201

CTTCCCGCAGGAGAGGGCTCTGGAGAACAAGACTTCACCTTTGAAACATCTGGGGAGAAC

L P A G E G S G E Q D F T F E T S G E N

221

ACAGCTGTGGCTGCCGTAGAGCCCGGCCTGCGGAATCAGCCCCCGGTGGACGAAGGAGCC

T A V A A V E P G L R N Q P P V D E G A

241

ACAGGTGCTTCTCAGAGCCTTTTGGACAGGAAGGAAGTGCTGGGAGGTGTCATTGCCGGA

T G A S Q S L L D R K E V L G G V I A G 261

GGCCTAGTGGGCCTCATCTTTGCTGTGTGCCTGGTGGCTTTCATGCTGTACCGGATGAAG G L V G L I F A V C L V A F M L Y R M K

281

AAGAAGGACGAAGGCAGCTACTCCTTGGAGGAGCCCAAACAAGCCAATGGCGGTGCCTAC

K K D E G S Y S L E E P K Q A N G G A Y

301

CAGAAACCCACCAAGCAGGAGGAGTTCTACGCCTGA Q K P T K Q E E F Y A end

Nucleotide sequence of one strand of syndecan cDNA. The numbers refer to the amino acid sequence and corresponding DNA codon sequence beginning at the amino terminus of the protein. The stop codon is marked "end."

The trinucleotides of Table 1, termed codons, are presented as DNA trinucleotides, as they exist in the genetic material of a living organism. Complementary trinucleotide DNA sequences having opposite strand polarity are functionally equivalent to the codons of Table 1, as is understood in the art. An important and well known feature of the genetic code is its redundancy, whereby, for most of the amino acids used to make proteins, more than one coding nucleotide triplet may be employed. Therefore, a number of different nucleotide sequences may code for a given amino acid sequence. Such nucleotide sequences are considered functionally equivalent since they can result in the production of the same amino acid sequence in all organisms, although certain strains may translate some sequences more efficiently than they do others. Occasionally, a methylated variant of a purine or pyrimidine may be found in a given nucleotide sequence. Such methylations do not affect the coding relationship in any way. The equivalent codons are shown in Table 2 below. TABLE 2

GENETIC CODE

Alanine (Ala, A) GCA, GCC, GCG, GCT Arginine (Arg, R) AGA, ACG, CGA, CGC, CGG, CGT Asparagine (Asn, N) AAC, AAT Aspartic acid (Asp, D) GAC, GAT Cysteine (Cys, C) TGC, TGT

Glutamic acid (Glu, E) GAA, GAG Glutamine (Gin, Q) CAA, CAG Glycine (Gly, G) GGA, GGC, GGG, GGT Histidine (His, H) CAC, CAT Isoleucine (lie, I) ATA, ATC, ATT Leucine (Leu, L) CTA, CTC, CTG, CTT, TTA, TTG Lysine (Lys, K) AAA, AAG Methionine (Met, M) ATG Phenylalanine (Phe, F) TTC, TTT Proline (Pro, P) CCA, CCC, CCG, CCT Serine (Ser, S) AGC, AGT, TCA, TCC, TCG, TCT Threonine (Thr, T) ACA, ACC, ACG, ACT Tryptophan (Trp, W) TGG Tyrosine (Tyr, Y) TAC, TAT Valine (Val, V) GTA, GTC, GTG, GTT

Termination signal (end) TAA, TAG, TGA

Key: Each 3-letter triplet represents a trinucleotide of DNA having a 5' end on the left and a 3' end on the right. The letters stand for the purine or pyrimidine bases forming the nucleotide sequence.

A = adenine

G = guanine C = cytosine

T = thymine Since the DNA sequence of the gene has been fully identified, it is possible to produce a DNA gene entirely by synthetic chemistry, after which the gene can be inserted into any of the many available DNA vectors using known techniques of recombinant DNA technology. Thus the present invention can be carried out using reagents, plasmids, and microorganism which are freely available and in the public domain at the time of filing of this patent application. For example, nucleotide sequences greater than

100 bases long could be readily synthesized in 1984 on an Applied Biosystems Model 380A DNA Synthesizer as evidenced by commercial advertising of the same (e.g.. Genetic Engineering News, November/December 1984, p. 3). Such oligonucleotides can readily be spliced using, among others, the techniques described later in this application to produce any nucleotide sequence described herein. For example, relatively short complementary oligonucleotide sequences with 3' or 5' segments that extend beyond the complementary sequences can be synthesized. By producing a series of such short segments with "sticky" ends that hybridize with the next short oligonucleotide, sequential oligonucleotides can be joined together by the use of ligases to produce a longer oligonucleotide that is beyond the reach of direct synthesis.

Furthermore, automated equipment is also available that makes direct synthesis of any of the peptides disclosed herein readily available. In the same issue of Genetic Engineering News mentioned above, a commercially available automated peptide synthesizer having a coupling efficiency exceeding 99% is advertised (page 34). Such equipment provides ready access to the peptides of the invention, either by direct synthesis or by synthesis of a series of fragments that can be coupled using other known techniques. Recent advances in technology make synthesis of nucleotide sequences and peptides even more readily accessible.

In addition to the specific peptide sequence shown in Table 1, other peptides based on this sequence and representing minor variations thereof will have the biological activity of syndecan. In particular, proteins that lack the amino terminus first 17 amino acids are preferred since the first 17 amino acids appear to represent a signal sequence. Other variations can also be present. For example, up to 20 additional (i.e., not counting the 17-amino-acid leader sequence) amino acids can be absent from either or both terminals of the sequence given without losing ability to act as a core protein for synthesis of proteoglycans. Likewise, up to 10 additional amino acids can be present at either or both terminals. These variations are possible because the sites of glycosylation are located in more central regions of the molecule and the transmembrane region at the carboxy terminus does not need to be the full indicated length in order to be effective. Nevertheless, preferred compounds are those which more closely approach the specific formulas given (or the corresponding sequence that lacks a signal sequence) with 10 or fewer, more preferably 5 or fewer, absent amino acids being preferred for either terminal and 7 or fewer, more preferably 4 or fewer, additional amino acids being preferred for either terminal.

Within the central portion of the molecule, replacement of amino acids is more restricted in order that biological activity can be maintained. However, minor variations of the previously mentioned peptides and DNA molecules are also contemplated as being equivalent to those peptides and DNA molecules that are set forth in more detail, as will be appreciated by those skilled in the art. For example, it is reasonable to expect that an isolated replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar replacement of an amino acid with a structurally related amino acid will not have a major effect on the biological activity of the resulting molecule, especially if the replacement does not involve an amino acid at one of the glycosylation or cleavage sites. Conservative replacements are those that take place within a family of amino acids that are related in their side chains. Genetically encoded amino acids are generally divided into four families: (1) acidic = aspartate, glutamate; (2) basic = lysine, arginine, histidine; (3) nonpolar = alanine, valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan; and (4) uncharged polar = glycine, asparagine, glutamine, cystine, serine, threonine, tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified jointly as aromatic amino acids. Whether a change results in a functioning peptide can readily be determined by assessing the ability of the corresponding DNA coding for this peptide to produce this peptide in glycosylated form when introduced into eukaryotic cells. Examples of this process are described later in detail. If attachment of glycosaminoglycan chains occurs, the replacement is immaterial, and the molecule being tested is equivalent to those specifically described above. Peptides in which more than one replacement has taken place can readily be tested in the same manner. The number of replacements is not strictly limited, but 10 or fewer are preferred.

DNA molecules that code for such peptides can readily be determined from the list of codons in Table 2 and are likewise contemplated as being equivalent to the DNA sequence of Table 1. In fact, since there is a fixed relationship between DNA codons and amino acids in a peptide, any discussion in this application of a replacement or other change in a peptide is equally applicable to the corresponding DNA sequence or to the DNA molecule, recombinant vector, transformed microorganism, or transfected eukaryotic cells in which the sequence is located (and vice versa). Codons can be chosen for use in a particular host organism in accordance with the frequency with which a particular codon is utilized by that host, if desired, to increase the rate at which expression of the peptide occurs. In addition to the specific nucleotides listed in Table 1, DNA (or corresponding RNA) molecules of the invention can have additional nucleotides preceeding or following those that are specifically listed. For example, poly A can be added to the 3'-terminal, short (e.g., fewer than 20 nucleotides) sequence can be added to either terminal to provide a terminal sequence corresponding to a restriction endonuclease site, stop codons can follow the peptide sequence to terminate transcription, and the like. Additionally, DNA molecules containing a promoter region or other control region upstream from the gene can be produced. All DNA molecules containing the sequences of the invention will be useful for at least one purpose since all can minimally be fragmented to produce oligonucleotide probes and be used in the isolation of additional DNA from biological sources. RNA molecules are said to correspond to DNA molecules if they encode the same amino acids and/or control sequences.

Peptides of the invention can be prepared for the first time as purified preparations, either by direct synthesis or by using a cloned gene as described herein. By "purified" is meant, when referring to a peptide or DNA or RNA sequence, that the indicated molecule is present in the substantial absence of other biological macromolecules of the same type. The term

"purified" as used herein preferably means at least 95% by weight, more preferably at least 99% by weight, and most preferably at least 99.8% by weight, of biological macromolecules of the same type present (but water, buffers, and other small molecules, especially molecules having a molecular weight of less than 1000, can be present). The term "pure" as used herein preferably has the same numerical limits as "purified" immediately above. The term "isolated" as used herein refers to a peptide, DNA, or RNA molecule separated not only from other peptides, DNAs, or RNAs, respectively, that are present in the natural source of the macromolecule but also from other macromolecules and preferrably refers to a macromolecule found in the presence of (if anything) only a solvent, buffer, ion or other low molecular weight component normally present in a solution of the same. "Isolated" and

"purified" do not encompass either natural materials in their native state or natural materials that have been separated into components (e.g., in an acylamide gel) but not obtained either as pure substances or as solutions.

Two protein sequences (or peptides derived from them of at least 30 amino acids in length) are homologous (as this term is preferably used in this specification) if they have an alignment score of >5 (in standard deviation units) using the program ALIGN with the mutation data matrix and a gap penalty of 6 (or greater). See Dayhoff, M.O., in Atlas of Protein Sequence and Structure, 1972, volume 5, National Biomedical Research Foundation, pp. 101-110, and Supplement 2 to this volume, pp. 1-10. The two sequences (or parts thereof— robably at least 30 amino acids in length) are more preferably homologous if their amino acids are greater than or equal to 50% identical when optimally aligned using the ALIGN program mentioned above. Two DNA sequences (or a DNA and RNA sequence) are homologous if they hybridize to one another using nitrocellulose filter hybridization (one sequence bound to the filter, the other as a ³2_p_ labeled probe) using hybridization conditions of 40-50% formamide, 37°-42° C, 4x SSC and wash conditions (after several room temperature washes with 2x SSC, 0.05% SDS) of stringency equivalent to 37° C with lx SSC, 0.05% SDS. The number of preferred hyberdization conditions are set forth in the examples that follow.

The phrase "replaced by" or "replacement" as used herein does not necessarily refer to any action that must take place but to the peptide that exists when an indicated "replacement" amino acid is present in the same position as the amino acid indicated to be present in a different formula (e.g., when leucine is present at position 5 instead of isoleucine). Salts of any of the macromolecules described herein will naturally occur when such molecules are present in (or isolated from) aqueous solutions of various pHs. All salts of peptides and other macromolecules having the indicated biological activity are considered to be within the scope of the present invention. Examples include alkali, alkaline earth, and other metal salts of carboxylic acid residues, acid addition salts (e.g., HC1) of amino residues, and zwitter ions formed by reactions between carboxylic acid and amino residues within the same molecule.

Hydrophobic and hydrophilic regions can be determined by standard procedures from amino acid sequences, for example by plotting hydrophobicity according to the procedure of Kyte and Doolittle, J_-_ Mol. Biol. (1982) 157: 105-132. Plotted values averaged over groups of seven contiguous residues that are positive indicate hydrophobic regions, while negative values indicate hydrophilic regions.

The invention has specifically contemplated each and every possible variation of peptide or nucleotide that could be made by selecting combinations based on the possible amino acid and codon choices listed in Table 1 and Table 2, and all such variations are to be considered as being specifically disclosed.

In a preferred embodiment of the invention, genetic information encoded as mRNA is obtained from cultured epithelial cells, preferably from mammalian sources, and used in the construction of a DNA gene, which is in turn used to produce a peptide of the invention. An initial crude cell suspension is sonicated or otherwise treated to disrupt cell membranes so that a crude cell extract is obtained. Known techniques of biochemistry (e.g., preferential precipitation of proteins) can be used for initial purification if desired. The crude cell extract, or a partially purified RNA portion therefrom, is then treated to further separate the RNA. For example, crude cell extract can be layered on top of a 5 ml cushion of 5.7 M CsCl, 10 mM Tris-HCl, pH 7.5, 1 mM EDTA in a 1 in. _ 3 | in. nitrocellulose tube and centrifuged in an SW27 rotor (Beckman Instruments Corp., Fullerton, Calif.) at 27,000 rpm for 16 hrs at 15°C. After centrifugation, the tube contents are decanted, the tube is drained, and the bottom _- cm containing the clear RNA pellet is cut off with a razor blade. The pellets are transferred to a flask and dissolved in 20 ml 10 mM Tris-HCl, pH 7.5, 1 mM EDTA, 5% sarcosyl and 5% phenol. The solution is then made 0.1 M in NaCl and shaken with 40 ml of a 1:1 phenol:chloroform mixture. RNA is precipitated from the aqueous phase with ethanol in the presence of 0.2 M Na-acetate pH 5.5 and collected by centrifugation. Any other method of isolating RNA from a cellular source may be used instead of this method.

Various forms of RNA may be employed such as polyadenylated, crude or partially purified messenger RNA, which may be heterogeneous in sequence and in molecular size. The selectivity of the RNA isolation procedure is enhanced by any method which results in an enrichment of the desired mRNA in the heterodisperse population of mRNA isolated. Any such prepurification method may be employed in preparing a gene of the present invention, provided that the method does not introduce endonucleolytic cleavage of the mRNA.

Prepurification to enrich for desired mRNA sequences may also be carried out using conventional methods for fractionating RNA, after its isolation from the cell. Any technique which does not result in degradation of the RNA may be employed. The techniques of preparative sedimentation in a sucrose gradient and gel electrophoresis are especially suitable.

The mRNA must be isolated from the source cells under conditions which preclude degradation of the mRNA. The action of RNase enzymes is particularly to be avoided because these enzymes are capable of hydrolytic cleavage of the RNA nucleotide sequence. A suitable method for inhibiting RNase during extraction from cells involves the use of 4 M guanidium thiocyanate and 1 M mercaptoethanol during the cell disruption step. In addition, a low temperature and a pH near 5.0 are helpful in further reducing RNase degradation of the isolated RNA.

Generally, mRNA is prepared essentially free of contaminating protein, DNA, polysaccharides and lipids. Standard methods are well known in the art for accomplishing such purification. RNA thus isolated contains non-messenger as well as messenger RNA. A convenient method for separating the mRNA of eukaryotes is chromatography on columns of oligo-dT cellulose, or other oligonucleotide-substituted column material such as poly-U or poly-T Sepharose, taking advantage of the hydrogen bonding specificity conferred by the presence of polyadenylic acid on the 3' end of eukaryotic mRNA. Hybridization with oligonucleotide probes prepared from DNA sequences set forth in this specification can then be used to isolate the particularly desired mRNA.

The next step in most methods is the formation of DNA commplementary to the isolated heterogeneous sequences of mRNA. The enzyme of choice for this reaction is reverse transcriptase, although in principle any enzyme capable of forming a faithful complementary DNA copy of the mRNA template could be used. The reaction may be carried out under conditions described in the prior art, using mRNA as a template and a mixture of the four deoxynucleoside triphosphates, dATP, dGTP, dCTP, and dTTP, as precursors for the DNA strand. It is convenient to provide that one of the deoxynucleoside triphosphates be labeled with a radioisotope, for example ³²P in the alpha position, in order to monitor the course of the reaction, to provide a tag for recovering the product after separation procedures such as chromatography and. electrophoresis, and for the purpose of making quantitative estimates of recovery. The cDNA transcripts produced by the reverse transcriptase reaction are somewhat heterogeneous with respect to sequences at the 5' end and the 3' end due to variations in the initiation and termination points of individual transcripts, relative to the mRNA template. The variability at the 5' end is thought to be due to the fact that the oligo-dT primer used to initiate synthesis is capable of binding at a variety of loci along the polyadenylated region of the mRNA. Synthesis of the cDNA transcript begins at an indeterminate point in the poly-A region, and variable length of poly-A region is transcribed depending on the inital binding site of the oligo-dT primer. It is possible to avoid this indeterminacy by the use of a primer containing, in addition to an oligo-dT tract, one or two nucleotides of the RNA sequence itself, thereby producing a primer which will have a preferred and defined binding site for initiating the transcription reaction.

The indeterminacy at the 3'-end of the cDNA transcript is due to a variety of factors affecting the reverse transcriptase reaction, and to the possiblity of partial degradation of the RNA template. The isolation of specific cDNA transcripts of maximal length is greatly facilitated if conditions for the reverse transcriptase reaction are chosen which not only favor full length synthesis but also repress the synthesis of small DNA chains. Preferred reaction conditions for avian myeloblastosis virus reverse transcriptase are given in the examples section of U.S. Patent 4,363,877 and are herein incorporated by reference. The specific parameters which may be varied to provide maximal production of long-chain DNA transcripts of high fidelity are reaction temperature, salt concentration, amount of enzyme, concentration of primer relative to template, and reaction time. The conditions of temperature and salt concentration are chosen so as to optimize specific base-pairing between the oligo-dT primer and the polyadenylated portion of the RNA template. Under properly chosen conditions, the primer will be able to bind at the polyadenylated region of the RNA template, but non-specific initiation due to primer binding at other locations on the template, such as short, A-rich sequences, will be substantially prevented. The effects of temperature and salt are interdependent. Higher temperatures and low salt concentrations decrease the stability of specific base-pairing interactions. The reaction time is kept as short as possible, in order to prevent non-specific initiations and to minimize the opportunity for degradation. Reaction times are interrelated with temperature, lower temperatures requiring longer reaction times. At 42°C, reactions ranging from 1 min. to 10 minutes are suitable. The primer should be present in 50 to 500- fold molar excess over the RNA template and the enzyme should be present in similar molar excess over the RNA template. The use of excess enzyme and primer enhances initiation and cDNA chain growth so that long-chain cDNA transcripts are produced efficiently within the confines of the short incubation times.

In many cases, it will be possible to further purify the cDNA using single-stranded cDNA sequences transcribed from mRNA. However, as discussed below, there may be instances in which the desired restriction enzyme is one which acts only on double-stranded DNA. In these cases, the cDNA prepared as described above may be used as a template for the synthesis of double- stranded DNA, using a DNA poly erase such as reverse transcriptase and a nuclease capable of hydrolyzing single-stranded DNA. Methods for preparing double- stranded DNA in this manner have been described in the prior art. See, for example, Ullrich, A., Shine, J., Chirgwin, J. Pictet, R., Tischer, E., Rutter, W.J. and Goodman, H.M. , Science (1977) 196:1313. If desired, the cDNA can be purified further by the process of U.S. Patent 4,363,877, although this is not essential. In this method, heterogeneous cDNA, prepared by transcription of heterogeneous mRNA sequences, is treated with one or two restriction endonucleases. The choice of endonuclease to be used depends in the first instance upon a prior determination that recognition sites for the enzyme exist in the sequence of the cDNA to be isolated. The method depends upon the existence of two such sites. If the sites are identical, a single enzyme will be sufficient. The desired sequence will be cleaved at both sites, eliminating size heterogeneity as far as the desired cDNA sequence is concerned, and creating a population of molecules, termed fragments, containing the desired sequence and homogeneous in length. If the restriction sites are different, two enzymes will be required in order to produce the desired homogeneous length fragments.

The choice of restriction enzyme(s) capable of producing an optimal length nucleotide sequence fragment coding for all or part of the desired protein must be made empirically. If the amino acid sequence of the desired protein is known, it is possible to compare the nucleotide sequence of uniform length nucleotide fragments produced by restriction endonuclease cleavage with the amino acid sequence for which it codes, using the known relationship of the genetic code common to all forms of life. A complete amino acid sequence for the desired protein is not necessary, however, since a reasonably accurate identification may be made on the basis of a partial sequence. Where the amino acid sequence of the desired protein is now known, the uniform length polynucleo- tides produced by restriction endonuclease cleavage may be used as probes capable of identifying the synthesis of the desired protein in an appropriate in vitro protein synthesizing system. Alternatively, the mRNA may be purified by affinity chromatography. Other techniques which may be suggested to those skilled in the art will be appropriate for this purpose.

The number of restriction enzymes suitable for use depends upon whether single-stranded or double- stranded cDNA is used. The preferred enzymes are those capable of acting on single-stranded DNA, which is the immediate reaction product of mRNA reverse transcription. The number of restriction enzymes now known to be capable of acting on single-stranded DNA is limited. The enzymes Haelll, Hhal and Hin(f)I are presently known to be suitable. In addition, the enzyme MboII may act on single-stranded DNA. Where further study reveals that other restriction enzymes can act on single-stranded DNA, such other enzymes may appropriately be included in the list of preferred enzymes. Additional suitable enzymes include those specified for double-stranded cDNA. Such enzymes are not preferred since additional reactions are required in order to produce double-stranded cDNA, providing increased opportunities for the loss of longer sequences and for other losses due to incomplete recovery. The use of double-stranded cDNA presents the additional technical disadvantages that subsequent sequence analysis is more complex and laborious. For these reasons, single-stranded cDNA is prefered, but the use of double-stranded DNA is feasible. In fact, the present invention was initially reduced to practice using double-stranded cDNA.

The cDNA prepared for restriction endonuclease treatment may be radioactively labeled so that it may be detected after subsequent separation steps. A preferred technique is to incorporate a radioactive label such as ^P in the alpha position of one of the four deoxynucleoside triphosphate precursors. Highest activity is obtained when the concentration of radioactive precursor is high relative to the concentration of the non-radioactive form. However, the total concentration of any deoxynucleoside triphosphate should be greater than 30 yM, in order to maximize the length of cDNA obtained in the reverse transcriptase reaction. See Efstratiadis, A., Maniatis, T., Kafatos, F.C., Jeffrey, A., and Vournakis, J.N., Cell, (1975) :367. For the purpose of determining the nucleotide sequence of cDNA, the 5' ends may be conveniently labeled with ³²P in a reaction catalyzed by the enzyme polynucleotide kinase. See

Maxam, A.M. and Gilbert, W., Proc. Natl. Acad. Sci. USA (1977) 74:560.

Fragments which have been produced by the action of a restriction enzyme or combination of two restriction enzymes may be separated from each other and from heterodisperse sequences lacking recognition sites by any appropriate technique capable of separating polynucleotides on the basis of differences in length. Such methods include a variety of electrophoretic techniques and sedimentation techniques using an ultracentrifuge. Gel electrophoresis is preferred because it provides the best resolution on the basis of polynucleotide length. In addition, the method readily permits quantitative recovery of separated materials. Convenient gel electrophoresis methods have been described by Dingman, C.W., and Peacock, A.C., Biochemistry (1968) 1_: 659 , and by Maniatis, T., Jeffrey, A. and van de Sande, H., Biochemistry (1975) 1 :3787.

Prior to restriction endonuclease treatment, cDNA transcripts obtained from most sources will be found to be heterodisperse in length. By the action of a properly chosen restriction endonuclease, or pair of endonucleases, polynucleotide chains containing the desired sequence will be cleaved at the respective restriction sites to yield polynucleotide fragments of uniform length. Upon gel electrophoresis, these will be observed to form a distinct band. Depending on the presence or absence of restriction sites on other sequences, other discrete bands may be formed as well, which will most likely be of different length than that of the desired sequence. Therefore, as a consequence of restriction endonuclease action, the gel electrophoresis pattern will reveal the appearance of one or more discrete bands, while the remainder of the cDNA will continue to be heterodisperse. In the case where the desired cDNA sequence comprises the major polynucleotide species present, the electrophoresis pattern will reveal that most of the cDNA is present in the discrete band.

Although it is unlikely that two different sequences will be cleaved by restriction enzymes to yield fragments of essentially similar length, a method for determining the purity of the defined length fragments is desirable. Sequence analysis of the electrophoresis band may be used to detect impurities representing 10% or more of the material in the band. A method for detecting lower levels of impurities has been developed founded upon the same general principles applied in the initial isolation method. The method requires that the desired nucleotide sequence fragment contain a recognition site for a restriction endonuclease not employed in the initial isolation. Treatment of polynucleotide material, eluted from a gel electrophoresis band, with a restriction endonuclease capable of acting internally upon the desired sequence will result in cleavage of the desired sequence into two sub-fragments, most probably of unequal length. These sub-fragments upon electrophoresis will form two discrete bands at positions corresponding to their respective lengths, the sum of which will equal the length of the polynucleotide prior to cleavage. Contaminants in the original band that are not susceptible to the restriction enzyme may be expected to migrate to the original position. Contaminants containing one or more recognition sites for the enzyme may be expected to yield two or more sub-fragments. Since the distribution of recognition sites is believed to be essentially random, the probability that a contaminant will also yield sub-fragments of the same size as those of the fragment of desired sequence is extremely low. The amount of material present in any band of radioactively labeled polynucleotide can be determined by quantitative measurement of the amount of radioactivity present in each band, or by any other appropriate method. A quantitative measure of the purity of the fragments of desired sequence can be obtained by comparing the relative amounts of material present in those bands representing sub-fragments of the desired sequence with the total amount of material. Following the foregoing separation or any other technique that isolates the desired gene, the sequence may be reconstituted. The enzyme DNA ligase, which catalyzes the end-to-end joining of DNA fragments, may be employed for this purpose. The gel electrophoresis bands representing the sub-fragments of the desired sequence may be separately eluted and combined in the presence of DNA ligase, under the appropriate conditions. See Sgaramella, V., Van de Sande, J.H., and Khorana, H.G., Proc. Natl. Acad. Sci. USA (1970) £7:1468. Where the sequences to be joined are not blunt-ended, the ligase obtained from E. coli may be used; Modrich, P., and Lehman, I.R., J. Biol. Chem. (1970) 245:3626. The efficiency of reconstituting the original sequence from sub-fragments produced by restriction endonuclease treatment will be greatly enhanced by the use of a method for preventing reconstitution in improper sequence. This unwanted result is prevented by treatment of the homogeneous length cDNA fragment of desired sequence with an agent capable of removing the 5'-terminal phosphate groups on the cDNA prior to cleavage of the homogeneous cDNA with a restriction endonuclease. The enzyme alkaline phosphatase is preferred. The 5'-terminal phosphate groups are a structural prerequisite for the subsequent joining action of DNA ligase used for reconstituting the cleaved sub-fragments. Therefore, ends which lack a 5'-terminal phosphate cannot be covalently joined. The DNA sub-fragments can only be joined at the ends containing a 5'-phosphate generated by the restriction endonuclease cleavage performed on the isolated DNA fragment.

The majority of cDNA transcripts, under the conditions described above, are derived from the mRNA region containing the 5'-end of the mRNA template by specifically priming on the same template with a fragment obtained by restriction endonuclease cleavage. In this way, the above-described method may be used to obtain not only fragments of specific nucleotide sequence related to a desired protein, but also the entire nucleotide sequence coding for the protein of interest. Double-stranded, chemically synthesized oligonucleotide linkers, containing the recognition sequence for a restriction endonuclease, may be attached to the ends of the isolated cDNA, to facilitate subsequent enzymatic removal of the gene portion from the vector DNA. See Scheller e_t al. , Science (1977) 196:177. The vector DNA is converted from a continuous loop to a linear form by treatment with an appropriate restriction endonuclease. The ends thereby formed are treated with alkaline phosphatase to remove 5'-phosphate end groups so that the vector DNA may not reform a continuous loop in a DNA ligase reaction without first incorporating a segment of the syndecan DNA. The cDNA, with attached linker oligonucleotides, and the treated vector DNA are mixed together with a DNA ligase enzyme, to join the cDNA to the vector DNA, forming a continuous loop of recombinant vector DNA, having the cDNA incorporated therein. Where a plasmid vector is used, usually the closed loop will be the only form able to transform a bacterium. Transformation, as is understood in the art and used herein, is the term used to denote the process whereby a microorganism incorporates extracellular DNA and reproduces it stably from generation to generation. Plasmid DNA in the form of a closed loop may be so incorporated under appropriate environmental conditions. The incorporated closed loop plasmid undergoes replication in the transformed cell, and the replicated copies are distributed to progeny cells when cell division occurs. As a result, a new cell line is established, containing the plasmid and carrying the genetic determinants thereof. Transformation by a plasmid in this manner, where the plasmid genes are maintained in the cell line by plasmid replication, occurs at high frequency when the transforming plasmid DNA is in closed loop form, and does not or rarely occurs if linear plasmid DNA is used. Once a recombinant vector has been made, transformation of a suitable microorganism is a straightforward process, and novel microorganism strains containing the syndecan gene or a related gene may readily be isolated, using appropriate selection techniques as is understood in the art.

Using these general techniques specifically as set forth in the following examples, we have isolated cDNA clones encoding the syndecan polypeptide from a normal mouse mammary gland epithelial cell line as well as mouse liver tissue. The cDNA derived protein sequence of syndecan is unique; comparisons with the National Biomedical Research Foundation and the translated NIH-Genebank databases detected no statistically significant similarities. The nascent polypeptide sequence is 311 amino acids and has -a molecular mass of 32,868 daltons. Treatment of syndecan with heparatinase and chondroitinase ABC generates a protein with relative mobility of ca. 69k daltons versus globular molecular weight markers on a gradient SDS-PAGE system. Treatment of the ectodomain with anhydrous HF for 1.5 hrs at 0°C, Mort, A.J. and Lamport, D.T.A., Anal. Biochem. (1977) Σ2: 289-309, yields a protein that migrates as a broad band at ca. 46k daltons, Weitzhandler, M. , Streeter, H.B., Henzel, W.J., and Bernfield, M., J. Biol. Chem. (1988) 263: 6949-6952. These core protein sizes as measured by SDS-PAGE are larger than would be predicted based on the cDNA and any incompletely removed carbohydrate. This anomoly appears to be a charge effect and has been seen in other proteins rich in proline, alanine, and highly charged amino acides. Syndecan is not a disulfide cross-linked dimer. Its migration on SDS-PAGE is unchanged following DTT treatment; its CNBr-cleavage product produces a single signal during amino acid sequencing; and its single cysteine in the predicted mature protein is located in the putative transmembrane domain. It also does not appear to be cross-linked by lysyl oxidase- or transglutaminase- mediated reactions because β-aminoproprionitrile and monodansylcadaverine treatments of NMuMG cells do not change its mobility on SDS-PAGE. Proteins with regions rich in proline, alanine and highly charged amino acids have highly extended conformations and anomalously slow mobilities in SDS-PAGE, Guest, J.R., Lewis, H.M. , Graham, L.D., Packman, L.C., and Perham, R.N., J. Mol. Biol. (1985) 185: 743-754. These amino acids are abundant in syndecan, and a Chou and Fasman secondary structure prediction is consistent with large regions of extended conformation. In vitro translation of synthetic mRNA corresponding to the coding region of syndecan (Sacl-Hindlll fragment of clone 4-19b) produces a nascent polypeptide of ca. 45k daltons. Therefore, while we have not excluded the possiblity of other post-translational modifications, the bulk of the size difference probably reflects anomalous gel migration on SDS-PAGE. The amino acid sequence derived from the syndecan cDNA shows three functional domains; an extracellular domain and, by inference, transmembrane and cytoplasmic domains.

The transmembrane domain was inferred from the physical properties of syndecan. The derived C- terminal sequence of syndecan contains both a characterics transmembrane domain (amino acids 253 to 277 in Table 1) and a 34 amino acid putative cytoplasmic domain. The cytoplasmic domain was inferred from properties already known for purified syndecan indicating that syndecan associates with the actin cytoskeleton. An immune serum generated against a synthetic peptide from the C-terminus of the derived protein sequence reacts with native syndecan extracted from NMuMG cells but not with the ectodomain, providing direct evidence for the cytoplasmic domain. The ectodomain of syndecan is released from

NMuMG cell surfaces during cell culture, rapidly in response to cell rounding, or by mild trypsin treatment. The putative extracellular domain of syndecan contains a single dibasic site near the plasma membrane at which cleavage of syndecan from the cell surface undoubtedly occurs. Because the endogenously shed ectodomain of syndecan is indistinguishable from the trypsin-released form, a cell surface trypsin-like protease has been proposed. Shedding during cell culture is from the apical surface. However, when these cells are released from the substratum, destroying their polarity, the ectodomain is rapidly shed. These previously known results suggest that a cell surface protease is involved, but the structure of the site was not known. Identification of the putative cleavage site by the present invention will now allow more detailed investigation of this activity and will allow production of modified proteoglycans and other proteins that can be readily cleaved to release their extracellular regions for ready purification.

Syndecan isolated from several sources is a hybrid proteoglycan, containing both chondroitin sulfate and heparan sulfate. These chains are known to be linked via a xyloside to serine residues in proteins, Roden, L., The Biochemistry of Glycoproteins and Proteoglycans (1980) 267-371 and Dorfman, A., Cell Biology of Extracellular Matrix (1981) 115-138. Regulating the elaboration of both chondroitin sulfate and heparan sulfate chains on the same core protein is a significant problem because the intial four saccharides are identical. The synthesis of both types of chains is initiated by a xylosyltransferase that resides in either the endoplasmic reticulum or the Golgi, see Farquhar, M.G., Ann. Rev. Cell Biol. (1985) 1: 447-488, and by three Golgi-localized glycosyltransferases, Geetha-Habib, M., Campbell, S.C., Schwartz, N.B., J. Biol. Chem. (1984) ^5 : 7300-7310. Specific chain elongation subsequently involves the sequential action of an N-acetylgalactosaminyltransfer- ase and a glucuronosyltransferse for chondroitin sulfate, and an N-acetylglucosaminyltransferase and a glucuronosyltransferase for heparan sulfate. This specific chain elongation must involve recognition of unique structural features of the core protein, indicating that distinct peptide sequences might exist at chondroitin sulfate versus heparan sulfate attachment sites. The presence of both chondroitin sulfate and heparan sulfate on syndecan provides the opportunity to assess the relationship between these attachment sites. Based on the core protein sequence of three chondroitin sulfate proteoglycans, PG-19, PG- 40, and invariant chain and the reactivity of a xylosyltransferase with synthetic peptides. Bourdon, M.A., Krusius, T., Campbel, S., Schwartz, N.B., and Ruoslahti, E., Proc. Natl. Acad. Sci. USA (1987) ^4: 3194-3198, proposed that the xylose acceptor sequence for chondroitin sulfate in these proteoglycans is acidic-acidic-Xaa-Ser-Gly-Xaa-Gly. Syndecan contains five ser-gly sequences; the two in its single Ser-Gly- Ser-Gly repeat closely match this previously proposed acceptor sequence (Figure 3A) . Interestingly, although this consensus acceptor sequence is located near the N- terminus of syndecan and near the C-terminus of invariant chain, it is distant from the plasma membrane on both proteins.

Syndecan contains three potential ser-gly glycosaminoglycan attachment sites that contain some features of this consensus acceptor sequence but also contain unique features (Figure 3B). Though each of these three sequences retains an acidic amino acid two residues N-terminal to the acceptor Ser-Gly, they lack the consensus glycine that is two residues C-terminal to the Ser-Gly. This omission does not preclude this sequence from serving as a xylosyltransferase acceptor because it is also omitted from the Gly-Ser site of type IX collagen, Huber, S., Winterhalter, K.H., and Vaughan, L., J. Biol. Chem. (1988) 26^: 752-756. The unique feature of these three sequences is the consistent finding of an acidic amino acid C-terminal to the Ser-Gly (Figure 3B) . In contrast, the analogous amino acids in the chondroitin sulfate proteoglycans PG-19, PG-40, and invariant chain are either uncharged or hydrophobic. These three Xac-Xaa-Ser-Gly-Xac sites in sydecan appear to represent unique recognition sequences for the elongation of glycosaminoglycan chains, especially heparan sulfate chains. An artificial peptide containing a heparan sulfate elongation site of the formula Xac-Xaa-Ser-Gly-Xac, where Xac is an acidic amino acid (aspartate or glutamate) and Xaa is any amino acid, can be prepared and used to produce heparan sulfate in eukaryotic cells as described herein. The artificial peptide need not contain any of the remaining structure of the molecules described herein as long as it provides the indicated sequence at a location in the peptide that is available for glycosylation. Such locations can be predicted, such as by using the algorithms developed by Chou and Fasman, or by empirically inserting a DNA sequence encoding this amino acid sequence into a gene and determing that the product functions as a recognition sequence for the elongation of heparan sulfate chains. A simple artificial peptide, for example, might contain multiple copies of the recognition sequence either located directly adjacent to each other or being joined by from one to ten, preferably one to five, amino acids. Another preferred embodiment involves producing a known polypeptide by genetic engineering that has been engineered to contain the attachment site of the invention at a location known to reside on an external surface of the polypeptide. On the other hand, although sequences from the natural syndecan amino acid sequences adjacent the Xac- Xaa-Ser-Gly-Xac sequences are not required, they may be retained if desired in order to produce a protein that more closely resembles syndecan. Accordingly, artifical peptides containing from 1 to 10, 20, 30, or even more naturally adjacent amino acids as shown in Table 1, located either C terminal or N terminal or both to the Xac-Xaa-Ser-Gly-Xac sequence, represent other viable embodiments of the invention. Proteins containing such longer sequences can be prepared in the same manner discussed above using corresponding longer DNA sequences encoding the desired region.

The number of chondroitin sulfate chains on syndecan apparently differs in cells of distinct cellular organization and changes in response to TGF-β, implying that each potential glycosaminoglycan ^• attachment site is not always utilized. A possible novel regulatory mechanism for this variation is suggested by the location in syndecan of its single potential N-linked glycosylation site, Asn-Phe-Ser, at residues 43-45. This site is located within the putative chondroitin sulfate attachment sequence, and the attachment of an N-linked sugar at this site would likely prevent subsequent recognition by the xylosytransferase.

Of a wide variety of mature tissues examined with antibody 281-2, syndecan is expressed mainly in epithelia. Northern blot analysis of mRNA revealed two mRNA species at 2.6 and 3.4kb (constant ratio 3:1 respectively) in NMuMG cells as well as skin, liver, and midpregnant mammmary gland, all containing immunoreactive syndecan. In contrast, these two mRNAs were undetectable in cardiac and skeletal muscle, tissues of mesenchymal origin that do not stain with 281-2. However primitive and embryonic mesenchymal cells also show the 2.6 and 3.4kb mRNA species. A 4.5 kb mRNA was detected in adult cerebrum, which does not react in fixed tissue sections with the antibody, Hayashi, K., Hayashi, M, Jalkanen, M., Firestone, J.H., Trelstad, R.L., and Bernfield, M. , J. Histochem. Cytochem. (1987) 35: 1079-1088. The cDNA sequence reported here corresponds to the smaller (2.6 kb) and more abundant of the two mRNAs. Though the relationship between the 2.6 and 3.4 kb mRNAs is unknown, they are likely generated by usage of alternative polyandenylation sites. Probes from both 5' and 3' regions of the syndecan cDNA hybridized identically to these two mRNAs in Northern blot analysis. Moreover, the primer-extended library contained clones identical to the 5' end of clone 4- 19B. The relationship of the 4.5 kb mRNA identified in cerebrum to the others is unknown because clones have not yet been characterized from cerebral cDNA libraries.

Sequence alignments demonstrate similarity at the nucleotide level between the mouse syndecan and human insulin receptor cDNA sequences. The insulin receptor sequence is set forth in Ebina, Y., Ellis, L., Jarnagin, K., Edery, M. , Graf, L., Clauser, E., Ou, J., Masiarz, F., Kan, Y.W., Golfine, I.D., Roth, R.A., and Rutter, W.J., Cell (1985) £0: 747-758. Alignment of these sequences (University of Wisconsin GCG Bestfit program) places the putative start ATGs near the middle of a region of similarity; a 99 bp region of syndecan which spans its 5'-untranslated and initial coding sequences is 67% identical, with four small gaps, to the analogous region of the human insulin receptor

(Figure 5A) . The location of this similarity and the large size of the 5'-untranslated regions suggest that these sequences are shared translational control elements, as has been described for the 5'-untranslated region of the mRNAs for ferritin, Aziz, N., and Munro, H.N., Proc. Natl. Acad. Sci. USA (1987) 8_4: 8478-8482 and Casey, J.L., Hentze, M.W., Koeller, D.M., Caughman, S.W. , Rouault, T.A., Klausner, R.D., and Harford, J.B., Science (1988) 240: 924-928, and the B polypeptide of platelet-derived growth factor, Ratner, L., Theilan, B., and Collins, T., Nucleic Acids Res. (1987) 15: 6017-6036. There is also a second region of similarity between these cDNAs in their 3 '-untranslated regions; a 35 bp T-rich sequence of syndecan is 80% identical (no gaps) with a sequence of the human insulin receptor (Figure 5B) . These identical sequences in both 5' and 3'-untranslated regions between the mouse syndecan and human insulin receptor mRNAs suggest that post- transcriptional controls are shared by these two molecules.

A number of fine-structure aspects of syndecan can be seen by references to DNA and amino acid sequences. Starting at the indicated AUG (Figure 1), the syndecan cDNA codes for a protein of 311 amino acids containing two hydrophobic stretches. The derived sequence suggests several domains and structural features; their presumed arrangement is summarized in Figure 4.

The first hydrophobic stretch consists of 12 amino acids beginning shortly after the presumptive start methionine. Because syndecan is oriented with its N-terminus outside of the plasma membrane, this appears to be a signal sequence. The N-terminus of mature syndecan is blocked, and, therefore, it has not been possible to determine the N-terminus directly. A likely site for signal peptidase cleavage is following amino acid residue 17 (Figure 1) in the predicted sequence. Cleavage at this site would generate an N- terminal glutamine which could readily cyclize forming a pyrrolidone carboxlyl residue and thus a blocked N- terminus, as exists in a number of eukaryotic proteins.

The second hydrophobic stretch is a sequence near the C-terminus which has characteristics of a transmembrane domain (thick underline. Figure 1). This sequence is a highly hydrophobic stretch of 25 residues followed immediately by a series of highly charged residues, consistent with the stop transfer signals found following most membrane spanning domains. This domain also contains the only cysteine and one of the four tyrosines in the apparant mature protein sequence. The putative transmembrane domain defines two hydrophilic domains of the syndecan core protein, a putative extracellular domain consisting of approximately 235 amino acids, and a smaller putative cytoplasmic domain consisting of 34 amino acids. This orientation with respect to the plasma membrane is confirmed by the reactivity of immune serum directed either against a peptide containing the C-terminal seven amino acids or against the ectodomain of syndecan. The anti-C-terminus immune serum recognizes the hydrophobic native form of syndecan, but is unreactive with the non-hydrophobic ectodomain. In contrast, the anti-ectodomain immune serum recognizes both forms of the molecule.

The putative cytoplasmic domain contains three tyrosine residues, but the sequences adjacent to these tyrosines are not similar to the presently identified consensus sequences for tyrosine phosphorylation, Hunter, T., and Cooper, J.A., Ann. Rev. Biochem. (1985) 54: 879-930. This domain presumably has protein binding activity because the intact proteoglycan but not the ectodomain co-sediments with F-actin, Rapraeger, A., and Bernfield, M. , Extracelluar Matrix (1982) 265-269, and because syndecan associates with the actin-containing cytoskeleton when cross-linked at the cell surface, Rapraeger, A., Jalkanen, M. , and Bernfield, M. , J. Cell Biol. (1986) 10_3: 2683-2696.

The putative extracellular domain has several sequence characteristics that correspond with the known properties of this proteoglycan. The ectodomain of syndecan is shed by cleavage from its membrane anchor, Jalkanen, M., Rapraeger, A., Saunders, S., and Bernfield, M. , J. Cell Biol. (1987) 10_5: 3087-3096, and an indistinguishable molecule is released from the cell surface by mild trypsin treatment, Jalkanen, M. , Rapraeger, A., Saunders, S., and Bernfield, M. , J. Cell Biol. (1987) 105: 3087-3096. The only dibasic sequence (Arg-Lys) in this extracellular domain is located adjacent to the putative transmembrane domain at residues 250-251 (identified in Figure 1 by arrows). This location places the cleavage site adjacent to the plasma membrane. The putative extracellular domain lacks cysteine thus eliminating disulfide bridges as a means of generating secondary structure in this moleucle. The ectodomain contains both heparan sulfate and chondroitin sulfate chains, Rapraeger, A., Jalkanen, M. , Endo, E., Koda, J., and Bernfield, M., J. Cell Biol. (1985b) 260: 11046-11052. The serine hydoxyl group of ser-gly sequences are the attachment sites for these glycosaminoglycan chains, Roden, L., The Biochemistry of Glycoproteins and Proteoglycans. 267-371 and Dorfman, A., Cell Biology of Extracellular Matrix 115-138. Syndecan possess five such potential glycosaminoglycan attachment sites, all within the putative extracellular domain; three such serines are clustered ar the N-terminus at residues 37, 45, and 47, and the remaining two are clustered near the membrane at residues 207 and 217 (open circles. Figure

1). The ectodomain from NMuMG cells is insensitive to digestion by N-glycosidase F, as assessed by PAGE, Weitzhandler, M., Streeter, H.B., Henzel, W.J., and Bernfield, M. , J. Biol. Chem. (1988) 2 : 6949-6952. The putative extracellular domain contains a single canonical sequence for the attachment of N-linked oligosaccharide (solid circle. Figure 1). The serine in this Asn-Xaa-Ser sequence is a putative glycosaminoglycan attachment site.

In all cases, syndecan or a molecule related to syndecan will be expressed when the DNA sequence encoding it is functionally inserted into a vector that is expressed in a eukaryotic cell containing an enzyme system capable of producing glycosaminoglycan chains. By "functionally inserted" is meant in proper reading frame and orientation, as is well understood by those skilled in the art. Expression of syndecan can be enhanced by including multiple copies of the syndecan gene in a transformed or transfected host, by selecting a vector known to reproduce in the host, thereby producing large quantities of protein from exogeneous inserted DNA, or by any other known means of enhancing peptide expression.

In addition to the above general procedures which can be used for preparing recombinant DNA molecules and transformed unicellular organisms in accordance with the practices of this invention, other known techniques and modifications thereof can be used in carrying out the practice of the invention. In particular, techniques relating to genetic engineering have recently undergone explosive growth and development. Many recent U.S. patents disclose plasmids, genetically engineering microorganisms, and methods of conducting genetic engineering which can be used in the practice of the present invention. For example, U.S. Patent 4,273,875 discloses a plasmid and a process of isolating the same. U.S. Patent 4,304,863 discloses a process for producing bacteria by genetic engineering in which a hybrid plasmid is constructed and used to transform a bacterial host. U.S. Patent 4,419,450 discloses a plasmid useful as a cloning vehicle in recombinant DNA work. U.S. Patent 4,362,867 discloses recombinant cDNA construction methods and hybrid nucleotides produced thereby which are useful in cloning processes. U.S. Patent 4,403,036 discloses genetic reagents for generating plasmids containing multiple copies of DNA segments. U.S. Patent 4,363,877 discloses recombinant DNA transfer vectors. U.S. Patent 4,356,270 discloses a recombinant DNA cloning vehicle and is a particularly useful disclosure for those with limited experience in the area of genetic engineering since it defines many of the terms used in genetic engineering and the basic processes used therein. U.S. Patent 4,336,336 discloses a fused gene and a method of making the same. U.S. Patent 4,349,629 discloses plasmid vectors and the production and use thereof. U.S. Patent 4,332,901 discloses a cloning vector useful in recombinant DNA. Although some of these patents are directed to the production of a particular gene product that is not within the scope of the present invention, the procedures described therein can easily be modified to the practice of the invention described in this specification by those skilled in the art of genetic engineering.

All of these patents as well as all other patents and other publications cited in this disclosure are herein individually incorporated by reference.

Manipulation of the expression vectors will in some case produce constructs which improve the expression of the polypeptide in eukaryotic cells or express syndecan in other hosts. Furthermore, by using the syndecan cDNA or a fragment thereof as a hybridization probe, structurally related genes found in other organisms can be easily cloned. These genes include those that code for related core proteins of proteoglycans from other species, especially mammals such as humans and other primates.

Particularly contemplated is the isolation of related genes from these and other organisms that express progeoglycans on their surfaces by using oligo- nucleotide probes based on the principal and variant nucleotide sequences disclosed herein. Such probes can be considerably shorter than the entire sequence but should be at least 14, preferably at least 20, nucleotides in length. Longer oligonucleotides are also useful, up to 30, 40, 50, 75, or 100 nucleotides and further up to the full length of the gene. Both RNA and DNA probes can be used. Such probes can also be used in diagnostic tests that detect the presence of genetic material of a predetermined sequence in samples, e.g., as in a polymerase chain reaction (PCR). In use, the probes are typically labelled in a detectable manner (e.g., with 32p, ³H, biotin, or avidin) and are incubated with single-stranded DNA or RNA from the organism in which a gene is being sought. Hybridization is detected by means of the label after single-stranded and double-stranded (hybridized) DNA (or DNA/RNA) have been separated (typically using nitrocellulose paper). Hybridization techniques suitable for use with oligonucleotides are well known.

Although probes are normally used with a detectable label that allows easy identification, unlabeled oligonucleotides are also useful, both as precursors of labeled probes and for use in methods that provide for direct detection of double-stranded DNA (or DNA/RNA). Accordingly, the term "oligonucleo¬ tide" refers to both labeled and unlabeled forms and not just to labeled probes.

Particularly preferred are oligonucleotides corresponding to the segments of the gene that code for glycosaminoglycan attachment sites. As discussed in the examples that follow, an oligonucleotide with high probability of success in the identification of other gene products is the 64-fold degenerate oligonucleotide of the form GANGGNTCTGGNGA, where N represents presence of all four nucleotides in degenerate sequences. The complementary oligonucleotide having the degenerate sequence TCNCCAGANCCNTC is also particularly useful and has the added advantage of ability to identify messenger RNA of these gene products in Northern analysis. The invention allows the production in large amounts of highly pure heparan sulfate proteoglycans that contain heparan sulfate chains that are characteristic of specific cell types. For example, the surface of endothelial cells is non-thrombogenic because of the anti-coagulant properties of the heparan sulfate chains in a proteoglycan on their surfaces. Preparations of this highly anti-coagulant heparan sulfate proteoglycan in soluble form is now possible by transfection of cultured endothelial cells with a DNA construct defined by this invention. Expression of the contruct would produce syndecan containing endothelial cell-derived heparan sulfate chains. Sydecan contains a unique protease-susceptible site adjacent to the plasma membrane, allowing the harvesting of this modified syndecan as a soluble product in high yield and purity. This approach would produce an anti¬ coagulant proteoglycan with very high potency, potentially several thousand times more potent than commercially available heparin. The soluble proteins or peptides containing cell-type-speteific heparan sulfate chains, made possible by this invention, can be used in the prevention and therapy of certain viral diseases. Dextran sulfate and heparin have been shown to reduce infection and replication of certain retroviruses, including human immunodeficiency virus (HIV). However, these molecules are highly heterogenous and are probably non-specific. A more specific inhibitor would be a soluble heparan sulfate peptide or proteoglycan derived from a cell type that interacts with the virus. Peptides derived from this invention can also be used as highly specific competitive inhibitors of heparan sulfate (or chrondroitin sulfate) chain initiation. Because mutant transformed cells with reduced cell-surface heparan sulfate are substantially less turmorigenic, this invention has the potential of producing anti-tumor drugs that are non-cytotoxic.

Production of the heparan sulfate proteoglycan defined by this invention will allow the manufacture of molecules that bind growth factors, especially those involved in angiogensis. These proteoglycans are of significant theraputic value in those instances where local growth factor effects would be useful. A DNA construct derived from this invention can be used in fibroblasts that contain surface proteoglycans that bind various growth factors, including acidic fibroblast growth factor (FGF) and basic FGF. This bonding potentiates the action and prevents the proteolytic degradation of these growth factors. Platelet-derived growth factor (PDGF) binds to heparin in vitro, and the syndecan DNA construct could be used to prepare large amounts of soluble PDGF binding proteoglycan.

The peptide sequences involved in heparan sulfate chain attachment identified by the present invention will allow production of large amounts of cell-type-specific heparan sulfate proteoglycans and enable this attachment site to be placed into other biological macromolecules that do not normally contain it, thereby providing products that are not otherwise available. These products will represent a singular molecular species, whereas the heparins and all other heparan sulfate proteoglycans heretofor described represent many molecular species. The greater uniformity afforded by the present invention leads to greater potency and potentially to greater specificity of the materials being purified, thereby enhancing their therapeutic applications. Accordingly, existing materials such as heparin from pig intestine or beef lung or dextran sulfate, a synthetic product, that are polydispersed, of low potency, and of little specificity, can be replaced by genetically engineered products of the present invention. Cell lines containing the genetic material necessary for the practice of the present invention can be obtained from a number of public sources, some of which are specifically identified in the following examples. For example, normal mouse mammary epithelial cells can be prepared from normal mouse tissue using the procedure described in the examples below. The same procedure can be used to obtain genetic material from other species.

The invention now being generally described, it will be more readily understood by reference to the following examples which are included for purposes of illustration only and are not intended to limit the invention unless so stated.

EXAMPLE 1

cDNA Libraries

NMuMG mouse mammary epithelial cells (passages 13-22) were maintained in bicarbonate-buffered Dulbecco's modified Eagle medium (Gibco) as described previously, David, G., and Bernfield, M., Proc. Natl. Acad. Sci. USA (1979) 7_6: 786-790. For prepartion of poly(A) RNA, cells were plated on 245 x 245 mm tissue culture plates (Nunc) at approximately one-fifth confluent density and grown to 80-90 percent confluency (3-4 days). Following brief washing with ice-cold PBS the cells were solubilized in RNA extraction buffer (4 M guanidine isothiocyanate in 5 mM sodium citrate pH 7.0, 0.1M β-mercaptoethanol and 0.5% N-lauryl sarcosine) and total RNA prepared by CsCl density centrifugation, Chirgwin, J.M., Pryzybyla, A.E., MacDonald, R.J., and Rutter, W.J., Biochemistry (1979) 18: 5194-5299. Poly(A) RNA was purified by chromatography on oligo(dT)-cellulose (type 3; Collaborative Research) and utilized in the commercial synthesis (Strategene) of cDNA by the SI method, Huynh, T.V., Young, R.A., and Davis, R.W., DNA Cloning: A

Practical Approach (1985) 49-78. Following addition of EcoRI linkers, those cDNA greater than 1 kb in length were isolated by gel filtration chromatography, inserted into the EcoRI sites of λ gt-10 and the expression vector λ gt-11 and packaged. A portion of the gt-11 library was amplified for later study, while the remainder was screened immediately without expansion.

A primer extension cDNA library was prepared using the RNase H method, Gubler, U., and Hoffman,

B.J., Gene (1983) 2JL^: 263-269. First strand cDNA was synthesized from 10 yg of an 18-bp oligonucleotide containing sequence derived from near the 5' end of PM- 4 (see Example 2). The second strand was synthesized using RNase H(BRL) and DNA polymerase Klenow fragment (Boehringer-Mannheim) . The cDNA was methylated with EcoRI methylase and then ligated with synthetic EcoRI linkers (New England Biolabs). Excess linkers were removed by EcoRI digestion and the cDNA was purified on agarose gel electrophoresis and recovered by electroelution. The resulting cDNA was inserted into λ gt-10 (Promega and packaged using Giga pack Gold (Stratagene) . EXAMPLE 2

Isolation of Syndecan cDNA Clones

The preparation of a rabbit serum antibody to the ectodomain of NMuMG syndecan has been described elsewhere, Jalkanen, M., Rapraeger, A., and Bernfield, M., J. Cell Biol. (1988) 106: 953-962. For screening clones in λ gt-11, the immunoserum was first absorbed against E. coli proteins to reduce background. Briefly, a 500 ml culture of E. coli strain Y1090 was grown to saturation in the presence of 50 yg/ml ampicillin. Following centrifugation, the cells were resuspended in 50 ml TBST (Tris buffered saline triton: 10 mM Tris pH 7, NaCl 150mM, Triton X-100 0.3%), sonicated, and following addition of 100 yl immunoserum (1:500 dilution), incubated overnight at 4 C. This mixture was centrifuged for 10 min at 4000 rpm and used to screen expressed λ gt-11 cDNA clones. Young, R.A., and Davis, R.W., Science (1983) 22 : 778-782, by detection with alkaline phosphate-conjugated goat-anti- rabbit IgG (Promega). Four antibody reactive clones were identified from 7.5 x 10⁵ recombinants and were plaque-purified. Northern and Southern hybridization experiments allowed grouping of these clones into three distinct sets of related clones. Two of these sets produced fusion proteins that reacted with immunoserum affinity-purified against the ectodomain of syndecan. A 2.1-kb clone from one of these sets, PM-4, was found to contain a sequence that exactly matched the partial amino acid sequence of a cyanogen bromide-cleaved fragment of the ectodomain of syndecan. Additionally, syndecan purified from NMuMG cells reacted with an immunserum prepared against a synthetic peptide containing the C-terminal 7 amino acids (Lys-Gln-Gln- Glu-Glu-Phe-Tyr-Ala) of the PM-4 derived protein sequence. This immunserum failed to react with the ectodomain which lacks the putative cytoplasmic domain. Furthermore, this serum does not cross react with any other cellular proteins as assessed by Western blotting of total cell extracts.

Additional screeing of the NMuMG λ gt-10 libraries was performed using radiolabeled fragments from the 5' end of PM-4 (250 bp EcoRI-HincII fragment). cDNA fragments isolated from SeaPlaque agarose (FMC BioProducts) were labeled with ³²P by random oligonucleotide priming, Feinberg, A.P., and Vogelstein, B., Addendum. Anal. Biochem. (1984) 137: 266-267, and used as described by Maniatis, T., Fritsch, E.F., and Sambrook, J., Molecular Cloning: A Laboratory Manual (1982). This screening yielded two clones, 4-19B and 4-15 (Figure 2). Additional screening of a primer-extended λ gt-10 cDNA library, prepared with liver poly(A) RNA and a synthetic oligonucleotide complimentary to a site near the 5' end of PM-4 (positions 848-865 in Table 1) was screened with the same 250 bp probe. Several independent clones were characterized from this library; each contained a 5' sequence identical with that of clone 4-19B. -

EXAMPLE 3

Subcloning and DNA Sequencing

Purified lambda DNA was prepared from positively selected clones by Lambdasorb immunoprecipitation (Promega). Fragments released by restriction endonuclease digestions were isolated by electrophoresis followed by excision from SeaPlaque agarose (FMC BioProducts). These isolated fragments were subcloned directly, in the presence of agarose, Struhl, K., BioTechniques (1985) 3_: 452-453, to either pGEM 3 and 4 for in vitro transcription, or M13 mpl8 and mpl9. Messing, J., Methods Enzymol. (1983) 101: 20- 78, for sequence analysis.

DNA sequencing was performed by the dideoxy chain termination method, Sanger, F., Nicklen, S., and Coulson, A.R., Proc. Natl. Acad. Sci. USA (1977) 74: 5463-5467, using a modified T7 DNA polymerase (Sequenase ™, U.S. Biochemical). The strategy is summarized in Figure 2. Sequence was generated from both ends of subcloned restriction fragments using universal M13 sequencing primers. The internal sequence of large fragments as well as the complementary strands of all fragments were determined using oligonucleotide primers synthesized in accordance with preceding sequences. Sequencing artifacts generated as the result of G-C compression were avoided by determining all sequences using both dGTP and the nucleotide analogue dITP. The cDNA (Figure 1) has the following features: The first AUG is at postion 240. This putative intiation codon is preceded by two inframe termination codons (TAA and TGA at positions 39 and 72 respectively) and followed by a 930 base open reading frame that ends at position 1173 with a TGA termination codon. Following the putative coding region are 1,243 bases of 3'-untranslated sequence that ends with the poly(A) stretch. Because each of the primer extended clones has the same 5' end as the largest cDNA clone from the NMuMG library, M-4-19B, this sequence appears to include the complete 5'-untranslated region of syndecan. Other features have been previously discussed.

EXAMPLE 4

Northern Blots

RNA for Northern analysis was prepared from the following: NMuMG cells, adult liver, newborn skin, mid-pregnant mammary gland, adult cerebrum, skeletal and cardiac muscle. Excised tissues were ground to a fine powder in the presence of liquid nitrogen and transferred directly to RNA exraction buffer (see above); the NMuMG cells were extracted after washing with PBS as described above. The samples were vigorously vortexed, an equal volume of lOmM Tris pH 8.0, ImM EDTA, and 1% SDS added, and subsequently extracted exhaustively with 24:24:1 Tris-saturated phenol:chloroform:isoamyl alcohol followed by a single extraction with 24:1 chloroform:isoamyl alcohol. Following precipitation with an equal volume of 2- propanol, and resuspension in lOmM Tris pH 7.5, ImM EDTA, RNA was precipitated by addition of 1/3 volume of 10 M LiCl. Poly(A) RNA was prepared by oligo d(T) chromatography as described above.

For Northern analysis, 2 yg of each poly(A) RNA sample was separated by electrophoresis in 1.2% agarose-formaldehyde gels in the presence of MOPS

(Sigma)- Acetate buffer pH 7.0, Maniatis, T., Fritsch, E.F., and Sambrook, J., Molecular Cloning. A Laboratory Manual (1982). Following alkali treatment, Danielsen, M. , Northrop, J.P., and Ringold, G.M., EMBO J__;_ (1986) 5_: 2513-2522, and neutralization in transfer buffer (0.025 M sodium phosphate pH 6.5), the gel was blotted to Gene Screen and the RNA immobilized by UV cross-linking. Church, G.M., and Gilbert, W. , Proc. Natl. Acad. Sci. USA (1984) 81: 1991-1995. Hybridization probes were prepared by in vitro transcription of the 5' EcoRI-SacI fragment of PM-4 subcloned into pGEM3, Melton, D.A., Krieg, P.A., Rebagliati, M.R., Maniatis, T., Zinn, K., and Green, M.R., Nucl. Acids Res. (1984) 12: 7035-7056. Blots were prehybridized at 61°C in 50% formamide, 1% SDS, 5X SSPE, 0.1% ficoll, 0.1% polyvinylpyrrolidone and 100 yg/ml denatured salmon sperm DNA. Hybridization was for 16 hrs at 61°C in the same buffer containing 5 x 106 cpm/ml of RNA probe. Filters were washed 2 x 15 min at room temperature in 5% SDS/IX SSPE and 6 x 30 min at 67°C in 1% SDS/0.1X SSPE. Molecular sizes were determined relative to ethidium bromide stained molecular weight markers (BRL) and 18S and 28S riboso al RNA.

Northern blot analysis of the poly(A) RNA preparations revels two mRNA bands in NMuMG cells as well as in skin, liver and mammary gland tissues; one band is at 2.6 and the other at 3.4kb. The apparent lower level of expression found in midpregnant mammary gland, as compared with skin and liver, consistent with the relative paucity of epithelial cells in the mammary gland. Longer exposures of the Northern blot discussed above, as well as others containing larger quantities of poly(A) RNA, verify that the mammary gland expresses both the 2.6 and the 3.4 kb messages (data not shown). Scanning densitometry shows that these two messages are present at a nearly constant relative abundance of 3:1 (2.6kb:3.4kb) in NMuMG cells and in skin, liver, and mammary gland tissues (data not shown). As expected from the immunohistology, neither of these mRNAs were present in detectable amounts in cerebrum and striated muscle tissues (skeletal and cardiac). However, Northern analysis consistently detected a distinct 4.5kb mRNA in the cerebrum. The relationship of this message to that of syndecan is currently not known.

EXAMPLE 5

Preparation and Use of Antibodies to Synthetic Peptides

A seven amino acid (14C-labeled) synthetic peptide, corresponding to the predicted C-terminus of syndecan (Figure 1) was prepared by direct synthesis. The N-terminal lysine of this peptide was cross-linked by glutaraldehyde to keyhole limpet hemocyanin (KLH, Calbiochem) for immunization and bovine serum albumin (BSA, Fraction V, Sigma) for screening as described by Doolittle, R.F., Of URFS and ORFS: A Primer on How to Analyze Derived Amino Acid Sequences (1986) 85. Briefly, 10 mg carrier protein was dissolved in 0.5 ml of 0.4 M phosphate, pH 7.5, mixed with 7.5 ymoles of peptide in 1.5ml water and 1.0 ml of 20 mM glutaraldehyde was added dropwise with stirring over the course of 5 min. After continuous stirring at room temperature for 30 min., 0.25 ml of 1 M glycine was added to block unreacted glutaraldehyde and the stirring resumed for an additional 30 min. The product was dialyzed exhaustively against phosphate-buffered saline and incorporation determined by TCA precipitation and liquid scintillation counting. This procedure resulted in the attachment of 17 moles of synthetic peptide per mole of carrier protein. For immunization, 1.25 mg of synthetic peptide-KLH conjugate in 0.5 ml PBS pH 7.5 mixed with 0.5 ml complete Freunds adjuvant. The emulsion was delivered by intramuscular injections, 0.1 ml in each of ten sites, into 3 month old New Zealand white rabbit. After 2 weeks, the immunization was repeated with an identical quantity of immunogen. 10 days later, the rabbit was injected with Innovar 0.125 ml/kg subcutaneously and was bled from the central auricular artery. Innovar was reversed with Nalline 0.2 ml/kg, and serum was prepared from the collected blood. The native lipophilic form of syndecan and the nonlipophilic medium ectodomain form, Jalkanen, M. , Rapraeger, A., Saunders, S., and Bernfield, M., J. Cell Biol. (1987) 105: 3087-3096, were isolated and purified as described elsewhere and assessed for their reactivity to the immune sera. A cationic nylon membrane, Gene-Trans (Plasco Inc., Woburn, MA), was placed into an immunodot apparatus (V&P Scientific, San Diego, CA) and, samples of intact syndecan and the ectodomain (0.5, 5, 50 and 500 ng) were loaded on the membrane using mild vacuum. After loading, remaining binding sites on the membrane were blocked by 1 hr incubation in a solution containing 0.5% BSA, 3% Carnation instant nonfat dry milk, 10 nM Tris (Sigma) pH 8.0, 0.15 M NaCl and 0.3% Tween-20. Incubation with immune serum was performed at dilutions of 1:200 for the anti-cytoplasmic domain, and 1:500 for the anti- ectodomain in 10 mM Tris pH 7.4, 0.15 M NaCl, and 0.3% Tween-20 (TBST) for 30 min at room temperature. The membrane was washed for 60 min at room temperture with ten changes of TBST and then incubated for 30 min with 1:7500 dilution of alkaline phosphatase goat-anti- rabbit IgG (Promega, Madison WI). Following washing for 60 min with ten changes of TBST, the immobilized alkaline phosphatase was visualized with nitro blue tetrazolium (NBT) 330 yg/ml and 5-bromo-4-chloro-3- indolyl phosphate (BCIP) 165 yg/ml in lOOmM Tris pH 9.5, 100 mM NaCl, and 5 mM MgCl₂.

EXAMPLE 6

DNA construct for the expression of syndecan core protein in mammalian cells

Syndecan can be expressed within mammalian cells by transfection of a DNA contruct containing the syndecan core protein cDNA linked to a eukaryotic promoter that has the properties of both high-level expression and activity in a wide range of cell types. For example, the expression vector pHβ APr-1- neo has been described (Gunning et al., PNAS 84:4831- 4835) which utilizes the human β-actin promoter and fullfills both of the above requirements. This vector also contains the neomycin-resistance gene which allows selection of transfected cells with the antibiotic G- 418. A Sacl-Hindlll fragment of the syndecan cDNA

(nucleotides 214-1379 of the sequence shown in Figure 1) which encompasses all of the coding region was inserted directionally between the Sall-BamHI sites of the pHβ APr-1-neo vector and thus named pβ-SSyn-neo. In order to generate the necessary restriction sites on the 5' and 3' ends of the syndecan cDNA fragment for insertion into this vector, this fragment was passed sequentially through pGEM 3Z (Promega), pGEM 7Zf (Promega), and Bluescript (Stratagene) . Thus the resulting configuration of restriction sites at the point of insertion in pHβ APr-1-neo is as follows: Sall-Clal-Hindlll-EcoRV-EcoRI-SacI-syndecan cDNA fragment-Hindlll-BamHI.

This DNA construct was transformed into the bacterial strain TG-1 and prepared in large scale using routine plasmid preparation techniques including CsCl₂ density centrifugation. The purified circularized plasmid DNA was transfected into Chinese Hamster Ovary (CHO) cells by standard calcium phosphate precipitation technique, and transfected clones were selected with G418. Although the parental CHO (hamster) cells express mRNA which is cross-reactive with the murine syndecan cDNA, neither whole cells nor proteoglycan purified from these cells is reactive with the monoclonal antibody 281-2, a rat monoclonal antibody generated against murine syndecan. Therefore it has been possible to assess the function of the transfected murine syndecan gene using this antibody. By both quantitative radioimmunoassay and Western blotting, we have confirmed that clones of the transfected CHO cells express murine syndecan at levels about 1/3 that expressed endogenously by NMuMG mouse mammary epithelial cells, the murine cell line which to date has demonstrated the highest natural levels of expression. Furthermore, a quantitatively higher level of murine syndecan is actually accumulated in the culture media of these CHO cells versus the NMuMG cells, suggesting that the absolute rate of synthesis from the transfected gene is probably in excesses of even the highest natural levels in murine cells.

Example 7

DNA construct for blocking expression of syndecan core protein in mammalian cells

We have constructed anti-sense cDNA vectors analogous to the sense constructs described above for the purposes of blocking syndecan expression in mammalian cells. Anti-sense RNA produced from vectors of this type, if expressed in sufficiently high levels, is capable of binding to endogenous message intracellularly and blocking its subsequent translation.

To construct this vector, the same coding region Sacl-Hindll fragment of syndecan described above was inserted into the BamHI-Hindlll site of the pHβ Apr-1-neo vector to produce the vector pβ-ASyn-neo. In this application, however, the cDNA was inserted into the vector in the opposite orientation so as to produce mRNA from the transfected gene that is complementary to endogenous syndecan mRNA. To generate the appropriate restriction sites on the 5' and 3' ends of the syndecan cDNA for insertion into this site, this fragment was sequencially passed through pGEM 3Z (Promega) and Bluescript (Stratagene) . Thus, the resulting configuration of restriction sites at the point of insertion in pHβ APr-neo vector is as follows: Hindlll-syndecan cDNA fragment-Scal-EcoRI-Pstl-Smal- BamHI.

Upon transfection of this construct into NMuMG cells by calcium phosphate precipitation and selection with G418, we have observed two distinct morphological changes in these cells which appear to correlate with a reduction in the level of syndecan expression. These morphological changes include a change from the normal cobblestone appearance of the epithelial monolayer to a fibroblastic and to a neoplastic morphology and cell behaviors.

EXAMPLE 8

Identification of related molecules with degenerate oligonuceotides

While in principle any degenerate oligonucleotide corresponding to the murine syndecan gene product has a potential usefulness in the identification of related biological molecules, some oligonucleotide sequences have higher value. In studying the three putative glycosaminoglycan attachment sites in Syndecan of the consensus sequence D/E-X-S-G-D/E, we have observed that two of these sites have a conserved G in the X position, and that furthermore all five glycosaminoglycan attachment sites in syndecan utilize a single codon, TCT, of the six possible codons for the serine residue. Therefore, we expect that the 64 fold degenerate oligonucleotide of the form GAN GGN TCT GGN GA (where N is all four nucleotides) should statistically have the highest probability of success in the identification of other gene products which contain this putative signal for glycosaminoglycan attachment. Similarily, the complementary oligonucleotide of the form TCN CCA GAN CCN TC should have similar utility, with the added advantage of its ability to identify the messenger RNA of these gene products in Northern analysis.

All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.

Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.

Claims

WHAT IS CLAIMED IS:

1. A purified mammalian peptide having a molecular weight of from about 31 kD to about 35 kD and comprising an amino terminus hydrophilic extracellular region, a carboxy terminus hydrophilic cytoplasmic region, and a transmembrane hydrophobic region between said cytoplasmic and extracellular regions, a dibasic sequence extracellularly adjacent the transmembrane region of the peptide, and at least one glycosylation sites in the extracellular region including an Xac-Xaa- Ser-Gly-Xac sequence, wherein Xac is an acidic amino acid and Xaa is any amino acid.

2. The peptide of Claim 1, wherein said proteoglycan is obtainable from a mammal selected from the group consisting of humans, mice, rats, and hamsters.

3. The peptide of Claim 2, wherein said mammal is a mouse.

4. The peptide of Claim 3, wherein said proteoglycan is syndecan.

5. The peptide of Claim 1, wherein said peptide is gylcosylated at said glycosylation site.

6. The peptide of Claim 1, wherein said peptide is selected from

(1) compounds of

(a) a first formula: M-R- ^■R-A-A-L- W-L-W- L-C-A- L-A-L- R-L-Q- P-A- L-P- ^■Q-I-V-A- V-N-V- P-P-E- D-Q-D- G-S-G- D-D- S-D- -N-F-S-G- ^■S-G-T- G-A-L- p-D-T- ^•L-S-R- Q-T- P-S- ^■T-W-K-D- V-W-L- L-T-A- T-P-T- A-P-E- P-T- S-S- ^■N-T-E-T- ^•A-F-T- S-V-L- P-A-G- E-K-P- ^■E-E- G-E- -P-V-L-H- V-E-A- E-P-G- ^•F-T-A- R-D-K- E-K- E-V- -T-T-R-P- ^•R-E-T- V-Q-L- ^•P-I-T- ^■Q-R-A- S-T- V-R- -V-T-T-A- -Q-A-A- ^•V-T-S- ^■H-P-H- ^•G-G-M- Q-P- G-L- -H-E-T-S- ^•A-P-T- ^■A-P-G- ^•Q-P-D- ^•H-Q-P- P-R- V-E- -G-G-G-T- ^■S-V-I- ^■K-E-V- ^■V-E-D- ^•G-T-A- ^•N-Q- L-P- -A-G-E-G- ^•S-G-E- ^•Q-D-F- ^•T-F-E- ^•T-S-G- ^■E-N- T-A -V-A-A-V- ^■E-P-G- ^■L-R-N- •Q-P-p- ^■V-D-E- G-A- T-G' -A-S-Q-S- ^■L-L-D- ^■R-K-E- ^•V-L-G- ^■G-V-I- A-G- G-L- -V-G-L-I- ^■F-A-V- ^■C-L-V- ^•A-F-M- ^■L-Y-R- M-K- K-K' -D-E-G-S- ^■Y-S-L- ^■E-E-P- ^■K-Q-A- ^•N-G-G- ^•A-Y- Q-K -P-T-K-Q- -E-E-F- ^•Y-A

wherein A is alanine, C is cysteine, D is aspartate, E is glutamate, F is phenylalanine, G is glycine, H is histidine, I is isoleucine, K is lysine, L is leucine, M is methionine, N is asparagine, P is proline, Q is glutamine, R is arginine, S is serine, T is threonine, V is valine, W is tryptophan, and Y is tyrosine, (b) a second formula in which at least one amino acid in said first formula is replaced by a different amino acid, with the proviso that no more than 10 replacements take place.

(c) a third formula in which from 1 to 15 amino acids are absent from either the amino terminal, the carboxy terminal, or both terminals of said first formula or said second formula, or

(d) a fourth formula in which from 1 to 10 additional amino acids are attached sequentially to the amino terminal, carboxy terminal, or both terminals of said first formula or said second formula and

(2) salts of compounds having said formulas.

7. A purified DNA or RNA molecule, which comprises a nucleotide sequence coding for a peptide of Claim 1.

8. The molecule of Claim 7, wherein said sequence comprises a segment at least 14 nucleotides in length that is homologous to a segment of approximately said length in a DNA sequence

9. The molecule of Claim 7, wherein said sequence is followed by a termination codon.

10. The molecule of Claim 7, wherein said sequence is preceded by a promoter.

11. A recombinant DNA vector, wherein said vector is capable of replicating in a microorganism or being expressed in eukaryotic cell and said vector comprises a nucleotide sequence coding for a peptide of Claim 1.

12. The vector of Claim 11, wherein said sequence comprises a segment at least 14 nucleotides in length that is homologous to a segment of approximately said length in a DNA sequence

ATGAGACGCGCGGCGCTCTGGCTCTGGCTCTGCGCGCTGGCGCTGCGCCTGCAGCCTGCC CTCCCGCAAATTCTGGCTGTAAATGTTCCTCCTGAAGATCAGGATGGCTCTGGGGATGAC TCTGACAACTTCTCTGGCTCTGGCACAGGTGCTTTGCCAGATACTTTGTCACGGCAGACA CCTTCCACTTGGAAGGACGTGTGGCTGTTGACAGCCACGCCCACAGCTCCAGAGCCCACC AGCAGCAACACCGAGACTGCTTTTACCTCTGTCCTGCCAGCCGGAGAGAAGCCCGAGGAG GGAGAGCCTGTGCTCCATGTAGAAGCAGAGCCTGGCTTCACTGCTCGGGACAAGGAAAAG GAGGTCACCACCAGGCCCAGGGAGACCGTGCAGCTCCCCATCACCCAACGGGCCTCAACA GTCAGAGTCACCACAGCCCAGGCAGCTGTCACATCTCATCCGCACGGGGGCATGCAACCT GGCCTCCATGAGACCTCGGCTCCCACAGCACCTGGTCAACCTGACCATCAGCCTCCACGT GTGGAGGGTGGCGGCACTTCTGTCATCAAAGAGGTTGTCGAGGATGGAACTGCCAATCAG CTTCCCGCAGGAGAGGGCTCTGGAGAACAAGACTTCACCTTTGAAACATCTGGGGAGAAC ACAGCTGTGGCTGCCGTAGAGCCCGGCCTGCGGAATCAGCCCCCGGTGGACGAAGGAGCC ACAGGTGCTTCTCAGAGCCTTTTGGACAGGAAGGAAGTGCTGGGAGGTGTCATTGCCGGA GGCCTAGTGGGCCTCATCTTTGCTGTGTGCCTGGTGGCTTTCATGCTGTACCGGATGAAG AAGAAGGACGAAGGCAGCTACTCCTTGGAGGAGCCCAAACAAGCCAATGGCGGTGCCTAC CAGAAACCCACCAAGCAGGAGGAGTTCTACGCC.

13. A genetically engineered cell, wherein said cell comprises a microorganism or eukaryotic cell containing exogenous genetic information encoding a peptide of Claim 1.

14. An isolated oligonucleotide, comprising at least 14 sequential nucleotides selected from nucleotide sequences that code for an amino acid sequence M-R- R-A-A- L-W- L-W-L- ^■C-A-L- A-L-R- L-Q-P-A- L-P- Q-I-V- A-V- N-V-P- ^■P-E-D- Q-D-G- S-G-D-D- S-D- N-F-S- G-S- G-T-G- ^■A-L-P- ^•D-T-L- S-R-Q-T- P-S- T-W-K- D-V- W-L-L- ^■T-A-T- ^•P-T-A- P-E-P-T- s-s- N-T-E- T-A- F-T-S- ^■V-L-P- ^■A-G-E- ^•K-P-E-E-

G-E- ^•P-V-L- H-V- E-A-E- -P-G-F- ^■T-A-R- ^•D-K-E-K- E-V- ^■T-T-R- ^•P-R- ^•E-T-V- -Q-L-P- ^•I-T-Q- ^■R-A-S-T- V-R- ^■V-T-T- ^•A-Q- ^•A-A-V- ^■T-S-H- ^■P-H-G- ^■G-M-Q-P- G-L- ^■H-E-T- ^•S-A- ^■P-T-A- -P-G-Q- ^■P-D-H- ^■Q-P-P-R- V-E- ^•G-G-G- ^■T-S- ^■V-I-K- -E-V-V- ^■E-D-G- ^■T-A-N-Q- L-P- ^•A-G-E- ^■G-S- ^•G-E-Q' -D-F-T- -F-E-T- -S-G-E-N- T-A- ^•V-A-A- ^•V-E- ^■P-G-L- -R-N-Q- ^■P-P-V- ^•D-E-G-A- T-G- ^•A-S-Q- -S-L- ^•L-D-R' -K-E-V- -L-G-G- ^•V-I-A-G- G-L- -V-G-L- -I-F- ^■A-V-C -L-V-A- -F-M-L- -Y-R-M-K- K-K- -D-E-G- -S-Y- -S-L-E -E-P-K- -Q-A-N- -G-G-A-Y- Q-K- -P-T-K- -Q-E- -E-F-Y -A

wherein A is alanine, C is cysteine, D is aspartate, E is glutamate, F is phenylalanine, G is glycine, H is histidine, I is isoleucine, K is lysine, L is leucine, M is methionine, N is asparagine, P is proline, Q is glutamine, R is arginine, S is serine, T is threonine, V is valine, W is tryptophan, and Y is tyrosine.

15. The oligonucleotide of Claim 14, wherein said oligonucleotide is DNA.

16. The oligonucleotide of Claim 14, wherein said oligonucleotide is RNA.

17. The oligonucleotide of Claim 14, wherein said oligonucleotide is radioactivity labeled.

18. The oligonucleotide of Claim 14, wherein said oligonucleotide comprises at least 20 sequential nucleotides.

19. The oligonucleotide of Claim 14, wherein said sequential nucleotides include a sequence GAXGGXTCTGGXGA or TCXCCAGAXCCXTC, where X is any nucleotide.

20. The oligonucleotide of Claim 14, wherein said nucleotide sequences comprise a first DNA sequence of formula

ATGAGACGCGCGGCGCTCTGGCTCTGGCTCTGCGCGCTGGCGCTGCGCCTGCAGCCTGCC CTCCCGCAAATTGTGGCTGTAAATGTTCCTCCTGAAGATCAGGATGGCTCTGGGGATGAC TCTGACAACTTCTCTGGCTCTGGCACAGGTGCTTTGCCAGATACTTTGTCACGGCAGACA CCTTCCACTTGGAAGGACGTGTGGCTGTTGACAGCCACGCCCACAGCTCCAGAGCCCACC AGCAGCAACACCGAGACTGCTTTTACCTCTGTCCTGCCAGCCGGAGAGAAGCCCGAGGAG GGAGAGCCTGTGCTCCATGTAGAAGCAGAGCCTGGCTTCACTGCTCGGGACAAGGAAAAG GAGGTCACCACCAGGCCCAGGGAGACCGTGCAGCTCCCCATCACCCAACGGGCCTCAACA GTCAGAGTCACCACAGCCCAGGCAGCTGTCACATCTCATCCGCACGGGGGCATGCAACCT GGCCTCCATGAGACCTCGGCTCCCACAGCACCTGGTCAACCTGACCATCAGCCTCCACGT GTGGAGGGTGGCGGCACTTCTGTCATCAAAGAGGTTGTCGAGGATGGAACTGCCAATCAG CTTCCCGCAGGAGAGGGCTCTGGAGAACAAGACTTCACCTTTGAAACATCTGGGGAGAAC ACAGCTGTGGCTGCCGTAGAGCCCGGCCTGCGGAATCAGCCCCCGGTGGACGAAGGAGCC ACAGGTGCTTCTCAGAGCCTTTTGGACAGGAAGGAAGTGCTGGGAGGTGTCATTGCCGGA GGCCTAGTGGGCCTCATCTTTGCTGTGTGCCTGGTGGCTTTCATGCTGTACCGGATGAAG AAGAAGGACGAAGGCAGCTACTCCTTGGAGGAGCCCAAACAAGCCAATGGCGGTGCCTAC CAGAAACCCACCAAGCAGGAGGAGTTCTACGCC

a second DNA sequence complementary to said first DNA sequence, or a RNA sequence corresponding to said first or second DNA sequence.