CA1157406A

CA1157406A - Purification of nucleotide sequences suitable for expression in bacteria

Info

Publication number: CA1157406A
Application number: CA000303860A
Authority: CA
Inventors: Howard M. Goodman; John Shine; Peter H. Seeburg
Original assignee: University of California
Current assignee: University of California
Priority date: 1977-09-23
Filing date: 1978-05-23
Publication date: 1983-11-22
Also published as: US4363877B1; US4363877A

Abstract

ABSTRACT OF THE DISCLOSURE
A purification procedure for cDNA of desired nucleotide sequence complementary to an individual mRNA
species is disclosed wherein restriction endonuclease cleavage of cDNA transcribed from a complex mixture of mRNA is utilized. In this way, extensive purification of RNA is not required but rather the method makes use of transcription of RNA into cDNA, the sequence specific fragmentation of this cDNA with one or two restriction endonucleases and fractionation of the cDNA restriction fragments on the basis of their length.

Description

o ~

Proteins and peptides are synthesized in almost endless variety by living organisms. Many have proven to have medical, agrlcultural or industrial utility. Some proteins are enzymes, useful as specific catalysts for complex chemical reactions. Others function as hormones, which act to affect the growth or development of an organism or to affect the function of specific tissues in medically significa~t ~ays.
Specific binding proteins may have commercial significance for the isolation and purification of trace substances and for the removal of contaminating substances. Both proteins and peptides are composed of linear chains of amino aclds, the latter term being applied to short, single-chain s~quences, the former referring to long chain and multi-chain substances. The principles of th~ present invention apply equally to both proteins and peptides.
Proteins and peptides are generally hi~h molecular weight substances, each having a specific sequence of amino acids~ Except for the smaller peptides, chemical synthesis of peptides and proteins is frequently impractical~ costly and time consuming, if not impossible. In the majority of instances, in order to make practical use of a desired protein, it must first be isolated from the organism which makes it. Frequently, the desired protein is present only in miniscule amounts. Often, the source organism cannot be obtained in quantities sufficient to provide an adequate amount of the desired protein. Consequently, many potential agricultural, industrial and medical applications for ~ 1~7~
s~eciEic proteins are known, but remain undeve1Ope~ simply because an adc~uate supply of the desired protein or peptide does not exist.
; Recently developed techniques have made it possible to employ microorganisms, capable of rapid and abundant growth, for the synthesis lof commercially useful proteins and peptides, regardless of their source in nature. These techniques make it possible to genetically endow a suitable microorganism with the ability to synthesi~e a protein or peptide normally made by another organism. The technique makes use of a fundamental relationship ~hich e~ists in all living organisms between the genetic material, usually DNA, and the proteins synthesized by the iorganism. This relationship is such that the amino acid sequence of the ;Iprotein is reflected in the nucleotide sequence o~ the D~A. There are ,¦one or more trinucleotide sequence groups specifically related to each ' of the twenty amino acids most commonly occuring in proteins. The specific i relationship between each given trinucleotide sequence and its corres-ponding amino acid constitutes the genetic code. The genetic code is ~ believed to be the same or similar for all living organisms. As a ;'consequence, the amino acid sequence of every protein or peptide is ~, llreflected by a corresponding nucleotide sequence, according to a well l understood relationship. Furthermore, this sequence oE nucleotides ~~ can, in principle, be translated by any living organism.
1, I¦ Genetic Code ! Phenylalanine(Phe) TTK ~istidine(His) CA~
2S ! Leucine(Leu) XTY Glutamine(Gln) CAJ
Isoleucine(Ile) ATM Asparagine(Asn) AAK
Methionine(~let) ATG Lysine(Lys) AAJ
Valine(Val) GTL Aspartic acid(Asp) GAK
Serine(Ser) QRS Glutamic acid(Glu) GAJ
Proline(Pro) CCL Cysteine(Cys) TGK
Threonine(Thr) ACL Tryptophan(Try) TGG
Alanine(Ala) GCL Arginine(Arg) WGZ
Tyrosine(Tyr) T~C Glycine(Gly) GGL
Termination signal TAJ
Termination signal TGA
Key: Each 3 letter triplet represents a trinucleotide of mP~A, having a 5' end on the left and a 3' end on the right. The letters stand for the p~lrine or pyrimidine bases forming the nucleotide sequence~

2 !

i ~57~V~
.
= adenine J = A or C
G = guanine ~ = T or C
C = cytosine L = A, T, C or G
T = thymine M = ~, C or T
! ~ = T or C if Y is A or G
X = C if Y is C or T
Y = A, C, C or T if ~ is C
Y = ~ or G if X is T
! W = C or A if Z is A or G
1 ~ = C if Z is C or T
Z = A, C, C or T~if W is C
' Z = A or G if W is A
I QR = TC if S is A, G, C or T
, QR = AG if S is T or C
S = A, G, C or T if QR is TC
S = T or C if QR is AG
The trinucleotides of Table 1, termed codons, are presented as DNA
trinucleotides, as they exist in the genetic material of a living organism.
~ Expression of these codons in protein synthesis requires that intermediate formation of messenger RNA (m~NA), as described more fully, infra. The mRNA codons have the same sequences as the DNA codons of Table I, except that uracil is found in place of thymine. Complementary trinuc]eotide DNA sequences having opposite strand polarity are functionally equivalent to the codons of Table 1, as is understood in the art. An important and well known feature of the genetic code is its redundancy, whereby, for 1, most of the amino acids used to make proteins, more than one coding , nucleotide triplet may be employed. Therefore, a number of clifferent I nucleotide sequences may code for a given amino acid sequence. Such nucleotide sequences are considered functionally equivalent since they can result in the production of the same amino acid sequence in all i! 1 j! organisms, although certain strains may translate some sequences more I efficiently than they do others. Occasionally, a methylated variant of ¦~ a purine or pyrimidine may be found in a given nucleotide sequence. Such ¦j methyLations do not affect the coding relationship in any way.
In its basic outline, a method oE endowing a microorganism with , the ability to synthesize a new protein involves three general steps:
, (1) isolation and purification of the specific gene or nucleotide sequence I containing the genetically coded information for the amino acid sequence l of the desired protein, (2) recombination of the isolated nucleotide I sequence with an appropriate transfer vector, typically the DNA of a - 2A ~

1 1~'7~V~

bactoriopll.lge or plasmid, and (3) trans~er of the vector to the appro-priatc microorganism an~ selection of a strain of the recipient micro--organism containing the desired genetic information.
A fundamental difficulty encountered in attempts to exploit com~ercially ~ the above-described general process lies in the first step, the isolation and purification of the desired specific genetic information. DNA
exists in all living cells in the form of extremely high molecular weight chains of nucleotides. A cell may contain more than 10,000 structural genes, coding for the amino acid sequences of over 10,000 specific proteins~ each gene having a sequence many hundreds of nucleo-tides in length. For the most par~, four different nucleotide bases ma~e up all the e~isting sequences. These are adenire (A), guanine (G), cytosine (C), and thymine (T). The long sequences comprising the structural genes of specific proteins are consequently very similar in li overall chemical composition and physical properties. The separation of one such sequence from the plethora of other sequences present in isolated DN~ cannot ordinarily be accomplished by conventional physical and chemical preparative methods.
Two general methods have been used in the prior art to accomplish ', step (1) in the above-described general procedure. The first method is sometimes referred to as the shotgun technique. The DN~ of an organism is fragmented into segments generally longer than the desired nucleotide sequence. Step (1) of the above-described process is essentially ¦ by-passed. The DNA fragments are immediately recombined with the Ij desired vector, without prior purification of specific sequences.
Optionally, a crude fractionation step may be interposed. The l selection techniques of microbial genetics are relied upon to select, i from among all the possibilities, a strain of mlcroorganism containing l the desired genetic inEormation. The shotgun procedure suffers from two ¦ major disadvantages. Most importantly, the yrocedure can result in the ! transfer of hundreds of unknown genes into recipient microorganisms, so that during the experiment, new strains are created, having unknown genetic capabilities. Therefore, the use of such a procedure could ~ 3 ~

I ~'7~
, crea~e a hazard ~or laboratorv workers and ~or the environment. A
second disadvantage of the shotgun method is that it is extremely in-efficient ~or the production of the desired strain, and is dependent upon the use of a selection technique having sufficient resolution to compensate ~or the lack of fractionation in the first step.
The second general method takes advantage of the fact that the total genetic information in a cell is seldom, if ever, expressed at any given time. In particular, the differentiated tissues of higher organisms may be synthesizing only a minor proportion of the proteins which the organism is capable of making. In extre~e cases, such cells may be synthesizing pre~ominantly one protein. In such e~treme cases, it has been possible to isolate the nucleotide sequence coding for the protein in question by isolating the corresponding messenger RNA from the I appropriate cells.

~lessenger ~NA functions in the process of converting the nucleotide sequence information of DNA into the amino acid sequence structure of a protein. In the first step of this process, termed transcription, a local segment of DNA having a nucleotide sequence ~hich specifies a ' protein to be made, is first copied into ~A. RNA is a polynucleotide , similar to DNA except tha~ ribose is substituted for deoxyribose and uracil is used in place of thymine. The nucleotide bases in RNA are capable of entering into the same kind of base pairing relationships ; 5~ J5 ~ that are known to exlst between the complementary ~ of DNA.

3-L~ A and U tT) are complementary~ and G and C are complementary. The RNA

1l transcript of a DNA nucleot;de sequence will be complementary to the Ij copied sequence. Such RNA is termed messenger RNA (mR~A) because of its ,¦ status as intermediary between the genetic apparatus of the cell and its protein synthesizing apparatus. Generally, the only mRNA sequences Il present in the cell at any given time are those which correspond to 11 proteins being actively synthesized at that time. Therefore, a differ-entiated cell whose function is devotcd primarily to the synthesis of a single protein will contain primarily the RNA species corresponding to that protein. In those instances where it is feasible, the isolation ,, . ~ . ~ , . . . .

4 ~ ~
and puri~ication of ttle appropri.l~e nucleo~ide se~uence coding for a given proLein can be accomplisll~d by taking advantage of the specialized synthesis of such protein in differentiated cells.
A major disadvantage of the foregoing procedure is that it is applicable only in the relatively rare instances where cells can be found engaged in synthesizing primarily a single protein. The majority of proteins of commercial interest are not synthesized in such a special-. , .
ized way. The desired proteins may be one of a hundred or so different proteins being produced by the cells of a tissue or organism at a given time. Nevertheless, the mRNA isolation technique is potentially useful ; since the set of ~`~A species present in the cell usually represents only a fraction of the total sequences e~isting in the DNA, and thus provides , an initial purification. In order to take advantage of such purification, I
~ however, a method is needed whereby sequences present in low frequencies, lS such as a few percent, can be isolated in high purity.
The present invention provides a process whereby nucleotide sequences can be isolated and purified even when present at a frequency as low as 2% of a heterogeneous population of m~A sequences. Furthermore, the ~ method may be combined with known methods of fractionating mRNA to isolate and purify sequences present in even lower frequency in the ,. i total RNA population as initially isolated. The method is generally applicable to mRNA species extracted from virtually any organism ' and is therefore expected to provide a powerful basic tool for the - 'I ultimate production of proteins of commercial and research interest 9 1 ¦ in useful quantities.
Human growth hormone has medical utility in the treatment of defective pituitary function. Animal growth honnones have commercial utility in veterinary medicine and in agriculture> particularly in the case of ~ animals used as food sources, where large size and rapid maturation are , desirable attributes. Human chorionic somatomammotropin is of medical l significance because of its role in the 'etal maturation process.

~ ~S- : I

I ~57~LO~

The process of the present invention takes advantage of certain structural Eeatures oE n~A and DNA, and makes use of certain enzyme ,~ catalyzed reactions. The nature of these reactions and structural details as they are understood in the prior art are described herewith.
The symbols and abbreviations used herein are set forth in the following table~ b~ 2~
;~ DNA - deoxyribonucleic acid A - Adenine 3~~ P~A - ribonucleic acid T - Thymine cDNA - complementary DNA G - Guanine (enzymatically synthesi~ed C - Cytosine from an mRNA sequence) U - Uracil m~A - messenger RNA Tris - 2-A~ino-~-hydroxyethyl-dATP - deoxyadenosine triphosphate 1-1,3-propanediol i dGTP - deoxyguanosine triphosphate EDTA - ethylenediamine , dCTP - deoxycytidine triphosphate tetraacetic acid HCS - Human Chorionic Somatomammo- ATP - adenosine triphosphate tropin dTTP - thymidine triphosphate TCA - Trichloroacetic acid RGH - Rat growth hormone ~ HGH - Human Growth Hormone In its native configuration, DNA exists in the form cf paired linear polynucleotide strands. The complementary base pairing relationships described above exist between the paired strands such that each nucleotide base of one strand exists opposite its complement on the other strand.
The entire sequence o One strand i5 mirrored by a c mp1ementary sequence ii !1l `
... _ : . .

1 ~ 5t~
Ol~ t~e otl~r strall~l. LE the strancls are separated, it is possible to synth~si~:e a ll~w partner str~nd, starting ~rom the appropriate precursor monomers. The sequence of addLtion of the monomers starting from one end is determined by, and complementary to, the sequenc~ of the original intact polynucleotide strand, which thus serves as a template for the synthesis of its complementary partner. The synthesis of m~`lA corresponding i' to a speciEic nucleotide sequence of DNA is understood to follow the ; same basic principle. Therefore a specific mP~A molecule will have a sequence complementary to one strand of DNA and identioal to the sequence , of the opposite DNA strand, in the region transcribed. Enzy~ic mechanisms exist within living cells which permit the selective transcription of a particular DNA segment containing the nucleotid~ sequence for a particular protein. Consequently, isolating the m~A which contains the nucleotide sequence coding for the amino acid sequence of a particular protein is equivalent to the isolation of the same sequence, or gene, from the DNA
itself. If the mRNA is retranscribed to form DNA complementary thereto (cD~A), the exact DNA sequence is thereby reconstituted and can, by appropriate techniques, be inserted into the genetic material of another , organism. The two complementary versions of a given sequence are therefore 1 inter-convertible, and functionally equivalent to each other.
Th nucleotide subunits of DNA and RNA are linked together by phos-phodiester bo~ds between the 5' position of one nucleotide sugar and the 3' position of its next neighbor. Reiteration of such linkages produces Il a linear polynucleotide which has polarity in the sense that one end can ¦ be distinguished from the other. The 3' end may have a free 3'-hydroxyl, or the hydroxyl may be substituted with a phosphate or a more complex structure. The same is true of the 5' end. In eucaryotic organisms, i.e., those having a defined nucleus and mitotic apparatus, the synthesis II of Eunctional mRNA usually includes the addition of polyadenylic acid to ¦¦ the 3' end of the mRNA. ~lessenger r~A can therefore be separated from !l other classes oE RNA isolated from an eucaryotic organism by column chromatography on cellulose to which is attached polythymidylic acid. I

_7_ !
1.

: . , . ~ , . _ . . .

7~

See Aviv, H., and Leder, P., Proc.Nat. Acad.Sci. USA 69, 1408 (1972). Other chromatographic methods, exploiting the base-pairing affinity of poly ~ for chromatographic packing materials containing oligo dT, poly U, or combinations of poly T and poly U, for example, poly U--Sepharose*, are likewise suitable.
Reverse ~ranscriptase catalyzes the synthesis of DNA complementary to an RNA template strand in the presence of the RNA template, a primer which may be any complementary oligo or polynucleotide having a 3'-hydroxyl, and the four deoxynucleoside triphosphates, dATP, dGTP, dCTP, and dTTP.
The reaction is initiated by the non-covalent association of the oligodeoxynucleotide primer near the 3' end of mRNA
followed by stepwise addition of the appropriate deoxynucleotides, as determined by base-pairing relationships with the mRNA nucleotide sequence, to the 3' end of the growing chain. The product molecule may be described as a hairpin structure in which the original RNA is paired hy hydrogen bonding with a complementary strand of DNA partly folded back upon itself at oneend. The DNA and RNA strands are not covalently joined to each other~ Reverse transcriptase is also capable of catalyzing a similar reaction using a single-stranded DNA template, in which case the resulting product is a double-stranded DNA hairpin having a loop o~
single-stranded DNA joining one set of ends. See Aviv, H., and Leder, P., Proc.Natl.Acad.Sci. USA 69, 1408 (1972) and Efstratiadis, A., Kafatos, F.C., Maxam, A.M., and Maniatis, T., Cell 7, 279 (1976).
Restriction endonucleases are enzymes capable of * Trademark 1~

~ ~5'~4~S

hydrolyzing phosphodiester bonds in DNA, thereby creating a break in the continuity of the DNA strand. If the DNA is in the form of a closed loop, -the loop is converted to a linear structure. The principal feature of a restriction enzyme is that its hydrolytic action is exerted only at a point where a specific nucleotide sequence occurs. Such a sequence is termed the restriction site for the restriction endonuclease. Restriction endonucleases from a variety of sources have been isolated and characterized in terms of the nucleotide sequence of their restriction sites. When acting on double-stranded DNA, some restriction endonucleases hydrolyze -8a-.j~

1 ~5~0~
the pllospho~ieitor bonds on both stranclci ~t the sume point, producing b1ullt ell~s. Others cataly~e hydrolysls of bonds separated by a few nuc1eotides Erom each othor, producing free single-stranded regions at eacll end of the cleaved molecule. Such single-stranded ends are self-complementary, hence cohesive, and may be used to rejoin the hydrolyzed DNA. Since any DNA susceptible to cleavage by such an enzyme must contain th~ same recognition site, the same cohesive ends will be produced, so that it is possible to join heterogeneous sequences of DNA which have been treated with restriction endonuclease to other sequences similarly treated. See Roberts, R.J., Crit.Rev.Biochem. 4, 123 (1976).
It has been observed that restriction sites for a given enzyme are relatively rare and are nonuniformly distributed. Whether a specific restriction site exists within a given segment is a matter which must be empirically determined. However, there is a large and growing number of restriction endonucleases, isolated from a variety of sources with varied site specificity, so that there is a reasonable probability that a given segment of a thousand nucleotides will contain one or more restriction sites.
I For general background see ~atson, J.D., The Molecular Biology of the Gene, 3d Ed., Benjamin, Menlo Park, California, (1976); Davidson, J.~N., The Biochemistry of the Nucleic Acids, 8th Ed., Revised by Adams, ;, R.L.P., Burdon, R.~1., Campbell, A.M. and Smellie, R.~1.S., Academic Press, , New York, ~1976); and Hayes, W., "The Genetics of Bacteria and Their ~, ,l Viruses", Studies in Basic Genetics and ~lolecular Biology, 2d Ed., Blackwell Scientific Pub., Oxford (1968~.

, SU~MARY OF INVENTION
!
I A novel purification procedure for cDNA of desired nucleoeide sequence complementary to an individual mRNA species is disclosed. The , method employs restriction endonuclease cleavage of cDNA transcribed I from a complex mixture of mRNA. The method does not require any extensive purification of RNA but instead makes use of transcription of RNA into cDNA, the sequence speclfic Eragmeneaclon of this cDNA with one or two 1~ 9 ... I , .
.

1 '~ S ~

res~riction endo~ cl~lscs~ arld t~e fractionation of the cDNt~ restriction Ira~ments on the basis oE their ~engti~. The use of restriction endonucleases eliminates size lleterogcneity and produces homogeneous len~th D~t~ fragments from any c~ species which co~tains at leclst two restriction sites.
From the initially heterogeneous population oE cDNA transcripts, uniform size fragments o~ desired sequence are produced. The Eragments may be several hundred nucleotides in length and may in some instances include the entire structural gene for the desired protein. The length of the fragments depends on the number of nucleotides separating the restriction sites and will usually be different for different regions of DNA.
Fractionation by length enables purification of a homogeneous population of fragments having the desired sequence. The fragments will be homogeneous in size and highly pure in terms of nucleotide sequence. Current separation I and analysis methods enable the isola~ion of such fragments from a corresponding m~NA species reprcsenting at least 2% of the mass of the RNA transcribed. The use of prior art RNA fractionation methods to prepurifv the m~A before transcription will result in lowering the actual lower limit of detection to less than 2% of the total m~A isolated from the organism.
,~ Specific sequences purified by the procedure outlined above may be ! further purified by a second specific cleavage with a restriction , endonuclease capable of cleaving the desired sequence at an internal ! site. This cleavage results in formation of two sub-fragments of the ~ ~ , desired sequence, separable on the basis of their length~ The sub-253 ~ I fragments are separated from uncleaved and spec:Lfically cleaved contaminating i sequences having substantially the same original si~e. The method is founded upon the rarity and randomness of placement of restrictlon '¦ endonuclease recognition sites~ which results in an extremely low probability l~ that a contaminant having the same original length will be cleaved by I the same en7yme to yield fragments having the same length as those yielded by the desired sequence. After separation from the contaminants, the sub-fragments of the desired sequence may be rejoined using techniques ,;
-10- 1' I
l ~ ' _ _~

~ ~ 57~
linowil in the ;Irt to rccollstitute the original secluence. The two sub-Eragmerlts must be prcvented ~rom jo-ining to~etiler in the reverse order of their original seq-lence. A method is disclosed whereby the sub-fragments can only join to each other in the proper order.
Variations of the above-recited methods may be used in combination with appropriate labelling techniques to obtain accurate, quantitative i measurements of the purity of the isolated sequences. The combined techniques have been applied to produce a known nucleotide sequence with greater than 99~ purity.
~ The cDNA isolated and purified by the described methods may be recombined with a suitable transfer vector and transferred to a suitable host microorganism. Novel plasmids have been produced, containing the nucleotide sequences coding for rat growth hormone and the major portions i of human chorionic somatomammotropin and human growth hormone, respectively.
~ovel microorganisms have been produced having as part of their genetic makeup the genes coding for RG~I, the major portion of HCS and the major portion of HGH, respectively. The disclosed techniques may be used for the isolation and purification of growth hormones from other animal species and for the construction of novel transfer vectors and micro- j ~ organisms containing these genes.
.
DETAILED DESCRIPTION OF INVENTION
The present invention employs as starting material polyadenylated, I crude or partially purified messenger RNA, which may be heterogeneous ; in sequence and in molecular size. The selectivity of the ~A isolation ~5 ll procedure is enhanced by any method which results in an enrichment ofi¦ the desired mRNA in the heterodisperse populaeion of mRNA isolated. Any such prepurification method may be employed in conjunction with the method , of the present invention, provided the method does not introduce endo-1 nucleolytic cleavage of the mRNA. An important initial consideration ~ is the selection of an appropriate source tissue for the desired mRNA.
¦l Often, this choice will be dictated by the fact that the pro~ein ulti-1l mately to be produced is only made by a certain specialized tissue of a ~ I
l _ ~

g~

differentiated organism. Such is the case, for example, with the peptide hormones, such as growth hormone or HCS.
In other cases, it will be found that a variety of cell types or microbial species can serve as a source of the desired mRNA. In those cases, some preliminary experi-mentation will be necessary in order to determine the optimal source. Frequently, it will be found that the prop-ortion of desired mRNA can be increased by taking advantage of cellular responses to environmental stimuli. For example, treatment with a hormone may cause increased production of the desired mRNA Other techniques include growth at a particular temperature and exposure to a specific nutrient or other chemical substance.
Prepurification to enrich for desired mRNA
sequences may also be carried out using conventional methods for fractionating RNA, after its isolation from the cell.
Any technique which does not result in degradation of the RNA may be employed. The technique_ of preparative sedimentation in a sucrose gradient and gel electrophoresis are especially suitable.
The mRNA must be isolated from the source cells under conditions which preclude deyradation of the mRNA.
The action of RNase enzymes is particularly to be avoided because these enzymes are capable of hydrolytic cleavage of the ~NA nucleotide sequence. The hydrolysis of one bond in the sequence results in disruption of that sequence and loss of the RNA fragment containing the original 5' end of the sequence. A suitahle method for inhibiting RNase during extraction from cells is disclosed in Canadian patent application Serial no. 303,930 filed May 23, 1978, assigned to the same assignee as the instant application. The method ~ ~57~

involves the use of 4M guanidinium thiocyanate and 1 M
mercaptoethanol during the cell disruption step. In addition, a low temperature and a pH near 5.0 are helpful in further reducing RNase degradation of the isolated RNA.
Prior to application of the method of the present invention, ~RNA must be prepared essentially free of contaminating protein, DNA, polysaccharides and lipids.
Standard methods are well known in the art for ~ -12a-~ 1 5 ~
acco[npLishinO such purification, RN~ thus isolatcd contains non-messengcr as ~ell as messenger r~NI~. A conv~nient method ~or separating the mP~A
of eucaryotes is chL-omatography on columns of oligo-dT cellulose, or other oligonucleotide-substituted column material such a poly U-Sepharose, taking advantage of the hydrogen bonding specificity conferred by the presence of polyadenylic acid on-the 3' end of eucaryotic m~NA.
The initial step in the process of the present invention is the formation of DNA complementary to the isolated heterogeneous sequences of m~A. The enzyme of choice for this reaction is reverse transcriptase, although in principle any enzyme capable of forming a faithful comple-mentary DNA copy of the mRNA template could be used. The reaction may ;be carried out under conditions described in the prior art, using m~A
as a template and a mixture of the four deoxynucleoside triphosphates dATP, dGTP, dCTP and dTTP, as precursors for the DNA strand. It is ; convenient to provide that one of the deoxynucleoside triphosphates be labeled with a radioisotope, for example 32p in the alpha position, in order to monitor the course of the reaction, to provide a tag for recovering the product after separation procedures such as chromato-graphy and electrophoresis, and for the purpose of making quantitative , estimates of recovery. See Efstratiadis, A., et al., supra.
The cDNA transcripts produced by the reverse transcriptase reaction i are somewhat heterogeneous with respect to sequences at the 5' end and the 3' end due to variations in the initiation and termination points of individual I¦ transcripts, relative to the mRNA template. The variability at the 5' 11 end is thought to be due to the fact that the oligo-dT primer used to initiate synthesis is capable of binding at a variety of loci along the polyadenylated region of the mRNA. Synthesis of the cDNA transcript ¦ begins at an indeterminate point in the poly-A region, and a variable ,1 length of poly-A region is transcribed depending on the initial binding 1, site of the oligo-dT primer. It is possible to avoid this indeterminacy by the use of a primer containing, in addition to an oligo-dT tract, one or two nucleotides of the RNA sequence itself, thereby producing a ~ ~57~0~
primer whicll ~ill l~ave a prcferrcd and defined bin~ling sitc ~or initiating thc trallscription reaction.
The indeterminucy at the 3'-end o~ the cDNA transcr-ipt is due to a variety of factors affecting the rever~se transcriptase reaction, and to the possibility of partial degradation of the RNA template. The isolation oE speciEic c~NA transcripts of maximal length is greatly facilitated if conditions for the reverse transcriptase reaction are chosen which not onl~ favor full length synthesis but also repress the synthesis of small ~NA chains. Preferred reaction conditions for avian myeloblastosis virus reverse transcriptase are given in the examples section. The specific parameters whicll may be varied to provide maximal production of long-chain DNA transcripts of high fidelity are reaction temperature, salt concentration, amount of enzyme, concentration of primer relative ito template, and reaction time.
The conditions of temperature and salt concentration are chosen so as to optimiæe specific base-pairing between the oligo-dT primer and the polyadenylated portion of the ~A template. Under properly chosen conditions, the primer will be able to bind at the polyadenylated 1 region of the RNA template, but non-specific initiation due to primer binding at other locations-on the template, such as short, A-rich sequences, will be substantially prevented. The effects of temperature and salt are interdependent. Higher temperatures and lower salt con-, centrations decrease the stability of specific base-pairing interactions.
I¦ The reaction time is kept as short as possible, in order to prevent lll non--specific initiations and to minimize the opportunity for degradation.
~ Reaction times are interrelated with temperature, lower temperatures I requiring longer reaction times. At 42C, reactions ranging from 1 min.
,I to 10 minutes are suitable. The primer should be present in 50 to ,1 500-fold molar excess over the RNA template and the enæyme should be 'I present in similar molar excess over the RNA template. The use of excess enzyme and primer enhances initiation and cDNA chain growth so that long-chain cDNA transcripts are produced efficiently within the confines of the sort incubation times.

. ~ . . .

57~

In many cases, it will be possible to carry out the remainder of the purification process of the present invention using single-stranded cDNA sequences transcribed from mRNA. However, as discussed below, there may be instances in which the desired restriction enzyme is one which acts only on double-stranded DNA. In these cases, the cDNA prepared as described above may be used as a template for the synthesis of double-stranded DNA, using a DNA polymerase such as reverse transcriptase and a nuclease capable of hydrolyzing single-stranded ~NAo Methods for preparing double-stranded DNA in this manner have been described in the prior art. See, for example, Ullrich, A., Shine, JO~
Chirgwin, J., Pictet, R., Tischer, E., Rutter, W.J. and Goodman, H.M., Science 196, 1313 (19~7).
Heterogeneous cDNA, prepared by transcription of heterogeneous mRNA sequences, is then treated with one or two restriction endonucleases. The choice of endonuclease to be used depends in the first instance upon a prior determination that recognition sites for the enzyme exist in the sequence of the cDNA to be isolated. The method depends upon the existence of two such sites. If the sites are identical, a single enzyme will be sufficient. The desired sequence will be cleaved at both sites, eliminating size heterogeneit~ as far as the desired oDNA sequence is concerned, and creating a population of molecules, termed fragments, containing the desired sequence and homoyeneous in length. If the restriction sites are different, two enzymes will be required in order to produce the desired homogen~ous length fragments.

3~ The choice of restriction enzyme(s) capable of producing an optimal length nucleotide sequence fragment coding for all or part of the desired protein must ~e made ~ ~ 5 ~

empirically. If the amino acid sequence of the desired protein is known, it is possible to compare the nucleotide sequence of uniform length nucleotide fragments produced by restriction endonuclease cleavage with the amino acid sequence for which it codes, using the known relationship o F
the genetic code common to all forms of life. A complete amino acid sequence for the desired protein is not necessary, howe~er, since a reasonably accurate identification may be made on the basis of a partial sequence. Where the amino acid sequence of the desired protein is not known, the uniform length polynucleotides produced by restriction endonuclease cleavage may be used as probas capable of identifying the synthesis of the desired protein in an appropriate ln vitro protein synthesizing system. Alter-natively, the mRNA may be purified by affinity chromatography.
Other techniques w~ich may be suggested to those skilled in the art will be appropriate for this purpose.
The number of restriction enzymes suitable for use depends upon whether single-stranded or double-stranded cDNA is used. The preferred enzymes are those capable of acting on single-stranded DNA, which is the immediate reaction product of mRNA reverse transcription. The number of restriction enzymes now known to be capable of acting on single-stranded DNA is limited. The enzymes HaeIII, HhaI
and Hin(f)I are presently known to be suitable. In addition, the enzyme MboII may act on single-stranded DNA. Where further study reveals that other restriction enzymes can act on single-stranded DNA, such other enzymes may appropriately be included in the list of preferred enzymes. Additional suitable enzymes include those specified for double--stranded cDNA~ Such enzymes are not preferred since additional ~ ~57~0~

reactions are required in order to produce double-stranded cDNA, providing increased opportunities for the loss of longer sequences and for other losses due to incomplete recovery. The use of double-stranded cDNA presents the additional technical disadvantage that subsequent sequence analysis is more complex and laborious. For these reasons, single-stranded cDNA iS preferred, but the use of double-stranded DNA is feasible.
The cDNA prepared for restriction endonuclease treatment may be radioactively labeled so that it may be detected after subsequent separation steps. A preferred technique is to incorporate a radioactive label such as 3~P
in the alpha position of one of the four deoxynucleoside triphosphate precursors. Highest activity is obtained when the concentration of radioactive precursor is high relatlve to the concentration of the non-radioactive form.
However, the total ~ -16a-l ~ s ~
conc~n~ration oE any d~o:iylluclcoside triphospilatc should b~ grcat~r than 30 ~l, in order to maximize the length oE cD~A obt.~ined in tl~e reverse transcriptasc reaction. See EEstratiadis. A., ~laniatis, T., Kalatos, F.C., JeEErey, rt.~ and Vournakis, J.`i~., Cell ~, 367 (1975).
For the purpose of determining the nucleotide sequence of cDN~, the

5' ends may be conveniently labeled with 32p in a reaction catalyzed by the enzyme polynucleotide kinase~ See ~laxam, ~.M. and Gilbert, W., Proc.t~atl.Acad.Sci. USA 74, 560 (]977).
Fragments which have been produced by the action of a restriction enzyme or combination of two restriction en~ymes may be separated from each other and from heterodisperse sequences lacking recognition sites by any appropriate technique capable of separating polynucleotides on the basis of differences in length. Such methods include a variety of electrophoretic techniques and sedimentation techniques using an ultra~
centrifuge. Gel electrophoresis is preferred because it provides the best resolution on the basis of polynucleotide length. In addition, the method readily permits quantitative recovery of separated materials.
Convenient gel electrophoresis methods have been described by Dingman, j C.W., and Peacock, A.~., Biochemistry 7, 659 (196~), and by ~laniatis, T., Jeffrey, A. and van de Sande, H., Biochemistrv 14, 3787 (1975).
; Prior to restriction endonuclease treatment, cDNA transcripts obtained from most sources will be found to be heterodisperse in length.
By the action of a properly chosen restriction endonuclease, or pair of l endonucleases, polynucleotide chains containing the desired sequence 1l will be cleaved at the respective restriction sites to yield polynuc-jl leotide frag~ents of uniform length. Upon gel electrophoresis, these will be observed to form a distinct band. Depending on the presence or absence of restriction sites on other sequences, other discrete bands Il may be formed as well, which will most likely be of different length 1l than that oE the desired sequence. Therefore, as a consequence of restriction endotluclease action, the gel electrophoresls pattern will ! reveal the appearance of one or more discrete bands, while the remainder ~ :~ 5, ~, J

of the cDNA will continue to be heterodisperseO In the case where the desired cDNA sequence comprises the major polynucleotide species present, the electrophoresis pattern will reveal that most of the cDNA is present in the discrete band.
Although it is unlikely that two different sequences will be cleaved by restriction enzymes to yield fragments of essentially similar length, a method for detexmining the purity of the defined length fragments is desirable.
Sequence analysis of the electrophoresis band may be used to detect impurities representing 10~ or more of the material in the bandO A method for detecting lower levels of impurities has been developed, as part of the present invention, founded upon the sa~e general principles applied in the initial isolation method. The method requires that the desired nucleotide sequence fragment contain a recognition site for a restriction endonuclease not employed in the initial isolation. Treatment of polynucleotide material, eluted from a gel electrophoresis band, with a restriction endonulcease capable of acting internally upon the desired sequence will result in cleavage of the desired sequence into two subfragments, most probably of unequal length. These sub-fragments upon electrophoresis will form two discrete bands at positions corresponding to their respective lengths, the sum of which will equal the length of the polynucleotide prior to cleavage. Contaminants in the original band that are not suscepti~le to the restriction enzyme may be expected to migrate to the original position. Contaminants containing one or more recognition sites for the enzyme may be expected to yield two or more sub-fragments. Since the distribution of recognition sites is believed to be essentially random, the probability that a contaminant will also yield sub-fragments o~ the same size as those of the fragment of desired sequence is extremely low. The amount of material present in any band of radioactively labeled polynucleotide can be determined by quanti~ative measurement of the amount of radioactivity present in each band, or by any other approp-iate method. A quantitative measure of the purity of the fragments of desired sequence can be obtained by comparing the relative amounts of material present in those bands representing subfragments of the desired sequence with the total amount of material.
Following the foregoing separation, the desired sequence may be reconstituted. The enzyme DNA ~ igase, which catalyzes the end-to-end joining of DNA fragments, may be employed for this purpose. The gel electrophoresis bands representing the sub~fragments of the desired sequence may be separately eluted and combined in the presence oE DNA ligase, under the appropriate conditions. See Sgaramella, V., Van de Sande, J.I~., and Khorana, H.G., Proc.Natl.Acad Sci. USA 67, 1468 (1970). Where the sequences to be joined are not blunt-ended, the ligase obtained from ~. Coli may be used, Modrich, P., and Lehman, I.R., J.Biol.Chem. 245, 3626 (1~70).

The efficiency of reconstituting the original sequence from subfragments produced by restriction endonu-clease treatment will be grea-tly enhanced by the use of a method for preventing reconstitution in improper sequence.
This unwanted result is prevented by treatment of the homogenous length cDNA fragment of desired sequence with an agent capable of removing the 5'-terminal phosphate groups on the cDNA prior to cleavage of the homogenous cDNA
with a restriCtiOn endonuclease. The enzyme, alkaline ~ :~ 5 ~

phosphatase, is preferred. The 5'-terminal phosphate groups are a structural prere~uisite for the subsequent joining action of DNA ligase used to reconstitute the cleaved sub-fragments. Therefore, ends which lack a 5'-terminal phcsphate cannot be covalently joined. The DNA sub fragments can only be joined at the ends containing a 5' phosphate generated by the restriction endonuclease cleavage performed on the isolated DNA fragments. The method is essentially that described in detail in Canadian application Serial no.
303,972 filed May 23, 1978.
The majority of cDNA transcripts, under the conditions employed, are derived from the mRNA region containing the 5'-end of the mRNA template by specifically priming on the same template with a fragment obtained by restriction endonuclease cleavage. In this way, the above-described method may be used to obtain not only fragments of specific nucleotide sequence related to a desired protein r but also the entire nucleotide sequence coding for the protein of interest.
The purification process is of a special significance in the cloning of human genes, which, under Federal regulations, can only be put into recombinant DNA
and then into bacteria after the genes have been very carefully purified, or if the experiments are carried out in special high-risk (P4) facilities. See Federal Register, Vol. 41, No. 131, July 7, 1967, pp 27902-27943. The present method has enabled the production of sufficiently pure human genes, comprising the majority of the structure of HCS and HGH. Human genetic material, isolated and purified as described above, may be incorporated into recombinant plasmids or other transfer vectors. Doublestranded chemically synthesized oligonucleotide linkers, containing the ~20-7 ~

recognition sequence for a restriction endonuclease, may be attached to the ends of the lsolated cDNA, to facilitate subsequent enzymatic removal of the human gene portion from the transfer vector DNA. See Scheller, R.H., et al., Science 196, 177 ~1977). The transfer vector DNA is converted from a continuous loop to a linear form by treatment with an appropriate restriction endonuclease. The ends thereby fo~med are treated with alkaline phosphatase to remove 5'-phosphate end groups so that the transfer vector DNA may not reform a continuous loop in a DNA ligase reaction without first incorporating a segment of the human DNA. The cDNA, with attached linker oligonucleotides, and the treated transer vector DNA are mixed together with a D~A ligase enzyme, to join the cDNA to the vector DNA, forming a continuous loop of recombinant vector DNA having the cDNA
incorporated therein. Where a plasmid transfer vector is used, usually the closed loop will be the only form able to transorm a bacterium. Transformation, as is understood in the art and used herein/ is the term used to denote the process whereby a microorganism incorporates extracellular DNA
into its own genetic constitution. Plasmid DNA in the form of a closed loop may be so incorporated under appropriate environmental conditions. The incorporated closed loop plasmid undergoes replication in the transformed cell, and the replicated copies are distributed to progeny cells when cell division occurs. As a result, a new cell line is established, containing the plasmid and carrying the genetic determinants thereof. Transformation by a plasmid in this manner, where the plasmid genes are maintained in the cell line by plasmid replication, occurs at high frequency when the transforming plasmid DNA is in closed loop form, and does not or rarely occurs if linear plasmid DNA is used. Once a ! ~

l~ ~ s~

recombinant transfer vector has been made, transformation of a suitable microorganism is a straightforward process, and novel microorganisms strains containing the human gene may readily be isola~ed, using appropriate selection techniques, as understood in the art.
The construction of novel transfer vectors and microorganisms containing the rat growth hormone gene can be carried out in similar fashion, except that a simplified process is permitted by lower purity requirements. Following isolation of the initial cDN~ transcripts of rat pituitary mRNA and electrophoresis to fractionate the cDNA transcripts by length, a band of material migrating at the expected position for full-length RGH-cDNA may be used as the starting material for the cloning process. This method is advantageous over the method employed for the hum~n genes in that it permits the isolation of DNA containing the entire structural gene nucleotide sequence. The growth hormones o vertebrate species are similar in length and in amino acid sequenceO
Therefore the foregoing procedure could be applied to the cloning of any growth hormone from anlanimal source and would be applicable to the isolation of the full sequence of human growth hormone given suitable (P4) laboratory facilities or a relaxation of the current Federal purity requirements.
Although it is preferred to isolate cDNA appearing as an observable band after gel electrophoresis, it would be feasible to isolate cDN~ at the expected position in the absence of a discrete band, provided the approximate length of the desired sequence were known.

Using the above described methods for purification and analysis, a desired nucleotide sequence containing most of the structural gene for HCS has been isolated and shown ts be greater than 99~ pure. The structural gene for HGH has 21a-1 ~ ~ 7 ~
been isolated to a compara~le degree of purity. Novel plasmids containing the isolated HCS of HGH sequences have been synthesized. Novel microorganisms containing the isolated HCS and HGH sequences as part of their genetic material have been produced. A nucleotide sequence contain- ;
ing the entire structural gene for RGH has been isolated, , -.

";

~ .

~;~
c ~ -2lb-I ~ 5 r~
novel recombinant plasmids constructed -therewith. Novel microorganisms containing the structural gene for RGH
as part of their genetic makeup have been produced.
The nucleotide sequence for human chorionic somatomammotropin, isolated by the presently disclosed methods, comprises: 5 G GCL24 25 26 27 28 29 30 31 J32GAJ33AcL34TAK3sATM36ccL37AAJ38GAK39c~J
AAJ TAK42QR43s43TTK44x45Ty45cAK46GAK47Q 48 48 49 50 QR5ls5lTTK52TGK53TTI~54QR55s55GAK56QR57s57AT 58 59 60 61Q 6~ 62AAK63ATGGAJ65GAJ66AcL67cAJ68cAJ ~J QR
S7lAAK72x73Ty73G~J74x75 75 76 76 77 77 78 79 79 80 80 81 8lx82Ty82ATM83GAJ84QRsssssTGGxg7Tyg7GAJggccL
90 91 91 92x93Ty93w94Gz94QR95s95ATGTTK97GcL98AA~c AAKloo~lolTylolGTLlo2TAKlo3GAKlo~AcLlo5QRlo6slo6G 107Q 108 S108GAKlogC~AKlloTAKlllCAK11 2X113T~113~114TY114AAJ115GAK116 X T~117GAJ118GAJ119GGL120ATM121CAJ122A 123 124 124 126 127 127xl28Tyl28GAJl29GAKl3oGGLl3lQR S W
Gzl33wl34Gzl34AcLl35GGLl36cAJl37ATMl38xl39Tyl39 1~0 141 ACL TAKl43QRl44sl44AAJl45TTKl46GAKl47 148 149 150 20Sl5ocAKl5lAAKl52cAKl53GAKl54GcLl55~l56Tyl56 157 157 158 AAKl59TAKl6oGGLl6lxl62Tyl62~l63Tyl63T 164 165 166 167 Gzl67AAJl68GAK~69ATGGAKl7lAAJl72GTLl73GAJl74AcLl75T 176 Xl77Tyl77wl78Gzl78ATGGTLl8ocAJlglTGKl82~l83Gzl83QRl84 184 185GAJl86GGLl87QRl88sl88TGKl89GGLl9oTTKl9lTAGGTGcccGAGTAG
CATCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCC - 3' wherein A is deoxyadenyl, G is deoxyguanyl, C is deoxycytosyl, T is thymidyl, 30J i.s A or G;
K is T or C;
L is A, TC or G;
M is A, C or T;

s ~

Xn is T or C, if Yn is A or G, and C if Yn is C
or T;
Yn is A, G, C or T, if Xn is C, and A or G if Xn is T;
Wn is C or A, if Zn is G or A, and C lf Z is C
or T' Zn is A, G, C or T, if Wn is C, and A or G if Wn is A' QRn is TC, if Sn is Al G, C or T, and AG if Srl is T or C;
Sn is A, G, C or T, if QRn is TC, and T or C if QR is AG and subscript numerals, n, reer to the amino acid position in human chorionic somatoma~unotropin, for which the nucleotide sequence corresponds, according to the genetic code, the amino acid positions being numbered from the amino end.
The nucleotide sequence for human growth hormone, isolated by the presently disclosed methods, comprises:

2 4 R 2 5GAK 2 6ACL 2 7TAK 2 ~CAJ 2 gGAJ 3 oTTX 3 lGAJ 3 GAJ
AcL34TAK35ATM36ccL37AAJ38GAJ39cAJ4oAAJ4lT 42Q 43 43 44 X45Ty45cAJ46AAK47ccL48cAJ49AcL5oQR5ls5lx52Ty52TG 53 54 QR s55GAJ56QR57ss7ATMsgccL5gAcL6occL6lQR62s62 63 64 Gz64GAJ65GAJ66AcL67cAJ68cAJ69AAJ7oQ~7ls7lAAK72x73 73 74 X TY X j Ty76w77Gz77ATM78QR79s79x8oTy8ox8lTy8lx82 82 83 84Q 85s85TGGx87Ty87GAJggccL89GTL9ocAJ9lTTK92x93Ty93w GZ94QR95S 95GTL96TTX97GCLggAAKggAAK100XlOlTYlOl(~TL10 2TAK103 ~104 lOS 106 1o6GAKlo7QRlogslogAAKlogGTLlloTAKlllGAK
113 113 114 1l4AAJll5GAKll6xll7T~ll7GAJll8GAJll9GGL

ATMl2lcAJl22AcLl23xl24T~l24ATGGGLl26wl27Gzl27xl28 128 129 GAK GGLl3lQRl32sl32ccLl33wl34Gzl34AcLl35 136 137 138 139 140 14lAcLl42T~Kl43QRl44sl44AAJl45TTKl'l6GAK
148 1d~9QR150S150CAK151AAKls2CAK153GA~15 GCL X

- 22a -~' ~ ~7~

Tyl56xl57Tyl57A~Jl58~AKl59TAKl6oGGLl6lxl62 162 163 163 164 165TTKl66wl67Gzl67AAJl6gGAKl69ATGGAKl7lAAJl72GTLl 3 174 175 176 177TY177W178GZ17~3ATM179GTL180CAJ181TGK
W GZ 83QR184S184GTL185GAJlg6GGL187QR188S188 189 190 TTKlgl TAGCTGCCCGGGTGGCATCCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCC -3' wherein A is deoxyadenyl, G is deoxyguanyl, C is deoxycytosyl, T is thymidyl, J is A or G;
K is T or C;
L is A, TC or G;
M is A, C or T' Xn is T or C, if Yn is A or G, and C if Yn is C or T;
Yn is A, G, C or T, if Xn is C, and A or G if Xn is T~
Wn is C or A, if Zn is G or A, and C i:E Zn is C or T;
Zn is A, G, C or T, if Wn is C, and A or G if Wn is A, QRn is TC, if Sn is A, G, C or T, and AG if Sn is T or C;
Sn is A, G, C or T, if QRn is TC, and T or C
if QRn is AG and subscript numerals, nl refer to the amino acid position in human growth hormone, for which the nucleotide sequence corresponds, according to the genetic code, ~the amino acid positions being numbered from the amino end.
The nucleotide sequences for human growth hormone may comprise, in addition t - 22b -1 ~57~rJ$

1 2 3 4ccL5x6Ty6QR7s7w~Gz8x9Ty9TTKloGAKl AAK
13 15 15W16GZ16GCL17CAK18~19GZlgX20TY2oCAK2 CAJ
X23TY23 -3' and wherein Y23 iS followed in sequence by GCL24 in the sequence of claim 17.
The accompanying figures and drawings are provided to demonstrate the results obtained in the specific examples illustrating the invention.
Figure l is an autoradiogram of a series of gel electrophoresis runs of 32P-labelled cDNA, as described in detail in Example l.
Figure 2 is a schematic representation of the nucleotide sequence coding for HCS, showing the relative locations of various restriction sites, as described in detail in Example 1.
Figure 3 is an autoradiogram of gel electrophoresis results using 32P-labelled CDNA, as described in detail in Example 2.
Figures 4 and 5 are autoradiograms of gel electro-phoresis results using 32P-labelled cDNA, as described in detail in Example 3.
EXAMPLE _ ~he general procedure for isolating a specific cDNA sequence has been demonstrated by isolating a sequence comprising a portion of the coding region for HCS, extracted from placental tissue.
mRNA Extraction From Placenta .
H-uman term placentas obtained from cesarean - section were quick-frozen in liquid nitrogen and stored at -60C. For extraction of total RNA, 40g of the frozen placental tissue was broken into small pieces and dissolved with the aid of a blender in 140 ml of freshly prepared 7 M guanidinium-HCl (Cox, R.A., Methods in Enzymology - 22c 12, 120 (1968), 20 mM Tris-HCl, pH 7.5, 1 mM EDTA, 1%
sarcosyl* at 0C. After adding 0.5g CsCl to each ml, the dark brown solution was heated at 65C for 5 min., quick-cooled in ice, layered on top of a 5 ml cushion of 5.7 M CsCl, 10 mM Tris-HC1, pH 7.5, 1 mM EDTA in in. x 3~ in. nitrocellulose tubes and centrifuged in an SW27 rotor (Beckman Instruments Corp., Fullerton, California) at 27,000 rpm for 16 hr at 15C (Glisin, V., Crkvenjakov, R., and Ryus, C., Biochem. 13, 2633 (1974)). After centrifugation, the tube contents were decanted, the tubes were * Trademark, Ciba-Ge igy Corp., Greensboro, N.C.

- 22d -~ ~ s ~ r~ ~

drained, and the bottom 1/2 cm containing the clear RNA
pellet was cut off with a razor blade. Pellets were transferred into a sterile erlenmeyer flask and dissolved in 20 ml 10 mM Tris-HCl, pH 7.5, 1 mM EDTA, 5% sarcosyl and 5%
phenol. The solution was then made 0.1 M in NaCl and vigorously shaken with 40 ml of a 50~ phenol-50~ chloroform mixture. RNA was precipitated from the aqueous phase with ethanol in the presence of 0.2 M Na-acetate pH 5.5. RNA
pellets were washed with 95% ethanol, dried, and dissolved in sterile water. Usually 40g of placental tissue yieldecL about 30mg of RNA from which approximately 300 ug of polyadenylated RNA was obtained af`ter twice chromatographing on oligo--dT
cellulose. See Aviv, and Leder, supra.
Synthesis of cDNA.
Analytical reactions were performed in 5~1 contain-ing 50 mM Tris-~Cl, pH 8.3; 0.1 mM EDTA; 7 mM MgC12;
20 mM KCl; 10 mM~-mercaptoethanol; 40 ~M dCTP (50,000 cpm P per pmole); 500 ~ M each dCTP, dATP, and dTTP; 100 ~g/ml of polyadenylated RNA; 20 ug/ml oligo-dT12 18 obtained from Collaborative Resea~ch, Waltham, Mass.; and 100 llnits/ml reverse transcriptase from avian myeloblastosis virus. The enzyme is available from Dr. D.J. Beard, Life Science Incorporated, St. Petersburg, Florida, who produces the enzyme under contract with the National Institutes of Health, by the procedure of Kacian, D.L. and Spiegelman, S., in Methods ln Enzymolog~ 29, L. Grossman, and K. Moldave, eds., Academic Press, N.Y. (1974), p. 150. Reactions were started by the addition of enzyme at 0 C and synthesis was for 6 min at 42~C. Under these conditions approximately 106 cpm P
were incorporated into TCA-precipitable material and each ug of RNA yielded about 50 ng of cDNA. To obtain enough ~DNA

~ :~ 5 ~

for sequence analysis, the reaction volumes were increased to 100 ~1 and the dCTP concentration was raised to 250,~M (specific activity of 500 cpm 32p per pmole). Under these conditions about 200,000 cpm of 32P-labelled dCMP were incorporated into cDNA.
Restriction Endonuclease Treatment.
-For restriction endonuclease digestions theanalytical reactions were stopped ~y the addition of 20 ~1 of ice-cold water, boiled for 2 min, quick-cooled on ice, and made 107 m~1 in MgC12. Aliquots (S ~1, about 2 x 10 cpm) were digested using an excess Emount of restriction ~j -23a-~ ~7~
endonuclease~s) ~elII o~ 11ha1 or ~oth, ~or 1 hr at 37C. Hai~lII was prep.lred accorditlg to the method ofil`liddleton, J.H., dgell, ~.H., and !lutchinson, C.A. III, J Virol. 10, 42 (197Z). ~-1haI and ~
~ere obtained from New England Bio-Labs~ Beverly, 11ass. E1aelII is also available from the latter source. The amount of enzyme used was empirically determined to be in excess of the amount needed to completely digest an equivalent amount of restriction-sensitive DNA under identical reaction conditions. Reactions were stopped with 5 ~1 of 20 m'1 EDT,~, 20% sucrose, 0.05% bromophenolblue, heated to 100C for l min and then analyæed by polyacrylamide gel electrophoresis.
The products were separated on a composite 4.5% - lO~o polyacrylamide slab gel for 2.S hr at 150V in Tris-Borate-EDTA buffer (Dingman~ C.W.
and Peacock, A.C., supra) and visualized by autoradiography of the dry gel.
Figure 1 shows the results of gel electrophoresis and autoradio-graphy of 32P-labeled cDNA, prepared as described above. The samples were initially spotted at the origin and migrated electrophoretically ; through 4.5% acrylamide and then through 10~ acrylamide. A bar is ' placed on the left-hand side of the figure to indicate the position of the boundary between the two gel regions. Lane A represents the electro- f phoretic migration of the total cDNA transcript. Lane B shows the migration of HhaI treated cDNA. Lane C shows the migration of HaeIII
treated cDNA. Lane D shows the electrophoretic migration of total cDNA
treated with both HhaI and HaeIII. Lane E demonstrates the electro-1~ phoretic migration of the material isolated from the prom-inent band in il Lane C. Lane F shows electrophoretic migration of iso]ated material ¦ from the prominent band of Lane C after treatment with HhaI. Lane C
shows the electrophoretic migration of HaeIII cleaved s'-32P end-labelcd Il single-stranded phage ~113 DNA used as a size standard, according to I Horiuchi, K., and Zinder, N.D., Proc.Nat.Acad.Sci. USA 72, 2555 (1975).
' The approximate lengths in nucleotides of these DNA fragments are indicated by the numbers on the right.
l l l '~1 -24-__ _ _ .

1 ~ 5 ~

Th~ re~llt in l.ane A demonstr.ltl~s that the c~ transcript Erom term placcllt;ll mRNA is lletero~Lsp~rse. Trc.ltrncnt with llh~I, Lane B or H;lelII. Lane C result~ in the accumu1:1tion o~ polynucleotides o~ discrete len(-th. The production of such discrete bands indicates the presence, in a heterogeneous population of cDNA transcripts, of at least one sequence present in multiple co~ies and having two restriction sites for HhaI and HaeIII, respectively. Cleavage with HhaI produces a fragment of about 470 nucleotides, and HaeIII digestion produces a fragment of appro~imately 550 nucleotides in length. Digestion by both enzymes yields three fragments designated A, 90 nucleotides long, B, 460 nucleotides long, and C, appro~imately 10 nucleotides long. Due to its small size, fragment C migrated off the gel under the conditions used in figure 1.
The band of material appearing at the interface between 10~ and ~l.5% gel represents heterogeneous material which was too large to enter the 10 gel and therefore accummulated at the interface. As judged from the simple band pattern of Lane D, fragments A and B seem to originate from the same cDNA molecule. This conclusion was confirmed by elution of the larger HaeIII fragment from the gel, migrating as sho~ in Lane L, followed by redigestion with HhaI. Such treatment produced two fragments ' comigrating with the bands released by combined HaeIII and HhaI digestion of the total cD~I~, as seen by comparing Lanes D and F. In the total cDNA digest, Lane D9 the autoradiographic density, which is a measure of ' the total radioactivity present in the band, is greater for fragment A
Il than fragment B, although the reverse may be expected on the basis of , size differences. This observation suggests that fragment A is transcribed from a region closer to the 3'-end of the mRNA than is fragment B.
Figure ~ is a schematic representation of the cDNA molecule showing the relative locations of the HaeIII and HhaI restriction sites. DNA
I fragments A and B, derived from the same cDNA molecule, were ordered on j the basis of their relative intensity on the autoradiogram shown in figure 1, Lane D. The existence of DNA fragment C was inferred from the difference in the electrophoretic mobility of the band appearing in Lane I! !

~ t 5 ~

B and Lane D of figure 1. The size of DNA fragment A is known exactly from a determination of its nucleotide sequence by the method of Maxam, A., and Gilbert, W., supra. The size of DNA fragment B was determined by comparison with the M13 DNA size mar]~ers shown in figure 1, Lane G.
The nucleotide sequences of DNA fragment A and a portion of the 5'-end of fragmen-t B were determined by the procedure of Maxam, A. and Gilbert, W., supraO Since the amino acid sequence of HCS is known, the nucleotide sequence of the two fragments could be compared with the amino acid sequence, using the known relationships of the genetic code. On the basis of these relationships it was demonstrated that the specific sequences did in fact code for portions of the HCS
molecule, and further confirmed the ordering of these fragments shown in figure 2.

The ability of the process of the present invention to purify a desired nucleotide sequence that is a minority proportion of the total population of nucleotide sequences is demonstrated in the following reconstruction experiment.
Defined RNA mixtures containing purified rabbit globin RNA
and human polyadenylated placental RNA were used as template for reverse transcriptase in the presence of alpha-32P dCTP, final specific activity, 105 cpm per pmole. The cDNA products were cleaved with endonuclease HaeIII and the cleavage products were separated on 4.5% - 10~ composite polyacrylamide slab gel. The cDNA fragments were visualized by autoradiography of the dried gel.
Figure 3 shows the results of the experiments. The gels were run essentially as described in example 1. Size markers prepared by endonuclease HaeIII cleavage of phage 7 '~ ~ ~

M13 DNA and 5'-3 P end labelling of the fragments thereby produced, were run in lanes A and H. The approximate lengths in nucleotides of these DNA fragments are indicated by the numbers on the left. Lanes B-G show the electrophoresis patterns produced by initiating the foregoing sequence of reactions with mixtures of globin RNA and placental RNA in varying proportions, as shown in the following table.

Globin RNA Placental RNA
Lane ~ nanograms F 7.5 292.5 It can be seen that a 320 nucleotide long HaeIII
fragment is derived from globin cDNA. The globin cDNA trans-cript can still be detected if globin RNA represents as little as 2-5% of the total RNA. If an RNA species is present as isolated in-too low a copy num~er to be amenable to this mode of analysis, it can be first partially purified by any one of the known RNA purification schemes until it represents about 2-5% of the remaining species mixture.

The purification of a nucleotide sequence fragment approximately 550 base-pairs in length comprising a portion of the coding region for HCS is described, together with a method of measuring the purity of the isolated sequence. The purified fragment is demonstrated to be greater than 99% pure.
Purification of HCS cDNA.

Polyadenylated placental RNA isolated as described in example 1 was enriched for HCS mRNA by sedimentation in a 5% to 20~ (w/v) sucrose gradient at 4~C in the SW 27 rotor of a Beckman Instruments ultracentrifuge at 25,000 rpm for 16 I :~ 5 7 'Jt ~ $

hours. The llS~14~ region of the gradient was pooled and 100~g of this RNA used for the synthesis of double-stranded cDNA as described by Ullrich, A., et al., supra. Synthesis of the second strand was stopped by extraction of the reaction mixture with one volume of ethanol at -70C.
Digestion of the cDNA with HaeIII endonuclease was carried out in 50~ 1 of ~ mM Tris-Hcl, pH 7.5, 6 mM MgC12, 6 mM
y -mercaptoethanol with 2 units of HaeIII enzyme at 37C for two hours, following which 0.1 units of bacterial alkaline phosphatase (type BAPF, Worthington Brochemical Corp., Freehold, N.J. 9 units as defined by manufacturer), were added and digestion continued at 60~C for ten minutes.
Following 2~

~ -27a-~ ~ 5 ~
c~traction witll on~ volu,-.le o~ pllollol-cllloro~orm, the ~ was pr~cipitat~d with two volu~es of etharlol -70C, dissolved in 20 ~1 of 10 ~ Tris-~lCl, pil ~, I m~l ~DT.~, and subjected to electrophoresis on a 6% (w/v) pol~-acrylamide gel. Figure 4(~) shows the electrophoresis pattern of the foregoing reaction mixture, which reveals a prominent band corresponding to a nucleotide sequence approximately ~S~ base-pairs in length. The Y 550 base-pair fragment was e~.ccised from the gel, and eluted electro-phoretically, with the result shown in figure 4(E).
The remaining material corresponding to the 550 base-pair fragment shown in figure 4(E) was digested with 4 units of i.haI endonuclease in 50,ul of the same buffer used for digestion with HaeIII endonuclease, at 37C for 2 hours. Following phenol-chloroform extraction and ethanol precipitation, the digestion products were separated by electrophoresis on a 6,~ (w/v) polyacrylamide gel. The result is shown in figure 4(~).
The two fragments were eluted electrophoretically, combined and rejoined by incubation in 20~ul of 66 m~l Tris-HCl, pEI 7.6, 6 m~1 ~IgC12, 15 mM dithiothreitol, 1 ml~ ATP containing 20~ug/ml of T4 DNA ligase at 15C for two hours. The reaction mixture was then diluted to 200,ul I with 0.1 M NaCl, extracted with 1 volume of phenol-chloroform and the ' DNA precipitated with 2 volumes of ethanol. After resuspension in 20,ul of 10 m`l Tris-HCl, pH 8, 1 ~r EDTA, the ligation products were separated by electrophoresis in the 6% (w/v) polyacrylamide gel. The result is i shown in figure l(C). It can be seen from the electrophoresis pattern 'l of figure 4(C) that the 550 nucleotide fragment was reconstituted by I the ligation treatment. The prior treatment with alkaline phosphatase !
,l insured that the two HhaI fragments were rejoined in the original sequence relative to each other to reconstitute the 550 nucleotide segment. The additional bands seen in figure 4(C) were the result of dimer formation I between the HhaI fragments, since dimer formation is not prevented by 3Q I I the alkaline phosphatase treatment.
The reconstituted 550 nucleotide fragment was excised from the gel and eluted electrophoretically. The electrophoresis pattern of -2a-t~l~ eluted mat~ri;ll is s1~o~n in Ei~ure 4(13). Fir,ure 4(~) re~resents the electrophoresis pattern oE 32P-labelcd 1'.aeIII digcst oE double-stranded ~113 DN.~ used as a size mar~er. The electrophoretic analyses were cond-1cted in a 6,'~ /v) polyacrylamide gel in 50 m~1 Tris-borate, pll 8, 1 ~ EDTA
` at 100 volts Eor two hours. Following electrophoresis, the gel was dried and e~posed to l~odak 1~S2T x-ray film to produce the autoradiograms.
'. ;
Puritv of Reconstituted 550 Nucleotide Fragment of HCS cDNA.
The isola~ed reconstituted ~1CS cDNA 11aeIII fragments was labeled with 32p at its 5' ends using the enzy~e polynucleotide kinase obtained ~ from bacteriophage T4-infected E. Coli by the method of Panet, A., et al., Biochemistry l2, 5045 (1973). Polynucleotide kinase is also commercially available from P-L Biochemical, Milwaukee,-Wisconsin. The fragment was then digested with either HhaI or HpaII in 50~ul of 6 mM
Tris-Hcl, pH 7.6, 6 ~'1~1gCl2, 6 m~ -mercaptoethanol at 37C for two hours. Following e~traction with an equal volume of phenol-chloroform, the D~A was precipitated with two volumes of ethanol at -70C, resuspended in 20Jul, 10 m~I Tris-HCl, pH 8, 1 1~M EDTA and subjected to electrophoresis, the gel was exposed to ~-ray film to visualize the labeled fragments, as described previously. I
. . .
Results are shown in figure 5. Figures 5(B) and 5(E) represent duplicate runs of the 550 nucleotide fragment prior to restriction enzyme digestion. Figure 5(C) represents the pattern resulting from HhaI cleavage and figure 5(D) represents the pattern resulting from , HpaII cleavage.
~ l' The purity of the 5~r nucleotide Eragment was measured by scanning ~S , the autoradiogram of the restriction enzyme cleavage products and by 3-L~ ~ " quantitation of the distribution of radioactivity in each of the two restriction endonuclease digests. Such measurements reveal that the ~¦ purified human HCS cDNA reconstituted HaeIII fragment was greater than 1¦ 99% homogeneous.

~: ;
, i ' .,.~,~ , . . . . . . .

I ~ 40~
~ 'L~ 4 S~nthesis of a plasmid contaiIling a nucleotide sequence of 550 base-pai~s comprising the majority of ~he co~ing region for ilCS is described.
~ A 550 nucleotide fragment of HCS cDN~ of greater than 99~ purity was prepared as described in ex~mple 3. Terminal 5' phosphate end groups were restored in a reaction mixture containing 50 m~I Tris-~lCl, pH 8.5, 10 m~l MgCl2, 0.1 mM spermidine, 5 m~I ~ -mercaptoethanol, 5 (w/v) glycerol, 333 pmole ATP, 5 ~mits of T4 polynucleotide kin~se incubated in a final volume of 40~ul at 37C for two hours. D~A was separated from the reaction mixture by phenol extraction followed by ethanol precipitation. Synthetic decanucleotide linkers having restriction site specificity for EcoRI and having the sequence, 5'-CCGAATTCGG-3', - prepared according to Scheller, et al., ~ , were then ligated to the HCS DNA in a molar ratio of approximately 50:1 in 50 11l of 66 mM Tris-HCl, pH 7.6, 9 ~ MgC12, 15 mM dithiothreitol, 1 mM ATP and 20,ug/ml T4 DNA ligase. Linkers are commercially available from Collaborative Research, ~altham, ~assachusetts. After incubation at 4C for 18 hours, the ' reaction was stopped by extraction with phenol-chloroform. The ligation products were precipitated with ethanol, redissolved in 50~ul 100 mM
NaCl, 50 n~l Tris-HCl, pH 7.6, 7 mM MgCl2, and digested with 50 units EcoRI endonuclease at 37C for 2 hours. Digestion with the endonuclease resulted in cleavage at the EcoRI site of the decamers giving rise to , ~ICS cDNA with EcoRI cohesive ends as well as cleaved unreacted deca-~1 nucleotides and self~ligated decanucleotides. As the cleaved decamers I also contained EcoRI termini and would compete with the HCS cDNA for ,¦ recombination with the similarly cleaved plasmid, the ~ICS cDN~ was isolated by gel electrophoresis before reaction with the transfer vector.
Il The use of the foregoing decanucleotide linker has the advantage that ~¦ the IICS cDNA fragment may be reisolated from the plasmid in a form identical to that of the original fragment.

I! The transfer vector employed was the bacterial plasmid p~-9, a l 3.5 x 106 ~olecular weight molecule containing a single EcoRI site, i ;, 1~

l -30- ~ I
:,' , ~ :~57~0~

prepared as described by Rodriguez, R.L., Bolivar, F., Goodman, H.M., Boyer, H.W. and Bet:Lach, M. in ICN-UCLA
S m osium On Molecular and Genetic Biology, D.P. Wiexlich, Y P _ ~.
W.J. Ru~ter, and C.F. Fox, Eds. (Academic Press, New Yor~, 1976), pp 471-477. The plasmids pM~-9 ancl pBR-322 (Example 5) are commercially available from Bethescla Research Labs, Rockville, Maryland. Infection of E. Coli with pMB-9 confers resistance to tetracycline. Incorporation of DNA into the EcoRI site of pMB-9 does not affect the tetracycline resistance or any other known property of the plasmid.
Consequently, there are no phenotypic differences between recombinant and normal plasmids. Therefore the EcoRI cut pMB-9 was first treated with alkaline phosphatase, according to a method described in detail in Canadian application Serial no. 303,972. See also, Ullrich, et al., supra. Alkaline phosphatase treatment removes the 5l phosphates from ~he EcoRI yenerated ends of the plasmid and prevents self-ligation of the plasmid DNA, insuring that circle formation and hence transformation is dependent on the inser-tion of a DNA fragment containing 5' phosphorylated termini.
The alkaline phosphatase treatment was carried out in a reaction mixture at the level of 1.0 enzyme units/mg of plasmid DNA in 25 mM Tris-HCl, pH 8, for 30 minutes at 65'C, followed by phenol extraction to remove the phosphatase, and ethanol precipitation of the DNA. Ligation of HCS cDNA to pMB-9 treated as described was carried out in 50~ 1 reactions containin~ 60 mM Tris-HCl, pH 8, 10 mM ~-mercaptoethanol, 8 mM MgC12, between 10 and 50 ng of the purified HCS cDNA
and approximately 500 ng of EcoRI-cleaved 5' dephosphorylated plasmid DNA. Reactions were begun by addition of T4 DNA

ligase to 5~ g/ml, allowed to proceed at 15~C for 1 hour and ~ ~.57~

mixture diluted to 0.25 ml with 12Q mM NaCl, 1 mM EDTA. The diluted reaction mixture was used directly for transformation of E. Coli X-1776.
E. Coli X-1776 is a host strain especially developed for recombinant DNA work, certified by NIH as an EK-2 host under the Federal guidelines. The 5train is available from Dr. Roy Curtiss III, University of Alabama9 Department of Microbiology, Birmingham, Alabama. The bacteria were grown in 150 ml o~ nutrient broth supplemented with 100 ~g/ml diaminopimelic (DAP) and 40,~g/ml thymine to a cell density of approximately 2 x 108 cells/ml. The cells were harvested by centrifugation and washed in 60 ml of 10 mM NaCl, recentrifuged and resuspended in 60 ml of transformation buffer containing 10 mM Tris-HCl, pH 8, 140 mM NaCl, 75 mM
CaC12. The cell suspension was ~ept on ice for 15 minutes, the cells col~ected by centrifugation and resuspended in 1.5 ml of the same transformation buffer. The cell suspension, 0.5 ml, was added to Q.25 ml of diluted ligation reaction mixture and incubated on ice for 15 minutes, then transferred to 25C for 4 minutes, then on ice again for 30 minutes. The cell suspension, 0.2 ml, was plated dire.ctly onto nutrient agar plates supplemented with 100,~/ml DAP
and 40 ~g/ml thymine and 20 ~1g/ml tetracycline. Four transformants were obtained, all of which con-tained a 550 base-pair insertion which was released from the plasmid DNA
by either EcoRI or HaeIII endonuclease digestion.
A transformant clone designated pHCS-l was selected for sequence analysis. E. Coli X-1776--pHCS-l was grown in suitable nutrient medium, plasmid DNA was isolated there-from and cleaved with EcoRI endonuclease. The 550 base-pair insertion was isolated from linear pMB-9 by electrophoresis in a 6% polyacrylamide gel and subjected to a DNA sequence ~ :~ 57~0~

analysis using the procedure of maxam and Gilbert, supra.
Sub-fragments of the HCS DNA were prepared by incubation with ~II restriction endonuclease and the 5' termini were labeled using ~32P-ATP and polynucleotide kinase. Following the sequence analysis procedure of Maxam and Gilbert, the nucleotide sequence of cloned HCS-DNA was determined.
By comparison with the known amino acid sequence of HCS, the 557 nucleotide sequence represented that portion of the coding region of HCS mRNA from amino acids 24 to 191~ plus S0 nucleotides of the 3'-untranslated region. See Niall, H.D., Hogan, M.L., Sauer, R., Rosenblum, I.Y. and Greenwood, F.C., Proc.Nat.Acad.Sci. USA 68, 866 (1971). The primary structure of HCS mRNA as determined from the DNA sequence of cloned fragment pHCS-l is shown in Table 3, to~ether with the amino acid sequence predicted therefrom on the basis of the known genetic code. The amino acid sequence determined from the nucleotide sequence is identical with the previously published amino acid sequence determined by chemical means. This demonstrates that the initially isolated HCS mRNA has been copied in v _ ro with high fidelity and that the cloned HCS DNA fragment was replicated with high fidelity in the transformed bacteria.

Table 3 Nucleotide sequence of one strand of HCS DNA
from cloned pHCS-l. The numbers refer to the amino acid sequence beginning at the amino terminus. The DNA sequence shown corresponds to the mRNA sequence for HCS, except that U replaces T in the m~NA. The amino acid sequence from pos-itions 1 through 23 is also shown.

~ .~ 5 ~

C~ ~ V ~ V ~ E~
1 U ~ C7 ~I V t C7 V~ C7 ~I C7 (L) E~(d E~ ~1 ~ a) E~ ~I C7 C~ ~ U ~ ~ t~ C~ ~ V ~ U
V :~
~V~ C~ ~ C~ ~ C~ ~ ~ V E~
~: C~ V ~ C~ P~ V ~ C~
a) E~ o u~ C7 C~ ~ V
n~
C~ ~C~ O ~ C~
~I c5 rl C7 ~ C C3 C~ ~C V ~1 :I V~ C~ ~ C~ ~ C3 ~ U ~ C7 O C~ ~I V ~ E~ u~ ~ o E I
O C~ C C7 V ~ C~ U~ C~
~ C~ ~ V ~ C) ~ C~ I
O ~1~ C~
'Cu~ C7 ~ C7 ~ t~ h C3 ~ C7 V
~1 ~a) E I ~ V a) c~ V
C7 V ~ I V C7 a~ c~ V
C~ C~ ¢ C3 V ~ C~
C7 ~ C7 a) C) ~ C C~
V ~1 ~7 ~ ~ C~ E~
~10 ~1 C) O ~U QIV ~ ~ C3 ~C)C~ t~ ~ C~r-l C~ C~~¢ C~ c7 ~7 c7 C~~ C7::~ C70 c) ~ ~I c~ O
ta ~c~ c~ C~ C~ ~ V
~; C~
1 ~~) ~ t5 C~ :1 C3 tn c7 tQ C7 E~
O ~ V C~ C7 ~ 1$ ~ ~; O
S~ C~ V
c~ ~I C7~ ) ~C~
a) C3 u t~O C~ 7:I C~ C3 C~
H ~~I V O V ~V ~I V ~ C3 C7 ~ c~ ~ c)tn ~
M ~ V ~ WE I E~ ~ I¢ V
XE~ E'O V ~ C7tn C~~ C) ~C) E

VP~ V W ~ C~
C7 ~ C7 ~ W r¢
o ~ C) a) E~
V1 l C) C~ V ~ l~C C~
4C7 c~O C7 ~ C7~ Vo u~ C~t~C~ ~
V 5~ C70 E~ C~ V
V E~1 C~ C
C~
~:iC~ C7a) E~ ~-I c~U7 V:~ V~) C~ C~
V ~ V ~ E~ V
S~ ~ ~ C7h E~ O Vtl~ C) c7 ~ V ~ ~ ~ ~1 ~ ~C7 f~
h~ C')u~ F~ C'J C7 F:~ ~ H ~ V E~ E~
U~C7 C7p~ V ~ UQ~ V ~ C7 h V~l a~ U
~ crJl¢ cr) ~1 ~~, ~C7 C~
a)~1 1¢
;J~C3 C~ 5 V P~ ~C3 ::~ V ~ C~
~ V ~ ~j ~ C7 ~ 1 CJ
o~ c~ ~ V¢ C~ C~ C~ ~ V-C7 C7 V ~ C7~ V ~ C7u~ E~
C7 ,C V ~ C') ~1~I V~ E~ ~ V u~ V V E-~
S V
rn V ~ C7~;:4 C)~ C7~ cr)h C~
~C'3 0 ~ C7~1 C7 a) C7 U E~~ ~ C) ~¢ cr) f~i V C3 0 u~ ~, W O V h V S~ C7 ~ Uo ~ C7 ~ C) a) v ~ C3 ~ ~ C~
V ~ C7 C7 H ¢ ~ O a) V S-l V ~1 V ~ V
a~ c~ ~ ~ .C v a) c~
~1 ~ V u~ C3 C~
7 ~ C~ ~V ~V ~ V u~ c7 ~I C7 ~ V 5~ I C~
C~ V ~ ~ C~ C7 ~ ~ ~ C7 -33a-~1 1 .
In 7 ~ ~ ~

_A~LE 5 Thc purification of ~NA whose nucleotide sequence comprises most of the co~ing region for liG~i is described, togetiler with the synthesis of a plasmid transfer vector containing tile purified DNA and the construction of a microorganism strain having the DNA as part of its genetic makeup.
; HGH ~as purified essentlally as described for HCS in Example 3, except as noted below.
Five benign human pituitary tumors~ quick-frozen in liquid nitrogen after surgical removal, weighing 0.4 g to 1.5 g each were thawed and homgenized in 4 ~1 guanidinium thiocyanate containing 1 ~I mercaptoethanol buf~ered to pH 5.0 at 4C. The homogenate was layered over 1.2 ml 5.7 ~1 CsCl containing 100 m~I EDTA and centrifuged for 18 hours at 37,000 rpm in the SW 50.1 rotor of a Beckman ultra-centrifuge at 15C (Beckman ; Instrument Company, Fullerton, California~. RNA travelled to the bottom of the tube. Further purification, using an oligo-dT column and sucrose gradient sedimentation ~as as described previously in Examples 1 and 3.
About 10% of the RNA thus isolated coded for growth hormone, as judged by incorporation of a radioactive amino acid precursor into anti-growth hormone precipitable mater.ial in a cell-Eree translation system derived from wheat germ. See Roberts, ~.E. and Patterson, B.M., Proc.Nat.Acad.
Sci. USA 70, 2330 (1973). Single-stranded cDNA and double-stranded ., cDNA were synthesized as described in Example 3. HCH cDNA was then treated with restriction endonuclease HaeIII and alkaline phosphatase , as described in Example 3, then fractiona~ed by gel electrophoresis. A
, distinct band in a position corresponding to about 550 nucleotides in length was observed, and isolated for further purification.
For further purification, the previously described tectmique of , dlviding the DNA into sub-fragments and separately purifying and recombining ~I the sub-fragments was carried out as previously described, except that 1¦ for ~IG~I, the restriction endonuclease YvuII was used to produce two sub-fragment~ of approximately 490 and approximately 60 nucleotides length, respectively. All restriction enzymes used herein are commercially .

~ ~57~

vailable Eron~ l~eW ~ngl~nd 13iolabs, ~everly1 ~tassac11~1setts. T~1e rcli~at~d product, abo-1t 550 base-pairs in lengt11, WaS greater than 99% purc as judged by sub-fractionation in four separate restriction endonuclease systems.
Synthesis of a recombinant transfer vector containing ~ICI~ DNA was carried out essentially as described in Example 4 except that the decanucleotide linkers and plasmid employed were different. ~ decanuc-leotide lin~er having 11ind III specificity was employed, sequence 5'-CCc~GCTTGG-3'. Treatment with HsuI5 yielded FIG~1 c~ with cohesive ends. HsuI and ~1ind III have the same site specificity and mav be used , interchangeably. The plasmid pBR-322 was used as the transfer vector.
This plasmid confers host resistance to the antibiotics ampicillin and tetracycline. D.IA insertions into the Hind III site have been found to reduce or abolish tetracycline resistance. Recombinants were therefore selected by growth on nutrient ?lates containing ampicillin, and by their inability to grow on 20 ~g/ml of tetracycline. HGH-CDNA WaS
recombined with HsuI-cleaved alkaline phosphatase-treated pBR-322, under conditions essentially as described in Example 4.
Products of the ligase reaction were used to transform E. Coli X-' 1776 under conditions as described in Example 4. Seven colonies were isolated based upon their ability to grow in the presence of ampicillin and their inability to grow in the presence of tetracycline. Five of the seven colonies carried the recombinant plasmid containing the approxi-'1 mately 550 base-pair portion of HGH DNA. One of the bacterial strains, ,I pHGi~-l, carrying HGH DNA as part of its genetic ma~eup, was gro~n in 11 quantity to provide a source of plasmid DNA from which the IIGH DNt~ COU1d be reisolated by treatment with ~1ind III or HsuI. This isolated HCH
~¦ DNA, having undergone many replications, was subjected to sequence analysis as described in Example 4. The results are shown in Table 4.

. .~
~1 -35-.

I ~5~
Table 4 Nucleotide se~uence of one strancl of UCIl-D~A c,f cloned pl[(;ll-l. The numbers refer to the ~mino acicl sequence oE IICH be~inning at the amino ter-minus. The DNA se~uence shown corresponds to the m~lA sequence for 'ICH, e~cept tllat U repl~ces T in the m~NA.

. c~ a. c~
E~ ¢ ¢ C) E~ ca) ¢~ ¢ ~ ¢
O C~ ~ C~ ¢ C~
~1 ~ ~ ¢ ~ ~ , ~ ~¢ 3c~ ~ :
¢ ¢ ~ ¢ o ~ C~
G~ ¢ ¢ ¢ ¢ ~ r ¢
C~ O ~ C~
C3 ¢ <~ ¢ ~
Crd C~ ¢r1 'r.~l I
c~ ¢ C^~ ~ ¢ ¢ ~ ¢
3 ~ ~ ~ ~ ~ ¢~ ~ , ` ~ ¢ ~ ~ ~ ~ C~ ~ ~,¢ C~ ~ .
h E~ l 1 3 Cl) ~, , C~ C2 r.~~ ¢ .~ ¢ ~ Cr) C
U~) C~ U~ ¢ C3 ~ ¢ ~ E~
,_~ ¢¢~.~ ¢ ~ C3a~l C3 h ¢~1 C E-l C C~ ¢ ¢ ¢ ~ ¢ E-l ¢E-~ ¢ c~ j rt ¢¢~ E-~lr~l c,~ ~ ¢ ¢ C3 ¢ ¢
C~ C3:J 3 C~ C r~
C~C.~ C.~ ~ C3 C3 ~ E-~~ C) C3~f C3 1 C3 ¢ 0 3 c~
.. O ¢ ¢ ¢ C3 C~ C3 C3 f~l C~r; C~r~f C3 :~ ¢ S-~ C~C~ ¢
0 ¢ ~d E~ O C30 ¢ C3 'I
` IO C~ ¢ ¢' ~ C3 ~-1 C~ u~ ¢¢ C3 ~ f~ h C~ h C3 ¢ c3 h ¢ fC3 f~
f~ f~f C~ ~ C¢ ~ ¢ ,C C~ C¢ ¢

O ~ c~a~ f ~ a) f~ r~ c¢~--1 ¢ f 3 .C C )O C~)C C3 ::) C O ¢ c~O C3 c3 ¦rC3C3 ~ c~ 0C¢3 ~r~~rJ CE-H ¢ c~ l ¢ P ~ ~ c~
C3 ~ ~ C3 ¢
:~ C~ C~ C~ 1-1 ¢ C~
3 ~ c3 U C~ C ~ ¢ ¢ ~I C
d C3C I C~ H ¢ ~ C3 C~ C~E~
C3~IJ E-l 0 ¢ r l C3 CJ h ~I C3 C~ ~ c~ ¢ C3 C3~ C~ C~) C3 E~ h E-l ~1 ¢ ~ Cal E~ C3 E~
, 1~ C3,~ E~o ~ C~ ¢ C3 ¢ C3C3 C ~ h C
¢ C3:I C3 ~I C~ h E-~ O C~O h C~ ~ C3 ~ E-l a) C3 o C3 1~ 0~0 ~ ¢ ~-i c3 I~ E-l~1 C3U~ E~ c/l E-~ ~ t3-- E-l E~ C3 C3 rd C3C~l E~ H ¢ ¢ C3 C) C¢ ¢ ¢¢ c3 c¢
¢ C3.~: C~~ C3C3 C3 r t3 ~ ¢ r E

. ' ~
~ .
_ .`"

. -36-1 ~S7~0~

E.~l[~L~ 6 Tll~ isolation and l)urification of DNA having the entire structural gene sequence Eor ~C~I is described, together with the synthesis of a transfer vector containing the entire structural gene for RCJI and the ' construction of a microorganism strain containing the gene for ~CH as part of its genetic makeup.
ere genes of non-lluman origin are involved, the Federal safety restrictions do not require the isolation of cDNA in such a high degree of purity as that required for human cDNAs. ThereEore, it was possible to isolate the cDNA containing the entire RGH structural gene by isolating electrophoretically separated DNA of the e~pected length, about 800 base-pairs, as determined from the known amino acid length of P~GH.
Cultured rat pituitary cells, a sub-clone of the cell line GH-l, available from American Type Culture Collection, were used as a source of ~GH
15 ~ j~ mRNA. See Tashjian, A.H., et al., Endochr~nology 82, 342 (1968). In ~_~q~ such cells, when gro~n in normal conditions, growth hormone mRNA represents only a small percentage 1-3% of the total poly-A containing RNA. However, growth hormone m~A levels were raised above that of other cellular m~NA
; species by the synergistic action of thyroid hormones and glucocorticoids.
RNA was obtained from 5 x 108 cells grown in suspension culture and induced for growth hormone production by including 1 ~I dexamethasone and 10 n~l L-triiodothyronine in the medium for 4 days before cell collection.
Polyadenylated R~A was isolated from the cytoplasmic membrane fraction i of the cultured cells, as described elsewhere. See Martial, J.A., Baxter, J.D., Goodman, H.M. and Seeburg, P.H., Proc.Nat.Acad.Sci. USA
74, 1816 (1977), and Bancroft, F.C., Wu, G. and Zubay, G., Proc.Nat.Acad.
Sci. IJSA 70, 3646 (1973). The mRNA was further purified and transcribed into double-stranded cDNA essentia].ly as described in examples 1 and 3, Il supra. Upon fractionation by gel electrophoresis, a faint but distinct 1 ¦ band corresponding to a DNA of about 800 base-pairs length was observed.
I Treatment of total cDNA transcribed from the cultured pituitary ,I cell mRNA with HhaI endonuclease yielded two maior DNA fragments upon electrophoretic separation corresponding to approximately 320 nucleotides il Il -37-~ ~7~0~

(t~agment .~) an~ ~0 n-lcleotides (~r~gment ~ 1uc1cotide se~uence anal~sis o~ fragmen~s A and ~ as described in exam~1e 4, revealed that these Erag~nents were in fact portions of the coding region Eor RCIl, based on published RCII amino acid sequence data and by comparison with othcr known ~rowth hormone sequences. See ~allis, ~1. and Da~ies, R.V.~., Growth l1ormone And ~elated Peptides (Eds., Copecile, A. and ~luller, E.E.), pp 1-14 (Elsevier, ~ew York, 1976), and Dayhoff, ~1Ø, Atlas of Protein Sequence and Structure, 5, suppl. 2, pp 120-121 (National Biomedical Research Founda~ion, Washington, D.C., 1976). When the 800 base-?air double-stranded cDNA isolated electrophoretically as described, supra, was similarly subjected to HhaI endonuclease treatment, two fragments corresponding in length to fragments A and B were found among the major cleavage products.
i Since the approximately 800 base-pair RGH-cDNA was not purified by resort to restriction endonuclease treatment, it was necessary to treat the DNA in order to remove any unpaired single-strand ends. In practice, treatment to remove such unpaired ends was carried out prior to electro-phoretic separation in 25,ul of 60 m?~ Tris-HCl, pH 7.5, 8 M?l~lgC12, ' 10 m~l ~ -mercaptoethanol, 1 ~1 ATP and 200~u~l each of dATP, dTTP, dGTP
and dCTP. The mixture was incubated with 1 unit of E. Coli DNA poly-merase I at 10C for 10 minutes to exonucleolytically remove any 3' protruding ends and to fill any 5' protruding ends. DNA polvmerase I
! is commercially available from Boehringer-Mannheim Biochemicals, Indianapolis,Ij Indiana.
' The approxiMately 800 base-pair RGH-cDNA was treated by the addition of chemically synthesized Hind III linkers, as described in Example 4.
The plasMid pBR-322, pretreated with Hind III endonuclease and alkaline I phosphatase, as described in Example 5, was comblned with the 800 base-,' pair RCil-cDNA in a DNA ligase reaction mixture as described in Example 4.
, The ligase reaction mixture was used to transform a suspension of E.
~! Coli ~-1776 cells, treated as previously described in Example 4. Recom-binant colonies were selected as described in Example 5. Ten such colonies were obtained all of which carried plasmid with an insert of approxi-mately 800 base-pairs that was released by Hind III cleavage.

j -38-. . .

~ ~S7~

The ~00 ~)ase-pair l`CII-~)NA ~as isolated in preparative amounts from recombinallt clone pl~GH-l and its nucleotide se~uence determined as described in ~ample 4. In this instance, the nuclcotide sequence inclucled portions of the 5' untranslated region of RGil, as well as a , 26 amino acid sequence found in the growth hormone precursor protein prior to secretion. The messenger of the mRNA sequence deduced from ! the gene sequence is shown in Table 5. The predicted amino acid sequence is in good agreement, e~cept in positions 1 and 8, with the partial amino acid sequence of rat growth hormone as described by Wallis and Davies, supra, which comprises residues 1-43, 65-69, 108-113, 133-143 and 150-190.

E~A~IPLE 7 The isolation and purification of the entire gene sequence coding for HGH is described, together with the synthesis of a recomhinant plasmid containing the entire structural gene for HGH, and the production of a microorg~nism having the entire structural gene for HGH as part of ; its genetic makeup is described.
The isolation of HGU mR~A is carried out essentially as described , in Example 6, except that the biological source ~aterial is human pituitary , tumor tissue, essentiall7 as described in Example 5. Preparation of HGH-cD~A is carried out essentially as described in Example 6. The HGH-cD~A is fractionated by gel electrophoresis and material migrating to a position corresponding to about 800 nucleotides in length is selected Il for cloning. The selected fraction is treated with DNA polymerase I as ¦I described in Example 6, then treated by the end addition of Hind III
lin~ers. The cDNA is then reco~bined with alkaline phosphatase-treated ¦I plasmid pBR-322 using DNA ligase. E. Coli X-1776 is transformed with the Il recombinant DNA and a strain containing }[Gll D~A is selected. The HG -I DNA containing strain is grown in preparati~e amounts, the HGH-DNA
¦ isolatcd therefrom and the nucleotide sequence thereof determined. The cloned HCH DNA is found to comprise nucleotides coding for the entire i amino acid sequence of HCH. The first twenty-three a~ino acids of HGH
i 10 are H2~-Phe-Pro-Thr-Ile-Pro-Leu Ser-Arg-Leu-Phe-Asp-Asn-Ala-Met-Leu-Arg-Ala-His-Arg-Leu-His-Gln-Leu-. The remainder of the sequence is shown in 1 Table 4.

,_, . I

~ 1 r~7~J~;
Tal~le 5 , D~ nucleotide se(luence of one strand, containing entire se~luence coding Eor RCII. Correspondin~ amino acids are shown, togettler with their position number relative to the amino terminus. Negatively numbered amino acids represellt the pre-growth hormone sequence. The corresponding m~A se~uence is the same, except that U replaces T in the mR~
a) E~ ~ ¢ ~ ¢ ~ ¢
4 E-, C~ 1 ¢ E~ ¢ ¢
E~ H ¢ ,_1 ~¢ c~ ~, ¢ E~
U) ¢~I E~ ~ ¢ C-¢~: E~aJ E~ ~ ~:: ~ ¢ ~~
I C~ o ~ td E~ O ~ O ~ C.) ul c~ ¢
a) E~c~ a) E-l~I c )~ O E~ ~1 ¢ ~ ~ ¢ E~
~I C )~1 C ) ¢ ~ ~ E~ ¢ ~ ~ E~
E ~ ¢ ,~ ¢ "~ ¢

V C~ ~ ¢ r~ ¢ ~ E~~ ¢ 1) E~ E~
v~ ¢ ~ ~ C~ ¢
V C~ ~d C~ ~d ~ C
r ~ ~ E~ h C~ ¢
¢ ~ ~ ¢ C.)~ ~ ¢ C~
~J ~0 ¢ '~ E~
~ ~ h ~ u~ ¢r-l E~ ~ ¢ E~
E~ ¢ ¢ ~ ¢ ¢ ¢ C~ ¢ '~ ~ ¢ ~1 ¢
E~ ~1 ¢~J E~ a) ~~1 ¢ ~ ¢ E~
J ~ r~V i--~ ~ ) h r~ E--/ ~d ~ r-l E-lIV ~1 5 ~I C~ ~i ¢ ~ E~ E~ ¢ C~ E~
~ ;~ rd E-~ h rJ ~ C ~ ~V r--Iin c~ C
E~ 1 ¢ ~)u~ E~`~ r ¢ r r-~ r ~ E--I ¢
O r~ C ~ E-~IJ r~ IV E~ l r,~ ¢
~ ~ ¢ ¢¢ ~ E~ ~ j- ~ ,v ~ ~
E-- r.~ ~~) r~ ~ V
r rJ r~ n ¢ IV E~~ C~ r~ ' r.
E-- ¢ ¢ C~ ¢ ~ ¢ C~ ¢ 14 C~0.) r-~O F~ C:)h E-~ O ~ O ~ r~ ~ I
~ ~ E ~ r ¢ j~ ¢ O E aJ ¢ ~D E r E--$-1 E~ )~ ¢ t~ ¢ 1-1 ru d E-l ¢
~V ~ ) h C~ IV C3 r-l ~ r-l rJ r-i c ) ~~ I
¢ ¢ r~ ¢c~ ) U ¢ C ) r~
~:L rJ~I E-- ~ r~) C C~ r~ c~ E~ O
u~ ¢ a) u r-l ¢ r-~ ¢ u~ ¢~n ¢ ~ ¢ .~
¢ c~ v~ ¢ r~ c~ ¢ ~ ¢ ~ E-~ E-l ~ E--rd ¢ 1-1 C) O r~)~ r~) ~J r~ ::~ ¢
¢ ,~ r ~ r-l ¢ u~ ¢
E-E~V E--~ a~ E r_~ U ¢ 1~ ~ ¢
¢ _~ ~ H ¢ ¢ C~ ~ rJ~ ¢ ¢
i.~ ~) UO rJ S~ V ~ r~ ~ ¢
i ir.~J ~V E--~ ~ ¢r-l ¢~I E--~ r-i ¢ O E-¢ ~ C~ i~ E~C~ r~ ~ ¢ C~ r~ C~
J ¢ ,~ rJ j ¢j ¢ ,~j ¢ ,V E ¢ '~) ¢
' I ~I C~ h U :~ ¢a~ ~U~V i~ ,~ 'J ~ E~ i~
C~ ¢ C) ~1 ¢ V~ ¢X ¢ ¢ ~ ¢
O E~ ~ ¢
¢ ,~ U a~ E~ q~ ¢ h C~ ~
! P~ ~ rJ ¢ c~ ¢ C.) ¢ ~ ¢
~V E~ ~q i~
i ~ E~i 4 E~ i~ ¢ i~ E~ ¢ ~ ¢ C~ ¢
o o ~J C C~o C ~~ C~ o ~n E~ E~
l ¢ ~ ~1 ¢~J r~~ ¢ ~V C~ ,r~o ~ U C:~
~ ~ rJ~ ¢ r~ E~ E~
ii.~P~ E~ ul ¢ ~d c~ ~ U i--i lUl r~ ¢ ~l rJ~) E~ ~I E-l 1~ ~U ~ <~
~ ¢ ¢ c~ ? C~~ ¢ ¢ c~ ~ ~ ¢ ¢
iE-l 0 E-l $~ O ¢ O U ' ~ ~ ru ¢
I ~ ¢ iJ c~ ) IV E-l ~ E~ i~
! ¢ ¢ ~ u x ~ x ¢ ~, E~ r~ ~ c ) ~ ~ ~I rJ r~
¢ --i ¢ r. rJ ~ ~1 ~) r 1 ¢ (a ¢ r.
r c~ ¢ H ¢ C~ ~r~ V ¢ ~ ?
r~ ~r,, ¢ ~ ¢ ~ r~ 00 r~
i C.) ¢i ~i E--l ¢1_l C~ ~i V ¢ C~ E
.E-- O E-~ 0 E-~ ~ U C~1 Cl) E~ J ~ ru ¢~
~r~ l V r-l ¢ ~I VV E--ïuJ ¢ Cl E~
i I ~ C~ ¢ C~ ¢ ~ rJ
i O P. r~ ~d E-~ ~ ¢ ~ ¢i~ rJ~1) E- 1 u~ E~ V ¢ ~: iE~ E-i i",j 1 ~7~

(,E~E~L CO~Ct.~'DI~G R~`IARKS
The proccss of thc present invention provides for the first time a method of general apE)licability for purifying desired specific nuc-leotide sequences. These sequences may be correlatecl with ti~e production of a specific protein of commercial or medical significance. The dis-closed process resul~s in the puriEication of nucleotide sequences which may be Eragments of a larger sequence coding for the desired protein.
The present met}lod may be used in combination with known ancillary procedures to produce the entire nucleotide sequence coding for a specific protein.
In addition, a method has been disclosed whereby a nucleotide sequence of specific length, however derived, may be highly purified.
A method for measuring the degree of purity of such fragments is also disclosed. By these means9 a nucleotide sequence coding for a portion of human HCS has been isolated, purified and sho~n to be at least 99%
pure.
Transfer vectors contain:ing most of the nucleotide sequence coding I for HCS, most of the sequence coding for HGH and all of the sequence I coding for RGH, respective]y have been synthesized. Novel microorganism strains containing the foregoing genes and portions of genes have been produced. The foregoing nucleotide sequences have been reisolated after many cycles of replication in the host microorganism and found to contain essentially the identical nucleotide sequence to that existing in the source ~ organism. The techniques disclosed herein for isolation, purification 1 and identiEication of a desired specific nucleotide sequence make it possible to synthesize transEer vectors, and develop microorganism strains, containing the structural gene for the growth hormone of any animal species including man.

i On the basis of the genetic code, there exists a Einite set of i nucleotide sequences which can genetically code for a given a~ino acid sequence. All such equivalent nucleotide sequences are operable variants oE the disclosed sequences, since all give rise to ~he same protein ` 1 ~ L~J~

hormone, having -the same amino acld sequence, during the course of in vivo transcription and -translation.
Consequently, all such variants are included ln the scope of the present invention.
While the invention has been deseribed in conneetion with speeific embodiments thereof, it will be understood that it is eapable of further modifications and this application is intended to eover any variations, uses, or adaptations of the invention following, in general, the principles of the inven-tion and ineluding sueh departures from -the present disclosure as eome within known oreustomary praetiee within the art to whieh the invention pertains and as may be applied to the essential features hereinbefore set forth, and as ~ollows in the seope of the appended elaims.

i ~ -42-

Claims

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:

1. A recombinant DNA transfer vector comprising codons for human chorionic somatomammotropin comprising the nucleotide sequence:

wherein A is deoxyadenyl, G is deoxyguanyl, C is deoxycytosyl, T is thymidyl, J is A or G;
K is T or C;
L is A, TC or G;
M is A, C or T;

Xn is T or C, if Yn is A or G, and C if Yn is C or T;
Yn is A, G, C or T, if Xn is C, and A or G if Xn is T;
Wn is C or A, if Zn is G or A, and C if Zn is C or T;
Zn is A, G, C or T, if Wn is C, and A or G if Wn is A;
QRn is TC, if Sn is A, G, C or T, and AG if Sn is T or C;
Sn is A, G, C or T, if QRn is TC, and T or C if QRn is AG and subscript numerals, n, refer to the amino acid position in human chorionic somatomammotropin, for which the nucleotide sequence corresponds, according to the genetic code, the amino acid positions being numbered from the amino end.

2. The recombinant DNA transfer vector of claim 1 wherein J is A in amino acid positions: 32, 33, 66, 68, 70, 119, 122 and 129, J is G in amino acid positions: 29, 30, 38, 40, 41, 49, 65, 69, 74, 84, 88, 115, 118, 137, 140, 141, 145, 158, 168, 172, 174, 181 and 186;
K is T in amino acid positions: 31, 35, 42, 46,72,103, 109, 111, 146, 153 and 189;
K is C in amino acid positions: 26, 28, 39, 44, 47, 52, 53, 54, 56, 63, 92, 97, 99, 100, 104, 107, 110, 112, 116, 130, 143, 147, 149, 151, 152, 154, 159, 160, 164, 165, 166, 169, 171, 176, 182 and 191;
L is A in amino acid positions: 37, 60, 148, 155 and 175;
L is T in amino acid position: 135;
L is G in amino acid positions: 59, 67, 90, 102, 123, 126, 136, 161, 180 and 185;
L is C in amino acid positions: 24, 27, 34, 50, 61, 89, 98, 105, 120, 131, 142, 173, 187 and 190;
M is T in amino acid positions: 25 and 58;
M is C in amino acid positions: 36, 78, 83, 121 and 138;

X is C;
Y is A in amino acid positions: 73, 114 and 117;
Y is G in amino acid positions: 45, 75, 80, 81, 87, 101, 124, 128, 156, 162 and 177;
Y is C in amino acid positions: 76, 82, 93, 113, 139, 157, and 163;
W is A in amino acid positions: 94, 127 and 167;
W is C in amino acid positions: 77, 91, 133, 134, 178 and 183;
Z is G in amino acid positions: 91, 94, 127, 134 and 167;
Z is C in amino acid positions: 77, 133, 178 and 183;
QR is AG in amino acid positions: 95, 108, 132, 144 and 188;
QR is TC in amino acid positions: 43, 48, 51, 55, 57, 62, 71, 79, 85, 106, 150 and 184;
S is A in amino acid position: 55;
S is T in amino acid positions: 57, 95 and 184;
S is G in amino acid positions: 43, 85, 106 and 150;
and S is C in amino acid positions: 48, 51, 62, 71, 79, 108, 132, 144 and 188.

3. A recombinant DNA transfer vector comprising codons for human growth hormone, comprising the nucleotide sequence:

wherein A is deoxyadenyl, G is deoxyguanyl, C is deoxycytosyl, T is thymidyl, J is A or G;
K is T or C;
L is A, TC or G;
M is A, C or T;
Xn is T or C, if Yn is A or G, and C if Yn is C or T;
Yn is A, G, C or T, if Xn is C, and A or G if Xn is T;
Wn is C or A, if Zn is G or A, and C if Zn is C or T;
Zn is A, G, C or T, if Wn is C, and A or G if Wn is A;
QRn is TC, if Sn is A, G, C or T, and AG if Sn is T or C;
Sn is A, G, C or T, if QRn is TC, and T or C if QRn is AG and subscript numerals, n, refer to the amino acid position in human growth hormone, for which the nucleotide sequence corresponds, according to the genetic code, the amino acid positions being numbered from the amino end.

4. The recombinant DNA transfer vector of claim 3 wherein J is A in amino acid positions: 32, 33, 39, 66, 68, 70, 119, 122 and 129, J is G in amino acid positions: 29, 30, 38, 40, 41, 46, 49, 56, 65, 69, 74, 84, 88, 91, 115, 118, 137, 140, 141, 145, 158, 168, 172, 174, 181 and 186:
K is T in amino acid positions: 25, 31, 35, 42, 53, 111, 153 and 189;
K is C in amino acid positions: 26, 28, 44, 47, 54, 63, 72, 92, 97, 99, 100, 103, 107, 109, 112, 116, 130, 139, 143, 146, 147, 149, 151, 152, 154, 159, 160, 164, 165, 166, 169, 171, 176, 182 and 191;
L is A in amino acid positions: 37, 60, 67, 148, 155 and 175;
L is T in amino acid position: 135;
L is G in amino acid positions: 59, 90, 102, 123, 126, 136, 161, 180 and 185;
L is C in amino acid positions: 24, 27, 34, 48, 50, 61, 89, 96, 98, 104, 105, 110, 120, 131, 133, 142, 173, 187 and 190;
M is T in amino acid position: 58;
M is C in amino acid positions: 36, 78, 83, 121, 138 and 179;

X is C;
Y is A in amino acid positions: 73, 114, 117 and 156;

Y is G in amino acid positions: 45, 75, 80, 81, 87, 101, 124, 128, 162 and 177;

Y is C in amino acid positions: 52, 76, 82, 91, 113, 157 and 163;

W is A in amino acid positions: 64, 94, 127 and 167;

W is C in amino acid positions: 77, 134, 178 and 183;

Z is G in amino acid positions: 64, 94, 127, 134 and 167;

Z is C in amino acid positions: 77, 178 and 183;
QR is AG in amino acid positions: 95, 108, 132, 144 and 188;
QR is TC in amino acid positions: 43, 51, 55, 57, 62 71, 79, 85, 106, 150 and 184;
S is A in amino acid positions: 43, 55 and 150;
S is T in amino acid positions: 57, 95, 106 and 184;
S is G in amino acid position: 85, and S is C in amino acid positions: 51, 62, 71, 79, 108 132, 144 and 188.

5. A transfer vector according to claim 3 comprising in addition the nucleotide sequence, and wherein Y23 is followed in sequence by GCL24 in the sequence of claim 3.

6. A microorganism containing and replicating the transfer vector of claim 3.

7. A microorganism containing and replicating the transfer vector of claim 1.

8. A recombinant DNA transfer vector comprising the nucleotide sequence coding for the human growth hormone and capable of transforming a microorganism, synthesized by a process comprising:
isolating polyadenylated RNA from human pituitary cells, preparing double-stranded cDNA transcripts of the isolated RNA, fractionating the cDNA according to its molecular length, in order to produce a fraction enriched for cDNA
coding for the human growth hormone, joining the cDNA coding for human growth hormone covalently with a vector to produce a recombinant DNA
transfer plasmid capable of transforming a microorganism.

9. A microorganism containing and replicating the transfer vector of claim 8.