Chapter 11

 

              The Molecular Machinery of Life

 

“…I went to Cambridge and saw the model and met Francis and Jim. It was the most exciting day of my life. The double helix was a revelatory experience; for me, everything fell into place and my future scientific life was decided there and then. When the paper appeared a few weeks later, it was not well received by the establishment, composed largely of professional biochemists. They could not see, at the time, how profoundly it would change their subject by offering us a framework for studying the chemistry of biological information.”


Sydney Brenner. "A Structure for Deoxyribose Nucleic Acid", J. D. Watson and F. H. C. Crick. Nature 1953, 171:737-738. Appears in "Outstanding Papers in Biology," selected and introduced by Sydney Brenner. 1953.

 

 

Text Box: Figure 1 Francis Crick and James Watson, The Discovery of DNA, Nature 171 : 737-738  &  964-967 (1953).In the previous chapter, we pointed out that the ability to store information is the essential ability that makes life and the evolution of life by natural selection possible. In this chapter we will discuss how that information is stored. All living organisms on Earth do it the same way, which is strong evidence that that all life arose from a common ancestor. In addition, we will discuss those processes that living organisms use in order to process that information, which enables a living organism to carry out those functions necessary to maintain its life. Again, the same mechanism operates in all living things. First, though, we need to know what is meant by the term “information” and how it is stored in general. As a prelude to how life does it, we first discuss the mechanism used by a far simpler entity—the digital computer— whose operation mimics many of the operations carried out by living organisms. Indeed, there are those who maintain that a robot with a computer brain could ultimately be assembled and programmed to such a high degree of sophistication that a human being communicating with it, but unable to see it, would be hard-pressed to tell that the robot was not alive—more about this in a later chapter!

 

11.1   The Decimal Code

 

What does the number 3298 mean? Your first thought is probably that this a ridiculous question! You know what it means. It means three thousand two hundred and ninety-eight. Humans use the decimal code so frequently (possibly because we have ten fingers) and so unconsciously that most of us have forgotten that it is a place-based number based on powers of ten—or the decimal base. We owe this numbering system to Hindu scholars who invented it certainly by the 6th century CE and possibly earlier. Its invention hinged on the invention of the digit “0” whose first uncontested appearance was an inscription found in the Vikrama calendar at Gwalior in 876 CE. We might wonder how western science would have emerged the place-based system had never been invented and we were still doing math with roman numerals. Ugh—I hate to imagine!

 

Now, let’s examine exactly what the number 3298 means. First, we’ll write it as 329810 to emphasize that it is a number expressed in the base 10, or decimal, system. Usually, the subscript is omitted since this system is used world-wide (except by a large number of computer geeks). The chart below illustrates what all the digits in this number represent based on their place of occurrence in the number.

 

105=100000

104=10000

103=1000

102=100

101=10

100=1

0

0

3

2

9

8

 

The digit’s position tells you how many of the corresponding powers of ten contribute to the overall value of the number. (We’ve shown more powers of ten than necessary primarily to indicate what successively higher places mean.) For example, the value of the number is obtained by adding the value of each digit according to its place as indicated in the table below…

 

0

x 105 =

 000000

0

x 104 =

+ 00000

3

x 103 =

  + 3000

2

x 102 =

    + 200

9

x 101 =

      + 90

8

x 100 =

        + 8

 

 

     3298

 

Goodness—we’ve simply arrived at our original number! However, hopefully you are now more aware of exactly what any digit in a place-based system means. It means—multiply each digit by the power of ten represented by the digit’s place and add up all the resulting values to get the final count.

 

The significance of the development of the positional number system is probably best described by the French mathematician Pierre Simon Laplace (1749–1827) who wrote:

"It is India that gave us the ingenuous method of expressing all numbers by the means of ten symbols, each symbol receiving a value of position, as well as an absolute value; a profound and important idea which appears so simple to us now that we ignore its true merit, but its very simplicity, the great ease which it has lent to all computations, puts our arithmetic in the first rank of useful inventions, and we shall appreciate the grandeur of this achievement when we remember that it escaped the genius of Archimedes and Apollonius, two of the greatest minds produced by antiquity."

 

11.2   The Binary Code

Text Box: Figure 2 Knife switch.The decimal code uses 10 digits … 0–9. How about other bases? Numbers stored in computers use the base two, or binary, system, which requires only 2 digits … 0 and 1. Why? Modern day computers are electronic gadgets. Its loaded with tiny electronic switches, which either allow the passage of electrical current or block its passage. Think of a knife switch that passes an electric current—the type that you perhaps saw in old Frankenstein films. More common examples can be found all over your house—when you wish to turn on a light in your house you close a “toggle switch.” This closure allows current to pass from the power company through your light bulb. The “knife switch”—shown in Figure 2—serves the same function. When it is closed, current can pass from a circuit attached to one side of the switch to a circuit attached to the other side. When the switch is open, as shown in the figure—no current can pass through the switch. Thus, the action of the switch can be symbolically represented by two digits—“0” symbolizes an open switch and “1” symbolizes a closed switch. We could ‘invert’ this representation and use the state of the knife switch—open or closed—‘off’ or ‘on’— to represent either a 0 or a 1

 

Suppose we have a number of switches at our disposal, say eleven of them. We can position them in a row and use their ‘state’ (closed or open) to represent a place-based number expressed in the binary system. For example, let’s reconsider the number 3298 again. Can it be represented as a binary number, which in turn is represented by the physical state of twelve knife switches? Let’s see. Consider the following pattern of ‘button’ type switch states—the button glows green when on or red when off:

 

211=2048

210=1024

29=512

28=256

27=128

26=64

25=32

24=16

23=4

22=4

21=2

20=1

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\P827PLZF\MC900351791[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\P827PLZF\MC900351791[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\E9VU2AOT\MC900351792[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\E9VU2AOT\MC900351792[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\P827PLZF\MC900351791[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\P827PLZF\MC900351791[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\P827PLZF\MC900351791[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\E9VU2AOT\MC900351792[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\E9VU2AOT\MC900351792[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\E9VU2AOT\MC900351792[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\P827PLZF\MC900351791[1].wmf

Description: Description: C:\Documents and Settings\George\Local Settings\Temporary Internet Files\Content.IE5\E9VU2AOT\MC900351792[1].wmf

1

1

0

0

1

1

1

0

0

0

1

0

 

The binary number this pattern of switch states represents is—1100111000102. By the way, each binary digit in the number is called a bit—‘b’ from binary and ‘it’ from digit. Each successive bit represents a value by virtue of its place that is a power of 2 larger than the previous place-based value. I’ve included the subscript 2 in the number to indicate that it is expressed in the base two system (given that only 2 digits appear, one would guess that it’s a binary number). Now, as we did with the decimal number, let’s calculate the resultant value of this number. Since we are humans with 10 fingers, we’ll convert the number to its decimal equivalent as well.

 

Binary Digit (Bit)

Place Value

Binary Number

Decimal Number

1

x 211=

100000000000

2048

1

x 210=

  10000000000

1024

0

x 29=

    0000000000

      0

0

x 28=

      000000000

      0

1

x 27=

        10000000

  128

1

x 26=

          1000000

    64

1

x 25=

            100000

    32

0

x 24=

              00000

      0

0

x 23=

                0000

      0

0

x 22=

                  000

      0

1

x 21=

                    10

      2

0

x 20=

                      0

      0

 

 

110011100010

3298

 

Interesting, huh? The binary number 110011100010 is equivalent to the decimal number 3298! So, I guess the answer to the question above is yes—3298 can be represented as a binary number whose only digits are 0 and 1. This whole discussion started with a number that was chosen arbitrarily. There was absolutely no reason why it was chosen. We could have started with any other number chosen at random. This means that any and every decimal number has a binary equivalent. All numbers in any base can be expressed in the base 2 system.

 

Back to electronic switches. Computers in essence are nothing more than an array of electronic switches—albeit an enormous number of them. The switches themselves are mostly very tiny transistors although other devices can be used—as long as they are simple 2-state systems—either off or on—‘0’ or ‘1’. Thus, computers can store any numbers one desires to store—as long as enough electronic switches are available. Storing numbers is all well and good, but what about ‘information’? Isn’t information a lot more than just numbers? Well—yes—and no. We’ll take this up in the next section.

 

11.3   Can Patterns of Bits Code Any Kind of Information?

 

Is it possible that binary numbers can be used to store information other than just numbers? In order to address that question, let’s describe briefly how an early generation rudimentary computer works. The basic behavior of modern computers is really not much different. Each successive generation of computers has access to many more electronic switches than did their predecessors—the switches are a lot smaller than ever before—and they can switch from off to on and vice-versa a lot faster. Also, the sophistication of their operating ‘software’ has increased dramatically over the years—a characteristic made possible mostly by increased number switches and their interconnections.

 

We’ll start with a typical conversation that occurs in the classroom—triggered by asking students to give a figure of merit that might represent the power of their computer—

 

One of them will say—“It’s got 160 gigabytes.”

 

“What does that mean,” I ask, which elicits a blank stare. “That number represents the storage capacity of your computer’s peripheral storage device—namely, the hard drive attached to the computer, but rigorously speaking, it’s not part of the computer per se. By the way, what’s a gigabyte?”

 

—more blank stares, so I again ask, “I’ll tell you what that means in minute, but for now can anyone else give me a figure of merit related directly to your computer?”

 

Someone pipes up, “Mine runs at 3 gigahertz.”

 

“Yep, that’s good” I respond, “—that’s a measure of its clock speed or how rapidly it can change its internal electronic switches from off to on—anything else?”

 

Another student says, “Mine’s got 1 gigabyte of memory.”

 

Another says, “Mine’s got an Intel Core i5 processor.”

 

“Hot dog!” I say, “Now we’re getting somewhere. All these characteristics relate to your computer’s power per se. Let’s see what all this stuff means and the best way to do that is to describe how a ‘computer for dummies’ adds two numbers together”—

 

Figure 3 A computer for 'dummies.'

Computers run programs. A program is a sequence of instructions and data, which has been designed to carry out some algorithm, which is a sequence of steps required to perform some kind of calculation or answer some question. Some human wrote the program and ‘put it inside the computer’ so that it could execute the programmed instructions using the data supplied and arrive at an answer—presumably much faster than could be accomplished by its human programmer. How does this work?

 

A basic computer consists of two main parts: (i) memory and a (ii) CPU, or central processing unit. We’ll ignore all the external attachments, like peripheral disk storage units, video display, mouse, keyboard, blah, blah, blah. They just complicate the issue. We simply want to examine the guts of the computer to see how it works. The program that the computer ‘runs’ is stored in its memory in the form of a sequence of coded instructions and data. The CPU retrieves this information from memory in a step by step sequential fashion and does what the current instruction tells it to do.

 

(i)                 Memory is where all information is stored that the computer needs to perform a calculation. It consists of a ‘matrix’ of electronic switches arranged as a sequence of cells whose rows represent computer ‘words.’ Each word is represented by a specified number of switches that can be either off or on. The state of each switch is equivalent to one bit—0 or 1. In our simple computer, one word consists of 8 bits in a row (historically, a ‘byte’ is ½ of a word. If a computer operates with 16 bit words, each word consists of two bytes—each byte is then 8 bits long). Words are arranged in sequential rows and the location of the row in which a word resides is designated by a number, which is the address of the word. This arrangement is depicted in the table below. You can see that the first 5 rows in this computer’s memory contains words consisting of 5 non-zero binary numbers located at memory addresses 0–4. The sixth row and those that follow contain null words—all bits in the words are 0’s. Our computer has 256 words. Their addresses range from 0 to 255. If your home computer has 1 gigabyte of memory using 16 bit words (here we’ll assume for the sake of simplicity that the bit length of a byte and a word are identical) then it contains 1 billion words, each one consisting of 16 bits. In other words, your computer memory contains 16 billion electronic switches!

 

                            Memory

Address

 

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

1

0

2

0

0

0

0

0

1

1

1

3

0

0

0

0

0

0

1

0

4

0

0

0

0

0

1

0

1

5

0

0

0

0

0

0

0

0

etc

0

0

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

(ii)               The CPU is ‘driven’ by an electronic oscillator that functions as a clock. The CPU consists of a lot of electronics that turns switches on or off as needed—at a rate determined by the frequency of the clock. If that frequency is, say, 3 gigahertz (Ghz), the clock ticks at a rate of 3 billion cycles per second. The exact details of the operation of the CPU need not concern us here. It has two parts that do concern us though—two electronic registers: one called the accumulator and one called the program counter. The accumulator does what its name implies—it accumulates—or stores the intermediate results of some calculation as that calculation is carried out. The program counter keeps track of where you are in the program—that is, which step in the program is currently being executed.

 

 Program Counter (PC)

0

0

0

0

0

0

1

1

 

          Accumulator (AC)  

0

0

0

0

0

1

1

1

 

The two tables above show the program counter and accumulator registers. Registers, like memory, are just an array of electronic switches whose number is identical to the number of switches that make up a single word of memory. Also, the electronic switches in the register are usually made of electronics that differ from the electronics that make up memory. They are capable of switching much more rapidly than the switches in memory cells. At the moment, the binary number shown in the program counter is equivalent to decimal 3. This means that whatever ‘information’ is stored at memory address 3 is about to be retrieved, or fetched, from that memory location and ‘operated with or upon’ by the CPU. Upon completion of the operation, whatever result is obtained will get stored in the accumulator. In the case illustrated above, the binary number 111, or decimal 7, is currently in the accumulator and that value will be overwritten with the result of the CPU’s operation. Upon completion of that operation, the CPU will automatically increment the contents of the PC by 1 so that it will next contain the binary number 100, or decimal 4. The CPU will now retrieve whatever ‘information’ is stored at memory address location 4 and operate upon it, storing the result obtained in the AC. The process continues until the CPU is ‘told’ to halt, which is the last piece of information stored in memory. The result in the accumulator is thus the final result of the calculation.

 

Now let’s dig into this a little more deeply by examining the sequence of operations that the computer carries out in order to add the numbers 7 and 5 together. Every computer program is written in binary code. A given binary number represents one of the many instructions to which the computer has been constructed to ‘recognize.’ The program illustrated below uses only two instructions. Ever more powerful computers constructed of increasingly larger numbers of electronic switches endow them with ever larger ‘vocabularies.’ This makes possible increasingly richer instruction sets. The numbers displayed in the memory table above represent a program, coded in binary, concocted to carry out the operation of adding 7 and 5. We reproduce the table below and add a column listing the particular instruction (or data) those binary numbers represent and that has been ‘loaded’ into each memory address in sequential fashion.

 

                            Memory

Address

 

Instruction

0

0

0

0

0

0

0

0

1

Clear AC

1

0

0

0

0

0

0

1

0

Add #7, AC

2

0

0

0

0

0

1

1

1

7

3

0

0

0

0

0

0

1

0

Add #5, AC

4

0

0

0

0

0

1

0

1

5

5

0

0

0

0

0

0

0

0

Halt

etc

0

0

0

0

0

0

0

0

 

 

 

 

 

 

 

 

 

 

 

 

We start the calculation by telling the computer to ‘run the program’—maybe by pushing a ‘run’ button on the front of the computer. The PC is automatically ‘zeroed’ and the CPU ‘fetches’ the binary number located at memory address 0. The CPU has some more internal electronics (that we haven’t described) that decodes this number—its value is 1 and the CPU knows what that means—it is the binary code that represents the instruction—‘clear the accumulator’, i.e., open all its switches, which is thus sets them all to 0. The CPU then automatically increments the PC to 1—goes to memory location 1—fetches the number from that location—decodes it—and realizes that the binary number 10, or decimal 2, means ‘add #7 to the AC.’’ Furthermore, the CPU knows that the number to be added to the accumulator (which in this case is 7) is located at the next sequential memory address, immediately following the add instruction. We have colored the 7 red to indicate that this binary number is data and not an instruction—it doesn’t have to be decoded. The CPU knows that this is the situation that applies to every add # instruction—the data always follows immediately after the instruction—the # symbol stands for immediate. At this point, the CPU again increments the PC from 1 to 2—goes to memory location 2—fetches the data and, no need to decode, simply adds that data to the accumulator. Thus, 111, or decimal 7 is now stored in the accumulator. The process continues until the CPU fetches all 0’s at memory location 5—decodes the instruction, which means halt—and so the computer halts. I bet that you can now see that the value stored in the AC is 1100, or decimal 12, which is the correct answer.

 

Suppose a computer is capable of executing 256 instructions (or any other number of them). Obviously, each one of those instructions could be represented uniquely by 1 of 256 binary numbers. Sophisticated electronics could be set up in the CPU designed to activate circuits in 1 of 256 different ways in order to execute the appropriate instruction. Thus, binary numbers code all types of information—not only numbers but also instructions to do something or even conditional instructions that do one of several different things depending upon the current intermediate result of some operation. Obviously, a lot of stuff has been left out of our discussion of computers. For example, how does the information get into the computer—how do you get at the result? Good questions all, but irrelevant for the sake of the essential point—all types of information can be stored in a computer using some type of binary code.

 

The process we have just described is quite analogous to the process of life—information is retrieved from storage and acted upon to produce a result. The information that life uses is not stored in an electronic memory in the form of a binary code, nor is its CPU a bunch of electronic circuits. The storage facility and CPU in living organisms are all molecular, produced by chemical reactions that take place in biological systems—these molecules are mostly, DNA, RNA and proteins—and the energy that’s necessary to process the information does not come from the electric power company—it’s extracted from other molecules, which ultimately got its store of energy from the Sun.

 

11.4   DNA and RNA

 

The race to uncover the secret of the structure of the DNA molecule is one of the most fascinating episodes in the history of science and its discovery one of its greatest achievements. 1869 was a landmark year in genetic research—the Swiss physiological chemist, Friedrich Miescher, discovered a substance that he called nuclein inside the nuclei of human white blood cells. It was DNA! Astonishingly, more than 50 years passed before scientists realized the significance of Miescher’s discovery. Several scientists did and eventually they successfully isolated the substance and identified its three structural components. Years later, Francis Crick and James Watson mapped out its 3-dimensional structure, the famous double helix.[1] They received the Nobel Prize in 1962 for their work—the gist of which is presented below. (Also, see Focus Box 1)

 

            DNA

 

·         The DNA molecule resembles the stucture of a ladder—twisted into the shape of a double helix. A ladder consists of two main components: two supporting spines held together by connecting rungs. The spine and half of a rung of the DNA “ladder” is composed of a linked chain of single nucleotides (Chapter 10, Section 6), with the sugar-phosphate group of one nucleotide bound to the sugar phosphate group of another, forming the structure of a spine. The nitrogenous base of each nucleotide “dangles” out on the side of each nucleotide forming half of a rung. If the half-rungs of two such anti-parallel chains of nucleotides are linked together, forming whole rungs, a complete DNA molecule results (Figure 4). The way in which rungs can link together is restricted—the linkage follows a base-pairing rule: A—T and C—G, that is, adenine can connect only with thymine and cytosine can connect only with guanine. Thus, there are only four possible ladder rungs that can exist in DNA: (i) A—T (ii) T—A (iii) C—G or (iv) G—C. It is this base-pairing rule that makes it possible for DNA to store a living organism’s information in the form of the genetic code—and all life on Earth is based on the same code!

·         Most DNA double helices are right-handed; that is, if you were to grab one of the spines with your right-hand with your thumb pointed in the direction of the spine, your right-hand fingers would curl in the counter-clockwise rotational direction of the helix (there is one type of DNA, called Z-DNA that is left-handed).

·         The DNA double helix is anti-parallel, which means that the leading 5' end of a strand is paired with its following 3' end as you proceed along the strand (and vice versa for the complementary strand), i.e., the  5' end has a leading phosphate group linked to the 5th carbon atom of the sugar and the 3' end has an OH group linked to the 3rd carbon atom of the sugar removed and replaced by a link to the O of the following phosphate group; thus, nucleotides are linked to each other by their phosphate groups, which bind the 3' end of one sugar to the 5' end of the next sugar. (Examine Figure 4 to decipher this description of the connections. Carbon atoms in the sugar molecule are counted 1 thru 5 clockwise starting from oxygen at the apex of the molecule—see also Figure 31, Chapter 10)

·         The DNA base pairs are connected via hydrogen bonds. Note, however, that exposed hydrogen atoms can be found on the outer edges of the nitrogen-containing bases, which are therefore available for potential hydrogen bonding as well. These potential hydrogen bonds provide easy access to DNA by other molecules, including the proteins that play vital roles in the replication and expression of DNA. What the term replication means is obvious and the term ‘expression of DNA’ refers to the reading of the genetic information stored in DNA followed by the synthesis of some genetic product. These two processes will be discussed in following sections.

 

Text Box: Figure 4 Structure of DNA illustrating (i) Sugar-phosphate spine (ii) nitrogenous base rungs linked together by hydrogen bonds according to base-pairing rule A–T, C–G, (iii) antiparallel, right-handed double helix shape.

RNA

 

RNA structure is similar to that of DNA, but differs in three important ways:

·         RNA is half a ladder—it is a single strand of linked nucleotides.

·         The sugar in a nucleotide of RNA is ribose—not deoxyribose—it has a single H attached to one of its carbon atoms instead of an OH (Figure 31, Chapter 10).

·         Uracil (U) replaces thymine (T) as one of its four possible nitrogenous bases. The shape of Uracil, like that of DNA’s thymine, allows it to bond only to adenine. This ability allows RNA to play an important role in regulating chemical reactions in a cell. There are several different types of RNA in a cell but all have the same basic structure.

 

11.5   Replication of DNA

 

Text Box: Figure 5 An enzyme splits and unwinds the DNA double helix and each complementary strand serves as a template for the formation of two new DNA molecules according to the base-pairing rule.

Cell division must take place if an organism is to grow and this requires the replication of DNA molecules. The process of DNA replication is more complicated than we will attempt to describe here, but roughly speaking the process is carried out by a number of enzymes (remember, enzymes are made of proteins in part and the geometry of the enzyme is what enables it to facilitate chemical reactions). First, the enzymes helicase and topoisomerase unwind the double helix and break the hydrogen bonds that link the rungs of the ladder, effectively splitting the molecule into two complementary strands (Figure 5). Notice that where the complementary strands are split, bases are left unpaired—for example, you can see that at beginning of the split in Figure 5 that the A—T and G—C base pairs have been broken. However, many free nucleotides are floating around in the fluid surrounding the DNA molecule and an enzyme called DNA polymerase will find the proper ones and attach them to the unpaired bases according to the base-pairing rule, thus initiating the formation of two additional complementary strands connected to the ones that have been split apart. This formation process is represented by the two green strands starting to form in Figure 5. Eventually, the formation process will be completed and the result will be two DNA molecules identical in every way to the original parent. Thus, the simple chemistry of base-pairing provides a mechanism for passing hereditary information from parent to offspring.

 

11.6   From DNA to Proteins

 

The DNA molecule is like the Encyclopedia Britannica (a lot of them) or, in this day and age, perhaps more like Google.[2] The question arises, “In what language is the information written and exactly how is the information read?” The language is the genetic code made possible by the base-paring rule and the reading operation is performed by a combination of RNA and proteins.

 

The DNA molecule contains all the information in the form of the genetic code that is necessary to run the chemistry of life that is carried out within each individual cell in all living organisms. The sum of all this information is called the organism’s genome. It is stored in the sequence of nucleotides that make up the DNA molecule. The chemical reactions that take place within a cell are regulated by protein enzymes so the question of how cellular chemistry works is basically the question of how proteins are manufactured within a cell. This manufacturing process is carried out according to instructions stored in DNA.

 

11.7   Transcription of DNA

 

Text Box: Figure 6 Transcription of DNA involves the assembly of a strand of messenger RNA (mRNA), which is essentially a copy of the DNA coding strand shown in the figure. RNA polymerase moves along the template strand from start to finish, assembling the mRNA out of nucleotides (NT’s) it picks up in the surrounding medium.DNA is a huge molecule. Human DNA contains about 3 billion nucleotide pairs that are arranged in a specific way to store the coded information needed to manufacture the many proteins used by the living organism. It is found inside the nucleus of eukaryotic cells (or in the cell at large in prokaryotic cells). The nucleus of a cell is surrounded by a double membrane that is quite effective at protecting the DNA information storehouse. So the question arises, “How is this information contained within DNA, which itself is contained within a protective wall, transferred out into the cell and in a form where it can be used? The answer to the question is that it is carried out in a process called transcription.

 

Transcription of DNA is catalyzed by the RNA polymerase enzyme (RNAP). The information that controls the construction of a specific protein lies along a segment of the DNA molecule called a gene. This information consists of essentially three parts:

   (i).            A sequence of nucleotides that make up a promoter region and a leader sequence that guides the beginning of the transcription—followed by—

 (ii).            A region that contains the nucleotides that code for the specific amino acid sequence that make up the protein to be constructed—followed by—

(iii).            A series of nucleotides that make up a terminal sequence that stop the transcription.

 

To begin the process, RNAP attaches to the promoter region, with the aid of certain other proteins called transcription factors. This complex of RNAP and transcription factors break the hydrogen bonds of the DNA molecule, “unzipping” it into two strands, providing access to one of them by RNAP.  The RNAP uses that strand as a template for the transcription process, which it carries out by moving along the template from the 3’ end to the 5’ end and assembling a single strand of messenger RNA (mRNA) according to the base-pairing rule—but substituting uracil for thymine. The RNAP gathers up the nucleotides that it needs for the assembly of mRNA from the surrounding medium. These nucleotides that are incorporated into mRNA are formed of the sugar, ribose—not the deoxyribose that make up DNA. The mRNA that is produced is therefore a copy of the complementary “un-transcribed” DNA strand (called the coding strand), except for the substitution of uracil for thymine and ribose for deoxyribose. The copy proceeds from the 5’ end to the 3’ end. RNAP moves along the DNA template strand until it reaches a terminator sequence. At that point, RNAP releases the mRNA polymer and detaches from the DNA. The transcription process is pictured in Figure 6—and it’s worth examining this figure closely!

 

The mRNA is a tiny molecule compared with DNA—it contains only one gene’s worth of information—the amount needed to construct a single protein. Therefore, it can easily pass through the larger pores in the nuclear membrane out into the cell at large. Thus, it transports information contained in a segment of DNA (a gene) out into the cell where it is needed—hence its name, messenger RNA.

 

11.8   The Genetic Code

 

Combinations of no more than 20 different amino acids make up all proteins. The code for a protein stored in DNA and transcribed into mRNA is expressed as a sequence of codons—each codon consisting of a triplet of 3 consecutive nitrogenous bases (the information storage unit of nucleotides). Therefore, each amino acid in the protein chain is represented by a single codon—or a ‘word’ consisting of 3 bases. The number of possible words contained in the genetic dictionary is easy to calculate: each word consists of only 3 ‘letters’ and only 4 letters make up the genetic alphabet A, T, C and G for DNA or A, U, C and G for RNA. By convention, the genetic code is based on the RNA alphabet. Thus, the number of different words that can be written down is 4 x 4 x 4 = 64. In principle, then, DNA (or equivalently RNA) could store enough information to represent 64 different amino acids, each amino acid coded for by one of the 64 unique words. However, since only 20 amino acids are used to make all life’s proteins, there are a lot of words that code for the same amino acid. For example, consider the two words—UUU and UUC—or 3 Uracils in an mRNA codon vs 2 Uracils followed by a Cytosine. Each word codes for the amino acid phenylalanine. This is shown in the chart of Figure 7. Thus, the genetic code is a degenerate code. This doesn’t mean that it has the same quality as a ‘dirty old man’ and should be isolated from society—degeneracy is a technical term that in this case means that several words code for the same amino acid. This property makes the genetic code very robust, i.e., if DNA has suffered damage, it stands a reasonable chance of being repaired in a way that Text Box: Figure 7 A triplet of bases chosen in sequence from the 4 possibilities: A, U, C, G represent the code for translation for an amino acid. There are 64 combinations, many which are redundant. One combination AUG signals start and three possible combinations signal stop for the assembly of a protein from amino acids.leads to no change in the assembly of the desired protein.

 

11.9   Translation of mRNA into a Protein

 

Text Box: Figure 8 Codon sequence in mRNA strand.There is 1 codon that represents a start-signal for manufacturing a protein. It is the code AUG for the amino acid Methionine. The AUG codon is always located at the start of every mRNA molecule. There are 3 codons that represent a stop-signal: (i) UAG, UGA and UAA. Stop-codons are also called ‘termination’ or ‘nonsense’ codons because there is no amino acid that they represent. A constructor molecule, called a ribosome, reads the information contained in the nucleotide sequence between the start-codon and the stop-codon of the mRNA molecule and uses it to manufacture a protein. This process is known as translation, i.e., the ribosome ‘translates’ the genetic information from RNA and uses it to assemble proteins out of the specified amino acids. Guess what—ribosomes are enzymes and their geometry is such that they readily bind to an mRNA molecule. They use the mRNA as a template to assemble the correct sequence of amino acids that make up a particular protein. An example of an mRNA molecule and the code it contains is shown in Figure 8. The DNA gene, the mRNA molecule that was produced by transcribing this gene and the resulting protein constructed by the ribosome are indicated in the sequences that follow:


DNA    ATGACGGAGCTTCGGAGCTAG
            TACTGCCTCGAAGCCTCGATC


mRNA AUGACGGAGCUUCGGAGCUAG

Protein   Start - Thr-Glu-Leu-Arg-Ser - Stop


The amino acids that the ribosome needs to construct the protein are floating around in the cytoplasm of a cell. These amino acids are attracted and attached to another type of RNA molecule called transfer RNA (tRNA). There is one particular tRNA molecule for each of the possible words that code for the 20 different amino acids that could be used to make a protein. Because the genetic code is robust, there are different tRNAs that carry the same amino acid. Ribosomes are enzymes made from complexes of RNA’s and proteins. They are divided into two subunits, one larger than the other. The smaller subunit is the one that binds to the mRNA, while the larger subunit binds to the tRNA and the amino acid it carries. As a ribosome ‘reads’ a particular codon, it attracts the appropriate tRNA molecule to it. The tRNA molecule enters one part of the ribosome and binds to the mRNA codon being read (the binding of the tRNA molecule takes place because it contains the anti-codon corresponding to the mRNA codon). The attached amino acids are then joined together by another part of the ribosome. The ribosome moves along the mRNA, ‘reading’ its sequence and producing a chain of amino acids. The process terminates when the ribosome reads a stop-codon. Its two subunits open up and then it releases the completed polypeptide chain—the desired protein. This ‘manufacturing’ process is schematized in Figure 9.

Figure 9 Manufacture of a protein by a ribosome. The ribosome reads mRNA codons and attracts corresponding tRNA molecules to it, which carry the ‘coded for’ amino acids. The amino acids are linked together into a growing chain and the empty tRNA is released. The process halts when a ‘stop’ codon is read.

 

11.10 Central Dogma of Molecular Biology

 

The net result of this rather complex molecular machinery has been the production of a particular protein enzyme, based upon information encoded in a particular stretch of the DNA molecule. This segment of DNA is a gene and the protein enzyme produced on the basis of that gene will be used to drive a certain chemical reaction in a cell. That reaction might determine eye color, skin color or any other of the characteristics of a living organism.

 

The flow of information from DNA to protein is a one-way street, which is the basis of the central dogma of molecular biology, first expressed by Francis Crick in 1958 and subsequently in a Nature publication in 1970.[3] Crick’s central dogma states that information in a protein cannot be transferred to another protein or back to the nucleic acids whence it came. Pictorially, the central dogma is shown in Figure 10. In essence, it points out that (i) information contained in DNA can be transferred to another DNA (replication), (ii) information contained in DNA can be copied into mRNA (transcription) and (iii) proteins can be synthesized using the information in mRNA as a template (translation).

 

Text Box: Figure 11 Schematic of the HIV virus.Text Box: Figure 10 Central dogma of molecular biology.However, there are special cases in which information can flow ‘backwards.’ For example, a retrovirus like HIV can transfer information from its RNA (it doesn’t have any DNA) back into its host’s DNA. A basic schematic of the HIV virus is shown in Figure 11. The virus is encapsulated by a phospholipid layer from which protrudes a complex of 70 proteins whose geometry is designed to mate with certain protein receptors in the host cell (uh-oh)! Upon entry to the target cell, an enzyme contained within the virus called reverse transcriptase converts the virus’ RNA genome into a small, double stranded DNA genome. Another of the virus’ enzyme, called integrase, integrates, or inserts the virus DNA genome into the host cell DNA. At this point, the host cell is infected—the cell’s replication machinery will now replicate the virus! In other words, the HIV virus has co-opted the cellular machinery of its host to make copies of its own RNA, thus information has flowed from viral RNA to viral DNA to host DNA and back to viral RNA.

 

As another example of the reversal of information flow, direct translation from DNA to protein, skipping over the mRNA intermediary, has been demonstrated in a cell-free system (i.e. in a laboratory test tube). Given that the primary reason for the transcription process is to produce the much smaller mRNA molecule in order to transport the genetic information through the nuclear membrane, the laboratory scientist’s concoction of such a process might not be too surprising. The process involves extracts from E. coli bacteria that contained ribosomes, but not intact cells. These cell fragments could express proteins from foreign DNA templates. Be that as it may, no known living system exists in which information from proteins is transferred back to RNA or DNA.

 

Every living organism on Earth operates via the same basic molecular machinery that we have just described. Each living organism has different information encoded in its genes. The nature of a cell is determined by the chemical reactions that take place within it and these chemical reactions are driven by the wide variety of its protein enzyme ‘inhabitants.’ The information required to manufacture all of these proteins from amino acids within the cell is stored within the cell’s DNA. Every cell within a single, complex, multi-celled organism such as a human contains the same DNA. Perhaps what is most remarkable is the unequivocal fact that every living organism stores the information it needs in its DNA using the same genetic code shown in Figure 7. It is unmistakably the greatest unifying principle in the biology of living systems. DNA is truly ‘the molecule of life.’

 

11.11 Darwinian Evolution Revisited

 

Uncovering the nature of DNA and how its genes are expressed has provided striking confirmation of Darwin’s theory of evolution driven by on natural selection. Darwin knew nothing of DNA. He based his theory entirely on his observations of the variations he saw in various populations of species (see Focus Box 1, Chapter 10). We now know how such variations occur—they are caused by gradual changes in DNA that occur over time—those genes that endow their ‘owners’ with a greater probability of survival than do the genes of other members of the same species are the ones that ultimately spread throughout a population. This is natural selection.

 

If each succeeding generation of living organisms had exactly the same DNA as its predecessors, they would be identical copies and organisms would not evolve. Natural selection couldn’t work. However, their DNA is different and each member of a species has slightly different DNA than other members. In those organisms that reproduce sexually, the DNA makeup of an offspring is a composite of the DNA of its preceding parents. Also, in sexually reproducing organisms, as well as those that reproduce asexually, changes occur in the makeup of DNA that are caused by ‘copying error’ or environmental interactions. Any change introduced into the structure of DNA by such causes results in a mutation.

 

There are a number of ways mutations can occur. For example, numerous chemicals found in the environment such as—exposure to nuclear radiation, X–rays or ultraviolet light, and so on can all ‘damage’ DNA. DNA can change if it is unfaithfully copied during the replication process. We now know that DNA is damaged at a far higher rate than previously supposed. Changes to DNA in humans occur at the rate of roughly 10,000 codons per cell per day. Fortunately, most—but not all— of this damage is repaired as rapidly as it occurs. If unrepaired alterations occur in a sperm or egg cell, the alteration will be faithfully passed on to offspring. If it occurs in other cells, it might result in the production of the wrong protein, which could have disastrous effects on a living organism.

 

As a heuristic example, consider the sentence—…THEFATCATSAT…  There is no punctuation but each 3 consecutive letters are analogous to a codon in DNA and the properly transcribed and translated sentence would read— … THE FAT CAT SAT   Now, suppose that absorption of an X-ray knocks out the 1st T so that the codon sequence becomes …HEFATCATSAT… and the decoded sentence, based on reading each triplet codon in sequence, would translate to—

…HEF ATC ATS AT. …

Text Box: Figure 12 Shape of sickle cell compared to normal red blood cells.In other words, the resultant protein would be quite different from the one intended.

 

As an example in the real world, consider sickle-cell anemia, which kills 100,000 people per year. It is caused by the change of an A to a T in just one codon of the gene for the manufacture of the hemoglobin molecule—changing the 6th codon from GAG to GTG, which codes for the hydrophobic amino acid valine instead of the hydrophilic glutamic acid. This single mutation results in a disastrous change in the shape of the hemoglobin molecule (Figure 12). The sickle cells are stiff and sticky, causing them to clump and ‘jam up’ in blood vessels creating blockages. This is clearly an unfavorable mutation.

 

Text Box: Figure 13 Control genes represent instructions that turn other genes "on" or "off."While many mutations do indeed have negative effects, such as the defect in a single codon that leads to sickle cell anemia, another sort of mutation can have major (and sometimes positive) effects.

Some regions of DNA contain control genes that determine when and where other genes are turned “on.” Mutations in these control genes can substantially change the way different parts of an organism are put together. The difference between a mutation to a control gene and a mutation to a less powerful gene is a bit like the difference between distracting one of the trumpet players in an orchestra versus distracting the orchestra's conductor (Figure 13). The impact of changing the conductor's behavior is much more profound than changing a few notes played by an individual orchestra member. Similarly, a mutation in a control gene can cause a cascade of effects in the behavior of genes under its control.

 

Text Box: Figure 14 A normal and mutant fly. A change in the Hox gene produced the mutant fly with a leg growing out of its head instead of an antenna.For example, Hox genes, found in many animals (including flies and humans), determine where the head goes and which regions of the body grow appendages. Such master control genes help direct the building of body parts, such as segments, limbs, eyes, etc. So producing an organism with a major change in its basic body layout may not be so unlikely—a simple mutation of a Hox gene can do it. Most times the mutation is unfavorable and produces a new organism that is not long for this world (Figure 14) and natural selection will see to it that progeny carrying this particular gene will not last long either. Eventually, the Hox gene carrying this particular mutation will disappear from the ‘gene pool’ since it is being carried by organisms unfit to survive very long.

 

However, the genetic makeup of the resulting offspring might endow it with characteristics that make it ‘more robust’, i.e., better able to survive its environment. If it is more robust, its survivability is enhanced and it more likely passes its genes on to its offspring. For example, mutations in a Hox gene could lead to an entirely new phenotype (body type) that would be better adapted to its environment than its predecessors. A mutation in a hox gene could cause eyes to be positioned more towards the front of an organism and away from the sides. Ultimately such mutant members of an ocean-going species might slowly migrate to the land. Such genes would tend to proliferate, while those that endow their ‘carriers’ with a lesser survivability out of water would tend to die away.

 

All living organisms are essentially ‘bags of information’—they are vessels that carry molecular genes that have endowed them with high survival probability. Its genes proliferate in the gene pool. It is the gene that is the basic unit of information in DNA that drives the process of natural selection, and understanding how genes control the characteristics of all living organisms completes and extends the explanation of evolution given by Charles Darwin long before the basic mechanisms of genetics were understood.  Evolution, or ‘survival of the fittest,’ is a genetically driven process—the fittest simply carry the ‘right genetic stuff.’!

 

Focus Box 1   Discovery of the Structure of DNA[i]

 

On the last day of February in 1953, according to James Watson, Francis Crick announced to the patrons of the Eagle Pub in Cambridge,

We have discovered the secret of life.”[ii]

Brian Hayes, the author of “The Invention of the Genetic Code” states,

If life ever had a secret, the double helix of DNA was surely it.[iii]

Francis Crick and James Watson gained fame for ‘unraveling the double helix.’ Maurice Wilkins played a role as well and together they all received the Nobel Prize in 1962. Yet there was one other person whose contribution to this discovery was truly essential and yet she could not be recognized by the Nobel Committee—she died in 1958 at the age of 37 from ovarian cancer and Nobel prizes are not awarded posthumously. That person was Rosalind Franklin. Crick and Watson stood on her shoulders as well as those of the many scientists whose hard work preceded them.

 

As previously mentioned, back in 1868, the Swiss physician Fritz Miescher discovered quite by accident a substance in the nuclei of cells that he called nuclein while attempting to isolate the protein components of white blood cells. Toward that end he had contacted a local surgical clinic and had them send him some pus-coated patient bandages, which he washed and filtered out the white blood cells. He then extracted and identified the various proteins within the cells, but he came across a substance inside the cell nuclei that had chemical properties unlike any protein. It contained a much higher phosphorous content, was acidic and was resistant to protein digestion by proteins. Miescher realized that he had discovered a new substance.

"It seems probable to me that a whole family of such slightly varying phosphorous-containing substances will appear, as a group of nucleins, equivalent to proteins."

These nucleins, because they were acidic and had been discovered within the nuclei of cells came to be known generally as nucleic acids. More than 50 years passed before the significance of Miescher's discovery of nucleic acids was widely appreciated by the scientific community. That was a shame because the nucleic acid that Miescher had discovered was in fact deoxyribonucleic acid, or DNA!

 

In the twentieth century, a number of other scientists began investigating the chemical nature of nuclein. The Russian biochemist, Phoebus Levene, based upon years of work using hydrolysis to break down and analyze nucleic acids in yeast was the first to discover that they were formed of a series of units that he called nucleotides. Each nucleotide was composed of (i) one of four nitrogen-containing bases, (ii) a sugar molecule, and (iii) a phosphate group. He also discovered that (iv) the sugar component of RNA was ribose, (v) the sugar component of DNA was deoxyribose and that (vi) the three major components of a nucleotide were linked together in the order phosphate-sugar-base. He was correct about all this, but the “tetranucleotide” model he proposed for the structure of DNA, formulated around 1910, was incorrect. Levene hypothesized that DNA was made up repeating strings of identical tetranucleotides, each tetranucleotide consisting of only 4 individual nucleotides—the ones containing the four possible nitrogenous bases, guanine (G), adenine (A), cytosine (C), and thymine (T).  Thus, the DNA nucleotide sequence was always G, A, C, T, G, A, C, T … and so he declared that DNA could not store the genetic code because it was chemically far too simple. Indeed, most scientists following Levene up though the 1940’s thought that proteins, which were far more complex than a linked chain of identical units, were the basis of heredity. Consequently, most research on the nature of the gene focused on proteins, particularly enzymes and viruses.[iv]

 

It was not until 1943 that the first direct evidence emerged in support of DNA as the bearer of genetic information. Oswald Avery, Colin MacLeod, and Maclyn McCarty of Rockefeller University in New York City discovered that DNA taken from a virulent (think nasty) strain of the bacterium Streptococcus pneumonae permanently transformed a non-virulent (think nice) form of the organism into a virulent form. Avery and his colleagues concluded that it was the DNA from the virulent strain which carried the genetic message for virulence and that it became permanently incorporated into the DNA of the recipient non-virulent cells. Although the scientific community was slow to adopt the idea that DNA was the carrier of genetic information (they were still hung up on proteins), a subsequent experiment provided evidence that this was indeed the case. In 1952, Alfred Hershey and Martha Chase found that when the bacterial virus, bacteriophage T2, infects an E. Coli host cell, it is the DNA of the virus and not its protein coat, which enters the host cell and provides the genetic information for replication of the virus.[v]

 

Erwin Chargaff expanded on Levene's work by uncovering additional details of the structure of DNA, thus further paving the way for Watson and Crick. Chargaff, an Austrian biochemist, had read the famous 1944 paper by Oswald Avery and his colleagues, which demonstrated that hereditary units, or genes, are composed of DNA.[vi] This paper inspired him to launch a research program on the chemistry of nucleic acids. Of Avery's work, Chargaff (1971) wrote the following:

“This discovery, almost abruptly, appeared to foreshadow a chemistry of heredity and, moreover, made probable the nucleic acid character of the gene... Avery gave us the first text of a new language, or rather he showed us where to look for it. I resolved to search for this text.”

 

Chargaff discovered that the composition of nucleotides that make up DNA varies among species, in particular in the relative amounts of A, G, T, and C bases. This evidence of molecular diversity made DNA a far more credible candidate for the carrier of genetic information than would be the case if it were merely a repetitive string of identical units as proposed by Levene.

 

Text Box: Chargaff's rule: In DNA, the total abundance of purines is equal to the total abundance of pyrimidines.Chargaff also discovered that DNA—no matter what organism or tissue type it comes from—maintains certain properties, even as its composition varies. In particular, the amount of adenine (A) is approximately the same as that of thymine (T) and the amount of guanine (G) approximates that of cytosine (C). In other words, the total amount of purines (A + G) is nearly the same as the total amount of pyrimidines (C + T) in all DNA, regardless of their source. For example, Chargaff determined that the four bases present in human DNA occur in the percentages: A=30.9% and T=29.4%; G=19.9% and C=19.8%. Thus, he concluded that [A + G] = [T+C], which became known as Chargaff’s rule. This strongly hinted towards the base pairing (A–T and C–G) in the DNA molecule, although Chargaff did not explicitly state this connection himself.

 

Most researchers had previously assumed that deviations in DNA from equal base ratios G = A = C = T as in Levene’s tetranucleotide model, were due to experimental error, but Chargaff proved unequivocally that the variation was real. Chargaff met Francis Crick and James Watson at Cambridge in 1952, and, despite not getting on well with them, explained his findings to them. Chargaff's research would later help Watson and Crick laboratory deduce the double helix model of DNA.

"So far as I could make out, they wanted, unencumbered by any knowledge of the chemistry involved, to fit DNA into a helix. The main reason seemed to be Pauling's alpha-helix model of a protein....I told them all I knew. If they had heard before about the pairing rules, they concealed it. But as they did not seem to know much about anything, I was not unduly surprised. I mentioned our early attempts to explain the complementarity relationships by the assumption that, in the nucleic acid chain, adenylic was always next to thymidylic acid and cytidylic next to guanylic acid....I believe that the double-stranded model of DNA came about as a consequence of our conversation; but such things are only susceptible of a later judgment...."

Erwin Chargaff, American Philosophical Society oral history interview (1972)[vii]

 

Watson and Crick independently shared a conviction that DNA, not proteins, was the critical factor in passing on genetic information from generation to generation. Watson moved from the United States of America to England in order to work with Crick at the Cavendish Laboratory in Cambridge. Scientists at Cavendish had done no prior research on DNA at that time, so Watson and Crick used information gathered from others, as well as their own observations, in order to establish the structure. Watson and Crick essentially relied upon three sources of knowledge in order to develop their model: (i) the helical structure found after numerous tests of X-ray scattering carried out by Rosalind Franklin, (ii) base composition restrictions determined by Chargaff and (iii) the knowledge available about the molecular structure of the four nucleotide base groups A, T, C and G.

 

Text Box:  Rosalind Franklin              X-Ray diffraction photograph 51Rosalind Franklin was analyzing DNA using the technique of x-ray crystallography in which a crystal is exposed to x-rays in order to produce a diffraction pattern that can be photographed.  If the crystal is pure enough and the diffraction pattern is acquired very carefully, it is possible to reconstruct the positions of the atoms in the molecules that comprise the basic unit cell of the crystal. Scientists at Cavendish were figuring out how to do this for biological structures like DNA by the early 1950’s. Based on a series of x-ray photographs that Franklin obtained, she concluded that the phosphate group lay on the outside of the DNA molecule, not the inside as previously thought, i.e., that it was part of its backbone. She also obtained the now infamous ‘photograph 51’ that was direct evidence that the structure of DNA was a double stranded helix (the fuzzy X in the middle of the picture is an x-ray diffraction signature of helical structure)—not a triple helix as proposed by Linus Pauling, the Nobel-prize winning chemist from Cornell University who was hot on the heels of Watson and Crick in the effort to find the structure of DNA! Although photograph 51 implied a double-stranded structure for DNA, its exact details and the way in which the strands were bonded together was hidden from the prying eyes of the x-ray crystallography study. Maurice Wilkins, a peer of Franklin’s who had also performed x-ray studies of DNA, obtained photograph 51 from Franklin’s student, Raymond Gosling, and showed it (without Franklin’s permission) to Watson who immediately recognized the significance of the "X" in photo-51—it meant that DNA was indeed a helix with 10 units per turn (count the spots in the photo) with 3.4 nanometers per turn. Watson sketched a copy of the photo on a newspaper and showed it to Crick. It all came together for them.

 

At this point, it is worth listing exactly what Crick and Watson knew—

1.      DNA is made up of subunits which scientists called nucleotides.

2.      Each nucleotide is made up of a sugar, a phosphate and a base.

3.      There are 4 different bases in a DNA molecule:

adenine (a purine)

cytosine (a pyrimidine)

guanine (a purine)

thymine (a pyrimidine)

4.      The number of purine bases equals the number of pyrimidine bases

5.      The number of adenine bases equals the number of thymine bases

6.      The number of guanine bases equals the number of cytosine bases

7.      The basic structure of the DNA molecule is helical, with the bases being stacked on top of each other

 

Putting it all together they produced a model of the structure of DNA—it consisted of a 2 chain helical ‘spine,’ with antiparallel properties and the bases facing inward paired to hold the molecule together. They were the first to recognize the most critical fact about DNA that made such a structure possible—they realized how the bases A and T and C and G adhered to one another: A—T, T—A, C—G, and G—C were the only permissible bonds! These particular bonds were the only possible pairings that were of the same length—an absolute necessity if all the ‘rungs’ of the DNA ladder were to easily fit into the frame of the double helix.

 

Description: Description: http://campus.udayton.edu/%7Ehume/DNA/DNA.ht27.jpgThis can be understood by noting that within the molecular structure of each base there is a small amount of negative electrical charge on those nitrogen and oxygen atoms not attached to a hydrogen atom, while a positive charge exists on those atoms that are attached to hydrogen. Examining the base structure of one “rung” of the DNA “ladder” shows that adenine has a surplus negative charge, while thiamine has surplus positive charge. This makes it possible for hydrogen bonds to form between the two nucleotides that hold them together. Furthermore, one can see that three hydrogen bonds hold guanine and cytosine together, while adenine and thiamine are held together by two. Therefore, the guanine–cytosine coupling is stronger than the adenine–thymine coupling and these pairings are the only ones possible. In addition, adenine and guanine are purines and are larger than the pyramidines, thiamine and cytosine. Thus, a purine must bond with a pyramidine in order to produce rungs of equal width. This complex pattern of bonding meant that the DNA polynucleotide chain could be formed of almost any arrangement of base pairs that adhered to these pairing rules and therefore that their pattern could be almost as intricate in base composition and arrangement as the pattern of amino acids that made up the polypeptide chain of proteins. 

 

This discovery opened the door to the realization that DNA was indeed capable of enough structural variety to serve as the molecule of heredity. In a note to their seminal publication in Nature in 1953, Watson and Crick listed several other implications of their model—

“The phosphate-sugar backbone of our model is completely regular but any sequence of the pairs of bases can fit into the structure . . . it therefore seems likely that the precise sequence of the bases is the code which carries genetic information.”

 

In addition, they noted that their model offers a particularly elegant solution to the problem of gene replication. Replication can be carried out by ‘unzipping’ the antiparallel sides of DNA and refilling them using the old DNA as a template for precise synthesis of new genetic material. Furthermore, it helps solve other problems—as stated by Watson and Crick in a second article, Genetical Implications of the Structure of Deoxyribonucleic Acid

“Our model suggests possible explanations for a number of other phenomena . . . spontaneous mutations may be due to a base occasionally occurring in one of its less likely tautomeric forms…” 

 

Discoveries pertaining to the structure DNA led to the advent of molecular biology. It had long been argued where the genetic material lay within the cell, but due to the ability of Watson and Crick to piece together the available information and labor to find what was missing, the model they proposed made it apparent that the secret of the genetic inheritance resided in the base arrangements of the DNA molecule. This revelation marked the beginning of what may be called the second era of molecular genetics. Like all new discoveries, however, it does open up another ‘Pandora’s box’—it offers unprecedented possibilities for manipulating the living cell and purposefully modifying heredity! This will no doubt, for better or worse, exert an influence on humankind perhaps even more profound than that made possible by the discoveries of nuclear physics.

 

James Watson and Francis Crick did none of the experimental work nor all of the theoretical work that led to the discovery of the structure and function of deoxyribonucleic acid; however, it was their intellectual creativity and ability to understand how all the available information fit together that allowed it to happen. One must realize that without the X-ray crystallography techniques of Rosalind Franklin few would have assumed DNA exhibited a helical structure and without Chargaff’s rule of proportions of purine and pyramidines the discovery of the base pairing rule might still lay hidden away. Lots of good science is done by people who make sense of the work done by others. Watson and Crick… have been criticized because they used so much experimental evidence laboriously gathered by those that came before...  Their genius was that they could make sense of a huge amount of information, some of which appeared to conflict with others, and yet put together a view of the structure of DNA which was consistent with all the evidence. It is now certain that DNA is the carrier of genetic information in all living cells. Their model—the double helix—with its biological implications ranks as the greatest contribution to biology since the work of Darwin and Mendel, something that is obvious enough from the fact that the acronym DNA and the image of the double helix rank close to the top among all the icons of late twentieth-century culture.



[1] Watson, J. D., & Crick, F. H. C. A structure for deoxyribose nucleic acid. Nature 171, 737–738 (1953). For a fascinating account of the discovery, read Watson, J. D. (1968). Gunther S. Stent. ed. The Double Helix: A Personal Account of the Discovery of the Structure of DNA. W. W. Norton & Company, ISBN 0-393-95075-1,  (Norton Critical Editions, 1981).

[2] Definition of Google from wordnetweb.princeton.edu/perl/webwn          

Nouna widely used search engine that uses text-matching techniques to find web pages that are important and relevant to a user's search. Verb—search the internet (for information) using the Google search engine; "He googled the woman he had met at the party."

[3] Crick, F.H.C., On Protein Synthesis,  Symp. Soc. Exp. Biol. XII, 139-163 (1958).

Crick, F., Central dogma of molecular biology, Nature 227 (5258): 561–3 (August 1970). http://www.nature.com/nature/focus/crick/pdf/crick227.pdf.



[i] Much material in Focus Box Discovery of the Structure of DNA based on lecture found at http://campus.udayton.edu/~hume/DNA/DNA.htm

[ii] Watson, James D. The Double Helix,  New York: Norton Critical Editions in the History of Ideas (1980).

[iii] Hayes, Brian. The Invention of the Genetic Code,  [Online] Available

            http://www.amsci.org/amsci/issues/Comsci98/compsci9801.html, 25 March 2001.

[iv] Pray, L. Discovery of DNA structure and function: Watson and CrickNature Education 1(1) (2008).

[v] For information about Avery and Hershey’s work see for example http://www.accessexcellence.org/RC/AB/BC/Search_for_DNA.php

[vi] Avery, Oswald T.; Colin M. MacLeod, Maclyn McCarty (1944-02-01). "Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types: Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type III". Journal of Experimental Medicine 79 (2): 137–158. doi:10.1084/jem.79.2.137

[vii] More quotes from Chargaff can be found at http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/dna/people/chargaff.html