Advertisement

Molecular Mechanism of Coding and Autonomous Decision-Making in Biological Systems

  • Tara Karimi
Chapter

Abstract

Biological systems are recognizable from inanimate materials through their cognition and computation capacity. Cells are the main subunits of a biological system and function as highly advanced computers by executing thousands of operations per second for different biological purposes to dynamically adapt with the environment. Unlike current electronic-based computers, biological systems utilize a molecular-based coding system in which information is stored in molecules. Information storage in molecules provides massive operation capacity for the cells. Deep understanding of mechanisms of coding and data processing in the cells could have several technology applications and trigger an industrial revolution. However, this level of progress requires the establishment of a different scientific viewpoint for life sciences – a paradigm that puts life sciences in a category that is much closer to the other experimental branches of natural sciences including chemistry, physics, and mathematics.

In this chapter, first we provide a detailed description of different aspects of molecular coding and data operation in biological systems applying new concepts of cognitive chemistry and the relativity of code, energy, and mass. We will discuss how information is stored in the patterns of molecular interactions and how real-time interactions between molecules and atoms generate a dynamic coding and operation capacity in biological systems. In the second part, we will discuss how we can leverage the cognitive chemistry knowledge in designing synthetic systems with similar autonomous properties of biological systems. In the third part of this chapter, we will discuss how basic principles of cognitive chemistry can be applied to mimic the extensive computation capacity of biomolecules in solving complex decision making problem.

Keywords

Cognitive chemistry coding system Nondeterministic polynomial time problems Molecular computing Artificial Intelligence (AI) DNA computing Multilayer coding Conserved and dynamic coding Stem cells and decision-making Eternal cognition 

2.1 Molecular Mechanism of Autonomous Coding and Data Processing Based on Physical and Chemical Foundation of Life

Considering the physical and chemical foundations of life, it can be concluded that living organisms are physicochemical systems and are not exempted from nature’s laws in physics and chemistry. Scientifically, we know that biological systems are composed of normal elements of nature. However, the regulatory mechanisms that for the first time caused the conversion of elements from randomness into highly organized system (determined as life) have remained a mystery.

“Science cannot solve the unlimited mystery of nature. And that’s because, in the last analysis, we ourselves are part of nature and therefore, part of the mystery that we are trying to solve”. (Max Plank).

The precise regulatory mechanisms behind “the origin of life” five billions years ago when the first self-generating reaction of life occurred remained a nature’s mystery (due to our limitations in the experience of time). However, the silver lining is that we can learn from the currently existing rules of life that already are accessible to human experience.

In the early twentieth century, right after the establishment of modern physics and the revolutionary effect of quantum physics on different technologies, Erwin Schrodinger referred to the emergence of a revision in the definition of life. Furthermore, he attempted to describe the revision in the classification of natural sciences by filling the gaps between life and other branches of sciences. In his well-known book about life, titled “What Is Life,” he referred to the missing parts in the classification of different branches of natural sciences that separate living systems from other human-made physicochemical systems. He described this gap to be related to our limited understanding of the structure and construction of living systems. Said another way, the construction of living systems is different from anything we have yet tested in the human-made physical laboratories.

“What I wish to make clear in this last chapter is, in short, that from all we have learned about the structure of living matter, we must be prepared to find it working in a manner that cannot be reduced to the ordinary laws of physics. That not on the ground that there is any new ‘force’ or what not, directing the behavior of the single atoms within a living organism, but because the construction is different from anything we have yet tested in the physical laboratory” [25].

During the early twentieth century, along with the great progress in physics and chemistry and later in mathematics (data science), a revolutionary movement occurred in conventional classifications of natural sciences (Fig. 2.1a, b). Different branches of natural science including physics, chemistry, and mathematics moved closer to each other, and subsequently, several new fields were established in natural sciences (Fig. 2.1b). For example, modern (quantum) physics was established by a new interpretation of integrated physics and chemistry. Later, computer science was established by integration of electronic physics and data science (mathematics) (Fig. 2.1c). During the second world war, Alan Turing developed the first universal digital computer (also called Turing machine). He worked on algebra and number theory, as well as a cipher machine based on electromagnetic relays to multiply binary numbers.
Fig. 2.1

(ag) Schematic image illustrating the trend of changes in the classification of natural sciences after the twentieth century as a result of the cross-pollination or integration of basic natural science. TC, Turing Computing; QC, Quantum Computing; O.Ph, Ordered Physics

At the early twentieth century, basic principles of modern physics were established by Max Plank. Modern physics later became highly practical and were applied in different technologies by research work of Albert Einstein and Erwin Schrodinger on quantum mechanics.

At 1900, Max Plank tried to explain blackbody (BB) radiation from warm objectives. Then, the BB spectrum was quantified by assuming that the energy in an electromagnetic wave, according to Max Plank eq. (E EM Wave = nhf).

At 1905, Albert Einstein explained the concept of a photon as a pocket of electromagnetic radiation which behaves as energy (wave) and particle (mass) at the same time. Also, he described the equation for the relativity of energy and mass as E = MC2 [3].

Later at 1927, Erwin Schrodinger attempted to define a wave equation that could precisely describe the energy in an atom particle. Schrodinger succeeded to present an equation that could precisely predict energies in hydrogen atoms which led to the establishment of the basic principles of quantum physics. Schrodinger’s equation probably was the most important equation of the twentieth century, because of its huge effect on technological progress. The major significance of Schrodinger’s work in quantum physics comes from its broad coverage of both physics and chemistry.

The physicist, Paul Dirac famously asserted: “The Schrodinger equation accounts for much of physics and all of chemistry”.

In addition, at the mid-twentieth century, Schrodinger attempted to revise the classical definition of life sciences by application of basic principles of physics and chemistry in biological systems. The full manuscript of his lecture series about the physicochemical foundation of life was later published as a book titled “What Is Life.” [25]

Schrodinger attempted to provide several pieces of evidence to prove the accuracy of principles of quantum physics in biological systems. He believed differences between biological systems and other physical and chemical systems are simply related to their structures. However, still, they are made of natural elements and obeying all nature’s laws in physics and chemistry.

He tried to explain that differences in the construction of two systems may cause a big difference in their function, while still the basic elements applied in both systems might be the same. This needs to be differentiated from anything else that may cause biased understanding of basic elements and nature of a system. The difference in construction is enough to make an entirely different way of functioning. Schrodinger tried to make this concept clear by comparing the structural differences between a heat motor and an electronic circuit, both of which may apply the same basic elements (e.g., copper and iron). He referred to the consistency of the natural properties of copper even though it has been applied differently in the construction of a heat motor compared to an electronic circuit.

“To put it crudely, an engineer, familiar with heat engines only, will after inspecting the construction of an electronic motor, be prepared to find it working along principles which he does not yet understand. He finds the copper familiar to him in kettles used here in the form of long, wires wound in coil; the iron familiar to him in levers and bars and steam cylinders here filling the interior of those coils of copper wire. He will be convinced that it is the same copper and the same iron, subject to the same laws of nature, and he is right in that. The difference in construction is enough to prepare him for an entirely different way of functioning [25].”

The concept of consistency in the subject of nature’s law earlier has also been mentioned by Max Plank and Einstein. However, Schrodinger attempted to provide a systematic description for the consistency of nature’s laws in living systems aligned with quantum mechanics.

Schrodinger referred to the “orderliness” as the major difference between a living system and anything else that exists in inanimate matter.

“The unfolding of events in the life cycle of an organism exhibits an admirable regularity and orderliness, unrivaled by anything we meet with in inanimate matter. We find it controlled by a supremely well-ordered group of atoms, which represents only a very small fraction of the sum total in every cell.”

As examples of ordered structures in nature, Fig. 2.2 illustrates the interconnections of living systems with inanimate elements in nature, and Fig. 2.3 illustrates the spontaneous formation of a highly organized structure in a snail shell from disordered elements of nature which is defined as ordered physics by Erwin Schrodinger.
Fig. 2.2

(a-c) provides examples of spontaneous formation of highly organized structures in nature among non-living and living systems. A- Formation of ice crystal (non-living). B-Geometrical structure of a starfish C- A jellyfish. Living systems are made of natural elements of nature and are obeying all nature’s laws in physics and chemistry. What differentiates the chemistry of a living system from non-living elements of nature is related to the capacity of performing autonomous functions (e.g self-regeneration, self-regulation and self-fueling capacity) applying their molecular units

Fig. 2.3

(a, b) Autonomous formation of highly organized geometrical structures in a snail shell. Formation of highly organized structure from disordered elements of nature is called as ordered physics by Erwin Schrodinger

In addition, based on the behavior of atoms, Schrodinger classified all physical systems in nature into two main categories including statistical physics and order from order (ordered) physics (Fig. 2.1d).

“I remember an interesting little paper by Max Plank on we have the topic the dynamical and statistical type of law (Dynamische und Statistische Gresetzmassigkeit). The distinction is precisely the one we have here labeled as ‘order from order’ and ‘order from disorder’ [25]. The object of that paper was to show how the interesting statistical type of law, controlling large-scale events, in constituted from the dynamical laws supposed to govern the small-scale events, the interaction of the single atoms and molecules. The latter type is illustrated by large-scale mechanical phenomena, as the motion of the planets or of a clock, etc. thus it would appear that the new principle, the order- from- order principle, to which we have pointed with great solemnity as being the real clue to the understanding of life.”

“For the new principle that is a genuinely physical one, it is, in my opinion, nothing else than the principle of quantum theory over and over again. But, we cannot expect the ‘laws of physics’ derived from it suffice straightaway to explain the behavior of living matter whose striking features are visibly based to large extent on the ‘order-from-order’ principle. You would not expect two entirely different mechanisms to bring about the same type of law- you would not expect your latch-key, to open your neighbor door as well”. We must therefore not be discouraged by the difficulty of interpreting life by the ordinary laws of physics. For that is just what is to be expected from the knowledge we have gained of the structure of living matter. We must be prepared to find a new type of physical laws prevailing in it.” [25]

Now the big question is what would be the new type of physical laws that prevail the highly ordered structure, function, and behavior of living systems?

Here in this book, we attempt to explain the mechanisms that cause the ordered behavior of molecules and atoms in biological systems. We will discuss how order from order physics can be explained by unique property of biological systems for generation of molecular codes. As an additional factor in nature’s laws, we describe the coding capacity of molecules in biological systems that should be considered alongside the properties of molecules for energy and mass production. Therefore, we determined code as the third dimension of nature’s laws that have been illustrated in the autonomous properties of biological systems.

By considering code as the third dimension of nature’s laws in physics and chemistry, we determined a new branch of chemistry called cognitive chemistry that mimics the ordered physics in biological systems. Cognitive chemistry integrates nature’s laws in chemistry, physics, and mathematics for coding, data storage, and operation by molecules (Fig. 2.1g). Cognitive chemistry can be interpreted in physics terminology as the relativity of code, energy, and mass.

2.2 Basic Principles of Coding and Data Processing in Biological Systems

2.2.1 Information Storage in Molecules and Materials

Unlike human-made computers that are operated according to physical- and electrical-based coding, biological systems apply a unique chemical-based coding system. In this system biological information is embedded in materials or chemical interactions between molecules and atoms. For simplicity, in this book, code-embedded materials are called “coded materials”. As an example, deoxyribonucleic acid (DNA) is the most well-known coded material that nature utilizes for storage of biological information. DNA, also known as genetic material, contains the entire information of an organism to be copied into the next generation of the species [17]. DNA provides a huge storage capacity because DNA encodes data applying four subunits including A, G, C, and T, while current computers apply a binary (0,1) coding system. Neurotransmitters are other examples of coded materials which play a major role in the information storage and transduction in the neural system.

Figure 2.4 compares different features of information storage and data processing in the molecular-based coding system of a seed (A), versus the binary coding system in an electronic-based flash memory (B). In fact, in a molecular coding system, the software (coding unit) and hardware (structural unit) are integrated together. For instance, in a DNA coding system, each coding unit (AGCT) possesses a physical mass. While in an electronic coding system, each coding units (0 or 1) lacks a molecular mass and is generated just by electronic currents. In addition, self-organization and autonomous formation of structural patterns are the direct results of data operation in a molecular coding system. For example, all information storage for the synthesis of highly organized geometrical structures of a sunflower has been saved in a physical mass of a seed.
Fig. 2.4

(ab) Illustrates examples of chemical-based coding and information storage in plants’ seeds versus electronic-based coding and information storage in an electronic memory applying binary coding system

Figures 2.5, 2.6, 2.7, and 2.8 provide examples of chemical-based coding in the neural system. The entire brain mass is generated from coded material. In fact, the brain is a chemical-based computer which works based on fundamental principles of cognitive chemistry rather than statistical chemistry (or molecular randomness and statistical possibility in Schrodinger’s definition for a non-living system).
Fig. 2.5

(a, b) The macroscopic and microscopic structure of the brain. The physical mass of the brain, which is totally made of coded molecules, provides a highly efficient matrix for information storage and memory formation. (b) Microscopic imaging of mice brain applying Brainbow method and mosaic expression of fluorescent proteins in neural cells. Microsections of mice brain were taken from dentate gyrus of the hippocampus (Image in part b is adapted from Weissman, et al., [36]) 

Fig. 2.6

(a, b) Coexpression of florescent proteins in specific areas of brain by Brainbow method. (a) Hippocampus neurons. (b) Cortex neurons. Applying Brainbow method, neural cells were labeled with FPs that specifically are coexpressed with molecular markers for different parts of the brain. Image obtained at higher magnification reveals the whole neural networks of brain are wired by information-enriched molecules (more than electrical neural networks). Source of image (images are adapted from [36])

Fig. 2.7

(a, b) Microscopic images of neural cells, illustrating the direct role of molecules of neurotransmitters in both data transportation as well as memory formation in neural synapses and consequently neural networks. (a) A cerebellar folium from the Brainbow mouse line H was imaged using confocal microscopy. Three-dimensional volume indicated in the box was segmented using semiautomated methods and reconstructed digitally. (b) Digital reconstruction of axons and granule neurons from volume marked in A (Images are adapted from [34])

Fig. 2.8

(ab) Microscopic images of labeled neural cells at high magnification by dark-field microscopy as well as electron microscopy. (a) Microscopic imaging of neurons by expression of different FPs with various neurotransmitters. (b) A colorized scanning electron microscope image of a nerve ending that has been broken open to illustrate the synaptic vesicles containing information-enriched molecules of neurotransmitters (orange and blue) beneath the cell membrane. Unlike the electronic coding system in silicon-based computers, the major mass of neurons is generated from information-enriched molecules. In fact, a unique property of brain’s coding system compared to electronic coding system is the integration of hardware and software in the whole physical mass of the brain. (Source of images: a – [36]. b – http://www.cellimagelibrary.org/images/214)

Figure 2.5a illustrates the physical mass of brain, which is the initial origin of memory formation. The brain is the only computing system that has the capacity to generate expandable memory volume in its limited physical mass. The main difference between the brain and current electronic computers is in the brain’s molecular-based coding system. This molecular coding system is the origin of electrical activities of neural cells. Furthermore, memory formation originates from the molecular coding system of neural cells. While electrical coding in silicon-based computers is based on transportation of electrons over a uniform metal (e.g., in wires and transistors) media, the brain is made of soft material.

Figure 2.5b illustrates an immunohistochemistry (IHC) imaging of brain microsections. Tissue microsections are stained against specific molecular markers of neural cells at different layers of the brain. Figure 2.6a–e illustrates IHC staining of the brain. Microsections were stained against molecular markers of various types of brain neural cells at different layers of the brain.

Figure 2.7 illustrates the direct role of the molecules of neurotransmitters in both data transportation as well as memory formation through their role in the formation of neural synapses and consequently neural networks. Neurotransmitters are a class of coded chemical molecules that play a major role in the formation of memory through neural networks. Neurotransmitters function as chemical (molecular)-based logic gates and switches. Initiation of the action potential and electrical activity of neurons are regulated by the logic gate function of neurotransmitters at the location of neural synapses. Consequently, neural networks are forming, by coupling of neural synapses together.

Figure 2.8a illustrates microscopic imaging of IHC staining of the brain sections against different types of neurotransmitters in neural cells. Microscopic imaging of neural cells at high magnification illustrates the excretion of neurotransmitters from neural cells into synaptic spaces. Unlike binary coding in silicon-based computers which are completely an electronic-based coding system, data operation in neural cells involves both molecular and electrical coding systems. In fact, even the electrical activity of neural cells is originated from molecular coding by neurotransmitters, which possess a specific molecular mass.

Figure 2.8b illustrates electron microscopy imaging of a neural network. Figure 2.8c illustrates electron microscopy imaging of vesicles of neurotransmitters in a neuron. Unlike an electronic coding system in silicon-based computers that lack physical mass, the major mass of neurons as coding subunits of a neural network is generated from coded materials (e.g., vesicles of neurotransmitters and signaling proteins).

Considering that basic difference of brain operation system compared to electronic computers is related to the existence of molecular-based coding system (through different types of coded materials), it can be concluded that unique capacities of the brain (e.g., expandable memory, learning, etc.) can be explained and even simulated in synthetic form through innovative molecular-based coding systems for next-generation cognitive systems.

2.2.2 Multilayer Coding

Unlike current silicon-based computing systems that utilize one layer of linear binary coding system, biological systems apply multiple layers of coding. In addition, each layer of coding in a biological system involves different types of coding subunits. For example, multilayer coding in biological systems involves coding languages of DNA, mRNA, amino acids, peptides, protein-protein interaction, signaling pathways, systemic signaling pathways of endocrine hormones, as well as neurotransmitters and neural networks. Figure 2.9a illustrates schematic images representative of DNA, amino acid, and protein-coding layers in a cell versus linear binary coding in an electronic-based computer. Figure 2.9c illustrates schematic representative of standard DNA-amino acid codon in biological systems.
Fig. 2.9

(ad) Schematic representative of multilayer coding in biological systems. (a) Formation of genetic coding layer by special sequences of nucleotides in DNA molecule. (b) Translation of DNA genetic code from nucleotides’ coding language to amino acids’ coding system and formation of a protein which would be a coding subunit of a cell signaling network. (c) Interactions of signaling protein units with each other and formation of a protein-based cell signal transduction network. (d) Dynamic data operation in a cell by protein-based cell signaling networks

Despite several attempts for generation of biologically inspired computing algorithms, the current biologically inspired algorithms almost are established only based on one layer of coding. Genetic algorithm, ant colony, and DNA computing are the most popular biologically inspired computing algorithms [5, 6, 10, 21, 28, 33]. The exquisite accuracy and efficiency of data operation in living cells through the highly complex and interconnected DNA-protein NP networks motivated us to determine a computing algorithm which is leveraged from the wide computation potentials of both nucleic acids and proteins for solving non-biological NP problems (which would be discussed in the following parts of this chapter). Figure 2.10 illustrates a schematic image representative of multilayer property of the biological coding system. Codes are defined by alphabetic symbols. The nucleic acid coding layer is made of four coding subunits and protein-coding layer is made of 20 amino acid coding subunits.
Fig. 2.10

Schematic image representative of alphabetic symbols of nucleotide and amino acids in DNA, RNA, and protein-coding layers, respectively. The nucleic acid coding layer is made of four coding subunits including A, U, G, and C. Protein-coding layer is made of 20 amino acid coding subunits

2.2.3 Molecular Coding and Algorithmic Chemistry

Unlike a silicon-based computers that is operated by an electronic-based coding system, all biological functions in living cells are operated by chemical-based coding systems. Figure 2.10 illustrates alphabetic symbols of the universal nucleotide/amino acid coding system in living systems.

Although typically molecular codes in cognitive chemistry system are presented as sequences of molecular sub-units (e.g with sequences of 4 DNA nucleotides or 20 amino acids), the real codes in this molecular coding system are hidden in electrochemical attractive forces among molecules and atoms. Figure 2.11a illustrates a schematic image representative of electrochemical attractive forces among molecules which can be applied in designing of different types of coding molecules. Figure 2.11b indicates a schematic image representative of electrochemical interactions between two amino acids (aspartate and lysine). Dynamic electrochemical interactions between amino acids with each other in a protein or among different proteins provide a highly efficient coding and signal transduction system in living cells. Figure 2.11c indicates electrochemical interactions between nucleotides A-T and G-C.
Fig. 2.11

(ac) Schematic image representative of electrochemical interactions between four coding subunits of DNA nucleotides as well as the side chains of two exemplary amino acids. (a) Illustrates a schematic image representative of electrochemical attractive forces among molecules which can be applied in designing of different types of coding molecules. (b) Indicates a schematic image representative of electrochemical interactions between two amino acids (aspartate and lysine). (c) Indicates electrochemical interactions between nucleotides A-T and G-C

Electrochemical attractive forces among coding molecules can be ordered in the form of various molecular algorithms for different computational purposes. For example, electrochemical forces among coding molecules can be translated into values and integers based on the level of attraction strength between chemical bonds.

The significance and capacities of cognitive chemistry coding and data operation system include:
  • The massive information storage and data compaction capacity through multiple layers of coding.

  • The capacity of massive parallel operation applying numerous molecular operators.

  • The autonomous data operation and environmental recognition capacity through the electrochemical interactions of coding molecules with each other, as well as the physicochemical environmental factors. Autonomous data operation and environmental recognition capacity of coding molecules can be explained by quantum mechanics effect of atoms on each other (discussed in Chap.  3).

  • The spontaneous and combinatorial nature of molecular interactions, at each coding and operation layer, leads to the real-time and massive operation capacity of a cognitive chemistry coding system (e.g., for solving complex mathematical problems).

  • Electrochemical attractive forces among different types of coding molecules can be quantified by specific affinity indexes. Affinity indexes among coding molecules can be translated to the values and integers and be applied in designing of novel chemical-based operating systems for solving complex mathematical problems such as nondeterministic polynomial time (NP) problems. Biological systems provide highly efficient models for solving complex problems. In the following part of this chapter, we will discuss the application of biomimetic approaches for solving NP problems.

Figure 2.12 illustrates a schematic images representative of a protein folding-related NP problem. Figure 2.12.I illustrates a schematic image representation of a sequence of amino acids in an exemplary protein before folding. Figure 2.12.II illustrate schematic image representation electrochemical attractive forces among amino acids during protein folding. Amino acids are determined as nodes and electrochemical interactions between amino acids are determined by edges. The schematic image illustrates different possibilities of protein folding through all various combinatorial electrochemical attractive forces among amino acids. As it can be observed in the picture, the combinatorial nature of the molecular interactions makes a NP network for the protein folding problem. Therefore, for prediction of 3D structure of proteins applying current silicon-based computers, by increasing  the number of amino acids, the operation time increases exponentially. However, protein folding is a highly precise process in biological conditions. Despite the existence of all protein folding possibilities through the electrochemical interactions of amino acids with each other, only one folding condition will happen in real time (Fig. 2. 12. III, IV).
Fig. 2.12

Schematic images representative of a protein folding-related NP problem and the visualization of various possible scenarios of folding through electrochemical attractive forces among amino acids. (I) Schematic image representative of the sequence of amino acids in a protein before folding. (II) Schematic image representative of visualization of electrochemical interactions among amino acid sequences in an exemplary protein. Amino acids are determined as nodes and electrochemical interactions among amino acids are determined by edges. The schematic image illustrates different possibilities of protein folding through the electrochemical interaction of amino acids on each other. (III) Despite the existence of all different protein folding possibilities through the electrochemical interactions of amino acids’ side chains on each other, only one folding condition will happen in real time. (IV) The optimal solution in a complex NP problem of protein folding is the most electrochemically stable conformation of protein molecule and can be detected by the crystallography of protein. The optimal solution of the problem is highly accurate and repeatable because it provides the most stable molecular structure through the Gibbs energy level for the molecule

Proteins can be considered as information enriched polymers. In fact, the instruction for the 3-D structure of a protein is embedded in the sequences of amino acids and the electrochemical attractive forces among them. Within a given physiological condition, despite the existence of all different possibilities for protein folding, in reality, only one folding condition happens which is the most electrochemically stable conformation of a protein molecule (Fig. 2.12). The optimal solution of protein folding occurs immediately, after exposure with aqueous media in a physiological condition. In theory, the high accuracy of protein folding from the viewpoint of possibilities and electrochemical forces between amino acids is similar to the Erwin Schrodinger’s cat box theory which means all conditions are possible at the same time (Fig. 2.13). But, at each condition only one molecular folding forms and can be detected with a stable 3D structure, immediately.
Fig. 2.13

Schematic image representative of the interpretation of quantum mechanics, initially explained by Erwin Schrodinger [26]. A cat, a flask of poison, and a radioactive source are placed in a sealed box. If an internal monitor (e.g., Geiger counter) detects radioactivity, e.g., a single atom decaying, the flask is shuttered, releasing the poison, which kills the cat. Therefore, despite the existence of the possibility of both live and dead, only one condition would be observed after opening of the box

The optimal solution of a protein folding problem is highly accurate and repeatable because it provides the most stable spatial conformation through the Gibbs free energy level for the molecule. Protein folding mechanism can be simulated for solving nondeterministic polynomial time (NP) problems.

The electrochemical properties of amino acids can be applied as the coding and operational criteria for several computable functions, for example, designing of soft logic circuit networks applying the logic gate function and signal-transducing activities of proteins.

Furthermore, it needs to be considered that electrochemical attractive forces among amino acids’ side chains are highly affected by other physicochemical environmental factors. This causes the spatiotemporal flexibility in the conformation shape of proteins and subsequently their interactive affinity with other molecules. Therefore, unlike the static structure of crystals (in nonliving system), 3-D structures of proteins are highly flexible and dynamic. The highly dynamic conformational shape of proteins makes these molecules highly efficient logic gates for various biological purposes (please see part 4, conditional coding).

For future research, we suggest the careful design of DNA and protein-coding sequences for generation of autonomous systems. In addition, hybrid DNA-protein algorithms can be applied for generation of algorithmic self-assembly by the guided design of electrochemical attractive forces among coding molecules. A detailed description of algorithmic self-assembly and autonomy is discussed in Chap.  3.

2.2.4 Inherent and Conserved Coding

A unique property of biological coding is related to the capacity of data inheritance, which means information can be transferred from one generation to the next one. DNA provides a highly stable matrix for data inheritance among different generations in a biological system. For example, DNA residuals of ancient oil-producing bacteria can be detected in oil samples (Fig. 2.14). In addition, DNA provides a highly stable media for the conservation of biological data for maintenance of species over millions of years.
Fig. 2.14

An oil sample (right) is grown in an enriched substrate to isolate microbes; pure culture (left) of a strain of biosurfactant-producing microbes isolated from the oil sample

2.2.5 Conditional Coding

A special property of coded molecules (e.g., DNA or proteins) is related to their capacity for direct connection and reactivity to their environmental factors. In fact, even though biological data are stored in a highly conserved and stable-coded material like DNA, the coding system is still highly dynamic and able to adapt to the environmental conditions. In fact, DNA molecule is highly connected to the environmental factors and its expression always is affected by environmental conditions.

This property of biological coding for adaptation with environmental conditions here is called conditional coding. Figure 2.15 provides an example for the existence of conditional coding in a growing seed.
Fig. 2.15

Conditional coding in a plant’s seeds. Initiation of growth in plants’ seeds is conditional to the existence of special levels of light temperature and humidity

A DNA molecule in a seed carries the information for all biological operations and generates a complete plant, but the growth operation process will not start until the initiation of special environmental conditions including light, temperature, and specific level of humidity. This mechanism is highly regulated through intermediate molecular switches. Figure 2.16 provides schematic illustration representative of conditional coding through logic gate switches for activation of growth-signaling pathways.
Fig. 2.16

(ac) Schematic image representative of conditional coding in a plant’s seed. (a) DNA molecule in a seed carries the information for all biological operations and generates a complete plant, but the growth operation process will not start until the existence of special environmental conditions including light, temperature, and a specific level of humidity. (b) Expression of amylase is critical for initiation of growth-related pathways in a seed. In fact, expression of amylase is essential for initiation of metabolic and energy-producing pathways. (c) Schematic representative the function of logic gate switches for activation of growth-signaling pathways

Expression of amylase, an enzyme, is critical for initiation of growth-related pathways in a seed such as metabolic and energy-producing pathways. Enzymatic activity of amylase on starch (which is the main energy storage resource for the seed) causes the release of glucose. Consequently, glucose molecules are accessible for metabolic pathways and provide the energy requirements for early stages of growth before the generation of photosynthesis machinery. Expression of the amylase gene is conditional on the existence of light, temperature, and humidity.

Conditional coding in cells is a great model for a generation of synthetic systems with the capacity of quick adaptation with the environmental factors. Unlike current computers, DNA involves special sequences that are not carrying any information for the synthesis of biomolecules. These sequences are called noncoding sequences. DNA noncoding sequences are specialized for direct connection with environmental factors. These noncoding sequences also are called gene regulatory sequences. Gene regulatory sequences are able to recognize and provide an appropriate response to the environmental factors through a set of intermediate molecules which function as logic gate switches.

The high level of structural flexibility of proteins and their interactions’ capability with each other and other biochemical components make them highly efficient switching elements for data processing networks in biological systems. For example, in a protein-protein signal transduction and data processing network, proteins can function as switching elements or signal transducers [15, 17].

Switching capability of proteins is due to a conformational change induced by an input signal [17]. Signal transduction occurs by a switching protein (e.g., due to the phosphorylation or interactions with cAMP). In addition, the intermolecular allosteric interaction between a regulatory domain (receiving the input signal) and a functional domain (transmitting the output signal) makes the logic gate activities of proteins. Signal-transducing proteins are components of logical gates in biological data processing systems. Switches can be used to carry out logical operations of the type NOT, AND, OR, and NOR according to the rules of Boolean algebra. These operations are sufficient to process any kind of logical information [14].

2.2.6 Dynamic and De Novo Coding

One of 7 the special properties of biological systems is related to their de novo coding based on their dynamic environmental condition. Benefits of understanding these mechanisms are not limited only to the biomedical science as they can be applied in machine learning and engineering of internally intelligent systems. For instance, a remarkable property of the immune system is the capacity of real-time de novo coding for synthesis of antibodies, just after the antigen presentation to the body. Applying this mechanism, the immune system (even with the existence of a limited number of immunoglobulin G genes) is able to respond to the antigens that never have had a single specific gene in the genome. However, mechanisms of dynamic coding, such as alternative splicing and rearrangement of coding subunits of immunoglobulins, cause the production of highly specific antibodies against antigens that are presented to the body for the first time (Fig. 2.17).
Fig. 2.17

A schematic image representative of the mechanism of mRNA alternative splicing. Alternative splicing of mRNA molecules causes the production of different types of proteins from one gene, based on the dynamic microenvironmental signals

2.2.7 Integration of Software and Hardware

Another property of biological coding is related to the unity of data storage and operating system. The entire coding and operating units in biological systems are derived from multifunctional materials that perform different functions at different stages and based on environmental conditions (Figs. 2.18 and 2.19). Unity of software and hardware will be discussed in more detail in Chap.  3.
Fig. 2.18

(ad) Illustration of integration of software and hardware in a multilayer molecular-based coding system. (a) A schematic image representative of gene regulatory networks encoding the information for generating a plant. (b, c) A combination of multilayer coding and molecular coding causes the unity of software and hardware, unlike the electron-based binary coding where a software needs a hardware (e.g., 3-D printer) to generate a physical object (d)

Fig. 2.19

(ad) Schematic image representative of integration of software and hardware in biological systems. (a) DNA coding layer which encodes the information for the synthesis of a membrane signal transducer. (b, c) Schematic image illustrating that the output of the first layer of coding is an input for the next layer. For example, the output of the first layer of coding functions as a part of a protein signaling pathway as well. (d) Schematic image representative of protein-protein interaction signaling pathway. (e, f) IHC staining of fibroblast against a cytoplasmic signaling protein. (e) IHC staining against a cytoskeletal protein (red color) and a jap junction (green color) which are playing bifunctional roles in cell signal transduction (software function) as well as the formation of the physical structure of cell (hardware) at the same time

2.3 Cognitive Chemistry Algorithm: A Biomimetic Multilayer Molecular Coding Algorithm for Solving NP Problems

In computational complexity theory Nondeterministic Polynomial time (NP) problems are a class of combinatorial optimization problems that are solvable in polynomial time by a theoretical non-deterministic Turing machine. That means, regarding the combinatorial nature of NP problems, the required time for solving the problem is a polynomial function of the size of input data. Therefore, by increase in the size of in put data, the operation time for solving of problem, increases exponentially. NP problems are defined as one of the most important open questions in mathematics. NP problems also are categorized as a class of decision-making problems in computational complexity theory. The main importance of NP problems is related to their vast application in designing of decision-making algorithms in artificial intelligence (AI) [9].

There have been significant efforts in conceiving computational approaches for solving NP problems applying current silicon-based and quantum computers. Living cells apply highly efficient computational methods to solve similar NP problems for different biological benefits. For example, during the early stages of embryonic development, thousands of NP problems can be operated per second in parallel to each other through the developmental signaling pathways in differentiating stem cells (Fig. 2.20). Stem cells can make thousands of decisions per second by finding the optimal combination of gene regulatory factors toward different destinations (e.g., differentiation toward various tissues and organs).
Fig. 2.20

Schematic representative of the special capacity of stem cells for real-time operation of NP problems during early stages of embryonic development. (a) A schematic illustrating the decision-making capacity of stem cells for differentiation to different tissues and organs. (b) Stem cells can make thousands of decisions per second by finding the optimal combinations of gene regulatory factors in a NP problem with a network as large as the entire of their genome

As an example of NP problems, we refer to traveling salesman problem (TSP). TSP is a combinatorial optimization problem, important in both operations research as well as theoretical computer science. Briefly, the TSP asks the following question: Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city once and only once and returns to the origin city? [2, 8].

Figure 2.21 illustrates a TSP for an airway optimization problem. Figure 2.22 illustrates a schematic representation of a TSP with 6 cities and 15 roads. Cities are demonstrated by nodes, and roads between cities are shown as edges. Numerical values are representative of distances between each pair of cities. As it can be observed in Figs. 2.21 and 2.22, due to the combinatorial nature of the TSP problem, as the number of cities increases, the complexity of the problem increases exponentially.
Fig. 2.21

Schematic representative of a TSP problem in airway optimization. In a given number of cities, the problem asks for the shortest possible pathway that visits each city exactly once and comes back to the original city

Fig. 2.22

A simple schematic representative of a TSP. Cities are shown by nodes and roads are shown by vertices between each pair of nodes. Values are representative of various distances between each pair of cities

The main limitation of current computers in solving NP problems is related to their sequential operation capacity. The theory of computer science is based on universal Turing machines (UTMs), which initially was described by Alan Turing [4, 33]. Based on the Turing theory, a task is computable if it could be specified by a sequence of instructions which results in the completion of a task applying the same machine. Modern digital computers are physical embodiments of classical UTMs. Modern computers operate at enormous speeds, capable of executing more than 1015 instructions per second (current fastest computer has a Linpack performance of 93 peta FLOPS). However, their sequential approach to data processing is the main limitation for solving combinatorial problems such as TSP.

Current approaches for solving NP problems are mainly focused on finding the near to optimal solutions and applying local search and semi-sequential operation algorithms (Fig. 2.23) [37]. Therefore, TSP is still classified in the category of NP problems.
Fig. 2.23

Schematic image representative of the visualization of a local search method for solving a NP problem

However, the high accuracy of the computational method (e.g., in finding the exact optimal solution in NP problems) is critical for biological purposes. In fact, analysis of gene expression patterns in genetic diseases compared to the normal samples indicates that near to optimal solution is not sufficient for biological purposes. In some cases, variation in one amino acid codon, over the entirety of the human genome, can cause a genetic disorder. For example, sickle cell anemia is a genetic disorder, well known by a single nucleotide mutation in B-globin gene, which results in a glutamic acid being substituted by valine. Figure 2.24 illustrates an schematic image related to changes in the morphology of red blood cell in a sickle cell anemia patient.
Fig. 2.24

A schematic image illustrating the morphology of red blood cells in a sickle cell anemia patient (left) compared to the normal condition (right). Sickle cell anemia is a genetic disorder that occurs by mutation of a single nucleotide in globin gene. This mutation results in a glutamic acid being substituted by valine

Furthermore, results of genetic analysis of human and chimpanzee indicate 98% homology between the human’s and chimpanzee’s genome. It can be concluded that even 2% difference in the entire developmental stages of a human and chimpanzee embryo can make great structural and functional variations in biological systems (Fig. 2.25).
Fig. 2.25

(a, b) The high accuracy of computation method (e.g., in finding the exact optimal solution in NP problems) is critical in biological systems. (a) Results of genetic analysis of human and chimpanzee indicate more than 98% homology between a human and chimpanzee genome. (b) Less than 2% difference in the entire developmental stages of human and chimpanzee makes a great variation at structural and functional level (Image is adapted from [4])

Comparative analysis of genetic in different species reveals that near to optimum solution in solving NP problems is not sufficient for biological purposes. Biological systems apply special data operation algorithms to find the exact optimal solutions for NP problem in real time.

There is still no model that provides a clear explanation for this high level of accuracy and efficiency in computational methods of biological systems.

On the other hand, recent studies on complexity problems across different industries indicated that real-life NP problems are geometrical problems with multidimensional complexity in their networks [4, 23].

In fact, current computing methods which are working based on linear operation are not sufficient for solving NP problems with geometrical complexity in their networks. Solving NP problems requires innovative approaches in designing algorithms with the nonlinear and parallel operation capacities.

Biological coding systems are promising models for designing such algorithms that satisfy the requirements of multidimensional and parallel operation to overcome the complexity of NP problems. Figure 2.26 shows a schematic image representative of a TSP problem in a complex gene regulatory network. Spatiotemporal patterns of gene expression cause the geometrical complexity of NP problems in biological systems.
Fig. 2.26

A schematic image representative of visualization of data analysis in a biological gene regulatory network applying CIRCOS genome visualization tools (Image courtesy of [16]). The schematic also indicates the combinatorial nature of genomic data processing in biological systems. Genes which are representative of nodes in a TSP problem are determined by different colored barcodes. Combinatorial effect of genes on each other is determined by connecting edges among genes.

Biologically inspired algorithms such as neural network [31] and ant colony [21, 38, 39] have been applied extensively for various computational purposes. At the molecular level, DNA computing has been studied initially by Leonard Adleman [1]. Since then, DNA has been the major focus of several research studies in designing biomimetic computing algorithms [1, 5, 13, 14, 18, 24, 27, 32, 39].

Other biomolecules including RNA and proteins also, have been applied for computing purpose [8, 11, 18, 20, 34]. Faulhammer and coworkers reported an RNA-based computation solution to chess problem [7]. Unger and Moult designed a molecular-based computing system in which proteins were used as NADN logic gates [34]. Nicolau et al., (2016) reported a parallel computational method with molecular- motor-propelled agents in nanofabricated protein-based networks [18].

However, there is still very limited information about the biologically inspired algorithms that could simulate multilayer coding and operation capacities of living cells. For example, gene regulatory networks or protein-protein signal transduction networks have yet remained to be applied as models for the generation of biomimetic computing algorithms.

Simulation of mechanisms of multilayer coding and data processing in cell signaling pathways can provide powerful tools in designing innovative algorithms for solving NP problems.

To summarize, the significance and capacities of cognitive chemistry coding and data operation system include:
  • The massive information storage capacity through multiple levels of coding.

  • The capacity of massive parallel operation applying numerous number of molecular operators (e.g. DNA, RNA, and Proteins).

  • The capacity of autonomous data operation through the electrochemical interactions of molecules and atoms on each other which can be described through the quantum mechanic’s effect of atoms on each other (e.g., the autonomous formation of three-dimensional conformation of proteins through the amino acid-amino acid interaction during protein folding).

  • The capacity of multilayer operation applying different types of coding molecules and materials at different levels of complexity. Figure 2.27 illustrates schematic images representative of massive operation capacity of cognitive chemistry coding system for solving NP problems at different layers of complexity including gene regulatory networks, protein-protein interaction networks, as well as amino acids interaction networks during protein folding.

Fig. 2.27

A schematic representation of the multilayer capacity of cognitive chemistry system for solving NP problems by simulation of mechanisms of coding and data processing at different layers of DNA, protein, and amino acid coding systems

The autonomous and combinatorial nature of molecular interactions, at each coding and operation layer, leads to the real-time and massive operation capacity of the system in solving NP problems. Though not discussed here, the exquisite accuracy and efficiency of data operation in living cells through the highly complex and interconnected DNA-protein NP networks motivated us to develop a biomimietic computing algorithm. This algorithm is leveraged from the wide computation potentials of both nucleic acids and proteins for solving non-biological NP problems.

References

  1. 1.
    Adleman L (1994) Molecular computation of solutions to combinatorial problems. Science 266(5187):1021–1024Google Scholar
  2. 2.
    Afaq H, Saini S (2011) On the solutions to the traveling salesman problem using nature-inspired computing techniques. IJCSI 8:326–334 Google Scholar
  3. 3.
    Allard A, Serrano MA, Garcia-Perez G, Boguna M (2017) The geometric nature of weights in real complex networks. Nat Commun 216:1–8Google Scholar
  4. 4.
    Bradbury J (2005) Molecular insight in to human brain evolution. PLoS Biol 3:0367–0370CrossRefGoogle Scholar
  5. 5.
    Chen YJ, Dalchau N, Srinivas N, Phillips A, Cardelli L, Soloveichik D, Seelig G (2013) Programmable chemical controllers made from DNA. Nat Nanotechnol 8:755–762CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    De Castro LN (2007) Fundamentals of natural computing: an overview. Phys Life Rev 4:1–36CrossRefGoogle Scholar
  7. 7.
    Faulhemmer D, et al. (2000) Molecular computation: RNA solution to chess problems. PNAS, USA 57(4):1385-1389Google Scholar
  8. 8.
    Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP completeness. W. H. Freeman & Company, New York, U.S.AGoogle Scholar
  9. 9.
    Feng C et al (2013) Codon usage patterns in Chinese bayberry (Myrica rubra) based on RNA sequencing data. BMC Genomics 14:732CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Hug H, Schuler R (2001) Strategies for the development of a peptide computer. Bioinformatics 17:364–368CrossRefPubMedGoogle Scholar
  11. 11.
  12. 12.
    Kim J et al (2008) An extended transcriptional net-work for pluripotency of embryonic stem cells. Cell 132:1049–1061CrossRefPubMedGoogle Scholar
  13. 13.
    Lee JY, Shin SY, Park TH, Zhang BT (2004) Solving traveling salesman problems with DNA molecules encoding numerical values. Biosystems 78:39–47CrossRefPubMedGoogle Scholar
  14. 14.
    Liu Q et al (2000) DNA computing on surface. Nature 403:175–179CrossRefPubMedGoogle Scholar
  15. 15.
    Mark F, Klingmuller U, Decker K (2009) Cellular signal processing, an information to the molecular mechanism of signal transduction. USA, Gaelan Science, Tylor and Francis GroupGoogle Scholar
  16. 16.
    Naquin D, Aubenton-Carafa Y, Thernes C, Silvain M (2014) Circus: a package for circus display of structural genome variation for paired-end and mate-pair sequencing data. BMC Bioinformatics 14:198CrossRefGoogle Scholar
  17. 17.
    Nelson DL, Cox M (2017) Lehninger principles of biochemistry, 7th edn. W.H. Freeman & Company, New YorkGoogle Scholar
  18. 18.
    Nicolau D Jr et al (2016) Parallel computation with molecular- motor- propelled agents in nanofabricated networks. PNAS 13:2591–2596CrossRefGoogle Scholar
  19. 19.
    Qian L, Winfree E (2011) Scaling up digital circuit computation with DNA strand displacement cascades. Science 332:1196–2011CrossRefPubMedGoogle Scholar
  20. 20.
    Redriguez RA, Yu L, Chen LY (2015) Computing protein- protein association affinity with hybrid steered molecular dynamics. J Chem Theory Comput 11:4427–4438CrossRefGoogle Scholar
  21. 21.
    Roy S (2013) Bioinspired ant algorithms, a review. J Modern Education Comput Sci 4:25–35CrossRefGoogle Scholar
  22. 22.
    Rubens J, Selvaggio G, Lu TK (2016) Synthetic mixed signal computation in living cells. Nat Commun 2016(7):11658CrossRefGoogle Scholar
  23. 23.
    Rune J et al (2015) Identifying causal gateways and mediators in complex spatiotemporal systems. Nat Commun 6:8502CrossRefGoogle Scholar
  24. 24.
    Schatz MC, Langmead B, Sazberg S (2010) Cloud computing and DNA data race. Nat Biotechnol 28:691–693CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Schrodinger E (1944) What is life. Cambridge University Press, UKGoogle Scholar
  26. 26.
    Schrodinger E (1935) Die gegenwartige Situation in der Quantenmechanik. Die Naturwissenschaften 23 (48):807–812Google Scholar
  27. 27.
    Shapiro E, Ran T (2013) DNA computing: molecules reach consensus. Nat Nanotechnol 8:703–705CrossRefPubMedGoogle Scholar
  28. 28.
    Singh S, Lodhi EA (2013) Study of variation in TSP using genetic algorithm and its operator comparison. IJSCE 3:2231–2307Google Scholar
  29. 29.
    Siuti P, Yazbek J, Lu TK (2013) Synthetic circuits integrating logic and memory in living cells. Nat Biotechnol 2013(31):448–452CrossRefGoogle Scholar
  30. 30.
    Tarkov MS (2015) Solving the traveling salesman problem using a recurrent neural network. Am Anal Appl 8:275–283Google Scholar
  31. 31.
    Tulpan D (2014) Thermodynamic post processing versus GC- content pre- processing for DNA codes satisfying hamming distance and reverse-complement constraints. JEEEA ACM Trans Comput Biol Bioinform 11(2):441–452CrossRefGoogle Scholar
  32. 32.
    Turing A (1936) On computable numbers with an application to Entcheidung problem. Proc Lond Math Soc II Ser 42:230–265Google Scholar
  33. 33.
    Unger R, Moult J (2006) Towards computing with proteins. Proteins 63(53–64):9Google Scholar
  34. 34.
    Wang Z, Dongmei H, Meng H, Tang C (2013) A new fast algorithm for solving minimum spanning tree problem based on DNA molecules computation. Biosystems 1114:1–7CrossRefGoogle Scholar
  35. 35.
    Weissman JA, Pan YA (2013) New resource and emerging biological application for multicolor genetic labeling analysis. Genetics 199(2):293–306CrossRefGoogle Scholar
  36. 36.
    Weissman JA et al (2011) Generating and imaging multicolor Brainbow mice, Cold Spring Harbor Laboratory Protoc. https://doi.org/10.1101/pdb.top114
  37. 37.
    Wong L, Low MYH, Chong CS (2010) Bee colony optimization with local search for traveling salesman problem. Int J Artif Intell Tools 19(3):305–334CrossRefGoogle Scholar
  38. 38.
    Yang J, Dung R, Zhang Y, Cong M, Wang F, Tang G (2015) An improved ant colony optimization (I-ACO) method for the quasi traveling salesman problem (Quasi-TSP). Int J Geogr Inf Sci 29:1534–1551CrossRefGoogle Scholar
  39. 39.
    Zhang M, Cheng M, Tarn JA (2006) Mathematical formulation of DNA computation. IEEE Trans Nanobioscience 5(1):32–40CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Tara Karimi
    • 1
  1. 1.Tulane Medical CenterTulane UniversityNew OrleansUSA

Personalised recommendations