1 Introduction

Precise function is an important feature of proteins in a crowded cellular environment. Eventually, and in most cases, a protein’s characteristic lies in the property of not sticking with copies of itself or other micro/macromolecules to form aggregates or structured assemblies. Both aggregation and crystallization of proteins inside the body of an organism (in vivo), either inside (in cellulo) or outside cells (ex cellulo) may have a detrimental effect on the cells or the organism.13 Alternatively, evolution imparts negative selection pressure on the proteins to avoid these two phenomena, as highlighted by the fact that it remains difficult to crystallize many soluble proteins in vitro. The native, functional form of a protein inside the cell is, therefore, its solution state.

In vivo protein crystallization has been considered to be an anomalous behavior till the late twentieth century. Due to their small sizes, these crystals had not been explored by X-ray diffraction until recently. In vivo-grown protein crystals have been observed from a varied group of organisms.12, 13, 22, 24 Positive natural selection pressure acts on these proteins for in vivo crystallization due to their functional significance. In most cases, high protein concentration drives crystallization of these proteins inside the living organism. Some of the biological functions for these in vivo protein crystals include: food storage as observed in plant seeds,13 cockroach milk proteins17 and eggs;22, 25, 34 pathogenicity in Bacillus thuringiensis;37 defense mechanism as observed in Paramecium38 and Tetrahymena36; storage for infectious viruses7,8,9 and avoiding proteolytic cleavage like in insulin.11 In vivo crystals have also been identified in some pathological conditions in human such as histiocytosis,12 hemoglobin C13 and cataract.24

Recent developments in the field of X-ray crystallography have shifted gears from macrocrystals to microcrystals. Microcrystals are referred to small crystals grown by either in vivo or recombinant methods. Microcrystallography refers to the specific set of experimental approaches for handling these crystals for structure determination4 and computational methods used to process these data. In vivo crystallography presents the advantage of studying the proteins in their functional niche.2, 21 It also gives the opportunity to study functionally important post-translational modifications—often ordered only partially.

In this review, we first list some examples of naturally occurring in vivo crystals and the crystallography studies carried out on some of them. It is followed by the development of technologies for microcrystallography and examples of few proteins, for which structures were determined by these emerging methods.

2 Naturally Occurring In Vivo Protein Crystals

2.1 Plant Seed Proteins

Seed germination during the life cycle of plants requires nourishment and energy to drive the process. The plant seeds store proteins ranging from ~ 10% (in cereals) up to ~ 40% (in certain legumes).33 These pools of proteins provide metabolic, structural and nutritional support during germination and seedling development. Most dicotyledonous plant seeds contain 7S and/or 11S globulins and albumins as the major storage proteins in the embryo or cotyledons.33 The proteins are deposited in crystalline form in the protein storage vacuoles (PSVs), highly specialized compartments. X-ray diffraction studies carried out on the dry and wet slices of pumpkin and rock melon seeds showed powder-like diffraction6 although the structure of the protein in situ could not be resolved. It may be hypothesized that evolution has imparted positive natural selection on the plant seed storage proteins for two reasons: (1) in vivo crystallization to store concentrated forms of the proteins in their functional state even in the dormant stage of the life cycle and (2) to escape the lytic environment of the PSVs. It may be hypothesized that to store a concentrated form of the protein to remain functional even in the dormant stage and to escape the lytic environment of the PSVs, evolution has imparted positive natural selection on the plant seed storage proteins for in vivo crystallization. Although these crystals were discovered a very long time ago, no structures have yet been determined from the in vivo crystals. In most of the near-native structural studies performed, the structure was studied after proteins have been isolated from the seeds and re-crystallized.6 This could also be because large crystals observed are an aggregate of small crystals. Often, attempts at diffraction on these only give powder patterns. So, the entire field of in vivo crystallography for seed storage proteins remains to be explored.

2.2 Trichocysts in Paramecium

Paramecium is known to release crystalline secretory granules as a defense mechanism13. In response to various external stimuli such as differences in its chemical environment or presence of a predator, Paramecium swims on the opposite direction, leaving behind a trail of crystalline needle-shaped ‘trichocysts’. These trichocysts are secretory products with stimulus-dependent release. The intracellular crystal is a mixture of small, acidic polypeptides with a molecular mass range of 15–20 kDa. This heterogeneity is a result of extensive proteolytic processing of the polypeptides. Structural studies on both undischarged and discharged trichocysts have been carried out using electron microscopy.3 With the advent of the latest technologies, initiatives could easily be foreseen to undertake in vivo structural studies on trichocysts from Paramecium and related species. The understanding of the processes which results in controlled release of these polypeptides will throw light into how nature uses crystalline material to carry out function.

2.3 Pro-toxin Proteins in Bacillus thuringiensis

Bacillus thuringiensis is a gram-positive bacteria used widely in agriculture as a bio-pesticide.32 It forms in vivo parasporal crystals of pro-toxins during the stationary phase of its growth cycle, which has insecticidal property. These proteins (Cry proteins) form a single large crystal covering the entire mother compartment. Figure 1 shows parasporal crystals of Cry3A toxin.31 There are different types of these proteins, each forming different shapes. The shapes could vary from cuboidal to rectangular to rhomboidal.32 After ingestion by insect larvae, these crystals dissolve in the alkaline environments of the gut, producing toxins and thus facilitating its entry into the insect.13 The ability of in vivo crystallization in B. thuringiensis could have evolved for the storage function of a highly concentrated protein in a limited space.31 In this particular case, crystallization also reduces the susceptibility of the pro-toxins for proteolytic cleavage.32 Interestingly here, Sawaya et al. determined the in vivo crystal structure of the Cry3A toxin using XFEL sources on Bt cells containing the naturally crystallized Cry3A toxin at 2.9 Å resolution.31 Already then, the authors suggest that in vivo diffraction studies can be carried out with authenticity to provide atomic-level structural information.

Figure 1:
figure 1

Adapted from Sawaya et al.31 with permission.

Cry3A toxin crystals used for XFEL diffraction studies. a Phase contrast light micrograph of sporulating rod-shaped Bt cells containing the dark rectangular-shaped toxin crystals. b Scanning electron micrograph of isolated Cry3A crystals. c Transmission electron micrograph of thin-sectioned Bt cells showing that the rectangular crystals, which are so large that the cells take the shape of the crystals.

2.4 Milk Proteins in Diploptera punctata

Diploptera punctata is the only known viviparous cockroach, which gives birth to young ones. After fertilization, the ootheca containing the fertilized eggs is deposited in the brood sac of the pregnant females.29 The mother nourishes these developing embryos by secreting milk proteins (Lili-Mips for Lipocalin-like-Milk Proteins) from the brood sac. As the concentration of the proteins increase inside the gut of the embryos, the surplus milk ingested is stored in crystalline form. Figure 2a shows the in vivo crystals of Lili-Mips crystallized inside the gut of the developing embryos. Lili-Mips are a heterogeneous mixture of polypeptide sequences with different primary amino acid structures, like observed in the trichocysts of Paramecium. However, it is also heterogeneous with respect to the extent of glycosylation and the bound fatty acids.2 The milk protein crystals serve as complete food for the embryos consisting of proteins, sugars and fats. The energy provided by Lili-Mips is 3–4 times more than most mammalian milks. Due to the higher volume of gut of the cockroach as compared to the cells, relatively large protein crystals (up to 10 × 10 × 30 μm3) could be observed.2 In spite of the large heterogeneity, these crystals diffract to atomic resolutions. The structure of this milk protein was determined by conventional X-ray crystallography (owing to the large size) using the anomalous signal from sulfur for phase determination. Figure 2b shows the 1.2 Å X-ray crystal structure of Lili-Mip. This is the first structure of a naturally occurring and chemically unaltered, heterogeneous protein crystal grown in vivo at atomic resolution.

Figure 2:
figure 2

Reproduced with permission of the International Union of Crystallography..

Lili-Mip crystals from D. punctata embryos. a Polarized microscopic image of protein crystals enclosed inside the embryo midgut and an enlarged view of the extracted crystals (inset). b X-ray crystal structure of Lili-Mip consisting of one C-terminal α-helix (light blue) and nine β-strands (magenta) forming a barrel to coordinate the lipid. The N-glycans (yellow) at the four glycosylation sites are modeled in 2Fo–Fc electron density (white).

2.5 Alcohol Oxidase in Hansenula polymorpha

Peroxisome is an important organelle in eukaryotes implicated in sequestered lipid metabolism and scavenging of reactive oxygen species.40 In vivo crystallization has been observed for peroxisomal enzymes in many organisms. Some of the examples of these peroxisomal enzymes include rat hepatocyte uricase16 and plant catalase.15 Crystals of alcohol oxidase in yeast peroxisomes10, 39 is one of the most common examples. Alcohol oxidase (AO) converts methanol and oxygen to formaldehyde and hydrogen peroxide. Their crystalline inclusions are found in methanol-utilizing yeasts like H. polymorpha (Hp) when grown on methanol as the carbon source.18 Figure 3 shows the in vivo crystals of HpAO grown inside the peroxisomes of the yeast.

Figure 3:
figure 3

Reproduced with permission of the International Union of Crystallography..

Crystalline alcohol oxidase (AO) in Hansenula polymorpha. Electron micrograph of Hp cells showing crystals of AO in peroxisomes (P) seen next to mitochondria (M) and a vacuole (V). The right section of the image shows the crystal in high magnification. The single membrane outlining the organelle and enclosing the crystal is to be noted.

The attempt to determine the in cellulo structure of HpAO was accomplished by Jakobi et al. 18. They used femtosecond pulses from an X-ray free-electron laser to collect diffraction data directly on yeast cells containing peroxisomal AO crystals. SFX diffraction up to 6 Å resolution from single micrometer-sized AO crystals was observed. The authors of the paper have developed the concept of in cellulo serial crystallography on protein targets imported into yeast peroxisomes without the need for protein purification or subsequent crystallization.

2.6 Crystalline Yolk Platelets in Non-mammalian Vertebrates

Yolk platelets are the most important component in the oocyte cytoplasm of non-mammalian vertebrates. They serve as inert reservoirs for utilization during embryonic and larval development of the organisms.20 Yolk proteins are produced in the liver as a result of estrogen stimulation and are derived from the cleavage of the lipoglycophosphoprotein called vitellogenin (VTG) in non-mammalian vertebrates. The mammalian liver has lost the ability to make VTG during the course of evolution.30 After synthesis, VTG is transported to the oocyte where it finally forms yolk globules, which get converted to yolk platelets. In the platelets, VTG is enzymatically cleaved into the two main yolk proteins, lipovitellins and phosvitins.27

Yolk platelets have two unique structural organizations. In many amphibians, ancient bony fishes and some teleosts, the crystalline structure corresponds to an orthorhombic array. In other teleosts, reptiles and birds, the yolk platelets are arranged as homogeneous non-crystalline structures. Electron diffraction patterns have given the unit cell dimensions for the crystals, which are highly similar across the species.

The biological significance for separate crystalline and non-crystalline structures is not known.27 It has been speculated that the highly conserved crystals in bony fishes and amphibian is a consequence of storing nutritional material in a limited volume of the ovum. Furthermore, it has been observed that most of the animals with crystalline yolk platelets live in fresh water habitats. It is possible that the platelet crystal provide some essential nutrients not available to the embryo in fresh water.22 Three-dimensional structural studies can now be undertaken given the progress of microcrystallography.

2.7 Protein Crystallization in Human Diseases

In cellulo crystals in plasma cells and lymphocytes are known to occur in many pathological conditions in humans like plasmacytoma, myeloma and lymphocytic leukemia. The crystalline bodies (CBs), i.e., the cells harboring the crystals have diverse shapes varying from rod-shaped, to rhombohedral, cubic, oval and spherical. The intracellular crystallization for immunoglobulins occurs in endocytic reticulum and lysosome compartments where phagocytosed immunoglobulins (Igs) are trafficked for recycling or for degradation.14 Crystal-storing histiocytosis (CSH), a rare condition in which crystalline material accumulates in the cytoplasm of histiocytes, is typically associated with disorders that express monoclonal immunoglobulins, such as multiple myeloma (MM), lymphoplasmacytic lymphoma (LPL), and monoclonal gammopathy of undetermined significance.12

Human γD crystallin is a member of a highly homologous family of mammalian lens proteins called the γ crystallins. Together with the α and β crystallins, these proteins are essential for maintaining lens transparency. Due to their intrinsic property, γ crystallins are more susceptible to aggregation and phase separation resulting in opacity of lens and finally cataract. Mutations in γD crystallin gene make the protein less soluble than wild type resulting in crystalline deposits.24

3 Recent Advances in Microcrystallography

Protein structure determination from in vivo-grown crystals has always been challenging. The cellular volume and the protein concentration within the cells limit the sizes of these crystals. Isolation and handling of these micrometer-sized crystals have been difficult in the conventional X-ray radiation sources. The ability to now undertake experiments that will allow structure determination of these in vivo and in cellulo crystals comes from significant developments in microcrystallography. Development of microfocus beamlines at third-generation synchrotrons and several X-ray free-electron laser (XFEL) beamlines have enabled crystallographers to determine structures from intrinsically small in vivo-grown protein crystals.

X-ray free-electron laser sources produce femtosecond X-ray pulses with wavelength of the range of 0.1–10 nm. Serial femtosecond crystallography (SFX) enables data collection by streaming across the beam, thousands of small, hydrated, randomly oriented protein crystals using a ‘one crystal, one shot’ approach.19 The use of XFEL in serial crystallography has enabled structural biologists to probe nano- and micrometer-sized crystals. Figure 4 schematically depicts an SFX experiment using XFEL sources. SFX experiments generate large datasets comprising of snapshots, with each snapshot capturing Bragg diffraction of single crystals in random orientations just before their destruction.23 Novel technologies related to data management strategies are being continuously developed to handle such large amounts of data.19

Figure 4:
figure 4

Schematic representation of an SFX experiment. X-ray femtosecond pulses are targeted to a suspension of flowing crystals. When the X-ray hit the crystals, there is a diffraction, which is recorded in the detector. Low hit rate, large datasets generated and small number of indexable frames are some characteristics of SFX. This leads to requirement of many thousands of crystals for successful complete data collection.

Beamlines with a cross-section in the range of 1–20 μm are known as microfocus beamlines. The increasing availability of dedicated microfocus beamlines has allowed a great expansion of the use of microcrystals for structure determination.4 This is achieved by increasing the signal to noise ratio, which can be achieved by two ways. First, the increased flux at the sample position is critical in determining if the beamline is suitable for the smallest crystals.4 The second is to reduce the surrounding noise, which can be achieved by replacing air between the crystal and detector by inert gases at low pressure. A combination of reduction of flux and decrease of surrounding noise can dramatically increase the signal to noise ratio.

The combined use of microfocus synchrotron beamlines with serial crystallography approaches allows protein structure determination with reasonably low number of micron-sized protein crystals. The technologies behind these methods are nascent and under intensive developments.28, 35 Crystallographers have pushed the boundaries of in vivo crystallography by inducing crystallization by heterologous expression, in the cytoplasm or in specific subcellular compartments (Chavas et al., unpublished data). Examples of induced systems include Trypanosoma cathepsin B21, 26 and cypopolyhedrin virus.1 Induced systems could enable the possibility to obtain crystals of proteins for which classical in vitro crystallization has been unsuccessful. Figure 5 depicts the differences between the workflow from sample preparation to data collection for in vitro crystals with in vitro diffraction, purified in vivo crystals with in vitro diffraction and in vivo crystals with in cellulo diffraction.5 When properly understood, in vivo crystallography could minimize the efforts invested in optimizing sample purification and in vitro crystallization.

Figure 5:
figure 5

Reproduced with permission of the International Union of Crystallography..

Comparison of macro- and microcrystallography methods. Top row: in vitro crystallization pipeline involves protein expression, purification, crystallization optimization and crystal cryoprotection. Middle row: in vivo crystallization approach involves crystals to be grown in the cells that express the protein, bypassing protein purification and the crystallization steps. Cells are lysed and crystals are purified and cryoprotected for data collection. Bottom row: in cellulo crystallization approach involves crystals to be produced as in the in vivo approach, but the host cells are not lysed. Crystal-containing cells are sorted by flow cytometry, stained with trypan blue and mounted on a support. Green, red and gray colors refer to least, most and identical demanding tasks, respectively.

4 Conclusion

While the presence of crystalline materials in vivo and inside cells (in cellulo) has been known for several decades now, structure determination of these proteins has not been easy and as described above very few structures have been determined. The development of serial microcrystallography has allowed for exploration of these crystals and structure determination at atomic resolution by X-ray crystallography. The number of known structures is limited and our (and others) continued efforts to determine more structures of in vivo crystals we hope will allow us to understand the principles (thermodynamic, kinetic and structural) of protein crystallization in vivo. This understanding will allow us to hopefully engineer cell lines that can then drive crystallization of proteins that have not been amenable to crystallization in vitro.