Abstract
Diatoms are central to the global carbon cycle. At the heart of diatom carbon fixation is an overlooked organelle called the pyrenoid, where concentrated CO2 is delivered to densely packed Rubisco. Diatom pyrenoids fix approximately one-fifth of global CO2 but virtually nothing is known about this organelle in diatoms. Using large-scale fluorescence protein tagging and affinity purification-mass spectrometry we generate a high-confidence spatially-defined protein-protein interaction network for the diatom pyrenoid. Within our pyrenoid interactome are 10 proteins with no known function. Six form a static shell encapsulating the Rubisco matrix of the pyrenoid, with the shell critical for pyrenoid structural integrity and potentially acting as a CO2 diffusion barrier. Although no conservation at a sequence level, the diatom pyrenoid shares some structural similarities to prokaryotic carboxysomes. Collectively, our results support the convergent evolution of pyrenoids across the two main plastid lineages and uncovers a major structural and functional component of global CO2 fixation.
Introduction
Approximately one-third of global carbon fixation takes place in pyrenoids1. Pyrenoids are biomolecular condensates of the principle CO2-fixing enzyme Rubisco found in the chloroplasts of algae.2 There are two major chloroplast lineages, the green and red plastids with algae within these lineages proposed to have convergently evolved pyrenoids.3,4 Nearly all of our knowledge of pyrenoid function and composition comes from the model terrestrial green plastid alga Chlamydomonas reinhardtii.2,5 C. reinhardtii is a powerful model organism6 but global carbon fixation is primarily driven by oceanic red plastid containing algae, such as diatoms, where our knowledge is still in its infancy.7–9 Diatoms are responsible for up to 20% of global net primary production in the modern ocean, are estimated to fix ∼70 gigatons CO2 yr-1, and are fundamental for the long-term storage of carbon by driving the flux of organic material from the ocean surface to sediments.10,11
Pyrenoids are found at the heart of algal CO2 concentrating mechanisms (CCMs).2,7 CCMs overcome the slow diffusion of CO2 in water and the catalytic limitations of Rubisco by actively pumping inorganic carbon from the external environment into the cell and releasing it as CO2 in the pyrenoid where it can be fixed by tightly packaged Rubisco. In C. reinhardtii a disordered linker protein, EPYC1, condenses Rubisco to form the liquid-liquid phase separated matrix of the pyrenoid.1,12,13 A shared Rubisco binding motif found in EPYC1 and numerous other pyrenoid components enables targeting to and structural organization of the pyrenoid. With specialized proteins containing this motif proposed to link the matrix to specialized traversing thylakoids called pyrenoid tubules and link the matrix to the surrounding starch sheath.14 Inorganic carbon in the form of HCO3- is shuttled into the pyrenoid tubules by bestrophin-like proteins15 where a carbonic anhydrase converts it to membrane permeable CO2 that can leak out and be fixed by Rubisco in the pyrenoid matrix.5,15,16 The surrounding starch acts as a diffusion barrier to minimize CO2 leakage out of the pyrenoid.17,18 Fluorescent protein tagging,14,19 affinity purification followed by mass spectrometry (APMS)19 and proximity labeling20 have enabled a high-confidence pyrenoid proteome to be determined with multiple components now functionally characterized. This has enabled C. reinhardtii proto-pyrenoid engineering in plants21 and a parts-list of components that should theoretically enable the engineering of a functional pyrenoid-based CCM to enhance plant photosynthesis.17,22
Diatom pyrenoids have some shared features with the C. reinhardtii pyrenoid including condensed Rubisco and traversing thylakoids, it is also proposed that they may function analogously with some, although limited, conservation of inorganic carbon delivery proteins.8 However, it is still unclear the level of conservation of structural proteins.7,8 The only structural component identified so far for the diatom pyrenoid is PYCO1 a Rubisco linker protein found in the pyrenoid of the pennate diatom Phaeodactylum tricornutum and suggested to be responsible for phase separating Rubisco to form the pyrenoid matrix.9 However, PYCO1 is not widely conserved, being absent in centric diatoms, and its functional importance is yet to be determined. Outside of PYCO1 most previous diatom CCM research has focused on inorganic carbon uptake. In P. tricornutum several candidates belonging to the SLC4 family of transporters have been proposed for HCO3- uptake at the plasma and chloroplast membranes.23,24 Bestrophin-like proteins are also implicated in the shuttling of HCO3- into the thylakoid lumen where a θ-type carbonic anhydrase has been identified that is thought to function by releasing CO2 from the thylakoid membranes that traverse the pyrenoid matrix.25,26 Less is known about inorganic carbon uptake in the centric diatom Thalassiosira pseudonana. SLC4 candidates have been implicated in inorganic carbon uptake across both the plasma and thylakoid membranes.8,27 Recently two bestrophin-like proteins were localized to the T. pseudonana pyrenoid, likely in the pyrenoid penetrating thylakoid (PPT)28,29 and a θ-type carbonic anhydrase 2 confirmed to be located in the PPT.8 Although, supporting functional data is missing for these proteins. Absent from diatom pyrenoids is a starch sheath encapsulating the pyrenoid. In C. reinhardtii flux balance modeling of pyrenoid function17 and analysis of starch mutants18 indicate that a diffusion barrier is essential for efficient pyrenoid function to minimize CO2 leakage. How the diatom pyrenoid minimizes CO2 leakage is a substantial outstanding question.
Here we rapidly advance our knowledge of the diatom pyrenoid by developing an iterative fluorescent protein tagging followed by APMS approach in T. pseudonana that belongs to a global biogeochemically important genus. These data enabled us to build a high-confidence diatom pyrenoid interactome, identifying multiple new pyrenoid proteins, many with no previously known functional domains. A family of these proteins form a static shell that encapsulates the Rubisco matrix and are critical for pyrenoid structural integrity and potentially act as a CO2 diffusion barrier. Our findings give novel insight into a global biogeochemically important organelle and provide additional molecular parts for engineering a CCM into crop plants to improve productivity.
Results and discussion
Rubisco co-immunoprecipitation mass spectrometry to identify diatom pyrenoid components
Although diatoms play a central role in global biogeochemical cycles, very little is known about the diatom pyrenoid. In T. pseudonana cells have two chloroplasts surrounded by 4 membranes, with each chloroplast containing a single centrally positioned lenticular-shaped pyrenoid (Fig. 1A). As a starting point to understand T. pseudonana pyrenoid composition, we performed co-immunoprecipitation coupled with mass spectrometry (coIPMS) with the main pyrenoid component, Rubisco, as a bait protein (Fig. 1B). To capture Rubisco, we used an antibody raised to a conserved 12 amino acid surface-exposed peptide on the Rubisco large subunit (rbcL) (Fig. S1). By comparing two independent coIPMS experiments, consisting of 2 and 3 technical replicates respectively, against non-antibody control experiments, we identified 36 putative pyrenoid components out of a total of 1167 number of detected proteins identified with two or more spectral counts (Table S1). For these exploratory experiments, we applied a relaxed cut-off based on the fold-change enrichment of bestrophin-like protein 2 (BST2) that we had previously localized to the pyrenoid28 and would expect to only have a weak enrichment due to it predicted to be membrane bound (Fig. 1B). Top hits from our rbcL coIPMS experiments were then fed into an iterative tagging, localization, and APMS framework to rapidly build a spatially defined pyrenoid proteome (Fig. 1C).
Development of a high-throughput tagging pipeline in T. pseudonana identifies multiple novel pyrenoid components
To enable rapid cycling through our iterative pipeline, we set out to establish high-throughput fluorescent protein tagging and screening in diatoms. We initially adapted our Golden Gate Modular Cloning-based episomal assembly framework28 to be 96-well compatible and combined it with multi-well diatom transformation via bacterial conjugation. We coupled this with 48-well plate strain maintenance and 96-well plate flow cytometry screening for clonal fluorophore-fusion expressing lines (Fig. 2A). As coIPMS data on a small-scale is inherently noisy with both false positives and false negatives30 we applied our tagging pipeline to confirm if our coIPMS hits were bona fide pyrenoid proteins. Nourseothricin-positive transformants were picked and screened for positive fluorescence using flow cytometry. Due to typical mosaic colony presence31 either multiple rounds of screening or screening of several independent colonies was completed. We found that screening 8–12 colonies would typically yield a stable cell population with >90% of cells mEGFP positive (Fig. S2). Positive lines were subsequently imaged by confocal microscopy (Fig. 2B and Fig. S3). From the initially identified Rubisco-interacting proteins, we saw multiple uncharacterized proteins localizing to distinct sub-regions of the pyrenoid (Fig. 2C; see below for further discussion). However, several candidates were localized to organelles adjacent to the pyrenoid, such as the chloroplast stroma, mitochondria, and endoplasmic reticulum (Fig. S3) indicating that our coIPMS data contains false positives.
To accurately determine sub-pyrenoid localization of components we developed a dual-tagging approach using two spectrally compatible fluorophores. We first developed a pyrenoid matrix marker for co-localization. In green algae, nuclear-encoded rbcS-fluorescent protein fusions have been powerful for understanding sub-pyrenoid spatial organization 19 and for determining the liquid-like properties of the pyrenoid.12 As the rbcS of diatoms is chloroplast encoded and no T. pseudonana chloroplast transformation protocol is available, we wondered if we could target an episomal expressed rbcS-mEGFP to the pyrenoid. Using the N-terminal signal and transit peptide sequences from the nuclear encoded chloroplast localized BST2 protein,32 we successfully targeted rbcS to the pyrenoid, allowing us to clearly define the pyrenoid matrix (Fig. 2C). Second, we tested assembling two target genes with different fluorophores on the same episome. We decided to initially validate the BST2 localization by making an episome with BST2-mEGFP and rbcS-mScarlet-I, a fluorophore we had previously validated using our episome system.28 This gave a clear pyrenoid localization with the rbcS signal extending outside of the BST2 signal supporting a PPT localization of BST2 (Fig. 2D).
Establishing large-scale APMS to build a pyrenoid interactome
To rapidly expand the pyrenoid proteome we developed and optimized an APMS pipeline in T. pseudonana and used our GFP-tagged pyrenoid proteins as bait proteins in this pipeline. Lines expressing GFP-tagged proteins were typically grown at ambient-CO2, where the CCM is fully active.33 In triplicate, GFP-trap nanobodies were used to enrich for target proteins from cell lysate and their interactors determined via LC-MS/MS (Fig. 3A, Table S2). Protein-protein interactions were stringently determined by comparing both CompPASS19,34 and SAINT analysis35 scores (Fig. 3A, Table S3) that use different weighting criteria to identify true interactors from non-specific background using label-free proteomic quantitation data. We set interaction confidence thresholds based on the known interaction of rbcS with rbcL, which resulted in proteins in the top 2.2% for CompPASS and top 1% for SAINT being designated as high confidence interactors.
From our initial Rubisco coIPMS we GFP tagged 22 proteins and 13 localized to the pyrenoid. These included the Rubisco small subunit rbcS; BST1 and BST2 that are bestrophin-like proteins proposed to be involved in HCO3- uptake into the PPTs;28 θCA2 most likely involved in CO2 release from HCO3- within the PPTs;25,26 cbbX a nuclear encoded red-type Rubisco activase36 that until now has not been localized in algae with red plastids;37 and 8 uncharacterized proteins with no clear function. These newly identified GFP-tagged pyrenoid components were called Diatom Pyrenoid Components 1 and 2 (DPC1 and DPC2) and Shell 1-6. Whilst DPC1 and DPC2 were predominantly in the pyrenoid matrix the initial tagging of Shell1 and Shell4 showed that they may encapsulate the pyrenoid (Fig. 1C; see below). A subset of these pyrenoid localized components were utilized for APMS using our iterative pipeline (Fig. 1C and 3). Subsequently, two additional components, DPC3 and DPC4, that had strong interactions with Shell4 and rbcS but again with no sequence predictable function were localized to the pyrenoid and also fed into our pipeline. Combining the data and using our stringent scoring approach enabled us to expand and build a high-confidence pyrenoid interactome for T. pseudonana built from 11 baits and containing 46 additional protein nodes linked by 57 interaction edges (Fig. 3B). In the network interaction confidence can be further interpreted by CompPASS and SAINT score magnitude (line thickness and color respectively in Fig. 3B) as well as the number of connecting edges with baits.
RbcS and DPC2 appear to be key hub proteins, each linking four nodes that further link to confirmed pyrenoid components. Although shell proteins were identified in our initial rbcL coIPMS data, and detected in our rbcS APMS they did not fall above the stringent thresholds set to be defined as high-confidence interactors with either rbcS or rbcL. This indicates that shell proteins may interact with the Rubisco matrix via an intermediary protein. The shell-like pattern displayed by DPC3 (Fig. 3C) suggests that this protein could be acting as a Rubisco matrix-shell adaptor or potentially an additional shell component. How Rubisco is packaged into the pyrenoid is unknown in T. pseudonana. With the absence of an EPYC1 or PYCO1 homolog to phase separate Rubisco, it was hypothesized that an alternative repeat protein could be fulfilling this role. Unexpectedly none of the pyrenoid proteins identified in our study contain a repeated sequence with the expected frequency of ∼60 amino acids,1 potentially indicating that pyrenoid assembly in T. pseudonana could be based on different biophysical principles to both green algal and pennate diatom pyrenoids.
Six proteins encapsulate the pyrenoid matrix
In our initial tagging we were intrigued to see that two proteins with no annotated functional domains appeared to encapsulate the pyrenoid matrix (Fig. 2C: Shell1, Shell4). BLAST analysis identified four additional homologs in the T. pseudonana genome that all contain a predicted beta-sheet domain (Fig. 4A and Fig. S4). We explored the distribution of these proteins across different evolutionary lineages. BLAST analysis against the NCBI database identified homologs in the stramenopiles (including other diatoms), pelagophytes and haptophytes; all of which are photosynthetic algae that contain secondary endosymbiotic red plastids (Fig. 4B). However, Shell proteins are absent in the rhodophytes, the primary endosymbiotic red plastid lineage, suggesting that Shell proteins were not present in the engulfed red alga but may have been present in the genome of the heterotrophic host prior to endosymbiosis or evolved after engulfment. Further supporting a role in pyrenoid function of shell proteins, TEM data from available literature indicates that algae found within the Shell protein containing clades all contain pyrenoids.2
To elucidate the precise sub-regions of the six Shell proteins we co-expressed rbcS-mScarlet-I with GFP-tagged Shell proteins (Fig. 4C and Fig. S5). Analysis of fluorescence intensity of perpendicular transects of the pyrenoid (Fig. 4D and Fig. S5) and max intensity z-stack projections (Fig. 4E and Fig. S5) indicate that all six proteins encapsulate the Rubisco matrix of the pyrenoid. Intriguingly, they can be grouped into two localization patterns. Generalizing the pyrenoid shape to be an elliptic cylinder with curved ends (Fig. S6), Shell1,2,3 and 6 encapsulate the pyrenoid matrix around the high-curvature edges, whereas Shell4 and 5 radially encapsulate the pyrenoid matrix (Fig. 4C,D,E and Fig. S5 and S6). Further supporting different pyrenoid surface localizations AlphaFold2 model comparisons of Shell1 with each of the other Shell proteins indicates that Shell2,3 and 6 are structurally closer to Shell1 than Shell4 and 5 (Fig. S7). With small differences in rotational angles potentially enabling different curvature.
In the well-characterized green algal pyrenoid, chloroplast synthesized starch forms a sheath that encapsulates the Rubisco matrix and acts as a CO2 leakage barrier to enhance CCM efficiency.17,18 In diatoms the main carbohydrate storage molecule is chrysolaminaran that is stored in cytosolic vacuoles38 with no clear carbohydrate barrier surrounding the diatom pyrenoid. We hypothesize that the shell proteins, instead of starch, could be forming a diffusion barrier in diatoms to minimize CO2 leakage. This has potential structural analogies to the cyanobacterial carboxysomes where a protein shell encapsulates Rubisco and is proposed to act as a CO2 barrier to minimize leakage.39,40 In addition, carboxysome shell proteins are essential for carboxysome biogenesis and shape.41,42 We set out to further understand the importance of the shell proteins for pyrenoid function and more broadly understand diatom pyrenoid functionality.
The pyrenoid matrix and shell proteins have minimal mobility in the pyrenoid
To understand the dynamics of pyrenoid components, we leveraged our generated tagged lines. If the shell proteins act as a CO2 barrier we would expect them to form a static assembly around the Rubisco matrix. We investigated the mobility of Shell1 and Shell4 by fluorescence recovery after photobleaching (FRAP). Supporting the role of the Shell proteins forming a structural barrier around the pyrenoid matrix both Shell1 and Shell4 showed minimal mixing after photobleaching (Fig. 5A, B).
In C. reinhardtii the pyrenoid matrix has liquid-like properties with Rubisco, EPYC1 and Rubisco activase (RCA1) all showing rapid recovery in FRAP experiments on the timescale of ∼30 s after photobleaching.12 Surprisingly, FRAP experiments on both T. pseudonana rbcS and cbbX showed minimal mixing (Fig. 5C, D). This aligns with recent in vitro and in vivo data where in vitro phase separated Rubisco from the pennate diatom P. tricornutum has slow mixing within droplets with full recovery not seen after 30 minutes and PYCO1 is immobile in vivo.9 This opens a considerable question on how cbbX can sufficiently access inhibited Rubisco to reactivate it. Collectively, this indicates that the diatom pyrenoid has different mesoscale properties to the Chlamydomonas pyrenoid and that once Rubisco is assembled into the pyrenoid there is minimal mixing. This aligns with the cyanobacterial carboxysome where upon Rubisco packing it forms ordered arrays.43,44
Shell1 and Shell2 are essential for pyrenoid structure
In addition to the Shell proteins potentially providing a barrier to minimize CO2 leakage from the pyrenoid matrix we hypothesized they may also be required for pyrenoid structural integrity. To test this, we used our MoClo Golden Gate system to simultaneously GFP tag rbcS and delete Shell1/2. Due to Shell1/2 being abundant shell components and having 93% DNA sequence similarity sgRNAs were designed that targeted both genes (Fig. S8). Edited lines were grown under high CO2 conditions, selected for by GFP fluorescence, and gene editing confirmed by Sanger sequencing (Fig. S8). Microscopy images of a Shell1/2 knock-out line that showed biallelic editing of both Shell1 and Shell2 failed to form a lenticular pyrenoid and instead typically possessed a single spherical pyrenoid per chloroplast (Fig. 5E). This suggests that Shell1/2 are required for the lenticular shape of the pyrenoid and in their absence the pyrenoid assembles into a sphere. Although speculative, spherical condensation of Rubisco suggests that liquid-liquid phase separation may have a role in pyrenoid matrix assembly. The shell1/2 mutant had significantly reduced growth at atmospheric CO2 that could be fully rescued by supplying elevated CO2 (Fig. 5F). Interestingly, the mutant grew faster than wild-type at high CO2 suggesting that the Shell may be minimizing CO2 diffusion to Rubisco in the pyrenoid when there is an excess of CO2 in the surrounding environment. These data support that the diatom Shell is critical for pyrenoid structural integrity and function and may act as a CO2 diffusion barrier.
A model of the diatom pyrenoid-based CO2 concentrating mechanism
Combining our data with previous experimental data we propose a structural and functional model for the pyrenoid (Fig. 6). The SLC4 family proteins contribute to sodium dependent HCO3- transport at the plasma membrane.7,23,24 Ci transport across the four chloroplast membranes is still unknown, SLC4 transporters are also proposed to play a role here27 along with the carbonic anhydrase LCIP6345 and the V-type ATPase.46 Ci delivery to the pyrenoid is potentially analogous to C. reinhardtii via channeling of HCO3- into the thylakoid lumen via BST1 and BST2.28 HCO3- is then dehydrated to CO2 via θCA225,26 where it can diffuse to Rubisco packaged within the pyrenoid. A doublet of thylakoids enter the pyrenoid where they often branch within the pyrenoid matrix, potentially to increase surface area for CO2 delivery. CO2 leakage out of the pyrenoid is then minimized by the proteinaceous shell that encapsulates the pyrenoid. With the shell also critical for maintaining the lenticular shape of the pyrenoid.
Perspective
The development of a high-throughput tagging and APMS pipeline in a model diatom has enabled us to generate a spatial interactome of the diatom pyrenoid, providing novel molecular insight into how diatoms drive the global carbon cycle. We have identified and confirmed via localization 13 new pyrenoid components, of which a large number have no conserved functional domains. Six of these new components constitute a protein Shell that encapsulates the pyrenoid and is found across diverse red plastid secondary endosymbionts. Simultaneous knock-out of Shell1 and Shell2 resulted in large pyrenoid structural changes and poor growth at atmospheric levels of CO2. Moving forward it will be important to understand the importance of the additional four Shell proteins, especially Shell4 and Shell5 that localize to different surface regions of the pyrenoid than Shell1 and Shell2. Four additional pyrenoid components, DPC1-4, have no clear sequence predictable function. As yet the condensation of Rubisco to form the pyrenoid matrix is unknown, with no EPYC1 or PYCO1 homolog or functional analog identified in our study. This suggests that pyrenoid matrix formation maybe different in centric diatoms. The matrix localization and interaction partners of DPC2 and DPC4 are suggestive that they may have a central role in pyrenoid matrix assembly/function. DPC3 showed a more Shell-like localization pattern and may mediate Shell-Rubisco matrix interactions. DPC2, DPC3 and DPC4 are prime targets for future characterization.
Close to 50% of global carbon fixation is performed by biomolecular condensates of Rubisco. This includes prokaryotic cyanobacterial carboxysomes and eukaryotic algal pyrenoids. Nearly all of our data of pyrenoid structure and function comes from the green plastid lineage alga, C. reinhardtii, with pyrenoids both between plastid lineages and within plastid lineages proposed to have convergently evolved.2,4 Insights from our data suggest that diatom pyrenoids have similarities to both green plastid pyrenoids and prokaryotic carboxysomes. Similarities between the T. pseudonana and C. reinhardtii pyrenoids include dense Rubisco packaging around specialized thylakoid membranes (PPTs) for CO2 delivery. With CO2 delivery to the PPTs via bestrophin family channels and CO2 release within the acidic lumen driven by constrained localization of a carbonic anhydrase within the PPTs. However, the dynamics of the pyrenoid and structural aspects of the pyrenoid have more similarities to carboxysomes. The encapsulation by a protein shell layer composed of homologs, some with different sub-shell localization patterns is analogous to carboxysome shell proteins.39 Additionally, the static nature of Rubisco, cbbX and shell proteins contrasts to dynamic properties of the C. reinhardtii pyrenoid and aligns more with carboxysomes. Outside of Rubisco, there are no homology or structural similarities between carboxysome and T. pseudonana pyrenoid components. This supports the convergent evolution of pyrenoids and that a broad range of biophysical and structural properties, some previously associated with carboxysomes, can be expected as more pyrenoids are characterized across diverse alga.
A core structural component of pyrenoids is a CO2 leakage barrier, with the absence of a starch sheath in C. reinhardtii shown both experimentally and theoretically to be required for efficient CCM function.17,18 As engineering of a pyrenoid in plants progresses a major future challenge will be CO2 diffusion barrier engineering.17,22 This is thought to require multiple starch synthesis related steps correctly localized to the pyrenoid periphery. The diatom Shell proteins could provide an alternative biotechnology solution to this challenge.
Materials and methods
Strains and Culturing
The background Thalassiosira pseudonana strain for all experiments was wild-type (WT) CCAP1085/12 (Scottish Culture Collection of Algae and Protozoa, equivalent to CCMP1335). WT cells were axenically maintained in artificial sea water (ASW) (32 g L-1, Instant Ocean SS15-10) supplemented with half-strength (F/2) Guillard F solution47,48 at 20°C under continuous illumination of ∼50 μmol photons m-2 s-1. All strains were grown at ambient CO2 except shell1/2 knock-out and Shell1-mEGFP lines, which were maintained under 1% CO2.
Episome Assemblies using Golden Gate Cloning
Level 0, 1, and 2 (L0, L1, and L2) plasmids were assembled by Golden Gate (GG) cloning49 using the custom parts from the diatom MoClo framework.28 The target genes without stop codon were synthesized from Twist Bioscience (Table S4).
T. pseudonana Transformation via Bacterial Conjugation
Episomes were delivered to T. pseudonana via bacterial conjugation according to50 with minor modifications. Episome plasmids were transformed into E. coli (TransforMax EPI300) harboring the pTA_Mob51 mobility plasmid (gift from R. Lale) via electroporation (Bio-Rad). Transformed cells were spread onto LB agar plates containing both gentamycin (10 μg mL-1) and kanamycin (25 μg mL-1) for selection overnight at 37°C. Colonies were inoculated for subsequent conjugation. Cultures (150 mL) grown at 37°C to OD600 of 0.3-0.4 were harvested by centrifugation (3,000 ×g, 5 min) and resuspended in 800 μL of SOC media. Liquid grown T. pseudonana WT culture was harvested by centrifugation (3,000 ×g, 5 min) and resuspended at a concentration of 2×108 cells mL-1 in ½ASW-F/2. Equal volume (200 μL) of E. coli and T. pseudonana WT cells were gently mixed by pipetting. Next the mixture of cells was plated on ½ASW-F/2, 5% LB, 1% agar plates and incubated in the dark for 90 min at 30°C. The plates were transferred to 20°C with continuous illumination (∼50 μmol photons⋅m- 2⋅s-1) and grown overnight. Next day, 500 μL of ½ASW-F/2 medium was added to the plate for scraping and resuspending the cells. Up to 200 μL of resuspended cells were spread onto 1% (w/v) ½ASW-F/2 agar plates with 100 μg mL-1 nourseothricin for selection. Colonies appeared after 6-14 days.
Fluorescence Screening by Flow Cytometry
mEGFP and mScarlet-I expression was analyzed by flow cytometry using either CytoFLEX LX355 or 375 (Beckman Coulter) analyzers. Forward scattered (FSC) and side scattered photons by the 488 nm laser were used to distinguish diatoms from cell culture debris. FSC-height versus FSC-area signal was used to separate single events from sample aggregates. Chlorophyll autofluorescence excited by 561 nm laser and emitted photons detected with 675/25 filter was used to ensure all the diatom cells were fully intact. mEGFP fluorescence excited by the 488 nm laser was detected by an avalanche photodiode detector with 525/40 bandpass filter. All the data analysis was done using CytExpert software (Beckman Coulter).
Microscopy
All super-resolution imaging was performed using Zeiss LSM880 confocal microscope in Airyscan mode with a 63x objective, 1.4 numerical aperture (NA) Plan-Apo oil-immersion lens (Carl Zeiss). The rbcS-mEGFP line was imaged in confocal mode. The 20 μL of cell suspension were pipetted on 8 well μ-Slide chambered coverslips (ibidi) overlaid with 180 μL of 1.5% F/2-low-melting point agarose (Invitrogen) for imaging. Excitation lasers and emission filters were as follows: mEGFP excitation 488 nm, emission 481-541 nm; mScarlet-I excitation 561 nm, emission 561-633 nm; and chlorophyll excitation 633 nm, emission 642-712 nm. All the microscopic images were processed using Fiji.52
Co-immunoprecipitation and affinity purification
For rbcL coIP, 50 mL of WT T. pseudonana cells grown in log phase (2-3×106 cells mL-1) were harvested by centrifugation (3,000 ×g, 10 min). The pellets were resuspended in a CoIP buffer (20 mM Tris-HCl pH 8.0, 50 mM NaCl, 0.1 mM EDTA, 12.5% glycerol) containing 5 mM DTT and protease inhibitor cocktail tablets (PIs, cOmplete EDTA-free, Roche). Cells were lysed by sonication for 3 min (ON 3 sec, OFF 12 sec). The lysates were centrifuged for 20 min (full speed, 4°C) to separate the supernatant (soluble lysate) from the pellet. 200 μL of Protein A (Dynabeads Protein A, Invitrogen) beads were washed twice in coIP buffer containing PIs. 32 μg of anti-rbcL antibody in 500 μL coIP buffer containing PIs was added to the washed beads and incubated at 4°C for 2 hours. After incubation beads were washed twice in coIP buffer containing PIs. For blocking, 500 μL of BSA (2 mg mL-1) was added and incubated at 4°C for 1 hour. After incubation beads were washed twice in coIP buffer containing PIs. Subsequently, the soluble lysates were added to protein A beads primed with antibody and incubated at 4°C for 3 hours. After incubation, beads were washed three times with coIP buffer containing PIs and 0.1% digitonin (SigmaAldrich). For elution, 200 μL of 1x SDS loading dye was added and boiled at 95°C for 5 min. The supernatant was collected without any beads and ran on an SDS-PAGE gel for ∼1.5 cm. Gels were sliced for further in-gel digestion for LC-MS/MS (see below).
For mEGFP tagged lines AP, 50 mL of GFP tagged T. pseudonana lines grown in exponential phase (2-3×106 cells mL-1) were harvested by centrifugation (3,000 ×g, 10 min). The pellets were resuspended in an immunoprecipitation (IP) buffer (200 mM D-sorbitol, 50 mM HEPES, 50 mM KOAc, 2 mM Mg(OAc)2, 1 mM CaCl2) containing protease inhibitor cocktail tablets (cOmplete EDTA-free, Roche), 2% digitonin (SigmaAldrich), 1mM PMSF, 0.5 mM NaF and 0.15 mM Na3VO4. Cells were lysed by sonication for 30 sec (On 3 sec, Off 15 sec) twice. The lysates were centrifuged for 20 min (full speed, 4°C) and the supernatant was incubated with mEGFP-Trap Agarose beads (ChromoTek) for 1 hour according to the manufacturer’s instructions. Subsequently, beads were washed twice with an IP buffer containing 0.1% digitonin and a final wash without digitonin. All steps were performed at 4°C.
Mass Spectrometry
For rbcL coIPMS, samples were in-gel digested with 0.2 μg Sequencing-grade, modified porcine trypsin (Promega), following reduction with 1.5 mg ml-1 dithioerythritol and alkylation with 9.5 mg mL-1 iodoacetamide. Digests were incubated overnight at 37°C. Peptides were extracted by washing three times with aqueous 50% (v:v) acetonitrile containing 0.1% (v:v) trifluoroacetic acid, before drying in a vacuum concentrator and reconstituting in aqueous 0.1% (v:v) trifluoroacetic acid. Peptides were loaded onto an mClass nanoflow UPLC system (Waters) equipped with a nanoEaze M/Z Symmetry 100 Å C18, 5 µm trap column (180 µm x 20 mm, Waters) and a PepMap, 2 µm, 100 Å, C 18 EasyNano nanocapillary column (75 mm x 500 mm, Thermo). The trap wash solvent was aqueous 0.05% (v:v) trifluoroacetic acid and the trapping flow rate was 15 µL min-1. The trap was washed for 5 min before switching flow to the capillary column. Separation used gradient elution of two solvents: solvent A, aqueous 0.1% (v:v) formic acid; solvent B, acetonitrile containing 0.1% (v:v) formic acid. The flow rate for the capillary column was 330 nL min-1 and the column temperature was 40°C. The linear multi-step gradient profile was: 3-10% B over 5 mins, 10-35% B over 85 mins, 35-99% B over 10 mins and then proceeded to wash with 99% solvent B for 5 min. The column was returned to initial conditions and re-equilibrated for 15 min before subsequent injections. The nanoLC system was interfaced with an Orbitrap Fusion Tribrid mass spectrometer (Thermo) with an EasyNano ionisation source (Thermo). Positive ESI-MS and MS2 spectra were acquired using Xcalibur software (version 4.0, Thermo). Instrument source settings were: ion spray voltage, 1900-2100 V; sweep gas, 0 Arb; ion transfer tube temperature; 275°C. MS 1 spectra were acquired in the Orbitrap with: 120,000 resolution, scan range: m/z 375-1,500; AGC target, 4e5; max fill time, 100 ms. Data dependant acquisition was performed in top speed mode using a 1 s cycle, selecting the most intense precursors with charge states >1. Easy-IC was used for internal calibration. Dynamic exclusion was performed for 50 s post precursor selection and a minimum threshold for fragmentation was set at 5e3. MS2 spectra were acquired in the linear ion trap with: scan rate, turbo; quadrupole isolation, 1.6 m/z; activation type, HCD; activation energy: 32%; AGC target, 5e3; first mass, 110 m/z; max fill time, 100 ms. Acquisitions were arranged by Xcalibur to inject ions for all available parallelizable time. Tandem mass spectra peak lists were extracted from Thermo .raw files to .mgf format using MSConvert (ProteoWizard 3.0). Mascot Daemon (version 2.6.0, Matrix Science) was used to submit searches to a locally-running copy of the Mascot program (Matrix Science Ltd., version 2.7.0). Peak lists were searched against the Thalassiosira pseudonana subsets of UniProt and NCBI with common proteomic contaminants appended. Search criteria specified: Enzyme, trypsin; Max missed cleavages, 2; Fixed modifications, Carbamidomethyl (C); Variable modifications, Oxidation (M); Peptide tolerance, 3 ppm; MS/MS tolerance, 0.5 Da; Instrument, ESI-TRAP. Peptide identifications were collated and filtered using Scaffold (5.2.0, Proteome Software Inc). Peptide identifications were accepted if they could be established at greater than 51.0% probability to achieve an FDR less than 1.0% by the Percolator posterior error probability calculation. Protein identifications were accepted if they could be established at greater than 5.0% probability to achieve an FDR less than 1.0% and contained at least 2 identified peptides.
For mEGFP tagged lines APMS, samples were on-bead digested using Chromotek’s recommended procedure for NanoTraps: protein was digested overnight at 37°C with 25 μL 50 mM Tris-HCl pH 7.5, 2 M urea, 1mM DTT, 5 µg ml-1 Sequencing Grade Modified Trypsin (Promega). Peptides were eluted with 50 mM Tris-HCl pH 7.5, 2 M urea, 5 mM iodoacetamide before loading onto EvoTip Pure tips for desalting and as a disposable trap column for nanoUPLC using an EvoSep One system. A pre-set EvoSep 60 SPD gradient was used with a 8 cm EvoSep C18 Performance column (8 cm x 150 μm x 1.5 μm). The nanoUPLC system was interfaced to a timsTOF HT mass spectrometer (Bruker) with a CaptiveSpray ionisation source (Source). Positive PASEF-DDA, ESI-MS and MS2 spectra were acquired using Compass HyStar software (version 6.2, Thermo). Instrument source settings were: capillary voltage, 1,500 V; dry gas, 3 l min-1; dry temperature; 180°C. Spectra were acquired between m/z 100-1,700. The following TIMS settings were applied as: 1/K0 0.6-1.60 V.s cm-2; Ramp time, 100 ms; Ramp rate 9.42 Hz. Data dependent acquisition was performed with 10 PASEF ramps and a total cycle time of 1.17 s. An intensity threshold of 2,500 and a target intensity of 20,000 were set with active exclusion applied for 0.4 min post precursor selection. Collision energy was interpolated between 20 eV at 0.5 V.s cm-2 to 59 eV at 1.6 V.s cm-2. Pick picking, database searching, significance thresholding and peak area integration was performed using FragPipe (version 19.1). Data were searched against UniProt reference proteome UP000001449, appended with common contaminants and concatenated with reversed sequences for false discovery calculation. Search criteria specified: Enzyme, trypsin; Max missed cleavages, 2; Fixed modifications, Carbamidomethyl (C); Variable modifications, Oxidation (M), Acetylation (Protein N-term); Peptide tolerance, 10 ppm; MS/MS tolerance, 10 ppm; Instrument, IM-MS. Peptide identifications were filtered using Percolator and ProteinProphet to 1% PSM FDR, protein probabilities >99%, best peptide probability >99% and a minimum of two unique peptides. Peak area quantification was extracted using IonQuant with match between run applied. Feature detection tolerances were set to: MS1 mass <10 ppm; RT < 0.4 min; and IM (1/k0) <0.05.
Interactome analysis
Protein abundances quantified using MS2 spectral count measurements of fragment ions from all sample triplicates were run through a CompPASS package in R Studio (https://github.com/dnusinow/cRomppass/blob/master/R/cRomppass.R) and a control IP inclusive variation of SAINT analysis in Ubuntu using standard parameters.35 The WD and AvgP scores respectively generated were used as measures of interaction strength between bait and prey proteins. Only interactions which fell in both the top 2.2% WD score and 1% AvgP score were filtered as high confidence interactors. Prior to analysis bait spectral count data was set to zero to minimize data skewing due to the typically high spectral counts and the inability to distinguish between mEGFP tagged bait and untagged native protein.
In vivo fluorescence recovery after photobleaching
FRAP experiments were performed using a Zeiss LSM980 confocal microscope with a 63x objective 1.4 numerical aperture (NA) Plan-Apo oil-immersion lens (Carl Zeiss). Samples were prepared as in confocal microscopy and overlaid with 200 μL of ibidi anti-evaporation oil. 20 pre-bleach images were taken prior to bleaching (60% 488 nm intensity, 1 cycle). All the images were processed by Fiji.52 The Image Stabilizer plugin (4 pyramid levels, 0.99 template update coefficient) output of the brightfield images were used to stabilize the fluorescence images. The mean gray values were measured for the bleached, unbleached and background ROIs. Background values were subtracted from bleached and unbleached values before photobleach normalization using the unbleached reference was completed. The average pre-bleach intensity was used for full-scale normalization.
Data Visualization
Network visualization of the interactome was done using Cytoscape (version 3.10.0) (https://cytoscape.org/). Phylogenetic tree was done in Geneious Prime (2023.2.1). Adobe illustrator (2023) was used to generate the figures.
Alphafold Structure Prediction
The alphafold structure for Shell1 protein without signal peptide and chloroplast transit peptide was predicted using ColabFold v1.5.2-patch: AlphaFold2 using MMseqs2 (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb). bottom of the tube. The cells were secondary fixed with Osmium Tetroxide (1%, in buffer pH 7.2, 0.1M) for 1 hour. After rinsing twice with buffer, again removing the liquid above the cells whilst leaving the cells undisturbed, the cells were dehydrated through an alcohol series (30%, 50, 70 90%) until it was in 100% ethanol. Each rinse for 15 min. To ensure thorough dehydration the 100% ethanol step was repeated 3 times. Then the alcohol was replaced by Agar low viscosity resin by placing it in increasing concentrations of resin (30% resin: 70 % ethanol. 50:50, 70:30 each change was left for at least 12 hours) until it was in 100% resin. Again to ensure complete resin infiltration, the 100% resin step was repeated 3 times leaving it overnight between changes. The Eppendorf tubes were then placed in an embedding oven and the resin polymerized at 60°C overnight. The resulting blocks were sectioned with a Leica Ultracut E ultra microtome using a diatome diamond knife. The sections were then stained using a saturated solution of Uranyl acetate (for 15 min) and Reynold’s Lead citrate (15 min). The sections were examined using a JEOL 1400 TEM.
Blast search
Full amino acid sequence of Shell 1-6 was used for Blastp with default settings: ’Standard databases’, ‘Non-redundant protein sequences (nr)’, ‘blastp (protein-protein BLAST)’ ‘Max target sequences=100’ ‘Word size =5’ ‘ Max matches in a query range = 0’ ‘Matrix = BLOSUM62’ ‘Gap Costs= Existence: 11 Extension: 1’ ‘Compositional adjustments = Conditional compositional score matrix adjustment’ and ‘Expect threshold =1’. Hits were sorted by highest to lowest alignment length, and a cut-off length of 100 amino acids was employed.
Phylogeny
The phylogenetic tree was build using Geneious Prime (2023.2.1). Selection of 136 proteins for MAFFT alignment defaults (Algorithm = ‘Auto’, Scoring matrix = ‘BLOSUM62’, Gap open penalty = ‘1.53’, Offset value = ‘0.123’) Selection of MAFFT alignment. Generated phylogenetic tree selecting RAxML: Protein Model = ‘GAMMA BLOSUM62’, Algorithm = ‘Rapid Bootstrapping’, Number of starting trees or bootstrap replicates = ‘1000’, Parsimony random seed = ‘1’. Generated a consensus tree selecting Consensus Tree Builder, Create Consensus Tree, Support Threshold % = 0, Topology Threshold % = 0, Burn-in % = 0, Save tree(s) separately. Added C. reinhardteii BST1-3 sequences (Cre16.g662600 (BST1) Cre16.g663400 (BST2) Cre16.g663450 (BST3) to root the tree.
Data availability
All mass spectrometry and proteomic identification data is referenced in ProteomeXchange (PXD045418) and can be accessed via MassIVE (MSV000092867).
Author contributions
O.N. and L.C.M.M. designed and supervised the study. O.N. carried out the experiments. C.M. and O.N. analyzed the phylogenetic tree. A.D., M.D. and J.B. provided bioinformatics and data analysis support. A.D. oversaw the mass spectrometry and peptide mapping. O.N., C.M., and L.C.M.M. analyzed and interpreted the data. O.N. created the figures. O.N. and L.C.M.M. wrote the manuscript with input from all authors.
Acknowledgements
Authors would like to thank Oliver Mueller-Cajar for kindly sharing the anti-rbcL antibody and members of the Mackinder Lab for fruitful discussions. The Technology Facility in the Department of Biology for the access/support for microscopy and flow cytometry. Thanks to Glenn Harper for optimizing the TEM. The York Centre of Excellence in Mass Spectrometry was created thanks to a major capital investment through Science City York, supported by Yorkshire Forward with funds from the Northern Way Initiative, and subsequent support from EPSRC (EP/K039660/1; EP/M028127/1). L.C.M.M. was supported by a UKRI Future Leader Fellowship (MR/T020679/1), EPSRC funding (EP/W024063/1) as part of the York Physics of Pyrenoids Project (YP3), and BBSRC funding (BB/S015337/1 and BB/X004953/1).