
MBW:Singular value decomposition of stoichiometric matricesFrom MathBioA summary and review of the paper "Systemic metabolic reactions are obtained by singular value decomposition of genomescale stoichiometric matrices" (2003) by Iman Famili and Bernhard O. Palsson. [1] ContentsProject Categorization
Models are created based on reactions occurring in the metabolism of several microorganisms: Escherichia coli, Helicobactor pylori, Staphylococcus aureus, and Methansoarcina barkeri.
A kinetics based model is used to write the set of reactions and metabolites as a stoichiometric matrix, where the columns represent reactions and the rows represent metabolites. In the S matrix, negative values correspond to metabolites that are consumed, while positive values indicate that the metabolite is produced. Flux Balance Analysis (FBA) is a metabolic modeling technique, where the stoichiometric matrix (S) multiplied by a vector of fluxes (v) gives the change in concentration of each metabolite over time. FBA is often performed with a steady state assumption (that the concentration of metabolites is constant), such that S*v = 0. Because there are typically more reactions than metabolites, linear programming techniques are applied to solve for a flux vector that satisfies a specific objective function.
In this project, the FBA problem is rearranged to find the SVD of the S matrix. The linear algebra technique of singular value decomposition is utilized to provide insight into the dominant modes (or reactions) of the system. This information is useful for metabolic engineers, who can modify the genetic of the organism according to the model output and maximize a desired phenotype. Executive SummaryThis paper uses singular value decomposition of a stoichiometric reaction matrix to find the underlying biochemical reaction modes of the overall network. The stoichiometric matrices are created with the purpose of modelling the metabolic processes of various cells and cellular compartments. Through the use of flux balance analysis, the various steady state fluxes through each of the reactions being modelled can be found. The focus of this paper is to analyse the network of biochemical pathways being modelled and find the dominant metabolites and their corresponding reactions that make up the major modes of the network. Context/Biological PhenomenonMetabolic reaction networks within cells can be reconstructed, modelled, and analyzed using genomic, biochemical, and physiological data. First, genomic sequencing data is used to find all genes within the cell of consideration.These genes are then matched within a database to find their corresponding proteins and therefore functions. Then a network representing all of the reactions in the cell are created by using data from literature as well as sequence databases such as KEGG. A Stoichiometric Matrix is then formed, with each row corresponding to a substrate and each column corresponding to a reaction. For example if we had a very simple metabolic network with two reactions and 3 substrates and in Reaction 1 we had and in Reaction 2 we had , the resulting Stoichiometric matrix would be the following:
(1) Where is the flux rates of the enzymatic reactions and is the metabolic concentrations. This relation is very commonly used to determine the steady state flux vector by setting the change in concentration of the the metabolites with respect to time as zero and then maximizing some objective function over the defined solution space. In the case of the paper being reviewed, the interest lies instead with analyzing the stoichiometric matrix with singular value decomposition. ModelThis paper can be thought of as utilizing two models, with one being a subset of the other. The first one is the overall model of the cell metabolism that has been preconstructed as described in the background. This model is in the form of a stoichiometric matrix describing all reactions within the cell. The second model, is really a subset of the first, and is created by the author. It is the model of the so called "eigenreactions" of the cell set up through singular value decompistions of the stoichiometric matrices. In the paper being reviewed, the Genome metabolic networks being analyzed are the networks for E. coli, H. influenzae, and H. pylori. All these networks and Stoichiometric matrices have been previously constructed (Edwards and Palsson, 1999, 2000; Schilling 2000), and the Stoichiometric matrices are 739x442, 461x367, and 381x332 respectfully. In our own analysis, we used newer models availalbe through the University of California at San Diego's BIGG database. We chose to analyze the organism models available for H.pylori, M.barkeri, S. aureus, and E. coli. As the author did, we took out all biomass reactions as these are artificially created to use as part of the objective function for maximization. We found that this was a very important step as just one biomass reaction accidentally left in change the singular values by a magnitude of 10^4. To demonstrate the how the singular value decompisiton (SVD) of the stoichiometric matrices can be used to find the dominant 'modes' or 'eigenreactions' through the network the author goes through the following mathematical reorginization. Starting with the original equality, (1) And the fact that the SVD of the matrix S is, (2) Equation (2) can then be substituted into equation (1). The orthogonality of matrix U is used to find the inverse as the transpose and move it to the left side to get the following. (4) The constant can then be moved inside to get the final form the author uses to describe the physical meaning of the SVD. (5) (for k = 1, ...,r) Referencing the above equation, the author states " there is a linear combination of metabolites..."
"Being moved by a linear combination of fluxes.."
Where "the extent of this motion is given by the singular values." The "eigenreactions" then take on the following form. AnalysisAs previously discussed, the author in this paper used three stoichiometric matrices that modeled E. coli, H. influenzae, and H. pylori. Since these matrices have since been updated and the H. influenzae model is not currently available, we chose to use four of the most up to date matrices that model H.pylori, M. barkeri, E. coli, and S. aureus, and see if we could come up with similar data. The only edditing we did on the matrices was to manually take out any reactions tagged as relating to biomass for reasons discussed above. To analyze these matrices, they first had to be loaded into Matlab using software provided by UCSD. The software that was used is called the COBRA toolbox ^{[2]} and it is just a set of functions that allow for the loading and manipulation of the models in the format that they are created in by UCSD. To use it, SBMLToolbox^{[3]} and libSBML must also be installed. Once the matrices were loaded into Matlab, only one simple command to preform and SVD was needed on each matrix to determine its corresponding U, V, and S matrices. Each column in U then represented the coefficients of the ith metabolite in each "eigenreaction" and each column in V represented the ith reaction flux participation. The diagonals of the S matrix represented the corresponding singular values. The larger the singular value, the more dominant its corresponding eigenreaction is said to be, where the largest singular value represent the most dominant mode or eigenreaction in the network. The columns of U and V can then be analyzed for each singular value to find the largest metabolite and reaction participants. The largest value in the ith column of U would then correspond to the largest metabolite participant and likewise the largest value in the ith column of V would correspond to the largest reaction participant in that particular eigenreaction. To preform the analysis we found singular values for each of the four matrices corresponding to the four different organisms, and then found the first 5 columns of U and V corresponding to the five largest singular values. The first five singular values together with their corresponding U and V vectors then describes totally the five most dominant modes or eigenreactions of the network. Following the procedure of the authors, we also made a control matrix by taking the E. coli matrix and randomly permuting all of its elements to give us a matrix of integers with the same structure and sparsity. To preform all of this analysis in Matlab we wrote the following mfile. clear all close all clc hpylori=readCbModel('hpylori.xml'); mbarkeri=readCbModel('Mbarkeri.xml'); ecoli=readCbModel('ecoli.xml'); saureus=readCbModel('Saureus.xml'); S_hpylori=full(hpylori.S); S_mbarkeri=full(mbarkeri.S); S_ecoli=full(ecoli.S); S1_saureus=full(saureus.S); S_hpylori(:,76)=[]; S_ecoli(:,150)=[]; S_saureus=S1_saureus(:,1:727); S_saureus(:,626)=[]; S_R=randswap(S_ecoli,'full'); [u_hpylori,s_hpylori,v_hpylori]=svd(S_hpylori); [u_mbarkeri,s_mbarkeri,v_mbarkeri]=svd(S_mbarkeri); [u_saureus,s_saureus,v_saureus]=svd(S_saureus); [u_ecoli,s_ecoli,v_ecoli]=svd(S_ecoli); [u_r,s_r,v_r]=svd(S_R); svalues_r=diag(s_r); svalues_hpylori=diag(s_hpylori); svalues_mbarkeri=diag(s_mbarkeri); svalues_saureus=diag(s_saureus); svalues_ecoli=diag(s_ecoli); n=(1:1000)'; u1_hpylori=u_hpylori(:,1); u2_hpylori=u_hpylori(:,2); u3_hpylori=u_hpylori(:,3); u4_hpylori=u_hpylori(:,4); u1_mbarkeri=u_mbarkeri(:,1); u2_mbarkeri=u_mbarkeri(:,2); u3_mbarkeri=u_mbarkeri(:,3); u4_mbarkeri=u_mbarkeri(:,4); u1_saureus=u_saureus(:,1); u2_saureus=u_saureus(:,2); u3_saureus=u_saureus(:,3); u4_saureus=u_saureus(:,4); u1_ecoli=u_ecoli(:,1); u2_ecoli=u_ecoli(:,2); u3_ecoli=u_ecoli(:,3); u4_ecoli=u_ecoli(:,4); v1_hpylori=v_hpylori(:,1); v2_hpylori=v_hpylori(:,2); v3_hpylori=v_hpylori(:,3); v4_hpylori=v_hpylori(:,4); v1_mbarkeri=v_mbarkeri(:,1); v2_mbarkeri=v_mbarkeri(:,2); v3_mbarkeri=v_mbarkeri(:,3); v4_mbarkeri=v_mbarkeri(:,4); v1_saureus=v_saureus(:,1); v2_saureus=v_saureus(:,2); v3_saureus=v_saureus(:,3); v4_saureus=v_saureus(:,4); v1_ecoli=v_ecoli(:,1); v2_ecoli=v_ecoli(:,2); v3_ecoli=v_ecoli(:,3); v4_ecoli=v_ecoli(:,4); [a,b]=size(u1_hpylori); nu_hpylori=(1:a)'; [a,b]=size(u1_mbarkeri); nu_mbarkeri=(1:a)'; [a,b]=size(u1_saureus); nu_saureus=(1:a)'; [a,b]=size(u1_ecoli); nu_ecoli=(1:a)'; [a,b]=size(v1_hpylori); nv_hpylori=(1:a)'; [a,b]=size(v1_mbarkeri); nv_mbarkeri=(1:a)'; [a,b]=size(v1_saureus); nv_saureus=(1:a)'; [a,b]=size(v1_ecoli); nv_ecoli=(1:a)'; mets_hpylori=hpylori.mets; mets_mbarkeri=mbarkeri.mets; mets_saureus=saureus.mets; mets_ecoli=ecoli.mets; rxns_hpylori=hpylori.rxns; rxns_mbarkeri=mbarkeri.rxns; rxns_saureus=saureus.rxns; rxns_ecoli=ecoli.rxns; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% u1matrix_hpylori=sortrows([u1_hpylori,nu_hpylori],1); for i=1:5 n=0; n=u1matrix_hpylori(i,2); u1_5mets_hpylori(i)=mets_hpylori(n); end u2matrix_hpylori=sortrows([u2_hpylori,nu_hpylori],1); for i=1:5 n=0; n=u2matrix_hpylori(i,2); u2_5mets_hpylori(i)=mets_hpylori(n); end u3matrix_hpylori=sortrows([u3_hpylori,nu_hpylori],1); for i=1:5 n=0; n=u3matrix_hpylori(i,2); u3_5mets_hpylori(i)=mets_hpylori(n); end u4matrix_hpylori=sortrows([u4_hpylori,nu_hpylori],1); for i=1:5 n=0; n=u4matrix_hpylori(i,2); u4_5mets_hpylori(i)=mets_hpylori(n); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% u1matrix_mbarkeri=sortrows([u1_mbarkeri,nu_mbarkeri],1); for i=1:5 n=0; n=u1matrix_mbarkeri(i,2); u1_5mets_mbarkeri(i)=mets_mbarkeri(n); end u2matrix_mbarkeri=sortrows([u2_mbarkeri,nu_mbarkeri],1); for i=1:5 n=0; n=u2matrix_mbarkeri(i,2); u2_5mets_mbarkeri(i)=mets_mbarkeri(n); end u3matrix_mbarkeri=sortrows([u3_mbarkeri,nu_mbarkeri],1); for i=1:5 n=0; n=u3matrix_mbarkeri(i,2); u3_5mets_mbarkeri(i)=mets_mbarkeri(n); end u4matrix_mbarkeri=sortrows([u4_mbarkeri,nu_mbarkeri],1); for i=1:5 n=0; n=u4matrix_mbarkeri(i,2); u4_5mets_mbarkeri(i)=mets_mbarkeri(n); end %%%%%%%%%%%%%%%% u1matrix_saureus=sortrows([u1_saureus,nu_saureus],1); for i=1:5 n=0; n=u1matrix_saureus(i,2); u1_5mets_saureus(i)=mets_saureus(n); end u2matrix_saureus=sortrows([u2_saureus,nu_saureus],1); for i=1:5 n=0; n=u2matrix_saureus(i,2); u2_5mets_saureus(i)=mets_saureus(n); end u3matrix_saureus=sortrows([u3_saureus,nu_saureus],1); for i=1:5 n=0; n=u3matrix_saureus(i,2); u3_5mets_saureus(i)=mets_saureus(n); end u4matrix_saureus=sortrows([u4_saureus,nu_saureus],1); for i=1:5 n=0; n=u4matrix_saureus(i,2); u4_5mets_saureus(i)=mets_saureus(n); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% u1matrix_ecoli=sortrows([u1_ecoli,nu_ecoli],1); for i=1:5 n=0; n=u1matrix_ecoli(i,2); u1_5mets_ecoli(i)=mets_ecoli(n); end u2matrix_ecoli=sortrows([u2_ecoli,nu_ecoli],1); for i=1:5 n=0; n=u2matrix_ecoli(i,2); u2_5mets_ecoli(i)=mets_ecoli(n); end u3matrix_ecoli=sortrows([u3_ecoli,nu_ecoli],1); for i=1:5 n=0; n=u3matrix_ecoli(i,2); u3_5mets_ecoli(i)=mets_ecoli(n); end u4matrix_ecoli=sortrows([u4_ecoli,nu_ecoli],1); for i=1:5 n=0; n=u4matrix_ecoli(i,2); u4_5mets_ecoli(i)=mets_ecoli(n); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% v1matrix_hpylori=sortrows([v1_hpylori,nv_hpylori],1); for i=1:5 n=0; n=v1matrix_hpylori(i,2); v1_5rxns_hpylori(i)=rxns_hpylori(n); end v2matrix_hpylori=sortrows([v2_hpylori,nv_hpylori],1); for i=1:5 n=0; n=v2matrix_hpylori(i,2); v2_5rxns_hpylori(i)=rxns_hpylori(n); end v3matrix_hpylori=sortrows([v3_hpylori,nv_hpylori],1); for i=1:5 n=0; n=v3matrix_hpylori(i,2); v3_5rxns_hpylori(i)=rxns_hpylori(n); end v4matrix_hpylori=sortrows([v4_hpylori,nv_hpylori],1); for i=1:5 n=0; n=v4matrix_hpylori(i,2); v4_5rxns_hpylori(i)=rxns_hpylori(n); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% v1matrix_mbarkeri=sortrows([v1_mbarkeri,nv_mbarkeri],1); for i=1:5 n=0; n=v1matrix_mbarkeri(i,2); v1_5rxns_mbarkeri(i)=rxns_mbarkeri(n); end v2matrix_mbarkeri=sortrows([v2_mbarkeri,nv_mbarkeri],1); for i=1:5 n=0; n=v2matrix_mbarkeri(i,2); v2_5rxns_mbarkeri(i)=rxns_mbarkeri(n); end v3matrix_mbarkeri=sortrows([v3_mbarkeri,nv_mbarkeri],1); for i=1:5 n=0; n=v3matrix_mbarkeri(i,2); v3_5rxns_mbarkeri(i)=rxns_mbarkeri(n); end v4matrix_mbarkeri=sortrows([v4_mbarkeri,nv_mbarkeri],1); for i=1:5 n=0; n=v4matrix_mbarkeri(i,2); v4_5rxns_mbarkeri(i)=rxns_mbarkeri(n); end %%%%%%%%%%%%%%%% v1matrix_saureus=sortrows([v1_saureus,nv_saureus],1); for i=1:5 n=0; n=v1matrix_saureus(i,2); v1_5rxns_saureus(i)=rxns_saureus(n); end v2matrix_saureus=sortrows([v2_saureus,nv_saureus],1); for i=1:5 n=0; n=v2matrix_saureus(i,2); v2_5rxns_saureus(i)=rxns_saureus(n); end v3matrix_saureus=sortrows([v3_saureus,nv_saureus],1); for i=1:5 n=0; n=v3matrix_saureus(i,2); v3_5rxns_saureus(i)=rxns_saureus(n); end v4matrix_saureus=sortrows([v4_saureus,nv_saureus],1); for i=1:5 n=0; n=v4matrix_saureus(i,2); v4_5rxns_saureus(i)=rxns_saureus(n); end %%%%%%%%%%%%%%%%%%%%%%%%%%%% v1matrix_ecoli=sortrows([v1_ecoli,nv_ecoli],1); for i=1:5 n=0; n=v1matrix_ecoli(i,2); v1_5rxns_ecoli(i)=rxns_ecoli(n); end v2matrix_ecoli=sortrows([v2_ecoli,nv_ecoli],1); for i=1:5 n=0; n=v2matrix_ecoli(i,2); v2_5rxns_ecoli(i)=rxns_ecoli(n); end v3matrix_ecoli=sortrows([v3_ecoli,nv_ecoli],1); for i=1:5 n=0; n=v3matrix_ecoli(i,2); v3_5rxns_ecoli(i)=rxns_ecoli(n); end v4matrix_ecoli=sortrows([v4_ecoli,nv_ecoli],1); for i=1:5 n=0; n=v4matrix_ecoli(i,2); v4_5rxns_ecoli(i)=rxns_ecoli(n); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% svaluesper_saureus=size(svalues_saureus); [a,b]=size(svalues_saureus); for i=1:a total=sum(svalues_saureus); svaluesper_saureus(i)=(svalues_saureus(i)/total); end svaluesper_mbarkeri=size(svalues_mbarkeri); [a,b]=size(svalues_mbarkeri); for i=1:a total=sum(svalues_mbarkeri); svaluesper_mbarkeri(i)=(svalues_mbarkeri(i)/total); end svaluesper_hpylori=size(svalues_hpylori); [a,b]=size(svalues_hpylori); for i=1:a total=sum(svalues_hpylori); svaluesper_hpylori(i)=(svalues_hpylori(i)/total); end svaluesper_ecoli=size(svalues_ecoli); [a,b]=size(svalues_ecoli); for i=1:a total=sum(svalues_ecoli); svaluesper_ecoli(i)=(svalues_ecoli(i)/total); end svaluesper_r=size(svalues_r); [a,b]=size(svalues_r); for i=1:a total=sum(svalues_r); svaluesper_r(i)=(svalues_r(i)/total); end ResultsAfter running the code above we were able to manually generate some figures to help interpret the data. The first two figures we made shows the distribution of the singular values in descending order for each of the matrices analyzed. The first figure on the left is an overall view and the second figure on the right is an up close view showing the behaviour of the first set of singular values. It can be seen in the blown up figure that all of the organism have a similar distribution in this are except for the random matrix. This confirms the results found in the paper that show the stoichiometric networks describing the organisms have a definite structure to them, with the first couple of singular values being distributed with a higher value than a random matrix would give. This figure also shows that after these first few singular values the random matrix then has higher values that the rest or the matrices, again showing there is some definite structure. It is important to note that although we came up with the same behavior in regards to the relation between the organism matrices and the random matrices, we did not come up with a distribution of the same shape. Reasons for this most likely lie in the use of newer and different matrices. The next figure we then generated was a plot of the singular values as a percentage of the total. This allowed to show that the first group of singular values make up a larger percentage of the total than a random matrix. As before they then also cross a point where they make up a less percentage of the total than a random matrix. The figure also shows how all of the organism matrices follow a similar pattern. We then used the data generated from our code to show the five most dominant metabolites that made up each of the four most dominant modes in each organism. What the figures show intuitively make sense. It would be expected that metabolites like ATP, NADH, and H would be dominant players in the metabolic activities of the cell. Our results were very similar but again not exactly the same probably due to differences in the matrices used. The length of each bar in each graph corresponds to the stoichiometric coefficient of a particular metabolite in a particular eigenreaction. Underneath each bar, the particular metabolite the bar corresponds to is listed. To the left of each bar graph, its corresponding eigenreaction (1,...4) and organism is noted.
Discussion/ConclusionsThe results of the author as well as our group show that this approach is both reproducible and meaningful. The fact that the approach should intuitively give the dominant metabolites as corresponding to the dominant reactions and does, shows the idea behind the analysis is sound. A next step to make this analysis more practical for biologist would be to see if this approach correctly predicts metabolites of "medium" importance. Every biologist and biochemist already knows metabolites like ATP and NADH are critically important to the function of the cell. The real benefit of this analysis would be if it could correctly predict a molecule that is critically important and linked with many pathways, but yet undiscovered or unknown. This could have applications to things like cancer research where one gene deletion removing one metabolite from the cell is known to be the difference between a normal cell and a malignant cell. Related ArticleKarnaukhov et al. cite Famili and Palsson in their 2007 Biophysical Journal article “Numerical Matrices Method for Nonlinear System Identification and Description of Dynamics of Biochemical Reaction Networks”^{[4]}. In this article, the authors present a set of matrixbased tools, Numerical Matrix Methods (NMM) for understanding and defining biochemical networks. The technique uses a least squares solution to calculate rate constants and reaction stoichiometry. Species concentration vs. time data obtained experimentally is compared to a linearly quadratic kinetic expression that mathematically predicts the production or consumption rate of each metabolite. Within the NMM, three sets of dynamic equations are presented, each recommended for a different level of knowledge about a biologic system.
The authors demonstrate the validity of the method by calculating the rate constants associated with the assembly of a hypothetical ribonucleoprotein (RNAprotein) complex. External Links
