Quantification of parameter uncertainty
- 1 Protocol for defining informative priors for ensemble modelling in systems biology
- 2 Overview of the protocol
- 3 Procedure
- 3.1 Step 1: Find and document parameter values
- 3.2 Step 2: Judge plausibility of values and assign weights
- 3.3 Step 3: Calculate the Mode and Spread of the probability distribution
- 3.4 Step 4: Calculate the location and scale parameters ( and ) of the log-normal distribution
- 3.5 Step 5: Account for thermodynamic consistency
- 4 References
Protocol for defining informative priors for ensemble modelling in systems biology
The quantification of parameter uncertainty is achieved by following a protocol  which aims to translate biological knowledge into informative probability distributions for every parameter, without the need for excessive mathematical and computational effort from the user. It assumes that a functioning kinetic model of the relevant biological system is already available and focuses on the procedures specific to the application of an ensemble modelling approach.
Overview of the protocol
The major protocol steps are the following:
- Collection and documentation of parameter information from the literature, databases and experiments.
- Plausibility assessment of each value collected from the literature using a standardized evidence weighting system. This includes assigning weights with regards to the quality, reliability and relevance of each experimental data point.
- Calculation of the Mode (most plausible value) and Spread (multiplicative standard deviation) using the weighted parameter data sets. These values are found using a function which calculates the weighted median and weighted standard deviation of the log transformed and normalised parameter values.
- Calculation of the location () and scale () parameters of the log-normal distribution based on the previously defined Mode and Spread.
- If required, ensure the thermodynamic consistency of the model by creating multivariate distribution for interdependent parameters (e.g., the rates of forward and reverse reactions are connected via the equilibrium constant of the reaction, and must not be sampled independently).
Step 1: Find and document parameter values
- Collect and document all available parameter values from literature, databases (e.g., BRENDA and BioNumbers) and experiments (hypothetical example displayed in Table 1).
- No values: If no values can be retrieved for a reaction occurring in a particular organism, broaden the search to include data from phylogenetically related organisms (the uncertainty should then be adjusted accordingly). When no values can be retrieved for a certain parameter, the broaden the search constraints; e.g., if the Michaelis–Menten constant, , for a particular substrate is unknown, it might be reasonable to retrieve all values from a database like BRENDA in order to determine the range of generally plausible values for this parameter. In cases where no parameter values are available, an indirect estimate is often possible, based on back-of-the-envelope calculations and fundamental biophysical constraints, which will have a larger uncertainty than a direct experimental measurement but will still be informative – situations where absolutely no information about a plausible parameter range is available will be extremely rare.
- One value: If only a single experimental value will be available for a particular parameter, use this value as the Mode (most plausible value) of the probability distribution. Sometimes, an experimental error (e.g., a standard deviation) is reported for the value and can be used to estimate the Spread of plausible values; if this is not available an arbitrary multiplicative error can be assigned, based on a general knowledge of the reliability of a particular type of experimental technology. This information, of course, can be complemented using the previous strategy of retrieving related values from databases to optimise the estimate of the Spread of plausible values and avoid over-confident reliance on a single observation. Proceed to Step 4.
- More than one value: Whenever possible, it is important to collect as many experimental data with some implications for the range of plausible values, including those from a diverse range of methodologies, experimental conditions and biological sources. These will be pruned and weighted in the next step. Proceed to step 2.
- Document the reported experimental uncertainty (if available), in addition to the actual parameter value (Table 2).
- Uncertainty is reported in literature: Uncertainty is often reported as an (additive) standard deviation (SD) or a standard error of the mean (SEM). As the functions employed in the subsequent steps require standard deviations as inputs, reported standard errors should be converted to standard deviations for consistency. This can be achieved by multiplying by the square root of the sample size (n), as per the equation: . In cases where the sample size is not reported, a conservative value of n=3 can be assumed.
- Uncertainty is not reported in literature:
- If the modeller is absolutely certain about a parameter value, the standard deviation can be set to zero. Although it is extremely rare that there is no uncertainty about a parameter at all, this option can be useful when designing the backbone topology of a model, prior to performing ensemble modelling, or when parts of the model are to be kept constant in all simulations.
- If the modeller has no information about the uncertainty of a reported parameter value, an arbitrary standard deviation can be assigned (e.g., the script provided for the next step assumes a default multiplicative standard deviation of 10% in such cases).
Note: Although uncertainty is usually reported in an additive form (x ± SD/SEM), theoretical biophysical considerations indicate that multiplicative errors often provide a better description about our actual uncertainty about experimental data in biology. For example, additive errors can imply that the confidence interval includes negative values. Thus, whenever possible, multiplicative standard deviations (Value */÷ SD) should be recorded. The script provided for the calculation of the probability distributions in Step 2 can handle both additive and multiplicative standard deviations. In order for the script to be able to distinguish between additive and multiplicative uncertainty, an extra column is added in the input file, where the uncertainty type is reported. This can be either 0 (additive standard deviation) or 1 (multiplicative standard deviation).
|Parameter Name||Reported values||Uncertainty||Sources|
|(μM)||0.5||N/A||J. Poultry 2(3),2002|
|4.4||±0.2 (S.D.)||Quack Res. 4(6),1998|
|10||N/A||Antarctica Biol. 1(2),2004|
|20||±1.5 (S.E.M.)(n=10)||Croc. Int. 10(3),2015|
|Parameter Name||Reported values||Standard Deviation||Multiplicative/ Additive|
Step 2: Judge plausibility of values and assign weights
When multiple parameter values are reported in the literature, assign appropriate weights according to their source. For the construction of larger models it can be helpful to use a standardized weighting scheme, to increase the internal consistency and reproducibility of the process. The weights will still be arbitrary, but they will consistently capture the modeller’s experience regarding the relative reliability of individual types of data. For example, standard weights can be assigned by answering a standard set of questions on the origin and properties of each parameter value:
- Is the value derived from an in vitro or in vivo experiment, and is this the same environment as in the experiment that will be simulated?
- Is the value derived from the modelled organism, from a phylogenetically related species, or from a completely unrelated system?
- Does the value concern the same enzyme (or protein / gene / receptor etc. depending on the case), a related class/superclass of enzyme, or a generic enzyme completely unrelated to the one used in the model?
- Are the experimental conditions (pH, temperature) the same, closely similar, or completely different from the ones simulated in the model?
According to the answer to each question, a standard partial weight can be assigned, and these weights can be multiplied to yield a total evidence weight for each experimental value (see Tables 3 and 4).
| Parameter values
| Experimental Conditions
(In vivo, chicken, hexokinase, pH=7-7.5, 41.8±0.8oC)
|In vivo/ in vitro||Organism||Enzyme||pH & Temperature|
|0.5||In vitro||Chicken||Hexokinase||pH=8, 25oC|
|4.4||In vivo||Duck||Glucokinase||pH=7, 25oC|
|10||In vitro||Penguin||Hexokinase||pH=6.8, 40oC|
|20||In vitro||Crocodile||Glucokinase||pH=7, 41oC|
| Parameter values
|Partial Weights||Total Weight|
|In vivo/ in vitro same as model?||Organism (same/ related/ unrelated)||Enzyme||Conditions (pH & temperature)|
|0.5||1 (In vitro)||4 (Same)||4 (Same)||4 (Same)||64|
|4.4||2 (In vivo)||2 (Related)||2 (Related)||1 (Unrelated)||8|
|10||1 (In vitro)||2 (Related)||4 (Same)||2 (Related)||16|
|20||1 (In vitro)||1 (Unrelated)||2 (Related)||4 (Same)||8|
Step 3: Calculate the Mode and Spread of the probability distribution
- Create a final input table which will include the parameter values, errors and weights that have been documented in Steps 1 and 2 (as shown in Table 5).
- Calculate the Mode and Spread of the probability distribution best describing our knowledge of the parameter values (Box 1) using the function CalcModeSpread.
Note: When no uncertainty estimate (e.g., standard deviation or standard error) is provided for an experimental parameter value (i.e., input for standard deviation is “NaN”), a multiplicative standard deviation of 10% (multiplicative factor 1.1) is automatically assigned to the parameter by the CalcModeSpread function, unless the uncertainty is explicitly set to zero. The standard assignment of the 10% multiplicative standard deviation is based on an extensive survey of reported error values in biology, but it remains arbitrary and can and should be modified by the user as appropriate.
|Parameter Name||Reported values||Standard Deviation||Total Weight||Multiplicative/ Additive|
Step 4: Calculate the location and scale parameters ( and ) of the log-normal distribution
The next part of the protocol is the calculation of the location () and scale () parameters of the log-normal distribution describing the parameter uncertainty (Box 2). The function CreateLogNormDist performs this surprisingly non-trivial transformation of the intuitive values of the Mode and Spread into the mathematically relevant properties of the log-normal distribution.
Step 5: Account for thermodynamic consistency
This step concerns parameters that cannot be independently sampled, because they are subject to thermodynamic constraints. In order to address the issue of thermodynamic consistency, joint probability distributions (multivariate distributions) can be employed for the interconnected parameters.
- Reversible Mass Action Reaction: Input the and the of all three initial (log-normal) distributions along with the number of parameter samples that should be generated in the function Multivariate3param. The function then generates the requested number of parameter triplets, by using the strategy detailed in Box 3.
- Enzymatic Reactions: For the 5-parameter Michaelis−Menten kinetics, input the μ and the σ of all five initial distributions as derived in Steps 1–4 of the protocol, along with the number of parameter samples that should be generated the Multivariate5param function. The function then generates the requested number of parameter sets by using the principles described above, by using the strategy described in Box 4. This strategy can be expanded to even more complex cases which would require more than five parameters (e.g., Bi–Bi, Uni–Bi, Ter–Ter mechanisms etc.), by using the internal mass action kinetic reactions to define the correlations of the external Michaelis−Menten parameters.
- Futile Cycles: Finally, it is possible that substrate (futile) cycles may be a part of a biological model. For each closed loop in the metabolic pathways, the total Gibbs free energy change () has to be zero as an additional thermodynamic constraint. This means that the equilibrium constants of the reactions involved in the cycle will need to additionally comply with the constraint . In order to address this aspect of the system, e.g. for a three reaction loop, correlate the parameters for two of the reactions independently as described above, and calculate the of the third reaction from the Gibbs energy equation (). Finally, pass the values calculated for on the internal system of equations (Michaelis-Menten, mass action etc.) of the third reaction and thus create the additional correlations.