Monte Carlo Simulation¶
API¶
- class opgee.mcs.distro.DistroGen(distName, func)¶
Stores information required to generate a Distro instance from an argDict
- classmethod genDistros()¶
Generate a basic set of distributions
- makeRV(argDict)¶
Call the generator function with an argDict to create a frozen RV
- classmethod signature(distName, keywords)¶
Makes a unique signature for a distribution type out of its name and a collection of argument names.
- class opgee.mcs.distro.Empirical(values)¶
Create an empirical distribution and ppf from an array of observations.
- class opgee.mcs.distro.GridRV(min, max, count)¶
Return an object that behaves like an RV in that it returns N values when when requested via the ppf (percent point function), though the N values are merely a shuffled sequence of a “gridded” range repeated to produce N values. No other methods of the standard RV class are implemented. This is intended for use in CoreMCS and derivatives only.
- ppf(q)¶
Return ‘n’ values from this object’s list of values, repeating those values as many times as necessary to produce ‘n’ values, where ‘n’ is the length of the percentile list given by ‘q’. (We ignore the values, though.)
- class opgee.mcs.distro.constant(value)¶
Return an object that produces an array holding the given constant value. Useful for forcing a parameter to a given value.
- opgee.mcs.distro.logfactor(factor)¶
Define a lognormal distribution assuming the 2.5% and 97.5% values are 1/factor and factor, respectively.
- opgee.mcs.distro.lognormalRv(logMean, logStd)¶
Define a lognormal RV by its own mean and stdev
- opgee.mcs.distro.lognormalRvFor95th(lo, hi)¶
Define a lognormal RV by its 95% CI.
- opgee.mcs.distro.lognormalRvForIQR(q1, q3)¶
Define a lognormal RV by its Q1 and Q3 values
- opgee.mcs.distro.lognormalRvForNormal(mu, sigma)¶
Define a lognormal RV by the mean and stdev of the underlying Normal distribution
- opgee.mcs.distro.makeDistroKey(name, dimensions, dropZeros=False)¶
Generate a dictionary key for the variable and a list of dimension indices. This is a normal function because it is used by both the MatrixRV and ParameterSet classes. Inverse of parseDistroKey.
- opgee.mcs.distro.parseDistroKey(key)¶
Gets the name and list of dimensions from a distro key. Inverse of makeDistroKey
- class opgee.mcs.distro.sequence(values)¶
Return an object that produces an array holding the given sequence of constant values. Useful for forcing parameters to given values.
- opgee.mcs.LHS.genRankValues(params, trials, corrMat)¶
Generate a data set of ‘trials’ ranks for ‘params’ parameters that obey the given correlation matrix.
params: integer denoting number of parameters.
trials: integer denoting number of trials.
corrMat: rank correlation matrix for parameters. corrMat[i,j] denotes the rank correlation between parameter i and j.
Output is a matrix with ‘trials’ rows and ‘params’ columns. The i’th column represents the ranks for the i’th parameter.
So an input with params=3 and trials=6 might output:
- [[1,4,6],
[2,3,5], [4,1,3], [6,5,2], [5,2,1], [3,6,4]]
- opgee.mcs.LHS.getPercentiles(trials=100)¶
Generate a list of ‘trials’ values, one from each of ‘trials’ equal-size segments from a uniform distribution. These are used with an RV’s ppf (percent point function = inverse cumulative function) to retrieve the values for that RV at the corresponding percentiles.
- opgee.mcs.LHS.lhs(paramList, trials, corrMat=None, columns=None, skip=None)¶
Produce an ndarray or DataFrame of ‘trials’ rows of values for the given parameter list, respecting the correlation matrix ‘corrMat’ if one is specified, using Latin Hypercube (stratified) sampling.
The values in the i’th column are drawn from the ppf function of the i’th parameter from paramList, and each columns i and j are rank correlated according to corrMat[i,j].
- Parameters:
paramList – (list of rv-like objects representing parameters) Only requirement on parameter objects is that they must implement the ppf function.
trials – (int) number of trials to generate for each parameter.
corrMat – a numpy matrix representing the correlation between the parameters. corrMat[i,j] should give the correlation between the i’th and j’th entries of paramlist.
columns – (None or list(str)) Column names to use to return a DataFrame.
skip – (list of params)) Parameters to process later because they are dependent on other parameter values (e.g., they’re “linked”). These cannot be correlated.
- Returns:
ndarray or DataFrame with trials rows of values for the paramList.
- opgee.mcs.LHS.lhsAmend(df, rvList, trials, shuffle=True)¶
Amend the DataFrame with LHS data by adding columns for the given parameters. This allows “linked” parameters to refer to values of other parameters.
- Parameters:
df – (DataFrame) Generated by prior call to LHS or something similar.
trials – (int) the number of trials to generate for each parameter
(bool) (shuffle) – if True, shuffle the values. Set this to false for linked params.
- Returns:
none
- opgee.mcs.LHS.rankCorrCoef(m)¶
Take a 2-D array of values and produce a array of rank correlation coefficients representing the rank correlation among the columns.
- class opgee.mcs.simulation.Simulation(sim_dir, analysis_name=None, trials=0, field_names=None, save_to_path=None, meta_data_only=False)¶
Simulationrepresents the file and directory structure of a Monte Carlo simulation. Each simulation has an associated top-level directory which contains:metadata.json: currently, only the analysis name is stored here, but more stuff later.
{field_name}/trial_data.csv: values drawn from parameter distributions, with each row representing a single trial, and each column representing the vector of values drawn for a single parameter. This file is created by the “gensim” sub-command.
analysis_XXX.csv: results for the analysis named XXX. Each column represents the results of a single output variable. Each row represents the value of all output variables for one trial of a single field. The field name is thus included in each row, allowing results for all fields in a single analysis to be stored in one file.
trials: a directory holding subdirectories for each trial, allowing each to be run independently (e.g., on a multi-core or cluster computer). The directory structure under
trialscomprises two levels of 3-digit values, which, when concatenated form the trial number. That is, trial 1,423 would be found intrials/001/423. This allows up to 1 million trials while ensuring that no directory contains more than 1000 items. Limiting directory size improves performance.
- field_trial_data(field)¶
Read the trial data CSV from the top-level directory and return the DataFrame. The data is cached in the
Simulationinstance for re-use.- Parameters:
field – (opgee.Field or str) a field instance or name to read data for
- Returns:
(pd.DataFrame) the values drawn for each field, parameter, and trial.
- generate(corr_mat=None)¶
Generate simulation data for the given
Analysis.- Parameters:
corr_mat – a numpy matrix representing the correlation between each pair of parameters. corrMat[i,j] gives the desired correlation between the i’th and j’th entries of the parameter list.
- Returns:
none
- load_model(save_to_path=None)¶
Loads the model (reading just the field being run by this Simulation) from XML to avoid carrying state between trials.
- Returns:
none
- classmethod new(sim_dir, model_files, analysis_name, trials, field_names=None, overwrite=False, use_default_model=True)¶
Create the simulation directory and the
sandboxessub-directory.- Parameters:
sim_dir – (str) the top-level simulation directory
model_files – (list of XML filenames) the XML files to load, in order to be merged
analysis_name – (str) the name of the analysis for which to generate the MCS
trials – (int) the number of trials to generate
field_names – (list of str or None) Field names to limit the Simulation to use. (None => use all Fields defined in the Analysis.)
overwrite – (bool) if True, overwrite directory if it already exists, otherwise refuse to do so.
use_default_model – (bool) whether to use the default model in etc/opgee.xml as the baseline model to merge with.
- Returns:
a new
Simulationinstance
- classmethod read_metadata(sim_dir)¶
Used by runsim to get the field names without loading the whole simulation
- run(trial_nums, field_names=None)¶
Run the given Monte Carlo trials for
analysis. IffieldsisNone, all fields are run, otherwise, only the indicated fields are run.- Parameters:
trial_nums – (list of int) trials to run.
Noneimplies all trials.field_names – (list of str) names of fields to run
- Returns:
none
- run_field(field, trial_nums, packet_num=None)¶
Run the Monte Carlo trials
trial_nums` for ``field, serially. Save the (full or partial) results for this field to a CSV file in the simulation directory.- Parameters:
field – (opgee.Field) the Field to evaluate in MCS
trial_nums – (iterator of ints) the trial numbers to run, or
Noneto run all trials.packet_num – (int) the sequence number of the current packet within
field. If not None, used for naming files containing partial results.
- Returns:
(int) the number of successfully run trials
- save_trial_results(field, df, packet_num, failures)¶
Save the results of an MCS “trial packet” (which may be all trials for
fieldor just a subset of trials) to a CSV file in the simulation directory.- Parameters:
field – (opgee.Field) the Field to evaluate in MCS
df – (pandas.DataFrame) the results to save
packet_num – (int) The sequential number for this packet in
field. If not None, this is used to name the result files.failures – (list of tuples) tuples of form (trial_num, message) for each failed trial.
- Returns:
nothing
- trial_data(field, trial_num)¶
Return the values for all parameters for trial
trial_num.- Parameters:
trial_num – (int) trial number
- Returns:
(pd.Series) the values for all parameters for the given trial.
- opgee.mcs.simulation.combine_results(sim_dir, field_names, delete=False)¶
Combine CSV files containing partial results/failures from an MCS into two files, results.csv and failures.csv.
- Parameters:
sim_dir – (str) the simulation directory
field_names – (list of str) names of fields to combine results for
delete – (bool) whether to delete partial files after combining them
- Returns:
nothing
- opgee.mcs.simulation.read_distributions(pathname=None)¶
Read distributions from the designated CSV file. These are combined with those defined using the @Distribution.register() decorator, used to define distributions with dependencies.
- Parameters:
pathname – (str) the pathname of the CSV file describing parameter distributions
- Returns:
(none)