Monte Carlo Simulation¶

API¶

class opgee.mcs.distro.DistroGen(distName, func)¶

Stores information required to generate a Distro instance from an argDict

classmethod genDistros()¶: Generate a basic set of distributions

makeRV(argDict)¶: Call the generator function with an argDict to create a frozen RV

classmethod signature(distName, keywords)¶: Makes a unique signature for a distribution type out of its name and a collection of argument names.

class opgee.mcs.distro.Empirical(values)¶: Create an empirical distribution and ppf from an array of observations.

class opgee.mcs.distro.GridRV(min, max, count)¶

Return an object that behaves like an RV in that it returns N values when when requested via the ppf (percent point function), though the N values are merely a shuffled sequence of a “gridded” range repeated to produce N values. No other methods of the standard RV class are implemented. This is intended for use in CoreMCS and derivatives only.

ppf(q)¶: Return ‘n’ values from this object’s list of values, repeating those values as many times as necessary to produce ‘n’ values, where ‘n’ is the length of the percentile list given by ‘q’. (We ignore the values, though.)

class opgee.mcs.distro.constant(value)¶: Return an object that produces an array holding the given constant value. Useful for forcing a parameter to a given value.

opgee.mcs.distro.logfactor(factor)¶: Define a lognormal distribution assuming the 2.5% and 97.5% values are 1/factor and factor, respectively.

opgee.mcs.distro.lognormalRv(logMean, logStd)¶: Define a lognormal RV by its own mean and stdev

opgee.mcs.distro.lognormalRvFor95th(lo, hi)¶: Define a lognormal RV by its 95% CI.

opgee.mcs.distro.lognormalRvForIQR(q1, q3)¶: Define a lognormal RV by its Q1 and Q3 values

opgee.mcs.distro.lognormalRvForNormal(mu, sigma)¶: Define a lognormal RV by the mean and stdev of the underlying Normal distribution

opgee.mcs.distro.makeDistroKey(name, dimensions, dropZeros=False)¶: Generate a dictionary key for the variable and a list of dimension indices. This is a normal function because it is used by both the MatrixRV and ParameterSet classes. Inverse of parseDistroKey.

opgee.mcs.distro.parseDistroKey(key)¶: Gets the name and list of dimensions from a distro key. Inverse of makeDistroKey

class opgee.mcs.distro.sequence(values)¶: Return an object that produces an array holding the given sequence of constant values. Useful for forcing parameters to given values.

opgee.mcs.LHS.genRankValues(params, trials, corrMat)¶

Generate a data set of ‘trials’ ranks for ‘params’ parameters that obey the given correlation matrix.

params: integer denoting number of parameters.

trials: integer denoting number of trials.

corrMat: rank correlation matrix for parameters. corrMat[i,j] denotes the rank correlation between parameter i and j.

Output is a matrix with ‘trials’ rows and ‘params’ columns. The i’th column represents the ranks for the i’th parameter.

So an input with params=3 and trials=6 might output:

[[1,4,6],: [2,3,5], [4,1,3], [6,5,2], [5,2,1], [3,6,4]]

opgee.mcs.LHS.getPercentiles(trials=100)¶: Generate a list of ‘trials’ values, one from each of ‘trials’ equal-size segments from a uniform distribution. These are used with an RV’s ppf (percent point function = inverse cumulative function) to retrieve the values for that RV at the corresponding percentiles.

opgee.mcs.LHS.lhs(paramList, trials, corrMat=None, columns=None, skip=None)¶

Produce an ndarray or DataFrame of ‘trials’ rows of values for the given parameter list, respecting the correlation matrix ‘corrMat’ if one is specified, using Latin Hypercube (stratified) sampling.

The values in the i’th column are drawn from the ppf function of the i’th parameter from paramList, and each columns i and j are rank correlated according to corrMat[i,j].

Parameters:

paramList – (list of rv-like objects representing parameters) Only requirement on parameter objects is that they must implement the ppf function.
trials – (int) number of trials to generate for each parameter.
corrMat – a numpy matrix representing the correlation between the parameters. corrMat[i,j] should give the correlation between the i’th and j’th entries of paramlist.
columns – (None or list(str)) Column names to use to return a DataFrame.
skip – (list of params)) Parameters to process later because they are dependent on other parameter values (e.g., they’re “linked”). These cannot be correlated.

Returns:

ndarray or DataFrame with trials rows of values for the paramList.

opgee.mcs.LHS.lhsAmend(df, rvList, trials, shuffle=True)¶

Amend the DataFrame with LHS data by adding columns for the given parameters. This allows “linked” parameters to refer to values of other parameters.

Parameters:

df – (DataFrame) Generated by prior call to LHS or something similar.
trials – (int) the number of trials to generate for each parameter
(bool) (shuffle) – if True, shuffle the values. Set this to false for linked params.

Returns:

none

opgee.mcs.LHS.rankCorrCoef(m)¶: Take a 2-D array of values and produce a array of rank correlation coefficients representing the rank correlation among the columns.

class opgee.mcs.simulation.Simulation(sim_dir, analysis_name=None, trials=0, field_names=None, save_to_path=None, meta_data_only=False)¶

Simulation represents the file and directory structure of a Monte Carlo simulation. Each simulation has an associated top-level directory which contains:

metadata.json: currently, only the analysis name is stored here, but more stuff later.
{field_name}/trial_data.csv: values drawn from parameter distributions, with each row representing a single trial, and each column representing the vector of values drawn for a single parameter. This file is created by the “gensim” sub-command.
analysis_XXX.csv: results for the analysis named XXX. Each column represents the results of a single output variable. Each row represents the value of all output variables for one trial of a single field. The field name is thus included in each row, allowing results for all fields in a single analysis to be stored in one file.
trials: a directory holding subdirectories for each trial, allowing each to be run independently (e.g., on a multi-core or cluster computer). The directory structure under trials comprises two levels of 3-digit values, which, when concatenated form the trial number. That is, trial 1,423 would be found in trials/001/423. This allows up to 1 million trials while ensuring that no directory contains more than 1000 items. Limiting directory size improves performance.

field_trial_data(field)¶

Read the trial data CSV from the top-level directory and return the DataFrame. The data is cached in the Simulation instance for re-use.

Parameters:: field – (opgee.Field or str) a field instance or name to read data for
Returns:: (pd.DataFrame) the values drawn for each field, parameter, and trial.

generate(corr_mat=None)¶

Generate simulation data for the given Analysis.

Parameters:: corr_mat – a numpy matrix representing the correlation between each pair of parameters. corrMat[i,j] gives the desired correlation between the i’th and j’th entries of the parameter list.
Returns:: none

load_model(save_to_path=None)¶

Loads the model (reading just the field being run by this Simulation) from XML to avoid carrying state between trials.

Returns:: none

classmethod new(sim_dir, model_files, analysis_name, trials, field_names=None, overwrite=False, use_default_model=True)¶

Create the simulation directory and the sandboxes sub-directory.

Parameters:

sim_dir – (str) the top-level simulation directory
model_files – (list of XML filenames) the XML files to load, in order to be merged
analysis_name – (str) the name of the analysis for which to generate the MCS
trials – (int) the number of trials to generate
field_names – (list of str or None) Field names to limit the Simulation to use. (None => use all Fields defined in the Analysis.)
overwrite – (bool) if True, overwrite directory if it already exists, otherwise refuse to do so.
use_default_model – (bool) whether to use the default model in etc/opgee.xml as the baseline model to merge with.

Returns:

a new Simulation instance

classmethod read_metadata(sim_dir)¶: Used by runsim to get the field names without loading the whole simulation

run(trial_nums, field_names=None)¶

Run the given Monte Carlo trials for analysis. If fields is None, all fields are run, otherwise, only the indicated fields are run.

Parameters:

trial_nums – (list of int) trials to run. None implies all trials.
field_names – (list of str) names of fields to run

Returns:

none

run_field(field, trial_nums, packet_num=None)¶

Run the Monte Carlo trials trial_nums` for ``field, serially. Save the (full or partial) results for this field to a CSV file in the simulation directory.

Parameters:

field – (opgee.Field) the Field to evaluate in MCS
trial_nums – (iterator of ints) the trial numbers to run, or None to run all trials.
packet_num – (int) the sequence number of the current packet within field. If not None, used for naming files containing partial results.

Returns:

(int) the number of successfully run trials

save_trial_results(field, df, packet_num, failures)¶

Save the results of an MCS “trial packet” (which may be all trials for field or just a subset of trials) to a CSV file in the simulation directory.

Parameters:

field – (opgee.Field) the Field to evaluate in MCS
df – (pandas.DataFrame) the results to save
packet_num – (int) The sequential number for this packet in field. If not None, this is used to name the result files.
failures – (list of tuples) tuples of form (trial_num, message) for each failed trial.

Returns:

nothing

trial_data(field, trial_num)¶

Return the values for all parameters for trial trial_num.

Parameters:: trial_num – (int) trial number
Returns:: (pd.Series) the values for all parameters for the given trial.

opgee.mcs.simulation.combine_results(sim_dir, field_names, delete=False)¶

Combine CSV files containing partial results/failures from an MCS into two files, results.csv and failures.csv.

Parameters:

sim_dir – (str) the simulation directory
field_names – (list of str) names of fields to combine results for
delete – (bool) whether to delete partial files after combining them

Returns:

nothing

opgee.mcs.simulation.read_distributions(pathname=None)¶

Read distributions from the designated CSV file. These are combined with those defined using the @Distribution.register() decorator, used to define distributions with dependencies.

Parameters:: pathname – (str) the pathname of the CSV file describing parameter distributions
Returns:: (none)