Statistics

A statistics utility module was put together for this project to support the need for sampling and weighted choice mechanisms. It can be found in the utilitis/statistics folder.

Weighted Sampler

There are two weighted samplers implemented in this work: weightedSamplerReplacement, weightedSamplerNoReplacement. As the names would suggest in the first case items in the list may be sampled more than once, in the second items once sampled are no longer available for sampling.

The weighted samplers take:

  • dec array of weights

  • int sampleSize

The list of weights should match the indexes for the list you are sampling from. The weighted sampler will return a list of indexes of the sampled items. This allows it to work with any kind of array.

Uniform Sampler

There are two uniform samplers implemented in this work: uniformSamplerReplacement, uniformSamplerNoReplacement. As the names would suggest in the first case items in the list may be sampled more than once, in the second items once sampled are no longer available for sampling.

The uniform sampler takes:

  • int length

  • int sampleSize

The sampler randomly selects indexes with in the length of the array to return a list of sampled indexes. This allows the sampler to work with any array type.

Sum

A simple function which takes a list of decimals and returns the sum of to list.

Distributions

Statistics also contains the Distribution data type. A Distribution has a list of values of any data type, a list of decimal proportions for each value and an int length for the number of unique values in the distribution.

This is formed using the makeDistribution function which takes a hashTable of counts of each value and returns a distribution.