SKULL - A STATISTICAL CALCULATOR

Naming conventions

R inbuilt function  : mean() 
My version          : my_mean() 
Its input           : my_mean_input() 
Its output          : my_mean_output()

Input Data variables -

my_mean_input_dataOne
my_mean_input_dataTwo

code hierarchy

Individual files for Modules

- module_one.R 
- module_two.R

module_one.R contains

- the vector list of its utilities - mean, median, mode, etc.
- Proper documentation of its utilities
- my_mean_input()
- my_mean_output()

module_one_core.R

- my_mean()
- Retains the core functionality

MODULES

1. Descriptive Analysis

Mean | Mode | Median

my_mean(data) 
my_mode(data)
my_median(data)

where data : A vector array of raw data

Variance

my_sample_variance(data)
my_population_variance(data)

where data : A vector array of raw data

Standard Deviation

my_sample_SD(data)
my_population_SD(data)

where data : A vector array of raw data

Mean Absolute Deviation

my_mad(data)

where data : A vector array of raw data

Range | Max | Min

my_range(data)
my_min(data)
my_max(data)

where data : A vector array of raw data

Quartiles | IQR

my_quantile(data)
my_IQR(data)

where data : A vector array of raw data

Moments

my_central_moments(data)
my_raw_moments(data)

where data : A vector array of raw data

Skewness | Kurtosis

  my_skewness(data)
  my_kurtosis(data)

where data : A vector array of raw data

2. Predictive Analysis

Correlation with significance test

my_cor(datasetA,datasetB)

where,
datasetA : A vector array of raw data
datasetB : A vector array of raw data

my_cor_significance_test(r,n,alpha)

where,
r : The correlation value to be tested
n : Size of dataset on which the correlation is found
alpha : level of significance

Multiple Linear Regression

my_multi_linear_regression(x1,x2,y)

where,
x1 , x2 : Variable to form matrix X
y : Variable to form matrix Y

3. Probability Analysis

factorial

my_factorial(number)

where
number : The number whose factorial needs to be found.

Permutations | Combinations

my_permutation(n,r)
my_permutation(n,r)

where
n : Total number of objects.
r : Objects taken at a time.

Basic Probability

my_basic_prob(favorable,total)

where
favourable: Favourable outcomes
total : Total outcomes

Conditional Probability

my_cond_prob(A,B,S)

where,
A : fav outcomes for Desired Event (A)
B : fav outcomes for Given Event (B)
S : Sample Space (S)

Bayes Theorem

my_bayes_theorm(An,Bn_given_An,k)

where,
Sample space can be divided into N events
An : A list of P(Ai) for each event i in n
( sum of probablitities of Ai must be 1 )
Bn_given_An : A list of P(B|Ai) for each event i in n
k : To find P(Ak|Hypothesis) we fix event k

4. Discrete Distribution Functions

Uniform

my_discrete_uniform(n,k)

where
n : Total Number of discrete groups
k : Find upto which group

Bernoulli

my_bernoulli(p,x)

where
p : probability of success in event
x : can have value either 0 or 1

Binomial

my_binomial(r,n,p)

where
n : Total number of events
r : No of events to be succeeded
p : probability of success in one event

Geometric

my_geometric(x,p)

where
x : Trial on which the user succeeded (x)
p : probability of success in one trial

Hyper-geometric

my_hyper_geometric(N,n,M,x)

where
N : population size (N)
n : number of draws (n)
M : Number of fav outcomes in population (M)
x : Required fav outcomes in draws (x)

Negative Binomial

my_negative_binomial(r,n,p)

where
n : Total number of events
r : No of events to be succeeded
p : probability of success in one event

Poisson

my_poisson(x,lambda)

where,
lambda : The mean value of the number of successes that are occurring in the region specified
x : The actual number of the successes that are occurring in the region specified. (x)

Multinomial

my_multinomial(d,n,prob)

where
n : Total number of events occurred
Events can be divided into s subgroups
prob : List of Probability of winning for each subgroup Si
d : List of Actual Number of events won by each subgroup (d)

Multivariate Hypergeometric

my_multivariate_hyper_geometric(M,D,n,x)

where
M : Total favourable outcomes in the dataset
Dataset can be divided into s subgroups
D : List of Fav outcomes available in each sub group Si
n : Lot size for draw from population
x : List of Number the fav outcomes wanted from each subgroup Si

5. Continuous Distribution Functions

Uniform

my_cont_uniform(alpha,beta,low,high)

The continuous uniform distribution is the probability distribution
of random number selection from the continuous interval between alpha and beta
where,
alpha : lower limit of interval
beta : upper limit of interval
low : lower limit for Integration
high : Upper limit for Integration

Normal

my_cont_normal(mean,psd,low,high)

where,
mean : Mean
psd : Population Standard Deviation
low : lower limit for Integration
high : Upper limit for Integration

Gamma

my_gamma_test(alpha,beta,low,high)

A random variable X is gamma-distributed with shape alpha and rate beta
where,
alpha : decides shape of distribution
beta : decides rate of distribution
low : lower limit for Integration
high : Upper limit for Integration

Exponential

my_exp_dist(lambda,low,high)

where,
special case of gamma distribution
lambda: 1/beta
low : lower limit for Integration
high : Upper limit for Integration

6. Sample Distribution Test Statistic

Z-test

my_z_test(avg,mu,pSD,n,alpha,flag)

where
n : Sample size
avg : Sample Average (x bar)
mu : Mean for hypothesis testing
pSD : Standard Deviation (sigma)
NULL Hypothesis : X bar is equal to Mu
alpha : level of significance (alpha)
flag : To choose between two tail or one tail. flag==0 is '=' case
flag==1 is '<' case
flag==2 is '>' case

Student t-test

my_t_test(data,mu,alpha,flag)

where,
data : Dataset on which the test is to be performed.
mu : Mean for hypothesis testing
NULL Hypothesis : X bar of dataset is equal to Mu
alpha : level of significance (alpha)
flag : To choose between two tail or one tail.
flag==0 is '=' case
flag==1 is '<' case
flag==2 is '>' case

F-test

my_f_test(data1,data2,alpha)

where
data1 : Dataset 1
data2 : Dataset 2
NULL Hypothesis : Both Data sets have approximately equal variances
alpha : level of significance (alpha)

Chi-Square

my_chi_square_test(data,pVar,alpha,flag)

where
data : Dataset on which the test is to be performed.
pVar : Variance for hypothesis testing
NULL Hypothesis : Variance of dataset is equal to pvar
alpha : level of significance (alpha)
flag : To choose between two tail or one tail.
flag==0 is '=' case
flag==1 is '<' case
flag==2 is '>' case

Shapiro Wilk test - Inbuilt

my_shapiro_test(data,alpha)

where,
data : Dataset on which the test is to be performed.
NULL Hypothesis : Dataset is Normally distributed.
alpha : level of significance (alpha)

7. Interval Estimation

Estimation of Means | Variance Known

my_est_mean_var_known(avg,pSD,n,alpha)

where,
n : Sample size (n)
avg : average (x bar)
pSD : Standard Deviation (sigma)
alpha : level of significance (alpha)

Estimation of Means | Variance Unknown

my_est_mean_var_unknown(data,alpha)

where,
data : Dataset on which the estimate is to be done
alpha : level of significance (alpha)

Estimation of Differences in Means | Variance Known

my_est_diff_mean_var_known(avg1,avg2,pVar1,pVar2,n1,n2,alpha)

where,
n1 : Sample size of dataset 1 (n1)
avg1 : average of dataset 1 (x bar1)
pVar1 : population variance (sigma sq1)
n1 : Sample size of dataset 2 (n2)
avg1 : average of dataset 2 (x bar2)
pVar1 : population variance (sigma sq2)
alpha : level of significance (alpha)

Estimation of Differences in Means | Variance Unknown

my_est_diff_mean_var_unknown(data1,data2,alpha)

where,
data1 : Dataset 1
data2 : Dataset 2
Estimate will be performed on these two data sets
alpha : level of significance (alpha)

Estimation of Proportions

my_est_prop(fav,n,alpha)

where,
fav : number of Favourable outcomes
n : number of Total Outcomes
alpha : level of significance (alpha)

Estimation of Differences in Proportions

my_est_diff_prop(fav1,n1,fav2,n2,alpha)

where,
fav1 : number of Favourable outcomes of dataset 1
n1 : number of Total Outcomes of dataset 1
fav2 : number of Favourable outcomes of dataset 2
n2 : number of Total Outcomes of dataset 2
alpha : level of significance (alpha)

Estimation of Variances

    my_est_var(data,pVar,alpha)

where,
data : Dataset on which the estimate is to be done
pVar : Population variance
alpha : level of significance (alpha)

Estimation of Ratio of Two Variances

    my_est_ratio_var(data1,data2,pVar1,pVar2,alpha)

where,
data1 : Dataset 1 on which the estimate is to be done
pVar1 : Population variance for dataset 1
data2 : Dataset 2 on which the estimate is to be done
pVar2 : Population variance for dataset 2
alpha : level of significance (alpha)

8. Non-Parametric Analysis

Sign Test | Wilcoxon Signed-Rank test

my_sign_test(data,mean,alpha,flag)
my_signed_rank_test(data,mean,alpha,flag)

where
data : Raw Data on which the test is to be performed.
mean : Mean for hypothesis testing (Mu)
alpha : level of significance (alpha)
flag : To choose between two tail or one tail.
flag==0 is '=' case
flag==1 is '<' case
flag==2 is '>' case

Mann-Whitney Test

my_mann_whitney_test(data1,data2,alpha,flag)

where,
data1 : Data set 1
data2 : Data set 2
Null hypothesis : Both Data Sets have approximately equal means.
alpha : level of significance (alpha)
flag : To choose between two tail or one tail.
flag==0 is '=' case
flag==1 is '<' case
flag==2 is '>' case

Kruskal-Wallis Test

my_kruskal_wallis(data1,data2,data3,alpha)

where,
data1 : Data set 1
data2 : Data set 2
data3 : Data set 3
Null hypothesis : All Data Sets have approximately equal means.
alpha : level of significance (alpha)

9. Visualizations

Histograms | Bar Graph | Pie Chart | Stem-leaf plot | Pareto Chart

my_hist(data)
my_barplot(data)
my_pie(data)
my_stem(data)
my_pareto_char(data)

where
data : is the raw data which needs to be plotted

Scatter plot | Box-plot | q-q plot

my_scatter(datax, datay)
my_boxplot(datax, datay)
my_qplot(datax, datay)

where datax : is the raw data which is plotted across x axis
datay : is the raw data which is plotted across y axis

Line Graph

my_linegraph(data1, data2, data3)

where data1 : for Plotting line 1 data2 : for Plotting line 2
data3 : for Plotting line 3

SKULL

A Statistical Calculator

SKULL - A STATISTICAL CALCULATOR

Naming conventions

Input Data variables -

code hierarchy

Individual files for Modules

module_one.R contains

module_one_core.R

MODULES

1. Descriptive Analysis

Mean | Mode | Median

Variance

Standard Deviation

Mean Absolute Deviation

Range | Max | Min

Quartiles | IQR

Moments

Skewness | Kurtosis

2. Predictive Analysis

Correlation with significance test

Multiple Linear Regression

3. Probability Analysis

factorial

Permutations | Combinations

Basic Probability

Conditional Probability

Bayes Theorem

4. Discrete Distribution Functions

Uniform

Bernoulli

Binomial

Geometric

Hyper-geometric

Negative Binomial

Poisson

Multinomial

Multivariate Hypergeometric

5. Continuous Distribution Functions

Uniform

Normal

Gamma

Exponential

6. Sample Distribution Test Statistic

Z-test

Student t-test

F-test

Chi-Square

Shapiro Wilk test - Inbuilt

7. Interval Estimation

Estimation of Means | Variance Known

Estimation of Means | Variance Unknown

Estimation of Differences in Means | Variance Known

Estimation of Differences in Means | Variance Unknown

Estimation of Proportions

Estimation of Differences in Proportions

Estimation of Variances

Estimation of Ratio of Two Variances

8. Non-Parametric Analysis

Sign Test | Wilcoxon Signed-Rank test

Mann-Whitney Test

Kruskal-Wallis Test

9. Visualizations

Histograms | Bar Graph | Pie Chart | Stem-leaf plot | Pareto Chart

Scatter plot | Box-plot | q-q plot

Line Graph