CRAN Package Check Results for Package DBERlibR

Last updated on 2023-09-26 14:04:54 CEST.

Flavor	Version	T_install	T_check	T_total	Status
r-devel-linux-x86_64-debian-clang	0.1.3	9.24	136.68	145.92	ERROR
r-devel-linux-x86_64-debian-gcc	0.1.3	7.74	103.06	110.80	ERROR
r-devel-linux-x86_64-fedora-clang	0.1.3			183.71	ERROR
r-devel-linux-x86_64-fedora-gcc	0.1.3			199.64	ERROR
r-devel-windows-x86_64	0.1.3	13.00	115.00	128.00	ERROR
r-patched-linux-x86_64	0.1.3	6.20	147.45	153.65	OK
r-release-linux-x86_64	0.1.3	7.88	150.84	158.72	OK
r-release-macos-arm64	0.1.3			78.00	OK
r-release-macos-x86_64	0.1.3			114.00	OK
r-release-windows-x86_64	0.1.3	16.00	154.00	170.00	OK
r-oldrel-macos-arm64	0.1.3			61.00	OK
r-oldrel-macos-x86_64	0.1.3			83.00	OK
r-oldrel-windows-x86_64	0.1.3	17.00	163.00	180.00	OK

Check Details

Version: 0.1.3
Check: examples
Result: ERROR
    Running examples in ‘DBERlibR-Ex.R’ failed
    The error most likely occurred in:

    > base::assign(".ptime", proc.time(), pos = "CheckExEnv")
    > ### Name: independent_samples
    > ### Title: Independent Samples Data Analysis
    > ### Aliases: independent_samples
    >
    > ### ** Examples
    >
    > # Run the following codes directly in the console panel. The plots
    > # generated through the link above may be displaced depending on the screen
    > # resolution.
    > independent_samples(treat_csv_data =
    + system.file("extdata", "data_treat_post.csv", package = "DBERlibR"),
    + ctrl_csv_data =
    + system.file("extdata", "data_ctrl_post.csv", package = "DBERlibR"),
    + m_cutoff = 0.15)




    Error in t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup, :
     cannot use 'paired' in formula method
    Calls: independent_samples -> t.test -> t.test.formula
    Execution halted
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-debian-gcc

Version: 0.1.3
Check: tests
Result: ERROR
     Running ‘testthat.R’ [17s/27s]
    Running the tests in ‘tests/testthat.R’ failed.
    Complete output:
     > # This file is part of the standard setup for testthat.
     > # It is recommended that you do not modify it.
     > #
     > # Where should you do additional test configuration?
     > # Learn more about the roles of various files in:
     > # * https://r-pkgs.org/tests.html
     > # * https://testthat.r-lib.org/reference/test_package.html#special-files
     >
     > library(testthat)
     > library(DBERlibR)
     >
     > test_check("DBERlibR")
     ==============================
     The number of students deleted
     n
     treat_pre_data 0
     treat_post_data 1
     ctrl_pre_data 1
     ctrl_post_data 0
     --> 0 student(s) from the treatment group pre-test data, 1 student(s) from the treatment group post-test data, 1 student(s) from the control group pre-test data, and 0 student(s) from the control group post-test data have been deleted since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     Pre-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.567 0.101 0.318 0.727
     2 Treatment 0.572 0.0982 0.364 0.773
     -------------------------
     Post-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.591 0.108 0.364 0.773
     2 Treatment 0.656 0.112 0.364 0.864
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics


     ===============================
     Results of Checking Assumptions

     # Linearity:
     Refer to the scatter plot to check linearity in the 'Plots' panel. If two lines look almost paralleled, they can be interpreted as meeting the assumption of linearity.
     ## Interpretation: if you are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then you can say this assumption has been met or the data has not violated this assumption of linearity. If your relationships are not linear, you have violated this assumption, and an ANCOVA is not a suitable analysis. However, you might be able to coax your data to have a linear relationship by transforming the covariate, and still be able to run an ANCOVA.
     -------------------------
     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: norm.all.aov$residuals
     W = 0.98453, p-value = 0.3273

     ## Interpretation: the assumption of normality by group has been met (p>0.05).
     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.
     ---------------------------
     # Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 1 0.0084 0.9273
     93
     ## Interpretation: the assumption of equality of error variances has been met (p>0.05).
     ----------------------------------------
     # Homogeneity of Regression Slopes:
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 91 8.071 0.006 * 0.081
     2 avg_score_pre 1 91 1.597 0.210 0.017
     3 datagroup:avg_score_pre 1 91 0.549 0.461 0.006
     ## Interpretation: there was homogeneity of regression slopes as the interaction term (i.e., datagroup:avg_score_pre) was not statistically significant (p>0.05).
     ----------
     # Outliers: No outlier has been found.


     ==================================
     Results of the main One-way ANCOVA
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 92 8.111 0.005 * 0.081
     2 avg_score_pre 1 92 1.605 0.208 0.017
     --------------------------
     # Estimated Marginal Means
     avg_score_pre datagroup emmean se df conf.low conf.high
     1 0.5693158 Control 0.5912314 0.01535562 92 0.5607338 0.6217290
     2 0.5693158 Treatment 0.6555045 0.01653250 92 0.6226695 0.6883395
     method
     1 Emmeans test
     2 Emmeans test
     --> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be significant with pre-test scores being controlled: F(1,92)=8.111, p=0.005 (effect size=0.081). The adjusted marginal mean of post-test scores of the treatment group (0.66, SE=,0.02) was significantly different from that of the control group (0.59, SE=,0.02).


     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     # A tibble: 4 x 5
     group variable n mean sd
     <chr> <fct> <dbl> <dbl> <dbl>
     1 Freshman average_score 11 0.562 0.091
     2 Junior average_score 14 0.594 0.103
     3 Senior average_score 10 0.532 0.113
     4 Sophomore average_score 15 0.573 0.087
     Refer to the boxplot in the 'Plots' panel.


     ==============================
     Results of Testing Assumptions

     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: resid(one_way_anova)
     W = 0.96075, p-value = 0.09553

     ## Interpretation: the assumption of normality by group has been met (p>0.05).

     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.

     -------------------------
     Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 3 0.371 0.7743
     46
     ## Interpretation: the assumption of equality of variances has been met (p>0.05).

     ===================================================================================
     Results of One-way ANOVA: Group Difference(s) (Parametric: Equal variances assumed)
     Df Sum Sq Mean Sq F value Pr(>F)
     factor(group) 3 0.0233 0.007775 0.805 0.497
     Residuals 46 0.4442 0.009657
     ----------------------------------------------
     Pairwide Comparisons (Equal variances assumed)
     Tukey multiple comparisons of means
     95% family-wise confidence level

     Fit: aov(formula = average_score ~ factor(group), data = data_original)

     $`factor(group)`
     diff lwr upr p adj
     Junior-Freshman 0.03223377 -0.07330199 0.13776952 0.8474823
     Senior-Freshman -0.03000909 -0.14445579 0.08443761 0.8969661
     Sophomore-Freshman 0.01069091 -0.09328546 0.11466728 0.9927074
     Senior-Junior -0.06224286 -0.17069336 0.04620765 0.4284339
     Sophomore-Junior -0.02154286 -0.11888016 0.07579445 0.9346245
     Sophomore-Senior 0.04070000 -0.06623364 0.14763364 0.7418160





     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ==================================
     Item Analysis Results - Difficulty
     q.number difficulty_index
     1 Q1 0.86
     2 Q2 0.44
     3 Q3 0.70
     4 Q4 0.36
     5 Q5 0.46
     6 Q6 0.62
     7 Q7 0.60
     8 Q8 0.48
     9 Q9 0.64
     10 Q10 0.58
     11 Q11 0.70
     12 Q12 0.60
     13 Q13 0.36
     14 Q14 0.54
     15 Q15 0.78
     16 Q16 0.72
     17 Q17 0.64
     18 Q18 0.38
     19 Q19 0.58
     20 Q20 0.56
     21 Q21 0.38
     22 Q22 0.52
     23 avg_score 0.57
     Refer to 'Difficulty Plot' in the 'Plots' panel.

     As seen in the difficulty plot, none of the difficulty indixes was found to be lower than 0.2.

     ======================================
     Item Analysis Results - Discrimination
     qnumber discrimination_index
     1 Q1 -0.12
     2 Q2 -0.25
     3 Q3 0.15
     4 Q4 0.08
     5 Q5 -0.16
     6 Q6 0.09
     7 Q7 0.08
     8 Q8 0.19
     9 Q9 0.33
     10 Q10 0.14
     11 Q11 0.30
     12 Q12 0.29
     13 Q13 0.53
     14 Q14 0.16
     15 Q15 0.17
     16 Q16 0.48
     17 Q17 -0.06
     18 Q18 0.57
     19 Q19 0.31
     20 Q20 0.55
     21 Q21 0.30
     22 Q22 0.28
     23 avg_score 1.00
     Refer to 'Discrimination Plot' in the 'Plots' panel

     As seen in the discrimination plot, the following question items present a discrimination index lower than 0.2:
     [1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q10" "Q14" "Q15" "Q17"


     ======================
     --> 0 student(s) from the pre-test data and 1 student(s) from the post-test data have been deleted since they have more than 15% of skipped answers.


     =================================================
     Pre-test/Post-test Scores' Descriptive Statistics
     # A tibble: 2 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 Post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     Refer to the boxplot in the 'Plots' panel to visually examine the descriptive statistics.

     ===================================
     Testing the Assumption of Normality
     # A tibble: 1 x 3
     variable statistic p
     <chr> <dbl> <dbl>
     1 avg_diff 0.948 0.0465
     ## Interpretation: the assumption of normality by group has NOT been met (p<0.05)
     Refer to the histogram and normal q-q plot to check the normality visually

     =====================
     Paired T-Test Results

     Paired t-test

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     t = -4.1859, df = 43, p-value = 0.0001378
     alternative hypothesis: true mean difference is not equal to 0
     95 percent confidence interval:
     -0.12396457 -0.04335361
     sample estimates:
     mean difference
     -0.08365909

     ## Sample Interpretation of the outputs above:
     --> The average pre-test score was 0.57 and the average post-test score was 0.66. The Paired Samples T-Test showed that the pre-post difference was statistically significant (p=0).


     _______________________________________
     ## The Shapiro-Wilk test result shows a violation of normality assumption (p=0.047). Although the t-test is known to be robust to a violation of the normality assumption, you may want to refer to the Wilcoxon signed rank test results below to be safe.
     =====================================
     Wilcoxon Signed Rank Sum Test Results

     Wilcoxon signed rank test with continuity correction

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     V = 142.5, p-value = 0.0003248
     alternative hypothesis: true location shift is not equal to 0

     ## A sample interpretation of the Wilcoxon signed rank test results above:
     --> The Wilcoxon signed rank test results above show that the pre-post difference was statistically significant (p=0).


     ==============================
     The number of students deleted
     n
     pre_data 0
     post_data 1
     post2_data 0
     --> 0 student(s) from the pre-test data, 1 student(s) from the post-test data, and 0 student(s) from the post2-test data have been deleted since they have more than 15% of skipped answers.


     ======================================
     Descriptive Statistics: Average Scores
     # A tibble: 3 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     3 Post2 Score 44 0.409 0.909 0.682 0.137 0.691 0.121 0.018 0.037
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics.


     ==============================
     Results of Testing Assumptions

     -----------
     # Outliers:
     # A tibble: 2 x 5
     Time id Score is.outlier is.extreme
     <fct> <fct> <dbl> <lgl> <lgl>
     1 post 44 0.364 TRUE FALSE
     2 Post2 44 0.409 TRUE FALSE
     ## Interpretation: No extreme outlier was identified in your data.

     ---------------------------
     # Normality:

     Shapiro-Wilk normality test

     data: resid(res.aov)
     W = 0.98561, p-value = 0.1801

     --> Interpretation: the average test score was normally distributed at each time point, as assessed by Shapiro-Wilk test (p>0.05).

     If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size (>50 in this case).

     --> Interpretation: if all the points fall in the plots above approximately along the reference line, you can assume normality.

     -------------
     # Sphericity:
     --> The assumption of sphericity has been checked during the computation of the ANOVA test (the Mauchly's test has been internally run to assess the sphericity assumption). Then, the Greenhouse-Geisser sphericity correction has been automatically applied to factors violating the sphericity of assumption.

     ===============================================================
     Result of the main One-way Repeated Measures ANOVA (Parametric)
     ANOVA Table (type III tests)

     Effect DFn DFd F p p<.05 ges
     1 Time 1.18 50.6 23.302 4.54e-06 * 0.171
     --> Interpretation: The average test score at different time points of the intervention are statistically different: F(1.18 50.6)=23.302, p<0.001, eta2(g)=0.171.


     --------------------
     Pairwise Comparisons
     # A tibble: 3 x 10
     .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
     * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
     1 Score Pre post 44 44 -4.19 43 1.38e-4 4.14e-4 ***
     2 Score Pre Post2 44 44 -5.31 43 3.66e-6 1.10e-5 ****
     3 Score post Post2 44 44 -4.58 43 3.92e-5 1.18e-4 ***
     --> Interpretation for 1: The average pre-test score (0.572) and the average post-test score (0.656) are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.001).
     --> Interpretation for 2: The average post-test score (0.656) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.001).
     --> Interpretation for 3: The average pre-test score (0.572) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.001).
     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]

     ══ Failed tests ════════════════════════════════════════════════════════════════
     ── Error ('test-independent_samples.R:6:3'): the function of independent_samples() works ──
     Error in `t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup,
     mu = 0, alt = "two.sided", conf = 0.95, var.eq = T, paired = F)`: cannot use 'paired' in formula method
     Backtrace:
     ▆
     1. └─DBERlibR::independent_samples(...) at test-independent_samples.R:6:2
     2. ├─stats::t.test(...)
     3. └─stats:::t.test.formula(...)

     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]
     Error: Test failures
     Execution halted
Flavor: r-devel-linux-x86_64-debian-clang

Version: 0.1.3
Check: re-building of vignette outputs
Result: ERROR
    Error(s) in re-building vignettes:
     ...
    --- re-building ‘dberlibr-vignette.Rmd’ using rmarkdown

    Quitting from lines 321-323 [unnamed-chunk-17] (dberlibr-vignette.Rmd)
    Error: processing vignette 'dberlibr-vignette.Rmd' failed with diagnostics:
    cannot use 'paired' in formula method
    --- failed re-building ‘dberlibr-vignette.Rmd’

    SUMMARY: processing the following file failed:
     ‘dberlibr-vignette.Rmd’

    Error: Vignette re-building failed.
    Execution halted
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-debian-gcc

Version: 0.1.3
Check: tests
Result: ERROR
     Running ‘testthat.R’ [14s/29s]
    Running the tests in ‘tests/testthat.R’ failed.
    Complete output:
     > # This file is part of the standard setup for testthat.
     > # It is recommended that you do not modify it.
     > #
     > # Where should you do additional test configuration?
     > # Learn more about the roles of various files in:
     > # * https://r-pkgs.org/tests.html
     > # * https://testthat.r-lib.org/reference/test_package.html#special-files
     >
     > library(testthat)
     > library(DBERlibR)
     >
     > test_check("DBERlibR")
     ==============================
     The number of students deleted
     n
     treat_pre_data 0
     treat_post_data 1
     ctrl_pre_data 1
     ctrl_post_data 0
     --> 0 student(s) from the treatment group pre-test data, 1 student(s) from the treatment group post-test data, 1 student(s) from the control group pre-test data, and 0 student(s) from the control group post-test data have been deleted since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     Pre-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.567 0.101 0.318 0.727
     2 Treatment 0.572 0.0982 0.364 0.773
     -------------------------
     Post-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.591 0.108 0.364 0.773
     2 Treatment 0.656 0.112 0.364 0.864
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics


     ===============================
     Results of Checking Assumptions

     # Linearity:
     Refer to the scatter plot to check linearity in the 'Plots' panel. If two lines look almost paralleled, they can be interpreted as meeting the assumption of linearity.
     ## Interpretation: if you are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then you can say this assumption has been met or the data has not violated this assumption of linearity. If your relationships are not linear, you have violated this assumption, and an ANCOVA is not a suitable analysis. However, you might be able to coax your data to have a linear relationship by transforming the covariate, and still be able to run an ANCOVA.
     -------------------------
     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: norm.all.aov$residuals
     W = 0.98453, p-value = 0.3273

     ## Interpretation: the assumption of normality by group has been met (p>0.05).
     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.
     ---------------------------
     # Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 1 0.0084 0.9273
     93
     ## Interpretation: the assumption of equality of error variances has been met (p>0.05).
     ----------------------------------------
     # Homogeneity of Regression Slopes:
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 91 8.071 0.006 * 0.081
     2 avg_score_pre 1 91 1.597 0.210 0.017
     3 datagroup:avg_score_pre 1 91 0.549 0.461 0.006
     ## Interpretation: there was homogeneity of regression slopes as the interaction term (i.e., datagroup:avg_score_pre) was not statistically significant (p>0.05).
     ----------
     # Outliers: No outlier has been found.


     ==================================
     Results of the main One-way ANCOVA
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 92 8.111 0.005 * 0.081
     2 avg_score_pre 1 92 1.605 0.208 0.017
     --------------------------
     # Estimated Marginal Means
     avg_score_pre datagroup emmean se df conf.low conf.high
     1 0.5693158 Control 0.5912314 0.01535562 92 0.5607338 0.6217290
     2 0.5693158 Treatment 0.6555045 0.01653250 92 0.6226695 0.6883395
     method
     1 Emmeans test
     2 Emmeans test
     --> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be significant with pre-test scores being controlled: F(1,92)=8.111, p=0.005 (effect size=0.081). The adjusted marginal mean of post-test scores of the treatment group (0.66, SE=,0.02) was significantly different from that of the control group (0.59, SE=,0.02).


     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     # A tibble: 4 x 5
     group variable n mean sd
     <chr> <fct> <dbl> <dbl> <dbl>
     1 Freshman average_score 11 0.562 0.091
     2 Junior average_score 14 0.594 0.103
     3 Senior average_score 10 0.532 0.113
     4 Sophomore average_score 15 0.573 0.087
     Refer to the boxplot in the 'Plots' panel.


     ==============================
     Results of Testing Assumptions

     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: resid(one_way_anova)
     W = 0.96075, p-value = 0.09553

     ## Interpretation: the assumption of normality by group has been met (p>0.05).

     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.

     -------------------------
     Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 3 0.371 0.7743
     46
     ## Interpretation: the assumption of equality of variances has been met (p>0.05).

     ===================================================================================
     Results of One-way ANOVA: Group Difference(s) (Parametric: Equal variances assumed)
     Df Sum Sq Mean Sq F value Pr(>F)
     factor(group) 3 0.0233 0.007775 0.805 0.497
     Residuals 46 0.4442 0.009657
     ----------------------------------------------
     Pairwide Comparisons (Equal variances assumed)
     Tukey multiple comparisons of means
     95% family-wise confidence level

     Fit: aov(formula = average_score ~ factor(group), data = data_original)

     $`factor(group)`
     diff lwr upr p adj
     Junior-Freshman 0.03223377 -0.07330199 0.13776952 0.8474823
     Senior-Freshman -0.03000909 -0.14445579 0.08443761 0.8969661
     Sophomore-Freshman 0.01069091 -0.09328546 0.11466728 0.9927074
     Senior-Junior -0.06224286 -0.17069336 0.04620765 0.4284339
     Sophomore-Junior -0.02154286 -0.11888016 0.07579445 0.9346245
     Sophomore-Senior 0.04070000 -0.06623364 0.14763364 0.7418160





     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ==================================
     Item Analysis Results - Difficulty
     q.number difficulty_index
     1 Q1 0.86
     2 Q2 0.44
     3 Q3 0.70
     4 Q4 0.36
     5 Q5 0.46
     6 Q6 0.62
     7 Q7 0.60
     8 Q8 0.48
     9 Q9 0.64
     10 Q10 0.58
     11 Q11 0.70
     12 Q12 0.60
     13 Q13 0.36
     14 Q14 0.54
     15 Q15 0.78
     16 Q16 0.72
     17 Q17 0.64
     18 Q18 0.38
     19 Q19 0.58
     20 Q20 0.56
     21 Q21 0.38
     22 Q22 0.52
     23 avg_score 0.57
     Refer to 'Difficulty Plot' in the 'Plots' panel.

     As seen in the difficulty plot, none of the difficulty indixes was found to be lower than 0.2.

     ======================================
     Item Analysis Results - Discrimination
     qnumber discrimination_index
     1 Q1 -0.12
     2 Q2 -0.25
     3 Q3 0.15
     4 Q4 0.08
     5 Q5 -0.16
     6 Q6 0.09
     7 Q7 0.08
     8 Q8 0.19
     9 Q9 0.33
     10 Q10 0.14
     11 Q11 0.30
     12 Q12 0.29
     13 Q13 0.53
     14 Q14 0.16
     15 Q15 0.17
     16 Q16 0.48
     17 Q17 -0.06
     18 Q18 0.57
     19 Q19 0.31
     20 Q20 0.55
     21 Q21 0.30
     22 Q22 0.28
     23 avg_score 1.00
     Refer to 'Discrimination Plot' in the 'Plots' panel

     As seen in the discrimination plot, the following question items present a discrimination index lower than 0.2:
     [1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q10" "Q14" "Q15" "Q17"


     ======================
     --> 0 student(s) from the pre-test data and 1 student(s) from the post-test data have been deleted since they have more than 15% of skipped answers.


     =================================================
     Pre-test/Post-test Scores' Descriptive Statistics
     # A tibble: 2 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 Post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     Refer to the boxplot in the 'Plots' panel to visually examine the descriptive statistics.

     ===================================
     Testing the Assumption of Normality
     # A tibble: 1 x 3
     variable statistic p
     <chr> <dbl> <dbl>
     1 avg_diff 0.948 0.0465
     ## Interpretation: the assumption of normality by group has NOT been met (p<0.05)
     Refer to the histogram and normal q-q plot to check the normality visually

     =====================
     Paired T-Test Results

     Paired t-test

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     t = -4.1859, df = 43, p-value = 0.0001378
     alternative hypothesis: true mean difference is not equal to 0
     95 percent confidence interval:
     -0.12396457 -0.04335361
     sample estimates:
     mean difference
     -0.08365909

     ## Sample Interpretation of the outputs above:
     --> The average pre-test score was 0.57 and the average post-test score was 0.66. The Paired Samples T-Test showed that the pre-post difference was statistically significant (p=0).


     _______________________________________
     ## The Shapiro-Wilk test result shows a violation of normality assumption (p=0.047). Although the t-test is known to be robust to a violation of the normality assumption, you may want to refer to the Wilcoxon signed rank test results below to be safe.
     =====================================
     Wilcoxon Signed Rank Sum Test Results

     Wilcoxon signed rank test with continuity correction

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     V = 142.5, p-value = 0.0003248
     alternative hypothesis: true location shift is not equal to 0

     ## A sample interpretation of the Wilcoxon signed rank test results above:
     --> The Wilcoxon signed rank test results above show that the pre-post difference was statistically significant (p=0).


     ==============================
     The number of students deleted
     n
     pre_data 0
     post_data 1
     post2_data 0
     --> 0 student(s) from the pre-test data, 1 student(s) from the post-test data, and 0 student(s) from the post2-test data have been deleted since they have more than 15% of skipped answers.


     ======================================
     Descriptive Statistics: Average Scores
     # A tibble: 3 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     3 Post2 Score 44 0.409 0.909 0.682 0.137 0.691 0.121 0.018 0.037
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics.


     ==============================
     Results of Testing Assumptions

     -----------
     # Outliers:
     # A tibble: 2 x 5
     Time id Score is.outlier is.extreme
     <fct> <fct> <dbl> <lgl> <lgl>
     1 post 44 0.364 TRUE FALSE
     2 Post2 44 0.409 TRUE FALSE
     ## Interpretation: No extreme outlier was identified in your data.

     ---------------------------
     # Normality:

     Shapiro-Wilk normality test

     data: resid(res.aov)
     W = 0.98561, p-value = 0.1801

     --> Interpretation: the average test score was normally distributed at each time point, as assessed by Shapiro-Wilk test (p>0.05).

     If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size (>50 in this case).

     --> Interpretation: if all the points fall in the plots above approximately along the reference line, you can assume normality.

     -------------
     # Sphericity:
     --> The assumption of sphericity has been checked during the computation of the ANOVA test (the Mauchly's test has been internally run to assess the sphericity assumption). Then, the Greenhouse-Geisser sphericity correction has been automatically applied to factors violating the sphericity of assumption.

     ===============================================================
     Result of the main One-way Repeated Measures ANOVA (Parametric)
     ANOVA Table (type III tests)

     Effect DFn DFd F p p<.05 ges
     1 Time 1.18 50.6 23.302 4.54e-06 * 0.171
     --> Interpretation: The average test score at different time points of the intervention are statistically different: F(1.18 50.6)=23.302, p<0.001, eta2(g)=0.171.


     --------------------
     Pairwise Comparisons
     # A tibble: 3 x 10
     .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
     * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
     1 Score Pre post 44 44 -4.19 43 1.38e-4 4.14e-4 ***
     2 Score Pre Post2 44 44 -5.31 43 3.66e-6 1.10e-5 ****
     3 Score post Post2 44 44 -4.58 43 3.92e-5 1.18e-4 ***
     --> Interpretation for 1: The average pre-test score (0.572) and the average post-test score (0.656) are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.001).
     --> Interpretation for 2: The average post-test score (0.656) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.001).
     --> Interpretation for 3: The average pre-test score (0.572) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.001).
     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]

     ══ Failed tests ════════════════════════════════════════════════════════════════
     ── Error ('test-independent_samples.R:6:3'): the function of independent_samples() works ──
     Error in `t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup,
     mu = 0, alt = "two.sided", conf = 0.95, var.eq = T, paired = F)`: cannot use 'paired' in formula method
     Backtrace:
     ▆
     1. └─DBERlibR::independent_samples(...) at test-independent_samples.R:6:2
     2. ├─stats::t.test(...)
     3. └─stats:::t.test.formula(...)

     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]
     Error: Test failures
     Execution halted
Flavor: r-devel-linux-x86_64-debian-gcc

Version: 0.1.3
Check: examples
Result: ERROR
    Running examples in ‘DBERlibR-Ex.R’ failed
    The error most likely occurred in:

    > ### Name: independent_samples
    > ### Title: Independent Samples Data Analysis
    > ### Aliases: independent_samples
    >
    > ### ** Examples
    >
    > # Run the following codes directly in the console panel. The plots
    > # generated through the link above may be displaced depending on the screen
    > # resolution.
    > independent_samples(treat_csv_data =
    + system.file("extdata", "data_treat_post.csv", package = "DBERlibR"),
    + ctrl_csv_data =
    + system.file("extdata", "data_ctrl_post.csv", package = "DBERlibR"),
    + m_cutoff = 0.15)




    Error in t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup, :
     cannot use 'paired' in formula method
    Calls: independent_samples -> t.test -> t.test.formula
    Execution halted
Flavors: r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc, r-devel-windows-x86_64

Version: 0.1.3
Check: tests
Result: ERROR
     Running ‘testthat.R’ [20s/86s]
    Running the tests in ‘tests/testthat.R’ failed.
    Complete output:
     > # This file is part of the standard setup for testthat.
     > # It is recommended that you do not modify it.
     > #
     > # Where should you do additional test configuration?
     > # Learn more about the roles of various files in:
     > # * https://r-pkgs.org/tests.html
     > # * https://testthat.r-lib.org/reference/test_package.html#special-files
     >
     > library(testthat)
     > library(DBERlibR)
     >
     > test_check("DBERlibR")
     ==============================
     The number of students deleted
     n
     treat_pre_data 0
     treat_post_data 1
     ctrl_pre_data 1
     ctrl_post_data 0
     --> 0 student(s) from the treatment group pre-test data, 1 student(s) from the treatment group post-test data, 1 student(s) from the control group pre-test data, and 0 student(s) from the control group post-test data have been deleted since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     Pre-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.567 0.101 0.318 0.727
     2 Treatment 0.572 0.0982 0.364 0.773
     -------------------------
     Post-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.591 0.108 0.364 0.773
     2 Treatment 0.656 0.112 0.364 0.864
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics


     ===============================
     Results of Checking Assumptions

     # Linearity:
     Refer to the scatter plot to check linearity in the 'Plots' panel. If two lines look almost paralleled, they can be interpreted as meeting the assumption of linearity.
     ## Interpretation: if you are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then you can say this assumption has been met or the data has not violated this assumption of linearity. If your relationships are not linear, you have violated this assumption, and an ANCOVA is not a suitable analysis. However, you might be able to coax your data to have a linear relationship by transforming the covariate, and still be able to run an ANCOVA.
     -------------------------
     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: norm.all.aov$residuals
     W = 0.98453, p-value = 0.3273

     ## Interpretation: the assumption of normality by group has been met (p>0.05).
     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.
     ---------------------------
     # Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 1 0.0084 0.9273
     93
     ## Interpretation: the assumption of equality of error variances has been met (p>0.05).
     ----------------------------------------
     # Homogeneity of Regression Slopes:
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 91 8.071 0.006 * 0.081
     2 avg_score_pre 1 91 1.597 0.210 0.017
     3 datagroup:avg_score_pre 1 91 0.549 0.461 0.006
     ## Interpretation: there was homogeneity of regression slopes as the interaction term (i.e., datagroup:avg_score_pre) was not statistically significant (p>0.05).
     ----------
     # Outliers: No outlier has been found.


     ==================================
     Results of the main One-way ANCOVA
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 92 8.111 0.005 * 0.081
     2 avg_score_pre 1 92 1.605 0.208 0.017
     --------------------------
     # Estimated Marginal Means
     avg_score_pre datagroup emmean se df conf.low conf.high
     1 0.5693158 Control 0.5912314 0.01535562 92 0.5607338 0.6217290
     2 0.5693158 Treatment 0.6555045 0.01653250 92 0.6226695 0.6883395
     method
     1 Emmeans test
     2 Emmeans test
     --> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be significant with pre-test scores being controlled: F(1,92)=8.111, p=0.005 (effect size=0.081). The adjusted marginal mean of post-test scores of the treatment group (0.66, SE=,0.02) was significantly different from that of the control group (0.59, SE=,0.02).


     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     # A tibble: 4 x 5
     group variable n mean sd
     <chr> <fct> <dbl> <dbl> <dbl>
     1 Freshman average_score 11 0.562 0.091
     2 Junior average_score 14 0.594 0.103
     3 Senior average_score 10 0.532 0.113
     4 Sophomore average_score 15 0.573 0.087
     Refer to the boxplot in the 'Plots' panel.


     ==============================
     Results of Testing Assumptions

     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: resid(one_way_anova)
     W = 0.96075, p-value = 0.09553

     ## Interpretation: the assumption of normality by group has been met (p>0.05).

     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.

     -------------------------
     Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 3 0.371 0.7743
     46
     ## Interpretation: the assumption of equality of variances has been met (p>0.05).

     ===================================================================================
     Results of One-way ANOVA: Group Difference(s) (Parametric: Equal variances assumed)
     Df Sum Sq Mean Sq F value Pr(>F)
     factor(group) 3 0.0233 0.007775 0.805 0.497
     Residuals 46 0.4442 0.009657
     ----------------------------------------------
     Pairwide Comparisons (Equal variances assumed)
     Tukey multiple comparisons of means
     95% family-wise confidence level

     Fit: aov(formula = average_score ~ factor(group), data = data_original)

     $`factor(group)`
     diff lwr upr p adj
     Junior-Freshman 0.03223377 -0.07330199 0.13776952 0.8474823
     Senior-Freshman -0.03000909 -0.14445579 0.08443761 0.8969661
     Sophomore-Freshman 0.01069091 -0.09328546 0.11466728 0.9927074
     Senior-Junior -0.06224286 -0.17069336 0.04620765 0.4284339
     Sophomore-Junior -0.02154286 -0.11888016 0.07579445 0.9346245
     Sophomore-Senior 0.04070000 -0.06623364 0.14763364 0.7418160





     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ==================================
     Item Analysis Results - Difficulty
     q.number difficulty_index
     1 Q1 0.86
     2 Q2 0.44
     3 Q3 0.70
     4 Q4 0.36
     5 Q5 0.46
     6 Q6 0.62
     7 Q7 0.60
     8 Q8 0.48
     9 Q9 0.64
     10 Q10 0.58
     11 Q11 0.70
     12 Q12 0.60
     13 Q13 0.36
     14 Q14 0.54
     15 Q15 0.78
     16 Q16 0.72
     17 Q17 0.64
     18 Q18 0.38
     19 Q19 0.58
     20 Q20 0.56
     21 Q21 0.38
     22 Q22 0.52
     23 avg_score 0.57
     Refer to 'Difficulty Plot' in the 'Plots' panel.

     As seen in the difficulty plot, none of the difficulty indixes was found to be lower than 0.2.

     ======================================
     Item Analysis Results - Discrimination
     qnumber discrimination_index
     1 Q1 -0.12
     2 Q2 -0.25
     3 Q3 0.15
     4 Q4 0.08
     5 Q5 -0.16
     6 Q6 0.09
     7 Q7 0.08
     8 Q8 0.19
     9 Q9 0.33
     10 Q10 0.14
     11 Q11 0.30
     12 Q12 0.29
     13 Q13 0.53
     14 Q14 0.16
     15 Q15 0.17
     16 Q16 0.48
     17 Q17 -0.06
     18 Q18 0.57
     19 Q19 0.31
     20 Q20 0.55
     21 Q21 0.30
     22 Q22 0.28
     23 avg_score 1.00
     Refer to 'Discrimination Plot' in the 'Plots' panel

     As seen in the discrimination plot, the following question items present a discrimination index lower than 0.2:
     [1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q10" "Q14" "Q15" "Q17"


     ======================
     --> 0 student(s) from the pre-test data and 1 student(s) from the post-test data have been deleted since they have more than 15% of skipped answers.


     =================================================
     Pre-test/Post-test Scores' Descriptive Statistics
     # A tibble: 2 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 Post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     Refer to the boxplot in the 'Plots' panel to visually examine the descriptive statistics.

     ===================================
     Testing the Assumption of Normality
     # A tibble: 1 x 3
     variable statistic p
     <chr> <dbl> <dbl>
     1 avg_diff 0.948 0.0465
     ## Interpretation: the assumption of normality by group has NOT been met (p<0.05)
     Refer to the histogram and normal q-q plot to check the normality visually

     =====================
     Paired T-Test Results

     Paired t-test

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     t = -4.1859, df = 43, p-value = 0.0001378
     alternative hypothesis: true mean difference is not equal to 0
     95 percent confidence interval:
     -0.12396457 -0.04335361
     sample estimates:
     mean difference
     -0.08365909

     ## Sample Interpretation of the outputs above:
     --> The average pre-test score was 0.57 and the average post-test score was 0.66. The Paired Samples T-Test showed that the pre-post difference was statistically significant (p=0).


     _______________________________________
     ## The Shapiro-Wilk test result shows a violation of normality assumption (p=0.047). Although the t-test is known to be robust to a violation of the normality assumption, you may want to refer to the Wilcoxon signed rank test results below to be safe.
     =====================================
     Wilcoxon Signed Rank Sum Test Results

     Wilcoxon signed rank test with continuity correction

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     V = 142.5, p-value = 0.0003248
     alternative hypothesis: true location shift is not equal to 0

     ## A sample interpretation of the Wilcoxon signed rank test results above:
     --> The Wilcoxon signed rank test results above show that the pre-post difference was statistically significant (p=0).


     ==============================
     The number of students deleted
     n
     pre_data 0
     post_data 1
     post2_data 0
     --> 0 student(s) from the pre-test data, 1 student(s) from the post-test data, and 0 student(s) from the post2-test data have been deleted since they have more than 15% of skipped answers.


     ======================================
     Descriptive Statistics: Average Scores
     # A tibble: 3 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     3 Post2 Score 44 0.409 0.909 0.682 0.137 0.691 0.121 0.018 0.037
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics.


     ==============================
     Results of Testing Assumptions

     -----------
     # Outliers:
     # A tibble: 2 x 5
     Time id Score is.outlier is.extreme
     <fct> <fct> <dbl> <lgl> <lgl>
     1 post 44 0.364 TRUE FALSE
     2 Post2 44 0.409 TRUE FALSE
     ## Interpretation: No extreme outlier was identified in your data.

     ---------------------------
     # Normality:

     Shapiro-Wilk normality test

     data: resid(res.aov)
     W = 0.98561, p-value = 0.1801

     --> Interpretation: the average test score was normally distributed at each time point, as assessed by Shapiro-Wilk test (p>0.05).

     If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size (>50 in this case).

     --> Interpretation: if all the points fall in the plots above approximately along the reference line, you can assume normality.

     -------------
     # Sphericity:
     --> The assumption of sphericity has been checked during the computation of the ANOVA test (the Mauchly's test has been internally run to assess the sphericity assumption). Then, the Greenhouse-Geisser sphericity correction has been automatically applied to factors violating the sphericity of assumption.

     ===============================================================
     Result of the main One-way Repeated Measures ANOVA (Parametric)
     ANOVA Table (type III tests)

     Effect DFn DFd F p p<.05 ges
     1 Time 1.18 50.6 23.302 4.54e-06 * 0.171
     --> Interpretation: The average test score at different time points of the intervention are statistically different: F(1.18 50.6)=23.302, p<0.001, eta2(g)=0.171.


     --------------------
     Pairwise Comparisons
     # A tibble: 3 x 10
     .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
     * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
     1 Score Pre post 44 44 -4.19 43 1.38e-4 4.14e-4 ***
     2 Score Pre Post2 44 44 -5.31 43 3.66e-6 1.10e-5 ****
     3 Score post Post2 44 44 -4.58 43 3.92e-5 1.18e-4 ***
     --> Interpretation for 1: The average pre-test score (0.572) and the average post-test score (0.656) are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.001).
     --> Interpretation for 2: The average post-test score (0.656) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.001).
     --> Interpretation for 3: The average pre-test score (0.572) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.001).
     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]

     ══ Failed tests ════════════════════════════════════════════════════════════════
     ── Error ('test-independent_samples.R:6:3'): the function of independent_samples() works ──
     Error in `t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup,
     mu = 0, alt = "two.sided", conf = 0.95, var.eq = T, paired = F)`: cannot use 'paired' in formula method
     Backtrace:
     ▆
     1. └─DBERlibR::independent_samples(...) at test-independent_samples.R:6:2
     2. ├─stats::t.test(...)
     3. └─stats:::t.test.formula(...)

     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]
     Error: Test failures
     Execution halted
Flavor: r-devel-linux-x86_64-fedora-clang

Version: 0.1.3
Check: re-building of vignette outputs
Result: ERROR
    Error(s) in re-building vignettes:
    --- re-building ‘dberlibr-vignette.Rmd’ using rmarkdown

    Quitting from lines 321-323 [unnamed-chunk-17] (dberlibr-vignette.Rmd)
    Error: processing vignette 'dberlibr-vignette.Rmd' failed with diagnostics:
    cannot use 'paired' in formula method
    --- failed re-building ‘dberlibr-vignette.Rmd’

    SUMMARY: processing the following file failed:
     ‘dberlibr-vignette.Rmd’

    Error: Vignette re-building failed.
    Execution halted
Flavors: r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc, r-devel-windows-x86_64

Version: 0.1.3
Check: tests
Result: ERROR
     Running ‘testthat.R’ [23s/62s]
    Running the tests in ‘tests/testthat.R’ failed.
    Complete output:
     > # This file is part of the standard setup for testthat.
     > # It is recommended that you do not modify it.
     > #
     > # Where should you do additional test configuration?
     > # Learn more about the roles of various files in:
     > # * https://r-pkgs.org/tests.html
     > # * https://testthat.r-lib.org/reference/test_package.html#special-files
     >
     > library(testthat)
     > library(DBERlibR)
     >
     > test_check("DBERlibR")
     ==============================
     The number of students deleted
     n
     treat_pre_data 0
     treat_post_data 1
     ctrl_pre_data 1
     ctrl_post_data 0
     --> 0 student(s) from the treatment group pre-test data, 1 student(s) from the treatment group post-test data, 1 student(s) from the control group pre-test data, and 0 student(s) from the control group post-test data have been deleted since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     Pre-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.567 0.101 0.318 0.727
     2 Treatment 0.572 0.0982 0.364 0.773
     -------------------------
     Post-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.591 0.108 0.364 0.773
     2 Treatment 0.656 0.112 0.364 0.864
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics


     ===============================
     Results of Checking Assumptions

     # Linearity:
     Refer to the scatter plot to check linearity in the 'Plots' panel. If two lines look almost paralleled, they can be interpreted as meeting the assumption of linearity.
     ## Interpretation: if you are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then you can say this assumption has been met or the data has not violated this assumption of linearity. If your relationships are not linear, you have violated this assumption, and an ANCOVA is not a suitable analysis. However, you might be able to coax your data to have a linear relationship by transforming the covariate, and still be able to run an ANCOVA.
     -------------------------
     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: norm.all.aov$residuals
     W = 0.98453, p-value = 0.3273

     ## Interpretation: the assumption of normality by group has been met (p>0.05).
     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.
     ---------------------------
     # Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 1 0.0084 0.9273
     93
     ## Interpretation: the assumption of equality of error variances has been met (p>0.05).
     ----------------------------------------
     # Homogeneity of Regression Slopes:
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 91 8.071 0.006 * 0.081
     2 avg_score_pre 1 91 1.597 0.210 0.017
     3 datagroup:avg_score_pre 1 91 0.549 0.461 0.006
     ## Interpretation: there was homogeneity of regression slopes as the interaction term (i.e., datagroup:avg_score_pre) was not statistically significant (p>0.05).
     ----------
     # Outliers: No outlier has been found.


     ==================================
     Results of the main One-way ANCOVA
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 92 8.111 0.005 * 0.081
     2 avg_score_pre 1 92 1.605 0.208 0.017
     --------------------------
     # Estimated Marginal Means
     avg_score_pre datagroup emmean se df conf.low conf.high
     1 0.5693158 Control 0.5912314 0.01535562 92 0.5607338 0.6217290
     2 0.5693158 Treatment 0.6555045 0.01653250 92 0.6226695 0.6883395
     method
     1 Emmeans test
     2 Emmeans test
     --> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be significant with pre-test scores being controlled: F(1,92)=8.111, p=0.005 (effect size=0.081). The adjusted marginal mean of post-test scores of the treatment group (0.66, SE=,0.02) was significantly different from that of the control group (0.59, SE=,0.02).


     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     # A tibble: 4 x 5
     group variable n mean sd
     <chr> <fct> <dbl> <dbl> <dbl>
     1 Freshman average_score 11 0.562 0.091
     2 Junior average_score 14 0.594 0.103
     3 Senior average_score 10 0.532 0.113
     4 Sophomore average_score 15 0.573 0.087
     Refer to the boxplot in the 'Plots' panel.


     ==============================
     Results of Testing Assumptions

     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: resid(one_way_anova)
     W = 0.96075, p-value = 0.09553

     ## Interpretation: the assumption of normality by group has been met (p>0.05).

     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.

     -------------------------
     Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 3 0.371 0.7743
     46
     ## Interpretation: the assumption of equality of variances has been met (p>0.05).

     ===================================================================================
     Results of One-way ANOVA: Group Difference(s) (Parametric: Equal variances assumed)
     Df Sum Sq Mean Sq F value Pr(>F)
     factor(group) 3 0.0233 0.007775 0.805 0.497
     Residuals 46 0.4442 0.009657
     ----------------------------------------------
     Pairwide Comparisons (Equal variances assumed)
     Tukey multiple comparisons of means
     95% family-wise confidence level

     Fit: aov(formula = average_score ~ factor(group), data = data_original)

     $`factor(group)`
     diff lwr upr p adj
     Junior-Freshman 0.03223377 -0.07330199 0.13776952 0.8474823
     Senior-Freshman -0.03000909 -0.14445579 0.08443761 0.8969661
     Sophomore-Freshman 0.01069091 -0.09328546 0.11466728 0.9927074
     Senior-Junior -0.06224286 -0.17069336 0.04620765 0.4284339
     Sophomore-Junior -0.02154286 -0.11888016 0.07579445 0.9346245
     Sophomore-Senior 0.04070000 -0.06623364 0.14763364 0.7418160





     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ==================================
     Item Analysis Results - Difficulty
     q.number difficulty_index
     1 Q1 0.86
     2 Q2 0.44
     3 Q3 0.70
     4 Q4 0.36
     5 Q5 0.46
     6 Q6 0.62
     7 Q7 0.60
     8 Q8 0.48
     9 Q9 0.64
     10 Q10 0.58
     11 Q11 0.70
     12 Q12 0.60
     13 Q13 0.36
     14 Q14 0.54
     15 Q15 0.78
     16 Q16 0.72
     17 Q17 0.64
     18 Q18 0.38
     19 Q19 0.58
     20 Q20 0.56
     21 Q21 0.38
     22 Q22 0.52
     23 avg_score 0.57
     Refer to 'Difficulty Plot' in the 'Plots' panel.

     As seen in the difficulty plot, none of the difficulty indixes was found to be lower than 0.2.

     ======================================
     Item Analysis Results - Discrimination
     qnumber discrimination_index
     1 Q1 -0.12
     2 Q2 -0.25
     3 Q3 0.15
     4 Q4 0.08
     5 Q5 -0.16
     6 Q6 0.09
     7 Q7 0.08
     8 Q8 0.19
     9 Q9 0.33
     10 Q10 0.14
     11 Q11 0.30
     12 Q12 0.29
     13 Q13 0.53
     14 Q14 0.16
     15 Q15 0.17
     16 Q16 0.48
     17 Q17 -0.06
     18 Q18 0.57
     19 Q19 0.31
     20 Q20 0.55
     21 Q21 0.30
     22 Q22 0.28
     23 avg_score 1.00
     Refer to 'Discrimination Plot' in the 'Plots' panel

     As seen in the discrimination plot, the following question items present a discrimination index lower than 0.2:
     [1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q10" "Q14" "Q15" "Q17"


     ======================
     --> 0 student(s) from the pre-test data and 1 student(s) from the post-test data have been deleted since they have more than 15% of skipped answers.


     =================================================
     Pre-test/Post-test Scores' Descriptive Statistics
     # A tibble: 2 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 Post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     Refer to the boxplot in the 'Plots' panel to visually examine the descriptive statistics.

     ===================================
     Testing the Assumption of Normality
     # A tibble: 1 x 3
     variable statistic p
     <chr> <dbl> <dbl>
     1 avg_diff 0.948 0.0465
     ## Interpretation: the assumption of normality by group has NOT been met (p<0.05)
     Refer to the histogram and normal q-q plot to check the normality visually

     =====================
     Paired T-Test Results

     Paired t-test

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     t = -4.1859, df = 43, p-value = 0.0001378
     alternative hypothesis: true mean difference is not equal to 0
     95 percent confidence interval:
     -0.12396457 -0.04335361
     sample estimates:
     mean difference
     -0.08365909

     ## Sample Interpretation of the outputs above:
     --> The average pre-test score was 0.57 and the average post-test score was 0.66. The Paired Samples T-Test showed that the pre-post difference was statistically significant (p=0).


     _______________________________________
     ## The Shapiro-Wilk test result shows a violation of normality assumption (p=0.047). Although the t-test is known to be robust to a violation of the normality assumption, you may want to refer to the Wilcoxon signed rank test results below to be safe.
     =====================================
     Wilcoxon Signed Rank Sum Test Results

     Wilcoxon signed rank test with continuity correction

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     V = 142.5, p-value = 0.0003248
     alternative hypothesis: true location shift is not equal to 0

     ## A sample interpretation of the Wilcoxon signed rank test results above:
     --> The Wilcoxon signed rank test results above show that the pre-post difference was statistically significant (p=0).


     ==============================
     The number of students deleted
     n
     pre_data 0
     post_data 1
     post2_data 0
     --> 0 student(s) from the pre-test data, 1 student(s) from the post-test data, and 0 student(s) from the post2-test data have been deleted since they have more than 15% of skipped answers.


     ======================================
     Descriptive Statistics: Average Scores
     # A tibble: 3 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     3 Post2 Score 44 0.409 0.909 0.682 0.137 0.691 0.121 0.018 0.037
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics.


     ==============================
     Results of Testing Assumptions

     -----------
     # Outliers:
     # A tibble: 2 x 5
     Time id Score is.outlier is.extreme
     <fct> <fct> <dbl> <lgl> <lgl>
     1 post 44 0.364 TRUE FALSE
     2 Post2 44 0.409 TRUE FALSE
     ## Interpretation: No extreme outlier was identified in your data.

     ---------------------------
     # Normality:

     Shapiro-Wilk normality test

     data: resid(res.aov)
     W = 0.98561, p-value = 0.1801

     --> Interpretation: the average test score was normally distributed at each time point, as assessed by Shapiro-Wilk test (p>0.05).

     If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size (>50 in this case).

     --> Interpretation: if all the points fall in the plots above approximately along the reference line, you can assume normality.

     -------------
     # Sphericity:
     --> The assumption of sphericity has been checked during the computation of the ANOVA test (the Mauchly's test has been internally run to assess the sphericity assumption). Then, the Greenhouse-Geisser sphericity correction has been automatically applied to factors violating the sphericity of assumption.

     ===============================================================
     Result of the main One-way Repeated Measures ANOVA (Parametric)
     ANOVA Table (type III tests)

     Effect DFn DFd F p p<.05 ges
     1 Time 1.18 50.6 23.302 4.54e-06 * 0.171
     --> Interpretation: The average test score at different time points of the intervention are statistically different: F(1.18 50.6)=23.302, p<0.001, eta2(g)=0.171.


     --------------------
     Pairwise Comparisons
     # A tibble: 3 x 10
     .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
     * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
     1 Score Pre post 44 44 -4.19 43 1.38e-4 4.14e-4 ***
     2 Score Pre Post2 44 44 -5.31 43 3.66e-6 1.10e-5 ****
     3 Score post Post2 44 44 -4.58 43 3.92e-5 1.18e-4 ***
     --> Interpretation for 1: The average pre-test score (0.572) and the average post-test score (0.656) are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.001).
     --> Interpretation for 2: The average post-test score (0.656) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.001).
     --> Interpretation for 3: The average pre-test score (0.572) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.001).
     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]

     ══ Failed tests ════════════════════════════════════════════════════════════════
     ── Error ('test-independent_samples.R:6:3'): the function of independent_samples() works ──
     Error in `t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup,
     mu = 0, alt = "two.sided", conf = 0.95, var.eq = T, paired = F)`: cannot use 'paired' in formula method
     Backtrace:
     ▆
     1. └─DBERlibR::independent_samples(...) at test-independent_samples.R:6:2
     2. ├─stats::t.test(...)
     3. └─stats:::t.test.formula(...)

     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]
     Error: Test failures
     Execution halted
Flavor: r-devel-linux-x86_64-fedora-gcc

Version: 0.1.3
Check: tests
Result: ERROR
     Running 'testthat.R' [14s]
    Running the tests in 'tests/testthat.R' failed.
    Complete output:
     > # This file is part of the standard setup for testthat.
     > # It is recommended that you do not modify it.
     > #
     > # Where should you do additional test configuration?
     > # Learn more about the roles of various files in:
     > # * https://r-pkgs.org/tests.html
     > # * https://testthat.r-lib.org/reference/test_package.html#special-files
     >
     > library(testthat)
     > library(DBERlibR)
     >
     > test_check("DBERlibR")
     ==============================
     The number of students deleted
     n
     treat_pre_data 0
     treat_post_data 1
     ctrl_pre_data 1
     ctrl_post_data 0
     --> 0 student(s) from the treatment group pre-test data, 1 student(s) from the treatment group post-test data, 1 student(s) from the control group pre-test data, and 0 student(s) from the control group post-test data have been deleted since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     Pre-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.567 0.101 0.318 0.727
     2 Treatment 0.572 0.0982 0.364 0.773
     -------------------------
     Post-test scores by group
     # A tibble: 2 x 5
     datagroup mean sd min max
     <fct> <dbl> <dbl> <dbl> <dbl>
     1 Control 0.591 0.108 0.364 0.773
     2 Treatment 0.656 0.112 0.364 0.864
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics


     ===============================
     Results of Checking Assumptions

     # Linearity:
     Refer to the scatter plot to check linearity in the 'Plots' panel. If two lines look almost paralleled, they can be interpreted as meeting the assumption of linearity.
     ## Interpretation: if you are seeing a liner relationship between the covariate (i.e., pre-test scores for this analysis) and dependent variable (i.e., post-test scores for this analysis) for both treatment and control group in the plot, then you can say this assumption has been met or the data has not violated this assumption of linearity. If your relationships are not linear, you have violated this assumption, and an ANCOVA is not a suitable analysis. However, you might be able to coax your data to have a linear relationship by transforming the covariate, and still be able to run an ANCOVA.
     -------------------------
     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: norm.all.aov$residuals
     W = 0.98453, p-value = 0.3273

     ## Interpretation: the assumption of normality by group has been met (p>0.05).
     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.
     ---------------------------
     # Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 1 0.0084 0.9273
     93
     ## Interpretation: the assumption of equality of error variances has been met (p>0.05).
     ----------------------------------------
     # Homogeneity of Regression Slopes:
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 91 8.071 0.006 * 0.081
     2 avg_score_pre 1 91 1.597 0.210 0.017
     3 datagroup:avg_score_pre 1 91 0.549 0.461 0.006
     ## Interpretation: there was homogeneity of regression slopes as the interaction term (i.e., datagroup:avg_score_pre) was not statistically significant (p>0.05).
     ----------
     # Outliers: No outlier has been found.


     ==================================
     Results of the main One-way ANCOVA
     ANOVA Table (type II tests)

     Effect DFn DFd F p p<.05 ges
     1 datagroup 1 92 8.111 0.005 * 0.081
     2 avg_score_pre 1 92 1.605 0.208 0.017
     --------------------------
     # Estimated Marginal Means
     avg_score_pre datagroup emmean se df conf.low conf.high
     1 0.5693158 Control 0.5912314 0.01535562 92 0.5607338 0.6217290
     2 0.5693158 Treatment 0.6555045 0.01653250 92 0.6226695 0.6883395
     method
     1 Emmeans test
     2 Emmeans test
     --> A sample summary of the outputs/results above: The difference of post-test scores between the treatment and control groups turned out to be significant with pre-test scores being controlled: F(1,92)=8.111, p=0.005 (effect size=0.081). The adjusted marginal mean of post-test scores of the treatment group (0.66, SE=,0.02) was significantly different from that of the control group (0.59, SE=,0.02).


     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ======================
     Descriptive Statistics
     # A tibble: 4 x 5
     group variable n mean sd
     <chr> <fct> <dbl> <dbl> <dbl>
     1 Freshman average_score 11 0.562 0.091
     2 Junior average_score 14 0.594 0.103
     3 Senior average_score 10 0.532 0.113
     4 Sophomore average_score 15 0.573 0.087
     Refer to the boxplot in the 'Plots' panel.


     ==============================
     Results of Testing Assumptions

     # Normality of Residuals:

     Shapiro-Wilk normality test

     data: resid(one_way_anova)
     W = 0.96075, p-value = 0.09553

     ## Interpretation: the assumption of normality by group has been met (p>0.05).

     Refer to the histogram and the normal Q-Q plot in the 'Plots' panel to visually inspect the normality of residuals.

     -------------------------
     Homogeneity of Variances:
     Levene's Test for Homogeneity of Variance (center = median)
     Df F value Pr(>F)
     group 3 0.371 0.7743
     46
     ## Interpretation: the assumption of equality of variances has been met (p>0.05).

     ===================================================================================
     Results of One-way ANOVA: Group Difference(s) (Parametric: Equal variances assumed)
     Df Sum Sq Mean Sq F value Pr(>F)
     factor(group) 3 0.0233 0.007775 0.805 0.497
     Residuals 46 0.4442 0.009657
     ----------------------------------------------
     Pairwide Comparisons (Equal variances assumed)
     Tukey multiple comparisons of means
     95% family-wise confidence level

     Fit: aov(formula = average_score ~ factor(group), data = data_original)

     $`factor(group)`
     diff lwr upr p adj
     Junior-Freshman 0.03223377 -0.07330199 0.13776952 0.8474823
     Senior-Freshman -0.03000909 -0.14445579 0.08443761 0.8969661
     Sophomore-Freshman 0.01069091 -0.09328546 0.11466728 0.9927074
     Senior-Junior -0.06224286 -0.17069336 0.04620765 0.4284339
     Sophomore-Junior -0.02154286 -0.11888016 0.07579445 0.9346245
     Sophomore-Senior 0.04070000 -0.06623364 0.14763364 0.7418160





     ==============================
     The number of students deleted: 0 student(s) has(have) been deleted from the data since they have more than 15% of skipped answers.


     ==================================
     Item Analysis Results - Difficulty
     q.number difficulty_index
     1 Q1 0.86
     2 Q2 0.44
     3 Q3 0.70
     4 Q4 0.36
     5 Q5 0.46
     6 Q6 0.62
     7 Q7 0.60
     8 Q8 0.48
     9 Q9 0.64
     10 Q10 0.58
     11 Q11 0.70
     12 Q12 0.60
     13 Q13 0.36
     14 Q14 0.54
     15 Q15 0.78
     16 Q16 0.72
     17 Q17 0.64
     18 Q18 0.38
     19 Q19 0.58
     20 Q20 0.56
     21 Q21 0.38
     22 Q22 0.52
     23 avg_score 0.57
     Refer to 'Difficulty Plot' in the 'Plots' panel.

     As seen in the difficulty plot, none of the difficulty indixes was found to be lower than 0.2.

     ======================================
     Item Analysis Results - Discrimination
     qnumber discrimination_index
     1 Q1 -0.12
     2 Q2 -0.25
     3 Q3 0.15
     4 Q4 0.08
     5 Q5 -0.16
     6 Q6 0.09
     7 Q7 0.08
     8 Q8 0.19
     9 Q9 0.33
     10 Q10 0.14
     11 Q11 0.30
     12 Q12 0.29
     13 Q13 0.53
     14 Q14 0.16
     15 Q15 0.17
     16 Q16 0.48
     17 Q17 -0.06
     18 Q18 0.57
     19 Q19 0.31
     20 Q20 0.55
     21 Q21 0.30
     22 Q22 0.28
     23 avg_score 1.00
     Refer to 'Discrimination Plot' in the 'Plots' panel

     As seen in the discrimination plot, the following question items present a discrimination index lower than 0.2:
     [1] "Q1" "Q2" "Q3" "Q4" "Q5" "Q6" "Q7" "Q8" "Q10" "Q14" "Q15" "Q17"


     ======================
     --> 0 student(s) from the pre-test data and 1 student(s) from the post-test data have been deleted since they have more than 15% of skipped answers.


     =================================================
     Pre-test/Post-test Scores' Descriptive Statistics
     # A tibble: 2 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 Post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     Refer to the boxplot in the 'Plots' panel to visually examine the descriptive statistics.

     ===================================
     Testing the Assumption of Normality
     # A tibble: 1 x 3
     variable statistic p
     <chr> <dbl> <dbl>
     1 avg_diff 0.948 0.0465
     ## Interpretation: the assumption of normality by group has NOT been met (p<0.05)
     Refer to the histogram and normal q-q plot to check the normality visually

     =====================
     Paired T-Test Results

     Paired t-test

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     t = -4.1859, df = 43, p-value = 0.0001378
     alternative hypothesis: true mean difference is not equal to 0
     95 percent confidence interval:
     -0.12396457 -0.04335361
     sample estimates:
     mean difference
     -0.08365909

     ## Sample Interpretation of the outputs above:
     --> The average pre-test score was 0.57 and the average post-test score was 0.66. The Paired Samples T-Test showed that the pre-post difference was statistically significant (p=0).


     _______________________________________
     ## The Shapiro-Wilk test result shows a violation of normality assumption (p=0.047). Although the t-test is known to be robust to a violation of the normality assumption, you may want to refer to the Wilcoxon signed rank test results below to be safe.
     =====================================
     Wilcoxon Signed Rank Sum Test Results

     Wilcoxon signed rank test with continuity correction

     data: treat_data_merged$avg_score_pre and treat_data_merged$avg_score_post
     V = 142.5, p-value = 0.0003248
     alternative hypothesis: true location shift is not equal to 0

     ## A sample interpretation of the Wilcoxon signed rank test results above:
     --> The Wilcoxon signed rank test results above show that the pre-post difference was statistically significant (p=0).


     ==============================
     The number of students deleted
     n
     pre_data 0
     post_data 1
     post2_data 0
     --> 0 student(s) from the pre-test data, 1 student(s) from the post-test data, and 0 student(s) from the post2-test data have been deleted since they have more than 15% of skipped answers.


     ======================================
     Descriptive Statistics: Average Scores
     # A tibble: 3 x 11
     Time variable n min max median iqr mean sd se ci
     <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 Pre Score 44 0.364 0.773 0.568 0.136 0.572 0.098 0.015 0.03
     2 post Score 44 0.364 0.864 0.636 0.136 0.656 0.112 0.017 0.034
     3 Post2 Score 44 0.409 0.909 0.682 0.137 0.691 0.121 0.018 0.037
     Refer to the boxplots in the 'Plots' panel to visually inspect the descriptive statistics.


     ==============================
     Results of Testing Assumptions

     -----------
     # Outliers:
     # A tibble: 2 x 5
     Time id Score is.outlier is.extreme
     <fct> <fct> <dbl> <lgl> <lgl>
     1 post 44 0.364 TRUE FALSE
     2 Post2 44 0.409 TRUE FALSE
     ## Interpretation: No extreme outlier was identified in your data.

     ---------------------------
     # Normality:

     Shapiro-Wilk normality test

     data: resid(res.aov)
     W = 0.98561, p-value = 0.1801

     --> Interpretation: the average test score was normally distributed at each time point, as assessed by Shapiro-Wilk test (p>0.05).

     If the sample size is greater than 50, it would be better refer to the normal Q-Q plot displayed in the 'Plots' panel to visually inspect the normality. This is because the Shapiro-Wilk test becomes very sensitive to a minor deviation from normality at a larger sample size (>50 in this case).

     --> Interpretation: if all the points fall in the plots above approximately along the reference line, you can assume normality.

     -------------
     # Sphericity:
     --> The assumption of sphericity has been checked during the computation of the ANOVA test (the Mauchly's test has been internally run to assess the sphericity assumption). Then, the Greenhouse-Geisser sphericity correction has been automatically applied to factors violating the sphericity of assumption.

     ===============================================================
     Result of the main One-way Repeated Measures ANOVA (Parametric)
     ANOVA Table (type III tests)

     Effect DFn DFd F p p<.05 ges
     1 Time 1.18 50.6 23.302 4.54e-06 * 0.171
     --> Interpretation: The average test score at different time points of the intervention are statistically different: F(1.18 50.6)=23.302, p<0.001, eta2(g)=0.171.


     --------------------
     Pairwise Comparisons
     # A tibble: 3 x 10
     .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
     * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
     1 Score Pre post 44 44 -4.19 43 1.38e-4 4.14e-4 ***
     2 Score Pre Post2 44 44 -5.31 43 3.66e-6 1.10e-5 ****
     3 Score post Post2 44 44 -4.58 43 3.92e-5 1.18e-4 ***
     --> Interpretation for 1: The average pre-test score (0.572) and the average post-test score (0.656) are significantly different. The average post-test score is significantly greater than the average pre-test score (p.adj<0.001).
     --> Interpretation for 2: The average post-test score (0.656) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average post-test score (p.adj<0.001).
     --> Interpretation for 3: The average pre-test score (0.572) and the average post2-test score (0.691) are significantly different. The average post2-test score is significantly greater than the average pre-test score (p.adj<0.001).
     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]

     ══ Failed tests ════════════════════════════════════════════════════════════════
     ── Error ('test-independent_samples.R:6:3'): the function of independent_samples() works ──
     Error in `t.test.formula(group_data_binded$avg_score_post ~ group_data_binded$datagroup,
     mu = 0, alt = "two.sided", conf = 0.95, var.eq = T, paired = F)`: cannot use 'paired' in formula method
     Backtrace:
     ▆
     1. └─DBERlibR::independent_samples(...) at test-independent_samples.R:6:2
     2. ├─stats::t.test(...)
     3. └─stats:::t.test.formula(...)

     [ FAIL 1 | WARN 1 | SKIP 0 | PASS 41 ]
     Error: Test failures
     Execution halted
Flavor: r-devel-windows-x86_64