A clean, suitably-structured, and well-documented data set is critical for efficient and accurate statistical analysis. Most commonly, data is imported into statistical analysis programs as a comma delimited text file. For easy and accurate importation of data into statistical software, it is essential that the data adhere to a regular structure with consistent entries.
While it is not required, using REDCap (Research Electronic Data Capture) can greatly simplify data collection and minimize costly and time-consuming data clean-up activities. REDCap is a secure web-based application for building and managing online databases for research and is supported by the CTSC Biomedical Informatics team.
Regardless of the software used to record data, to facilitate importation into statistical software, the data must have a consistent format.
Guidance on how to format data is provided in the following resources:
- Guidance for Database Developers for Efficient Import to Statistical Software (PDF)
- Data Organization in Spreadsheets
- Biostats4You Data: Data Collection and Management
In addition, every data set must include a data dictionary that describes each variable and identifies acceptable values. Additional information on data dictionaries is available on the UC Davis REDCap website.
- Real Statistics Using Excel
This site provides detailed information about performing common statistical tests and procedures in Excel including t-tests, ANOVA, repeated measure ANOVA, Correlation, Simple and Multiple Linear Regression, calculating confidence intervals and other descriptive statistics. There is a free resource pack and example workbooks available to download.
JASP is a free, stand-alone, statistical software package. The program provides a GUI Windows interface for R Statistical Computing Software. The program supports Frequentist and Bayesian statistical analysis. Supported procedures include common methods such as summary statistics, correlations, two-sample tests, linear and logistic regression, and more advanced methods including mixed-effect models structural equation models, meta-analysis, principal components, and factor analysis.
- Interactive Statistical Calculation Pages
This resource includes a comprehensive list of sites for many statistical analyses, including power and sample size calculations. The website has a page listing websites for interactive analyses (“Interactive Stats”) and for free software (“Free Software”) packages that can be downloaded and run on your local computer. This website also has links to many technical resources on statistics, including general introductory material.
Note: SPSS, SAS, and JMP can be obtained at a reasonable cost through UC Davis Information and Educational Technology.
A sufficient sample size is necessary for every study to ensure adequate power for detecting clinically meaningful differences. Excellent resources explaining the concepts of power and sample size and what they depend on are available at Biostats4You: Power and Sample Size Concepts.
These guides provide illustrated examples of how to estimate sample size requirements for the indicated statistical procedure using freely available software.
Sample Size Calculators
- Southwest Oncology Group Statistical Center Power and Sample Size CalculatorsThis resource provides online sample size/power calculators for one and two sample tests of means and proportions as well as for simple survival analyses.
- G*Power: Statistical Power Analyses for Windows and Mac
This site provides freely downloadable software that is easy to use and includes a detailed and helpful user manual. A wide range of statistical procedures are supported including common mean and proportion tests as well as multiple linear regression, logistic regression and Poisson regression.
- UCSF Sample Size Calculators for Designing Clinical Trials
This site provides sample size calculators for the following settings: one group, two independent groups, and paired group designs; tests for means, proportions, and correlation; clustered data; confidence intervals; survival analysis, likelihood ratio (diagnostic test accuracy), posterior probability of disease, and pediatric growth.
- Sealed Envelope
This website provides online tools to estimate the sample size needed for the following clinical trial settings, specifically superiority, equivalence, and non-inferiority trials for binary or continuous outcomes.
- Biostats4You: Statistical resources for non-statisticians
Biostats4You caters specifically to medical and public health researchers and professionals who wish to learn more about biostatistics. The site contains carefully selected and reviewed educational materials especially suited for a non-statistician audience.
- UCLA’s Institute for Digital Research and Education
This site provides a wealth of information on conducting statistical analyses using SAS, R, SPSS, Stata, and Mplus. The content includes examples of different types of analyses by explaining a motivating data set, providing code to analyze the data in one of the statistical packages, and reviewing and interpreting the output.
- The Little Handbook of Statistical Practice
This online handbook includes relevant overviews of common statistical analyses. Applied examples are given and interesting discussions of various topics relevant to applied data analysis are provided.
- Columbia University Biostatistics for Clinical Researchers
This site hosts an online video seminar series from Columbia University Irving Institute for Clinical and Translational Research covering a broad range of statistical topics.
- Medical College of Wisconsin
This YouTube seminar series covers a variety of topics including, but not limited to, longitudinal analysis, survival analysis, propensity scores, Bayesian statistics, linear regression, sample size calculations, ANOVA, multiple comparisons, logistic regression.