Data Preparation
A clean, suitably structured, and well-documented data set is critical for efficient and accurate statistical analysis. Most commonly, data is imported into statistical analysis programs as a comma delimited text or an Excel file. For easy and accurate importation of data into statistical software, it is essential that the data adhere to a regular structure with consistent entries.
The University of California developed a free online tool for creating a data management plan that is comprehensive and easy to understand.
REDCap, or Research Electronic Data Capture, can greatly simplify data collection and minimize costly and time-consuming data clean-up activities. REDCap is a secure web-based application for building and managing online databases for research and is supported by the CTSC Biomedical Informatics team.
Regardless of the software used to record data, developing and adhering to a data management plan will facilitate importation of the data into statistical software. Every data set must include a data dictionary that describes each variable and identifies acceptable values. An example of a codebook can be found here. Additional information on data dictionaries is available on the data management tool plan website.
Additional tips for data management are available in the PDF document, “Guidance for Database Developers for Efficient Import to Statistical Software.”
Recommendations for organizing spreadsheet data to reduce errors and facilitate statistical analyses are available in the PDF documents: “Data Organization in Spreadsheets” and “Biostatistics Center Guidelines for Excel and Access”