Data Management in Clinical Research: an Overview
Clinical Data Management in clinical research is an important stage that generates high-quality, reliable, and statistically sound data from clinical trials. It reduces the time it takes from drug development to marketing. Throughout a trial, the quality of various CDM procedures, such as Case Report Form (CRF) design, CRF annotation, database design, data entry, data validation, discrepancy management, medical coding, data extraction, and database locking, are analyzed at regular intervals.
Today, there is a greater need to improve CDM standards to meet regulatory requirements and stay ahead of the competition through faster product commercialization. This article explains the processes and gives the reader an overview of the tools and processes used in clinical research and clinical data management.
What is Clinical Data Management in Clinical Research?
The process of collecting, cleaning, and managing subject data according to regulatory standards is known as CDM. The main goal of CDM processes is to provide high-quality data by minimizing errors while gathering as much data as possible for analysis. Best practices are implemented using software applications that keep an audit trail and easily identify and resolve data discrepancies. CDM has handled large trials and ensured data quality even in complex trials.
How do we define ‘high-quality’ data?
High-quality data should be completely accurate and suitable for statistical analysis. These must meet the protocol’s parameters and requirements. In case of a deviation like failing to meet protocol specifications, the patient’s name must be removed from the final database.
Again, high-quality data should have minimal or no misses. Most importantly, high-quality data should have an arbitrarily ‘acceptable level of variation’ that does not affect the study’s statistical analysis conclusion. The data should also meet the applicable regulatory data quality requirements.
Tools for Clinical Data Management in Clinical Research
Many software tools for clinical trial data management are available, and these are called Clinical Data Management Systems (CDMS). A CDMS has become important in multicentric trials to handle massive amounts of data. Most CDMS used in pharmaceutical companies are commercial. ORACLE CLINICAL, CLINTRIAL, MACRO, RAVE, and eClinical Suite are common CDM tools.
These software tools are nearly similar in terms of functionality, and neither system outranks the other. These software tools are costly and need sophisticated information technology infrastructure to function. Besides, some multinational pharmaceutical giants use custom-made CDMS tools to meet their operational requirements and procedures. The most well-known open source tools are OpenClinica, openCDMS, TrialDB, and PhOSCo. These CDM software packages are as good as their commercial counterparts and are free to download from their respective websites.
Process: Clinical Data Management in Clinical Research
The CDM process is planned like a clinical trial with the end result in mind. To achieve this goal, the CDM process starts early, even before the study protocol is finalized.
Review and finalization of study documents | clinical data management in clinical research
The protocol is reviewed from the database design perspective for consistency and clarity. CDM personnel will identify the data items to be collected and the frequency with which they will be collected during this review. As the first step, the CDM team creates a Case Report Form (CRF), translating protocol-specific activities into data. The data fields should be clarified and consistent throughout. It should be evident from the CRF what kind of data needs to be entered.
Database designing | clinical data management in clinical research
Databases are clinical software applications designed to make it easier for CDM to carry out multiple studies. In general, these tools are easy to use and have built-in compliance with regulatory requirements. To ensure data security, “system validation” is performed, during which system specifications, user requirements, and regulatory compliance are evaluated before implementation. The database defines study details such as objectives, intervals, visits, investigators, sites, and patients, and CRF layouts are designed for data entry. Before moving on to the real data capture, these entry screens are tested with dummy data.
Data collection | clinical data management in clinical research
The CRF, available in both paper and electronic formats, is used to collect data. The traditional method uses paper CRFs to collect data responses, which are then translated to the database via in-house data entry. The investigator completes these paper CRFs according to the completion guidelines. In the e-CRF-based CDM, the investigator or a designee will log into the CDM system and enter data directly at the site. The e-CRF method reduces the chances of errors and speeds up the resolution of discrepancies. Many pharmaceutical companies are opting for e-CRF options to reduce the time required for drug development processes (also called remote data entry).
CRF tracking | clinical data management in clinical research
The Clinical Research Associate (CRA) will check the CRF entries for completeness before retrieving and handing over the CRFs to the CDM team. The CDM team tracks the retrieved CRFs and keeps a record of them. CRFs are manually checked for missing pages and illegible data to ensure that the data is not lost. In case of missing or illegible data, the investigator is contacted, and the problem is resolved.
Data entry | clinical data management in clinical research
Data entry is carried out according to the guidelines created along with the DMP. It is only applicable to paper CRFs retrieved from the sites. Double data entry is typically used, in which two operators enter the data separately. The second pass entry (entry made by a different person) helps verify and reconcile by identifying transcription errors and discrepancies caused by illegible data. Besides, double data entry results in a cleaner database than single data entry. Research has shown that double data entry ensures better consistency with paper CRF, as evidenced by a lower error rate.
Data validation | clinical data management in clinical research
In data validation, data is tested for validity according to protocol specifications. Edit check programs are written to identify discrepancies in entered data. These programs are written according to the DVP’s logic condition. Initially, these edit check programs are tested with dummy data that contains errors. Discrepancy means when a data point fails to pass a validation check. Inconsistent data, missing data, range checks, and protocol deviations can all cause discrepancies.
Discrepancy management | clinical data management in clinical research
Also called query resolution, discrepancy management includes reviewing discrepancies, investigating the cause, and either resolving them with documentary evidence or declaring them irresolvable. Discrepancy management helps in data cleaning by gathering sufficient evidence for data deviations. Almost all CDMS has a discrepancy database where all discrepancies are recorded and stored along with an audit trail.
The most important activity in the CDM process is discrepancy management. Therefore, it demands the utmost attention because handling discrepancies is critical in data cleaning.
Medical coding | clinical data management in clinical research
Medical coding helps identify and classify medical terminologies related to clinical trials. Technically, this activity needs an understanding of medical terminology, disease entities, drugs used, and a basic idea of the pathological processes involved. Also, it’s important to understand the structure of electronic medical dictionaries and the hierarchy of classifications available in them.
Medical coding helps classify reported medical terms on the CRF to standard dictionary terms, resulting in data consistency and preventing unnecessary duplication. For example, investigators may use different terms for the same adverse event, but it’s important to code them all to a single standard code and maintain consistency throughout the process.
Database locking | clinical data management in clinical research
The final data validation is done after a thorough quality check and assurance. If no discrepancies are found, the SAS datasets are finalized in consultation with the statistician. Before database lock, all data management activities should have been completed. A pre-lock checklist is used to ensure this and all activities are completed. This is done because the database cannot be changed in any way after it has been locked. Once all stakeholders have approved for locking, the database is locked, and clean data is extracted for statistical analysis.
However, privileged users can modify data even after the database has been locked because of a critical issue or for other important operational reasons. For this to be possible, proper documentation and an audit trail must be maintained with adequate justification for updating the locked database. After locking, data is extracted from the final database and archived.