![]() Information on how data was observed, collected and recorded Most recently, DQC proposed a data quality CDM 14 that builds on common elements of several prior data-quality frameworks.ĭATA QUALITY DOCUMENTATION RECOMMENDATIONS ![]() The DQC also advocates for publishing data quality metrics together with any observational data analysis, but points out that doing so can have unintended consequences, such as withdrawal of consortium data partners due to exposure of traditionally internal-only data quality indicators. See Table 1 for areas descriptions and example recommendations. Recently, the Data Quality Collaborative (DQC) published a 20-item list of data quality recommendations 13 that cover the areas of (1) data capture documentation, (2) data processing and provenance documentation, (3) data elements profiling, and (4) analysis-specific data quality documentation. 12 Similarly, the Mini-Sentinel network 7 has defined a series of quality checks. For example, Health Care Systems Research Network defined quality checks for data in their Virtual Data Warehouse. Data quality is often addressed within established research networks, but the full methodology and the actual data quality evaluation scripts may be available only to researchers participating in the network. 11 defined several data quality concepts and emphasized the ability to identify the origin of any data cell within the final analysis data set. 10 proposed a set of metadata to document data about data in a Primary Care Data Quality initiative in the domain of kidney disease patients. Wider adoption of CDMs 8, 9 also facilitates development of data quality tools that can be easily applied across multiple data sets.ĭata quality has been a subject of several past studies. In some cases, adherence to a CDM is a prerequisite for participating on a grant (or research network). 5, 6 In recent years, the biomedical informatics community has increasingly adopted common data models (CDMs) shared across many organizations, 7 because they allow the same analytical code to be executed on multiple distributed data sets. In some cases source data errors may affect a large number of patients (e.g., missing coding or loss of data), 4 and it can be difficult or impossible to distinguish ETL errors from source data errors. Some source data errors may be typos, and those typically do not follow a consistent pattern. ![]() Finally a third type of error is source data error, which occurs when the error is already present in the source data due to various causes, such as a human typo created during data entry or an incorrect default value assignment (e.g., birth year of 1900 assigned to patients with missing birth year data). A special type of an ETL data error is a mapping error that results from incorrect transformation of data from the source terminology (e.g., Korean national drug terminology) into the target data model’s standard terminology for a given domain (e.g., RxNorm ingredient terms or Anatomic Therapeutic Class terms). ![]() Depending on the ETL process involved, ETL data errors typically affect all source system data or some consistent part of it, e.g., when birth dates of the mothers of newborns are incorrectly loaded into the newborn’s record, or when a multisite data set has some subset of patients assigned to an incorrect location. Most data transformation also occurs in multiple stages and can span multiple ETL code files written by a variety of developers and teams. While ETL helps with data integration, it can also be a potential source of data quality issues when human mistakes are made in the ETL code. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |