Data Curation
The Data Curation Team (DCT) is focused on large scale aggregation and semantic curation of both raw and computed data fields, and the derivation of data assets including patient registries, complex clinical metrics, and outcome measures. The DCT comprises domain-specific operational units (DSU), consisting of one or more SQL developers focused exclusively on data asset development in a single clinical or operational domain (e.g. – Pulmonary and Critical Care, Cancer Center, Nursing), partnered with a lead clinical or operational subject matter expert, and a domain-specific project manager.
Under the guidance of the lead SME, each DSU within the DCT leverages a committed group of use case-specific SMEs that provide the requisite expertise to guide DSAs in data asset generation. The DCT aggregates and curates a range of data types including structured and unstructured electronic health record data, genomic data, patient-reported data, and external data such as from the UC Health Data Warehouse California Cancer Registry, and California Electronic Death Registry. Both the deliverables and work prioritization of each DSU are determined by DSU SMEs in collaboration with leadership from the sponsoring domain, whereas the methods used to develop, validate, and document each asset are guided by best practices and standard operating procedures set by IT Health Informatics leadership. This partnership ensures that the right work gets done the right way, enabling data assets to be used reliably across UC Davis Health for research, quality and safety, and operational applications. Each DSU works collaboratively with other DSUs to reuse data assets, and members of the DCT work closely with the DAT to enable reusability of data more broadly across UC Davis Health.
Roles and responsibilities of core DSU members are as follows:
- The lead SME is responsible for project scoping and prioritization (in collaboration with leadership from the sponsoring domain), providing guidance to DSAs in data asset generation and evaluation, identifying project-specific SMEs, and liaising with intra-domain data requestors. The lead SME is also responsible for serving on the DPC Oversight Committee, which will help to prioritize data requests originating from outside an established DSU.
- The DSA is responsible for the curation of raw and derived data assets, including algorithm development and validation of accuracy. The DSA works with SMEs to define project deliverables (data assets), specify data/concept definitions, and document definitions and performance characteristics. The DSA operates using a data curation playbook based on best practices from business and clinical informatics, developed and managed by DCT leadership developed and managed by DCT leadership.
- The PM ensures timely work prioritization, completion, and evaluation. The PM works closely with domain SMEs and DSAs to define project scope and specific deliverables, promote use of the data curation playbook, develop and enforce timelines, and ensure the adequacy of documentation. The PM works with DCT leadership to ensure efficient access to other DPC resources, to minimize duplication of effort across DSUs, and to promote the reusability of data assets.