Version 1.5 -- September 28, 2007
Overview of Plan
This plan is currently divided into two phases. Phase 1 of
the NARCCAP Data Management Plan is aimed at carving a taut, critical
path towards releasing RCM data to the broad community - our
priority. It will allow us to work through all phases of data
processing, refine the procedures, and deliver scientific value to the
broad community in the shortest possible timeframe. Phase 1 will rely
upon existing, installed, and operational computational, storage, and
Earth System Grid (ESG) systems. Initial services will include
registration, browse, search, file-based access, fast multi-file
download, aggregation and subsetting, and an archive.
Phase 2 will begin with the installation and configuration
of ESG distributed software components at LLNL/PCMDI. It is our intent
to engage in this phase in parallel with Phase 1. This will involve
dealing primarily with data transport systems, storage management
systems, and software and security infrastructure and policies. Once
these new capabilities have been integrated and tested with PCMDI
systems and storage, we will be positioned to begin publishing NARCCAP
datasets at PCMDI. In parallel with this work, we will begin to extend
the system from file-based access to "virtual" datasets, where users
will be able to request distributed data products by
spatial-temporal-variable subsets.
General Roles and Responsibilities
IOWA
- Specification and refinement of NARCCAP data format and
metadata requirements.
- Initial data quality control (QC) inspection
- Approval to move publishing process forward.
NCAR
- Software development for ESG adaptation for NARCCAP (CISL)
- Full dataset QC inspection (ISSE)
- Archiving of QC'd datasets on NCAR Mass Store System (MSS)
(CISL)
- Publication of QC'd datasets into ESG, including transfer
of datasets to NCAR and PCMDI disk storage resources
- Notification of availability to community
LLNL/PCMDI
- Liaison for exchanging data via shippable disk arrays,
upload of submitted datasets
- Allow data quality control NCAR staff access to PCMDI
storage systems to verify data before archiving
- Install all necessary software used by the quality control
NCAR staff to verify the data at PCMDI
- Archive all data at NERSC HPSS
- Installation, configuration, and support of ESG
distributed components
- Provisioning of ESG-connected online storage
LLNL
- Provisioning of CMOR code for RCM NARCCAP applications, and CAM3
timeslice modeling activities.
Description of NARCCAP Data Process
The process in a nutshell:
- Modeling centers will send sample data to Iowa State after
data are processed with CMOR or equivalent
- Data will undergo an initial quality-control check through
Iowa State.
- Modeling groups will relay their datasets to PCMDI via
shippable storage arrays.
- PCMDI will upload datasets from shippable disk arrays to
local staging PCMDI rotating storage.
- PCMDI will make a copy of all incoming datasets on the
NERSC HPSS for purposes of disaster recovery.
- Datasets will be QC'd by NCAR staff at PCMDI.
- QC'd datasets will be archived to the NCAR Mass Storage
System (MSS).
- QC'd datasets will be published to the Earth System Grid (ESG),
making them transparently available through the www.earthsystemgrid.org
interface. Early datasets will reside on NCAR disk, later
datasets from PCMDI disk.
We have established a "NARCCAP Data" mailing list, which will be
used by modeling groups to initiate and follow-through on the
submission process, and by NARCCAP data team members to post
notifications related to the various steps and related progress, as
outlined below. The idea here is to maintain good communication
throughout the process, so everyone is aware of progress and any
issues that might crop up. This is important: at each stage of the
process, collaborators must post mail to the NARCCAP Data Mailing
List:
narccap-data@mesonet.agron.iastate.edu.
This is to insure that everyone is aware of progress, problems, and
issues.
Step 1: Modeling groups submit sample data to Iowa State
If a modeling group is preparing output from runs subsequent to the
NCEP-driven runs, go to Step 3
Modeling groups will prepare output for the variables specified at
http://narccap.ucar.edu/data/output_archive.html.
Using CMOR or an equivalent process, modeling groups will produce
datasets for publication according to the NARCCAP requirements
specified at http://narccap.ucar.edu/data/output_requirements.html.
If preparing output from the NCEP-driven runs, send to Iowa
State via ftp, one file in standard NARCCAP format for one
variable in each of the NARCCAP archive tables:
Table 1 - Daily maximum temperature (tasmax)
Table 2 - Precipitation (pr)
Table 3 - 500 hPa geopotential height (zg500)
Table 4 - Surface altitude (orog)
Table 5 - Temperature (ta)
(Note: these tables are not the same as the CMOR tables)
Step 2: Iowa State reviews submitted data
Following a process similar to that used for AR4, Iowa will
undertake an initial review of model output, and interact with each
modeling group as needed to arrive at correct datasets according to
NARCCAP standards. This will include evaluating some diagnostics and
reviewing the metadata. Communications in this activity will all be
posted to the mailing list, so that all parties are aware of any
problems or workflow issues that arise.
Upon successful completion of this step, Iowa will either approve
the submission or iterate the process further with the modeling
group. Once the submission has been approved, the data can move on to
step 3.
Step 3: Modeling group announces data ready, PCMDI ships disk
When a modeling group has data that is ready for submission, they
will announce it on the narccap-data mailing list. At this point, Tony
Hoang will ship a disk array to the modeling group for them to load
with their data. Note that Dean Williams will help coordinate
activities at PCMDI.
Step 4: Modeling groups ship data back to LLNL/PCMDI
The modeling group will receive the disk array, load their
correctly formatted and structured output data upon it, and ship it
back to Tony Hoang at PCMDI.
Step 5: PCMDI uploads submitted data
Upon receipt of the shippable disk array, PCMDI will upload
datasets onto rotating storage at PCMDI (onto disk on the machine
climate.llnl.gov ) and archive a copy of each dataset, as
submitted by the modeling group, to the NERSC HPSS for purposes of
disaster/failure recovery. The current plan is to treat these copies
as dark/unpublished archives. Once upload of the contents of the disk
array is successfully complete, PCMDI will notify the mailing list
that the dataset is available, and will provide a pointer to its
location.
Step 6: NCAR performs QC on submission
QC will be conducted in-situ at PCMDI, on
climate.llnl.gov . This provides the closest connection to
the original datasets in the event file corruption or other problems
are detected. Seth McGinnis and Larry McDaniel will perform the final
QC of the datasets, resulting in a product that is ready to be
published into the ESG system.
What will happen when problems are found in the submission depends
on their nature. In the case of problems that are simple, isolated,
and easy to fix, the QC team will fix them. For more complicated or
pervasive problems, the modeling team will be responsible for fixing
them. The QC team will communicate the problem directly to the
modeling team, and the modeling team will coordinate with Dean
Williams to arrange for the retransfer of the corrected datasets to
PCMDI via ftp or shipping of another disk, as appropriate. The QC
team will also summarize the problem for the mailing list to aid other
modeling groups in avoiding the same problem.
Step 7: NCAR archives datasets, readies them for publishing
Once a dataset has made it through final QC, an archival copy will
be stored on the NCAR MSS for at least five years, as per grant
contract. The QC team will announce that the dataset is ready on the
mailing list. Chi-Fan Shih will then transfer QC'd datasets via the
network using DataMover from PCMDI to the NARCCAP storage
staging space on the NCAR SAN and make an archival copy in the MSS. If
the data is part of the first 10 TB (approximately) of output, it will
reside on disk at NCAR, on datazone.ucar.edu . In this
case, Chi-Fan will also copy the data into the appropriate location on
datazone . The remainder of the data will reside on disk
at PCMDI, on climate , in which case the QC team will
simply copy it from scratch space into the appropriate location.
Step 8: NCAR publishes datasets to ESG
Luca Cinquini will set up the initial re-engineering of the ESG
publishing infrastructure for NARCCAP. Once a dataset has been
positioned in its final place of residence, Seth and Larry will
publish it into the ESG system, whereupon it will be available for the
NARCCAP community to download.
Data will reside in a directory structure organized
[regional-model]/[driver]/[present|future] (e.g.,
MM5/CCSM/future or
RegCM3/NCEP/present ). These will be presented to
end-users as a table of RCM/GCM combinations that link to the
appropriate catalogs of data.
Note: Initially, test datasets and early results will be
published at NCAR as part of Phase 1 activities. The idea is to work
through the entire process as quickly as possible, and to get data
products out to the community in as streamlined a fashion as
possible. Once the Phase 2 integration of PCMDI systems is complete,
published data will begin to flow there as well.
Timeslice Data
The GFDL timeslice data will be served directly by GFDL. It is
currently available at http://www.gfdl.noaa.gov/~bw/narccap/.
Only data for the NARCCAP region is available. (QC for this dataset is
still under discussion.)
The CAM3 timeslice data from Phil Duffy will reside on climate
alongside the other datasets. The entire global dataset will be made
available. Phil will work with the QC team to perform an initial check
of the data before postprocessing. Otherwise, the timeslice data will
be treated like other model datasets with regard to QC and
publishing. Arrangements for transfer of the timeslice data as
necessary will be coordinated by Phil, Dean, and Chi-Fan.
|