Explanation of how to use the aggregate.ncl script For Bill to generate T1; also send to Ping Yang I wrote an ncl script to perform aggregation of NARCCAP data. This example shows how to use aggregate.ncl to condense 3-hourly data into daily data. This is the process I used to generate tasmin & tasmax from tas for the RCM3 runs when we discovered that the values generated by RCM3 itself were no good. There are some fiddly details relating to getting the aggregation period to match up with the 0600-0600 GMT "day" specified for NARCCAP Table 1 data, but the usage to generate monthly & seasonal averages and climatologies is pretty similar. 1) Concatenate input files together using the NCO command 'ncrcat'. This is necessary because the file boundaries on the 5-year files may not coincide exactly with the boundaries of the periods you're aggregating over. If you aren't careful with the boundaries, you can end up with a period at the edge of the range where the value for a large period is based on just one or two timesteps. So you definitely want to get this right. For going from 3-hourly to daily, we also throw out the very first timestep, which is at 0300 on the first day. If it's not excluded, it either results in an extra day at the beginning or in a day with 9 timesteps contributing intead of 8, and either way it messses things up. > ncrcat -d time,1, [input files] [output file] 2) Aggregate data using NCL script > ncl -Q -n aggregate.ncl infile=\"tas.nc\" outfile=\"tasmax.nc\" interval=\"day\" varname=\"tas\" method=\"max\" check=True offset=-0.25 taint=True outtime=\"start\" We pass command-line arguments to the NCL script using variable definition statements on the command line. For string-valued variables, NCL needs the quote-marks, which means you need to escape them with backslashes so the shell doesn't interpret them instead of passing them on to NCL. You could hardwire these values in the script if you needed to. In addition to the required command-line input to define the names of the input file, output file, name of the variable, and period of aggregation, there are a number of different options you can give aggregate.ncl to control is behavior. The options used here: method: allowed values are "mean", "min", or "max". Determines what function is used to aggregate over the period. Switch to "min" to generate tasmin. check: if True, prints a bunch of debugging information at the end so you can double-check that the output really is what you think it is and came where it was supposed to come from. Good practice to use this and look at it the output afterwards. (I typically redirect it to a file in a subdirectory named "check".) offset: a shift to the time coordinate. Used to adjust when the day starts when doing daily aggregations. Using -0.25 makes the day run from 0600 GMT to 0600 GMT. taint: if true, any missing_value timesteps in the input cause the entire output to be missing also. outtime: determines which point in the input interval should be used as the time coordinate for the output. There are also options to control where in the interval the output time coordinate is set, making climatological averages across years, and printing of progress indicators for large datasets that take a long time to process. 3) Rename variables to reflect new contents If we were averaging the variable, we'd probably want to leave it with the same name, but since we're generating a maximum temperature variable from an average temperature variable, we need to rename the data variable accordingly. > ncrename -v tas,tasmax tasmax.nc 4) Update metadata The tas variable is in table 2, while tasmax is in table 1, so we need to change the global attribute named "table". We also need to update the long_name attribute of tasmax to reflect the new variable. And, for a minimum or maximum value, we need to add an appropriate cell_methods attribute. All of these updates can be done with a single use of ncatted. Note that we use the -h flag to prevent ncatted from adding a history entry for this operation because the results of the action are plainly obvious in the metadata, and the very long entries typical of editing metadata really clutter up the history and make it hard to read. > ncatted -h -a table_id,global,m,c,"Table 1" -a long_name,tasmax,m,c,"Maximum Daily Surface Air Temperature" -a cell_methods,tasmax,m,c,"time: maximum(interval: 1 days)" tasmax.nc 5) Split files back into 5-year chunks using ncks For NARCCAP publication, we have everything split into 5-year chunks to keep the file sizes below 2 GB. If NCO has been installed with udunits support, we can subset the data along the time dimension using dates, which is a big plus for understanding what happened to the data later on. There's no good programmatic way to split the files according to the NARCCAP spec, so we just specify all the start and end dates by hand. For Table 1 data, we can leave the time of day unspecified. This sets it to 00:00 hours, and since the coordinates for daily values are at 06:00 hours, the bounds as specified below will split things properly. (The situation would be more complicated for splitting 3-hourly data.) Happily, going from Jan-01 to Jan-01 also lets you ignore differences in the calendar. The little shell loop does this for both tasmax and tasmin, and propagates whatever other filename components may be in place. NCEP data: foreach f (tasm*.nc) set g = `basename $f .nc` ncks -O -d time,"1979-01-01","1981-01-01" $f ${g}_1979010106.nc ncks -O -d time,"1981-01-01","1986-01-01" $f ${g}_1981010106.nc ncks -O -d time,"1986-01-01","1991-01-01" $f ${g}_1986010106.nc ncks -O -d time,"1991-01-01","1996-01-01" $f ${g}_1991010106.nc ncks -O -d time,"1996-01-01","2001-01-01" $f ${g}_1996010106.nc ncks -O -d time,"2001-01-01", $f ${g}_2001010106.nc end Current-period data: foreach f (tasm*.nc) set g = `basename $f .nc` ncks -O -d time,"1968-01-01","1971-01-01" $f ${g}_1968010106.nc ncks -O -d time,"1971-01-01","1976-01-01" $f ${g}_1971010106.nc ncks -O -d time,"1976-01-01","1981-01-01" $f ${g}_1976010106.nc ncks -O -d time,"1981-01-01","1986-01-01" $f ${g}_1981010106.nc ncks -O -d time,"1986-01-01","1991-01-01" $f ${g}_1986010106.nc ncks -O -d time,"1991-01-01","1996-01-01" $f ${g}_1991010106.nc ncks -O -d time,"1996-01-01", $f ${g}_1996010106.nc end Future-period data: foreach f (tasm*.nc) set g = `basename $f .nc` ncks -O -d time,"2038-01-01","2041-01-01" $f ${g}_2038010106.nc ncks -O -d time,"2041-01-01","2046-01-01" $f ${g}_2041010106.nc ncks -O -d time,"2046-01-01","2051-01-01" $f ${g}_2046010106.nc ncks -O -d time,"2051-01-01","2056-01-01" $f ${g}_2051010106.nc ncks -O -d time,"2056-01-01","2061-01-01" $f ${g}_2056010106.nc ncks -O -d time,"2061-01-01","2066-01-01" $f ${g}_2061010106.nc ncks -O -d time,"2066-01-01", $f ${g}_2066010106.nc end 6) Double-check results Always check that the end result makes sense. I wrote a little script in NCL that uses the cd_calendar() function to print the date and time of the first and last timestep in a file, and that plus the number of timesteps in each file and the debugging output from the aggregate script should indicate whether everything did what it was supposed to.