cprnc README ------------ cprnc is a generic tool for analyzing a netcdf file or comparing two netcdf files. Quick Start Guide: ------------------ > setenv INC_NETCDF $local_netcdf_include_path > setenv LIB_NETCDF $local_netcdf_lib_path > gmake > ./cprnc file1 # for analyze mode > ./cprnc file1 file2 # for compare mode Usage: cprnc [-v] [-d dimname:start[:count]] file1 [file2] -m: Compare each time sample. Default is false, i.e. match "time" coordinate values before comparing -v: Verbose output -d dimname:start[:count] Print variable values for the specified dimname index subrange. Users Guide: ------------ cprnc is a Fortran-90 application. It relies on netcdf version 3 or later and uses the f90 netcdf interfaces. It requires a netcdf include file and a netcdf library. The Makefile assumes those are in /usr/local/include and /usr/local/lib respectively. Users can override these defaults by setting environment variables INC_NETCDF and LIB_NETCDF to the local values of the netcdf paths. The Makefile has entries for IRIX64, AIX, Darwin, Cray X1E, OSF1, SUN, and Linux. On Linux, entries exist for pgf90 and lf95 compilers. The Makefile is quite extensible for new architectures. gmake should be used to build cprnc. The Makefile relies on a hard-wired Depends file. cprnc generates an ascii output file via standard out. It initially summarizes some characteristics of the input file[s]. A compare file is generally 132 characters wide and an analyze file is less than 80 characters wide. In analyze mode, the output for a field looks like ( lon, lat, time, -----) 259200 ( 587, 134, 1) ( 269, 59, 1) FX1 96369 8.273160400390625E+02 0.000000000000000E+00 avg abs field values: 9.052845920820910E+01 and a guide to this information is printed at the top of the file ( dim1, dim2, dim3, dim4) ARRSIZ1 ( indx1, indx2, indx3) file 1 FIELD NVALID MAX MIN The first 10 characters of the field name are identified in the first dozen columns of the third line. The first line summarizes the names of the dimensions of the field The second line summarizes the indices of the maximum and minimum value of the field for the first three dimensions. If the fourth dimension exists, it's always assumed to be time. Time is handled separately. The third line summarizes the number of valid values in the array and the maximum and minimum value over those valid values. Invalid values are values that are identified to be "fill" value. The last line summarizes some overall statistics including the average absolute value of the valid values of the field. In comparison mode, the output (132 chars wide) for a field looks like 96369 ( lon, lat, time) 259200 ( 422, 198, 1) ( 203, 186, 1) ( 47, 169, 1) ( 224, 171, 1) FIRA 96369 1.466549530029297E+02 -3.922052764892578E+01 1.4E+02 -3.037954139709473E+01 1.0E+00 -3.979958057403564E+00 96369 1.321966247558594E+02 -1.603044700622559E+01 1.084177169799805E+02 3.982142448425293E+00 259200 ( 156, 31, 1) ( 573, 178, 1) ( avg abs field values: 6.778244097051392E+01 rms diff: 1.4E+01 avg rel diff(npos): 4.6E-02 5.960437961084186E+01 avg decimal digits(ndif): 1.2 worst: 0.0 and a guide to this information is printed at the top of the file NDIFFS ( dim1, dim2, dim3, dim4, ... ) ARRSIZ1 ( indx1, indx2, indx3, ... ) file 1 FIELD NVALID1 MAX1 MIN1 DIFFMAX VALUES RDIFMAX VALUES NVALID2 MAX2 MIN2 ARRSIZ2 ( indx1, indx2, indx3, ...) file 2 The information content is identical to the information in analyze mode with the following additions. Two additional lines are added in the main body. Lines 4 and 5 are identical to line 3 and 2 respectively but are associated with file 2 instead of file 1. In addition, the right hand side of lines 2, 3, and 4 contain information about the maximum difference, the location and values of the maximum difference, the relative difference and the location and values of the maximum relative difference. The last two line summarize some overall statistics including average absolute values of the field on the two files, rms difference, average relative difference, average number of digits that match, and the worst case for the number of digits that match. "avg rel diff" gives the average relative difference (sum of relative differences normalized by the number of indices where both variables have valid values). The denominator for each relative difference is the MAX of the two values. "avg decimal digits" is determined by: For each diff, determine the number of digits that match (as -log10(rdiff(i)); add this to a running sum; then normalize by the number of diffs (ignoring places where the two variables are the same). For example, if there are 10 values, 8 of which match, one has a relative difference of 1e-3 and one has a relative difference of 1e-5, then the avg decimal digits will be 4. "worst decimal digits" is simply log10(1/rdmax), where rdmax is the max relative difference (in the above example, this would give 3). At the end of the output file, a summary is presented that looks like SUMMARY of cprnc: A total number of 119 fields were compared of which 83 had non-zero differences and 17 had differences in fill patterns A total number of 10 fields could not be analyzed A total number of 0 fields on file 1 were not found on file2 diff_test: the two files seem to be DIFFERENT This summarizes: - the number of fields that were compared - the number of fields that differed (not counting fields that differed only in the fill pattern) - the number of fields with differences in fill patterns - the number of fields that could not be analyzed - the number of fields that could not be found on the second file - whether the files are IDENTICAL or DIFFERENT Files are considered DIFFERENT if there are differences either in the values or in the fill patterns of any variable. Developers Guide: ----------------- The tool works as follows. Fields can be analyzed if they are int, float or double and have between 0 and n dimensions In general, fields that appear on both files are compared. If they are sizes, no difference statistics are computed and only a summary of the fields on the files are presented. If fields only appear on one file, those fields are analyzed. The unlimited dimension is treated uniquely. In general, for files that have a dimension named "time", the time axes are compared and matching time values on the two files are compared one timestep at a time. Time values that don't match are skipped. To override the matching behaviour, use cprnc -m. In this mode, timestamps are compared in indexical space. In analyze mode, the fields are analyzed one timestamp at a time. In general, if there is a "time" axis, it will be the outer-most loop in the output analysis. In compare mode, fields with a time axis and a timestamp that are not common between the two files are ignored.