Known Bugs in CLM4.0.54 in CESM1.1.0 Nov/08/2012 ==================================================================================== Bug Number: 1543 large-file format does NOT work in latest clm CLM has the NetCDF large-file format hardwired to TRUE. You can NOT use the namelist variable outnc_large_files to change them to Classic format. ==================================================================================== Bug Number: 1398 clm and mksurfdata_map needs to check map file If the map files sent to CLM are inconsistent with the datafiles, CLM does NOT nicely abort, but aborts with the following types of problem.... ==================================================================================== Bug Number: 1397 c2l_scale_type not specified for many history fields Bill Sacks reports the following problem... Many history fields do not have a c2l_scale_type parameter (in histFldsMod), but it seems they should. For example, there is a set of water flux variables, starting with QFLX_RAIN_GRND and ending with QFLX_DEW_SNOW, most of which do not have a c2l_scale_type. From talking with Keith Oleson, it seems that at least some and maybe all of these should have c2l_scale_type='urbanf', by analogy with similar fluxes that do have a c2l_scale_type specified. From talking with Keith Oleson: it sounds like most fluxes should have c2l_scale_type='urbanf', but this isn't necessarily always true. So this will require more investigation to determine the appropriate scale type for each history field. Most (all?) of the fields that do not have a c2l_scale_type are ones that were added after the urban model came in - for example, fields that were added when the CN code came in. So my guess is that whoever added these fields didn't realize that a c2l_scale_type was required. After these fields are fixed, perhaps scale_type_c2l should be made a required argument to hist_addfld1d and hist_addfld2d to prevent this problem from arising again in the future. ==================================================================================== Bug Number: 1377 CLM1PT mode is in extend mode by default, so fails going beyond one data cycle PLM1PT mode puts atm forcing data in extend mode so if you go beyond one data cycle it gives screwy results. I ran the spinup series with PTCLM and since it was running over multiple years after it ran one cycle it used the last time-step for the future years. Here's one way to fix this issue after a case is built... sed "s/'extend','cycle'/'cycle','cycle'/g" Buildconf/datm.buildnml.csh > datm.bld mv datm.bld Buildconf/datm.buildnml.csh See bug 1368 below for a case where this causes a major problem with a simulation. ==================================================================================== Bug Number: 1368 PTCLM for US-UMB spins up with zero GPP I ran a spinup with PTCLM (to test the procedure and to get initial conditions for US-UMB, and be able to work the transient issue). For the final_spinup phase GPP was identically zero. This was fixed by setting taxmode='cycle'. See bug 1377 above. ==================================================================================== Bug Number: 1360 Can't do a ncdump on the US-UMB single point data for PTCLM I get the following error doing an ncdump on the US-UMB data... ncdump -h 1999-01.nc ncdump: name begins with space or control-character: 1 I can view the file with ncks, or ncl, or ncview. It turns out the problem is that the filename -- doesn't start with an alphabetical letter it starts with a number. So there isn't really a way around this other than renaming the files and changing the streams. To do a ncdump, you can simply rename the file to something else and then do the ncdump on it. ==================================================================================== Bug Number: 1348 Restart test with crop on shows differences in landmask field The following test is failing on edinburgh with the lahey compiler with clm4_0_32. 006 erTZ4 TER.sh 21p_cncrpsc_ds clm_stdIgnYr^nl_crop 20020401:3600 10x15 USGS -3+-7 cold ........FAIL! rc= 13 The difference is in landmask field where some values are set to 1.95379e+09 looks like over ocean. ==================================================================================== Bug Number: 1345 Irrigation dataset is upside down! The irrigation dataset used to create surface datasets by mksurfdata is upside down (latitude goes from 90 to -90 rather) relative to other files used to create surface datasets. The filename is: $CSMDATA/lnd/clm2/rawdata/mksrf_irrig_2160x4320_simyr2000.c090320.nc This is the same problem as in the VOC dataset in bug number 1044 below. Even though the file looks incorrect the datasets it creates are fine. ==================================================================================== Bug Number: 1339 Limit on number of files when running with 155 years of MOAR data In order to run with 155 years of MOAR data the file limit in shr_stream needs to increase from 1000 to 2000 (technically 1860 would be sufficient, but might as well bump it to 2000). Index: shr_stream_mod.F90 =================================================================== --- shr_stream_mod.F90 (revision 28396) +++ shr_stream_mod.F90 (working copy) @@ -101,7 +101,7 @@ end type shr_stream_fileType !--- hard-coded array dims ~ could allocate these at run time --- - integer(SHR_KIND_IN),parameter :: nFileMax = 1000 ! max number of files + integer(SHR_KIND_IN),parameter :: nFileMax = 2000 ! max number of files type shr_stream_streamType !private ! no public access to internal components We didn't put this change in as it causes a compiler error on bluefire for AIX when compiling the dlnd component... /ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/lnd/obj/Depends mpxlf90_r -c -I. -I/usr/include -I/usr/local/include -I/usr/local/include -I. -I/ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/SourceMods/src.dlnd -I/glade/proj3/cseg/people/mver tens/src/cesm/cesm1_0_rel06/models/lnd/dlnd -I/glade/proj3/cseg/people/mvertens/src/cesm/cesm1_0_rel06/models/lnd/dlnd/cpl_mct -I/ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/li b/include -WF,-DMCT_INTERFACE -WF,-DHAVE_MPI -WF,-DAIX -WF,-DSEQ_ -WF,-DFORTRAN_SAME -q64 -g -qfullpath -qmaxmem=-1 -qarch=auto -qsigtrap=xl__trcedump -qsclk=micro -O2 -qstri ct -Q -qsuffix=f=f90:cpp=F90 /glade/proj3/cseg/people/mvertens/src/cesm/cesm1_0_rel06/models/lnd/dlnd/dlnd_comp_mod.F90 touch /ptmp/mvertens/SMS.f45_g37.A.bluefire.rel06_d/lnd/obj/Filepath 1517-009: (U) Error in compiler runtime system; compilation ended. xlf90_r: 1501-230 (S) Internal compiler error; please contact your Service Representative. For more information visit: http://www.ibm.com/support/docview.wss?uid=swg21110810 1501-511 Compilation failed for file dlnd_comp_mod.F90. gmake: *** [dlnd_comp_mod.o] Error 40 ==================================================================================== Bug Number: 1326 Running with both crop AND irrigation fail with a balance check error Running tests that have both crop AND irrigation fail with a balance check error. This is for starting up with arbitrary initial conditions and running with either CN and/or CNDV. ==================================================================================== Bug Number: 1325 Writing out GDDHARV cause abort when written out to history file The variables: GDDHARV cause the model to abort when adding it to the history file and DEBUG mode is on. It aborts in one of the pft averaging functions in subgridAveMod with a multiply by a NaN. This is on bluefire. The variables are initialized to spval, so I'm not sure why this happens. Here's what the abort looks like... (seq_mct_drv) : Model initialization complete Signal received: SIGTRAP - Trace trap Signal generated for floating-point exception: FP invalid operation Instruction that generated the exception: fmul fr02,fr01,fr02 Source Operand values: fr01 = NaNS fr02 = 1.00000000000000e+00 Traceback: Offset 0x00002104 in procedure __subgridavemod_NMOD_p2g_1d, near line 796 in file /fis/cgd/home/erik/clm_cropbr/models/lnd/clm/src/main/subgridAveMod.F90 Offset 0x00000540 in procedure *__subgridavemod_NMOD_p2g_1d_stub_in___histfilemod_NMOD_hist_update_hbuf_field_1d Offset 0x00000670 in procedure __histfilemod_NMOD_hist_update_hbuf_field_1d, near line 1172 in file /fis/cgd/home/erik/clm_cropbr/models/lnd/clm/src/main/histFileMod.F90 Offset 0x000000fc in procedure __histfilemod_NMOD_hist_update_hbuf@OL@1 Location 0x09000000015f2d4c Location 0x09000000015eb758 Offset 0x000000dc in procedure _pthread_body --- End of call chain --- ==================================================================================== Bug Number: 1310 Some indices are different for differing number of threads Some of the 1d indices are different on the history files when differing number of threads is used. 034 erL83 TER.sh _nrsc_do clm_std^nl_urb 20020115:3600 5x5_amazon navy -5+-5 arb_ic .............FAIL! rc= 13 Everything's bit-for-bit up to... CLM_compare.sh: comparing clmrun.clm2.h1.2002-01-20-00000.nc with /ptmp/erik/test-driver.888958/TSM._nrsc_do.clm_std^nl_urb.20020115:3600.5x5_amazon.navy.-10.arb_ic/clmrun.clm2.h1.2002 -01-20-00000.nc CLM_compare.sh: files are NOT b4b RMS land1d_g 7 RMS cols1d_g 7 RMS cols1d_l 8 RMS pfts1d_g 7 RMS pfts1d_l 8 RMS pfts1d_c 12 We got around this by removing these fields from the history files. ==================================================================================== Bug Number: 1289 Problem reading in single-point CO2 stream file on franklin We verified this is a problem on franklin, but NOT other machines such as bluefire for example. Zack Subin Reports on the following problem on Franklin. The problem is a subscript out of range in a MCT subroutine being used by PIO. Hence why I've added Jim E. and Rob J. to the list of people. He's the running the following case documented in the CLM Users Guide. http://www.cesm.ucar.edu/models/cesm1.0/clm/models/lnd/clm/doc/UsersGuide/x2920.html#AEN2948 I think the thing that's unique here is that the CO2 file only has one datapoint. There might be an assumption that you are reading more datapoints and something isn't dimensioned right in MCT, PIO or in datm? Not sure which... I ran the same case on bluefire and it runs both with DEBUG on and off (as I say below). But, possibly bluefire is more forgiving on this subscript overflow than Franklin. Since Franklin is a pretty standard machine it would probably show up on other platforms as well. Here's Zack's message, with my previous message to him to give me some data to file the bug report with. I'm running I_1850-2000_CN, 1.9x2.5 deg, on Franklin with clm4_0_24. It is out of the box with the instructions for passing CO2 except for the location of the forcing files and the initial conditions file, and the number of tasks in drv_in:ccsm_pes: *_ntasks. The end of the standard output log reads: 0: Subscript out of range for array compbuf (rearrange.F90: 300) subscript=1, lower bound=1, upper bound=0, dimension=1 The end of the cpl.log reads: (seq_mct_drv) : Initialize each component: atm, lnd, ocn, and ice (seq_mct_drv) : Initialize atm component The end of the datm.log reads: (shr_dmodel_readLBUB) reading file: /global/homes/z/zmsubin/Scratch/clmdata/fco2_datm_1765-2007_c100309.nc 85 There is no lnd.log. It does not produce a core file. When I run with DEBUG off it runs normally, and the PCO2 is identical to what I get from a run in clm4_0_16 except that the point at (1, 1) has a nonzero PCO2 whereas it is 0 in clm4_0_16. When I follow your instructions for setting the number of atm pio tasks to 1, it still has the same error. --Zack I was able to replicate this problem on lynx: /glade/proj2/fis/cgd/home/erik/clm4_0_24/scripts/DATM_CO2_TSERIES cat /ptmp/erik/DATM_CO2_TSERIES/run/ccsm.log.110225-170538 ock size conversion in bytes is 4086.02 8 MB memory alloc in MB is 8.00 8 MB memory dealloc in MB is 0.00 Memory block size conversion in bytes is 4086.02 8 MB memory alloc in MB is 8.00 8 MB memory dealloc in MB is 0.00 8 MB memory dealloc in MB is 0.00 Memory block size conversion in bytes is 4086.02 Memory block size conversion in bytes is 4086.02 . . . . 0: Subscript out of range for array compbuf (rearrange.F90: 300) subscript=1, lower bound=1, upper bound=0, dimension=1 0: Subscript out of range for array compbuf (rearrange.F90: 300) subscript=1, lower bound=1, upper bound=0, dimension=1 [NID 00073] 2011-02-25 17:06:52 Apid 136614: initiated application termination Application 136614 exit codes: 127 Application 136614 exit signals: Killed Application 136614 resources: utime 0, stime 0 atm.log file ends with... (shr_dmodel_readLBUB) reading file: /glade/proj2/fis/cgd/cseg/csm/inputdata/atm/datm7/CO2/fco2_datm_1765-2007_c100614.nc 85 ==================================================================================== Bug Number: 1282 Trouble running datm8 to the last time-step for datasets with missing data The urban single-point sites all have only a portion of a complete year of atm forcing data. Hence, all of them abort with a dtlimit error when you try to run until the last time-step. This is because it reads in the data for the next time-step (thinking it needs to do a time-interpolation) and finds the difference in time-step is large (since it's over the part of the year with missing data). This is for the 1x1_mexicocityMEX, 1x1_vancouverCAN, and 1x1_urbanc_alpha sites, but would be the case for other datasets with missing time-periods. The fix is to change the datm namelist to add settings for tintalgo and dtlimit in the datm namelist as follows... &shr_strdata_nml . . . tintalgo = 'nearest','linear' dtlimit = 25000.,1.5 / Thus it will use the nearest point in time, and won't die with a dtlimit error, as we are setting the dtlimit to a very high value. ==================================================================================== Bug Number: 1164 Restart trouble for CNC13 with INTEL, PGI and LAHEY compilers 017 erR53 TER.sh 17p_cnc13sc_do clm_std^nl_urb 20020115:NONE:1800 10x15 USGS@1850 10+38 cold ....FAIL! rc= 13 Answers differ and gradually diverge in time. This could be a restart issue or a multi-processing or threading issue. ==================================================================================== Bug Number: 1163 CN finidat files have a bunch of fields with NaN's on it. For example on: $CSMDATA/ccsm4_init/I2000CN_f09_g16_c100503/0001-01-01/ \ I2000CN_f09_g16_c100503.clm2.r.0001-01-01-00000.nc the fields: mlaidiff, flx_absdv, flx_absdn, flx_absiv, flx_absin, and qflx_snofrz_lyr all have NaN's, with mlaidiff being completely full of NaN's (since mlaidiff is only defined for CLMSP or if drydep is on). ==================================================================================== Bug Number: 1127 interpinic not tested for CNDV, yet; expected not to work Interpinic has not worked for the old dgvm since probably before clm3.5. Interpinic has not been tested, yet, for CNDV. Therefore, we assume that it does not work. ==================================================================================== Bug Number: 1124 Reported energy for grid-cell is not quite right for pftdyn The amount of water is conserved correctly in pftdyn mode, but the energy isn't reported quite accurately. ==================================================================================== Bug Number: 1101 suplnitro=ALL mode is over-productive suplnitro=ALL mode is over-productive. This is because it provides unlimited Nitrogen. Fixing it requires using fnitr from the pft-physiology file, a different pft-physiology file with fnitr scaled appropriately and some code modifications to get this all to work. ==================================================================================== Bug Number: 1063 Problems restarting for CESM spinup data mode Exact restarts for the 1850 CN spinup compset fail on bluefire... ERS.f09_g16.I1850SPINUPCN.bluefire also the ERB test fails, and the ERB_D test fails with optimization set to zero. (note ERS for the I1850CN compset passes, it's just the SPINUP case that fails) In the coupler log file there's a single field that is different... The good thing is that it's a single field from the land model that's causing trouble... Comparing initial log file with second log file Error: /ptmp/erik/ERS.f09_g16.I1850SPINUPCN.bluefire.124426/run/cpl.log.091029-130401 and /ptmp/erik/ERS.f09_g16.I1850SPINUPCN.bluefire.124426/run/cpl.log.091029-130648 are different. >comm_diag xxx sorr 1 4.5555498624818352000E+16 recv lnd Sl_t 9999. Having dates of Y10K or more is sometimes useful for paleo simulations. For clm to get past the Y10K barrier -- it needs the subroutines set_hist_filename restFile_filename set_dgvm_filename changed to allow 5 or 6 digit years rather than just 4-digit ones. scripts, drv, and csm_share also have problems with Y10K as well. ====================================================================================