WRF on Chaos

Compiling WRF and WPS

Note #1: These instructions were last updated 16 Nov 2016. Changes to the system architecture may have occurred since then. Locate libraries before following these instructions to ensure that they still exist. For example, /opt/intel-soft/netcdf3. Contact system administrators if you need assistance locating system directories.

Note #2: These instructions are for WRFv3.7.1, but shouldn't vary that wildly for newer versions. The information provided was adapted from the ARW Online Tutorial for use on our system. Before you begin, please review the first several chapters of the user's manual for the new Bright Computing system that has been installed on Chaos, or at least Chapter 2 on the new environment and usage of modules.

Step 1. Log on to chaos and set up your environment.
The following modules are necessary and can be added as they are listed below to your .bashrc file in your home directory so you don't have to load them every time you log on.
module load shared
module load gcc/5.1.0
module load intel/compiler/64/16.0.3/2016.3.210
module load intel/mkl/64/11.1/2013_sp1.3.174
  module load intel/mpi/64/5.1.3/2016.3.210
module load torque/5.1.0
Additionally, you will have to source ifortvars.sh so that executables can see the shared Intel Fortran library. On the command line, type
source /cm/shared/apps/intel/compilers_and_libraries/2016.3.210/bin/ifortvars.sh intel64
Next, make sure you have the path to NETCDF correct. This can be added to your .bashrc file.
export NETCDF=/opt/intel-soft/netcdf3
In fact, go ahead and add the above and also the following lines to your .bashrc file in your home directory (e.g. /home/sbuwrf/.bashrc):
   export WRFIO_NCD_LARGE_FILE_SUPPORT=1
   export JASPERLIB=/opt-soft/lib
   export JASPERINC=/opt-soft/include
   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel-soft/lib
An example .bashrc file with these changes is found at /D0/sbuwrf/REALTIME/WRFv3.7.1/test_run/example_bashrc_file.txt.
     You'll still have source the ifortvars.sh upon logging on, though, but not everything else mentioned if you added the rest to your .bashrc file. Alright, now that your environment is good to go, and you have unpacked WRF and WPS with the following commands,
   gunzip WRFV3.TAR.gz (if the file is gzipped, otherwise just go to the next command)
tar -xf WRFV3.TAR
gunzip WPS.TAR.gz   (if the file is gzipped, otherwise just go to the next command)
tar -xf WPS.TAR
then we should be ready to move on. Note: we have communal geographical data to save space on chaos. It is located at /D0/sbuwrf/REALTIME/WRFv3.7.1/geog/ and I'll mention where you have to specify that path in the namelist.wps file later on. Okay, now that the environment is set up correctly, we can move on.

Step 2. Configure and compile WRF.
Within your WRFV3/ directory (e.g. /D0/sbuwrf/REALTIME/WRFv3.7.1/WRFV3) configure the WRF by typing
   ./configure
   A bunch of options are available. The option we want is #15 which is INTEL (ifort/icc) dmpar- distributed memory option with MPI. Then select "1" for basic nesting compilation. This will create a configure.wrf file which we shouldn't need to edit (stay tuned for WPS compilation... yahoo!).
Before compiling, we have to set one more environmental variable.
   export WRF_EM_CORE=1
Okay, we are ready to compile. We want to direct the compile output to a log which will help with troubleshooting if we had any errors.
   ./compile em_real >& compile.log
   If you add an ampersand (&) after compile.log (i.e., ./compile em_real >& compile.log &) then it will run it in the background so you can do a trailing tail command on the log file, otherwise you'll have to open another window and log on to chaos and do the following --
tail -f compile.log
   Which will keep you apprised of all the Fortran and C intricacies of the WRF that have to be compiled as they stream along. It takes about 45 minutes for this to complete, so go read a journal article or something to pass the time! When finished, you should see a printout that includes ----> Executables successfully built <--- and then a list of the executables. To check that they were built, go to ./run or ./main and look for ndown.exe, real.exe, tc.exe, and most importantly wrf.exe.
   If something went wrong, please see the ARW Online Tutorial, Google the issue, and re-do the above steps after cleaning up what you've done and re-doing everything.
   ./clean -a
./configure
./compile em_real >& compile.log2
   If your executables were built, then congrats and proceed to compiling WPS!

Step 3. Configure and Compile WPS.
   Move into the WPS directory (e.g., /D0/sbuwrf/REALTIME/WRFv3.7.1/WPS/). We start by configuring.
   ./configure
   The option we want here is #19 for Linux x86_64, Intel compiler, dmpar (distributed memory with MPI) and allows GRIB2. That should have generated a configure.wps file that we do need to edit. Specifically, the JASPER, PNG and Zlib need to be set correctly within the configure.wps script.
The following lines must be changed to what is below within configure.wps within the section that follows "# Settings for Linux x86_64, Intel compiler (dmpar) " not the section for "Architecture specific settings" which are intentionally left blank.
  COMPRESSION_LIBS = -L/opt-soft/lib -ljasper -L/opt/intel-soft/lib -lpng -L/opt/intel-soft/lib -lz
COMPRESSION_INC = -I/opt-soft/include
DM_FC = mpif90 -f90=ifort
   DM_CC = mpicc -cc=icc -DMPI2_SUPPORT
   and everything else should be fine with the default setting. Now we are ready to compile.
   ./compile >& compile_wps.log
   Again, adding an ampersand will allow that to run in the background so you can do a tail on it.
   tail -f compile_wps.log
   The thing with compiling WPS is that it doesn't take as long as compiling the WRF (< 10 minutes) but it doesn't end with a pretty message letting you know when it's done. So once the compilation has completed, the following executables should have been built right in the same directory: geogrid.exe, metgrid.exe, and ungrib.exe. The most common issue is for ungrib.exe not to have been successfully built (i.e., it's missing). This is due to the availability of the compression libraries (JASPER, PNG, Zlib) that we had to edit the paths to within the configure.wps file. Make sure these were edited correctly (N.B. for COMPRESSION_LIBS those "-l" are lower-case L's as in llama that preface "jasper", "png", "z" and for the COMPRESSION_INC those are capital "I's as in island.) If you are missing one or more executable then clean up and re-do the process. Did you source ifortvars.sh? (See Step 1.)
     ./clean -a
./configure
Then edit the configure.wps file as indicated above.
./compile >& compile_wps.log2
   If your executables were built, then congrats -- you are ready to run the WRF!

Running WPS and WRF Manually

Before you begin, please review the first several chapters of the user's manual for the new Bright Computing system that has been installed.

Step 1. Log on to chaos and set up your environment.
The following modules are necessary and can be added as they are listed below to your .bashrc file in your home directory so you don't have to load them every time you log on.
   module load shared
   module load gcc/5.1.0
   module load intel/compiler/64/16.0.3/2016.3.210
   module load intel/mkl/64/11.1/2013_sp1.3.174
module load intel/mpi/64/5.1.3/2016.3.210
   module load torque/5.1.0
Additionally, you will have to source ifortvars.sh so that executables can see the shared Intel Fortran library.
   source /cm/shared/apps/intel/compilers_and_libraries/2016.3.210/bin/ifortvars.sh intel64
Finally, make sure you have the path to NETCDF correct and LD_LIBRARY_PATH includes an extra directory. Again, these can be added to your .bashrc file in your home directory (e.g. /home/sbuwrf/.bashrc).
   export NETCDF=/opt/intel-soft/netcdf3
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel-soft/lib
You'll still have source the ifortvars.sh upon logging on, though, but not everything else mentioned if you added the rest to your .bashrc file.
  An example .bashrc file with these changes is found at /D0/sbuwrf/REALTIME/WRFv3.7.1/test_run/example_bashrc_file.txt. Again, your actual .bashrc will be located in your own directory within home/, e.g. /home/sbuwrf/.bashrc.

Step 2. Create your namelists and run geogrid, ungrib, and metgrid.
Instructions on how to complete each step of the WRF Pre-processing System (WPS) manually are provided here with more general information found on the ARW Online Tutorial. In your WPS directory, you should see a set of executables (geogrid.exe, ungrib.exe, metgrid.exe). If not, you'll have to re-compile. If you have those, you should be good to proceed!
     Start by creating your namelist.wps within your WPS directory -- you'll need to edit the following line:
   geog_data_path = '/D0/sbuwrf/REALTIME/WRFv3.7.1/geog/'.
Within namelist.wps, make sure your input dates are correct, your domain settings are good, the "opt_geogrid_tbl_path = '/YourPathTo/WRFv3.7.1/WPS/geogrid'", "opt_metgrid_tbl_path = '/YourPathTo/WRFv3.7.1/WPS/metgrid'", etc. are correct for your simulation.
     Run geogrid.exe on the command line once your namelist.wps is all set. An example namelist.wps is found at /D0/sbuwrf/REALTIME/WRFv3.7.1/WPS/namelist.wps.
./geogrid.exe
The process should conclude with the output message "! Successful completion of geogrid. !" and geo_em.d0#.nc files (where # = domain number) should be present.
     Next, download your data and put it in a directory and then link to it with the following command within your WPS/ directory.
     ./link_grib.csh path_to_data (e.g., ./link_grib.csh /D0/sbuwrf/REALTIME/NAM/*.grb2)
   Nothing should have happened, but if you list out the files in the directory you'll see that those initial/boundary condition files you plan on using are linked with names like GRIBFILE.AAA, GRIBFILE.AAB, etc.
Next you'll have to link the variable table that you want to use which corresponds to what data you are using as your initial/lateral boundary conditions. For example, if I'm running with NARR data I'll want to use the Vtable.NARR found in /WPS/ungrib/Variable_Tables/.
   ln ./ungrib/Variable_Tables/Vtable.??? Vtable (where ??? is the one you want-- e.g., ln ./ungrib/Variable_Tables/Vtable.NARR Vtable)
     Run ungrib.exe on the command line if you see GRIBFILE.AAA, GRIBFILE.AAB, etc. in your WPS directory that should have been generated from the linking done in the previous step and after you have linked your Vtable.
   ./ungrib.exe
Ungrib will have worked when you see the following message printed out: "! Successful completion of ungrib. !" along with files with names like "FILE:YYYY-MM-DD_HH" corresponding to each IC/BC file.
Next you are ready for metgrid. Run metgrid.exe on the command line with
   ./metgrid.exe
   Don't worry about any printed out NetCDF errors- those aren't detrimental when it can't find certain attributes that it's looking for.
   After metgrid has completed, you should see the message "! Successful completion of metgrid. !" and met_em.d0#.YYYY-MM-DD_HH:MM:SS.nc, where # = domain number, and the rest corresponds to the date of the file used. Once you have those met_em. files in your WPS directory, copy or move them over to your WRFV3/run/ directory.
   cp met_em.d0*.nc ../WRFV3/run/
You have completed all of the WPS steps!
     Now change into the WRFV3/run directory where you should see all of your WRF executables (real.exe, wrf.exe). You'll need to edit your namelist.input file with all of the correct dates and domain information, i.e. the same that was used in your namelist.wps file. An example namelist.input is found at /D0/sbuwrf/REALTIME/WRFv3.7.1/WRFV3/run/namelist.input.
     The WRF steps require a different approach than what we used on the "old chaos" as we'll discuss in the next step.

Step 3. Create a job script for real.exe.
This is the part that is very different than the "old" chaos. Remember how we had to specify a node when we used mpirun and we had to keep tabs on who is using what nodes so we didn't accidentally run on top of each other? I hope you won't miss that because now we are going to use a batch submission system where you can just tell it how many nodes and processors to use and how long or short your run should be and the system will allocate your job in a fair and automatic way. It's like when a soccer team goes out for ice cream after a game-- everyone can rush the counter at once, but instead of no one getting ice cream because the staff is overloaded, there is a system that will instill order and allocate the job to available resources so everyone gets their cone in a timely fashion. Now I want ice cream.
   The job submission script--> things to note:
   - It's just a bash script so it must start with the whole shebang thing -- #!/bin/bash
   - #PBS followed by some options will set required PBS variables (PBS = portable batch system)
   - ##PBS will do nothing, as 2 hashtags/pound signs/octothorps will indicate a comment
- The whole point is that this is where you set all the variables, specify where you want the output to go, and what commands you want to run on the nodes.
- Note: We use Torque as our PBS workload management software (Chapter 7 of the User's Guide)
   Since you understand the components of a job script, let's just go through an example of one. Say I want to run real.exe from my working directory where it is located, on one node using 64 processors. Within that directory where real.exe, my edited namelist.input, and all of my met_em files, I will create a job script called real.pbs:

   #!/bin/bash
   #PBS -l nodes=1:ppn=64
   #PBS -N run_real
   #PBS -q shortq

   echo Working directory is $PBS_O_WORKDIR
   cd $PBS_O_WORKDIR

  /cm/shared/apps/intel/compilers_and_libraries/2016.3.210/mpi/intel64/bin/mpirun $PBS_O_WORKDIR/real.exe

So let's break down the few things I specified in the job submission script.
#PBS -l nodes=1:ppn=64 --> -l (as in llama) tells the script I want to run on one node while using 64 processors.
   #PBS -N run_real --> -N allows me to name the process and will be the prefix for the error and output files that are created when the job is submitted.
   # PBS -q shortq --> tells which queue to submit the job to. Our cluster has two queues-- shortq and longq. If you have a quick application (like real.exe), then feel free to submit to the shortq which has less "walltime" or available processing time. Don't submit a longer job, like wrf.exe for a 30-hour simulation to shortq... you'll make many enemies. Save that one for longq, please and thank you!
echo Working directory is $PBS_O_WORKDIR --> this is completely optional, but when you submit a job, the default setting is to write output to your home/ directory. We all know the home directories are not where you want to have output written to (not enough storage) so luckily there's a PBS variable ($PBS_O_WORKDIR) which is automatically set to the directory where you launched the job (i.e., your working directory). Assuming this is where you want the output written to, then you're all set so all you have to do is include a line to change into that directory, cd $PBS_O_WORKDIR.
   /cm/shared/apps/intel/compilers_and_libraries/2016.3.210/mpi/intel64/bin/mpirun $PBS_O_WORKDIR/real.exe --> this is the whole point of the script, right? We want to run real.exe using mpirun. On the old chaos system, you had to specify "mpirun -np 64, etc" but now those are set within the #PBS bits at the top so you don't have to so it's just of the format "full_path_to/mpirun command". Make sure that you are referencing the correct mpirun path. See Step 5 for troubleshooting.

Step 4. Check that the nodes are up and running.
To list out the status/"state" of each node plus a ton of other information about them issue the following command:
   pbsnodes -a
Nodes "states" are "free" and their "power_state"'s are "Running"? Cool beans! We have a working cluster at your disposal.
     If you are curious about what other processes are going on currently on the nodes, the following command will list out all ongoing jobs.
   qstat -a
If you are super-duper curious about the details of those jobs, then you can use the following command, you snoop.
      qstat -f
     Relax if nothing happens when you type those commands, it just means that there're no jobs being run at the moment.

Step 5: Submit a job with your job script for real.exe.
   Using Torque, once you have a job script ready to go (e.g. real.pbs), the command to submit it to the batch processing system is as follows:
   qsub real.pbs (or qsub your_script.whatever)
The job ID should be output to terminal and will be ##.chaos.cm.cluster.
To check that it's running (In the status ("S") column --> R = running, E = exiting, C = cancelled/completed), look for your job ID in the "Job ID" column after issuing the following command:
   qstat -a
   If something happened and you need to delete/cancel the job then type:
   qdel job_id (e.g. qdel 42.chaos.cm.cluster)
If you encountered issues (process took too long/never stopped), and you specified more than 1 processor to be used within a node (eg. 64) but only got one rsl.error.0000/rsl.out.0000 instead of 64 of those suckers, then the correct mpi module wasn't loaded/was overridden by another mpi. A "which mpirun" should match the path to mpirun in your real.pbs script. For example, if your "which mpirun" path went to "/cm/shared/apps/openmpi/intel/64/1.8.5/bin/mpirun" then you should be good. This mpi also has to match the mpi that was default during the compilation of your WRF build. Relax if you don't remember! There's really only two it could be so we'll see which works. If openmpi is loaded, then you may need to reload the correct module. Remove the openmpi module first
      module rm /openmpi/intel/64/1.8.5
   and then check that your "which mpirun" is still as above and edit the path in your real.pbs to this mpirun. You should be good!
   Submit your job again (e.g., qsub real.pbs). This time you should see as many rsl.error/out.00## files as processors you requested and hopefully your job will complete.
     Once your job is completed, there should be your output in your working directory (wrfbdy_d01, wrfinput_d0# files) along with error and output files named job_name.e## and job_name.o##, where ## corresponds to your job ID number (e.g. run_real.e42 and run_real.o42). Do a quick more on those (especially the error one) when trouble-shooting if something went wrong. If those two files aren't there, do a qstat -a and see if your job is still running. Sometimes things get hung up and then it'll be qdel to the rescue!

Step 6: Create and submit a job script for wrf.exe.
After real.exe is completed and the wrfbdy_d01, wrfinput_d0# files were created, create a job script for wrf.exe. It is incredibly similar to the job script for real.exe, except that you'll want to submit your job to the longq not the shortq. Here's an example wrf.pbs script:

#!/bin/bash
   #PBS -l nodes=1:ppn=64
   #PBS -N run_wrf
   #PBS -q longq

#echo Working directory is $PBS_O_WORKDIR
   cd $PBS_O_WORKDIR

    /cm/shared/apps/intel/compilers_and_libraries/2016.3.210/mpi/intel64/bin/mpirun $PBS_O_WORKDIR/wrf.exe

Once the job script is created and you are all set to go, submit the job to run the wrf.
   qsub wrf.pbs
   And you can check that it is running by using the qstat command and wait and see if wrfout files are being generated in your run directory.
   qstat -a
   And now you wait. For reference, a 6-hour simulation with 12, 4, 1.33-km domains took 1 hour and 2 minutes to complete using 64 processors on one node. Not too shabby! If you run into any issues and the simulation doesn't complete then qdel job_id will save you. Once your run has completed there should be error and output files available in your $PBS_O_WORKDIR (e.g. run_wrf.e44, run_wrf.o44). A quick more on those should alert you if there were any issues. Otherwise, good job running the WRF and have fun analyzing your simulation!

Post-processing WRF Data

RIP - Read/Interpolate/Plot
This is the software that allows for quick-and-easy plotting of meteorological fields. The SBU-WRF has used RIP for its graphics mainly because of speed. Written in Fortran, it's an incredibly efficient package. Here follows the steps for compiling RIP on chaos that borrow from the steps outlined in the ARW Online Tutorial.

Step 1. Download RIP4 and set up your environment.
   The source code can be downloaded here. And should be unpacked within a WRF directory alongside WPS and WRFV3 (e.g., /D0/sbuwrf/WRFv3.7.1/RIP4).
To unpack the RIP4.tar.gz file issue the following commands:
   gunzip RIP4.tar.gz
   tar xf RIP4.tar
   Next we want to make sure your environment is set up correctly and add a few more environmental variables.
     The following modules are necessary and can be added as they are listed below to your .bashrc file in your home directory so you don't have to load them every time you log on.
   module load shared
   module load gcc/5.1.0
   module load intel/compiler/64/16.0.3/2016.3.210
   module load intel/mkl/64/11.1/2013_sp1.3.174
module load intel/mpi/64/5.1.3/2016.3.210
   module load torque/5.1.0
Additionally, you will have to source ifortvars.sh so that executables can see the shared Intel Fortran library.
   source /cm/shared/apps/intel/compilers_and_libraries/2016.3.210/bin/ifortvars.sh intel64
Finally, make sure you have the path to NETCDF correct and that LD_LIBRARY_PATH contains a required directory. Again, these can be added to your .bashrc file in your home directory (e.g. /home/sbuwrf/.bashrc).
   export NETCDF=/opt/intel-soft/netcdf3
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel-soft/lib
You'll still have source the ifortvars.sh upon logging on, though, but not everything else mentioned if you added the rest to your .bashrc file.
  An example .bashrc file with these changes is found at /D0/sbuwrf/REALTIME/WRFv3.7.1/test_run/example_bashrc_file.txt. Again, your actual .bashrc will be located in your own directory within home/, e.g. /home/sbuwrf/.bashrc.
   The additional environmental variables that will need to be set up and can be added to your .bashrc file are as follows:
   export RIP_ROOT=/D0/sbuwrf/WRFv3.7.1/RIP4
Next, we have to set the $CC environmental variable from a GNU version to an Intel version with a simple command:
   export CC=icc
Next, check that your $NCARG_ROOT is set as /opt/ncl with the following command
   echo $NCARG_ROOT
In order for RIP to compile, temporarily (i.e., don't edit this within your .bashrc file) set a new $NCARG_ROOT with the following command:
   export NCARG_ROOT=/opt/cloud.opt/opt/ncl-intel
   and the rest we'll have to set after we issue the configure command.

Step 2. Configure and compile RIP.
Now we are ready to move on to configuring and compiling RIP! To configure, type
   ./configure
   and choose "4" for the Intel compiler.
Next we have to edit a few things in our configure.rip file in order for RIP to compile correctly for our environment.
   The line for "LOCAL_LIBS" has to be changed from what was there to the following:
   LOCAL_LIBS = -L/usr/lib64 -lX11 -lgfortran -lcairo
where the "-lX11 -lgfortran -lcairo" are dash lower-case l as in "llama" to link to the /usr/lib64 directory that has the main library link ("-L").
     Now we are ready to compile which is done by typing the following:
   ./compile >& compile.log &
   The "&" at the end allows it to compile in the background so you can issue "tail -f compile.log" to see its progress within the same terminal.
   If RIP successfully compiled, you should see the following among all of the executables: rip.exe, ripdp_wrfarw.exe. These are the two main ones that we use. Congrats on compiling RIP!

Step 3. Prepare the WRF data for RIP with RIPDP.
   Unlike other software, like NCL, RIP cannot work with raw WRF output files. The WRF data has to undergo data preparation, "DP", and therefore be run through RIPDP.
If you have all of your wrfout files in a directory, then within the $RIP_ROOT directory (e.g., /D0/sbuwrf/REALTIME/WRFv3.7.1/RIP4/) then issue the following command to create RIP-compatible files that will process all of the wrfout variables:
./ripdp_wrfarw /path_put/RIPDP_files/wrfout all /path_of_wrfout/wrfout_d0#*
Where the above command is broken down as follows:
  - ./ripdp_wrfarw is the command to issue on your wrfarw output data (you must be in your RIP4 directory to issue this command).
- /path_put/RIPDP_files/wrfout is the path where you want the RIPDP files to go which must already be an existing directory and the files will be named with the prefix "wrfout" (e.g. /D0/sbuwrf/REALTIME/ripdp/wrfout)
- all --> You can choose either "all" or "basic" to tell how many extra RIPDP files you want. Each variable processed is its own file, so they add up so if space is an issue then you might be okay with specifying "basic".
- /path_of_wrfout/wrfout_d0#* is the path to the wrfout files you want to process. I'm pretty sure ripdp_wrfarw can only work on one domain at a time, so you'll have to iterate through that command for whatever domain you want to. (e.g., /D0/sbuwrf/REALTIME/NAM/2016051912/wrfout_d03*)
Once that step has completed, there should be files with the prefix "wrfout" and a bunch of variable names as well as wrfout.minfo and wrfout.xtimes in the ripdp output directory.

Step 4. Create a RIP namelist.
   Namelists are the heart and soul of RIP. They are the text files that let RIP now what to plot and how. They follow a very specific format which in the interest of time I will not get into here. Detail can be found on the ARW Online Tutorial and within the RIP User's Guide (the latter is the RIP bible that you should definitely bookmark!).
Example namelists are found at /D0/sbuwrf/REALTIME/infiles/ with the ones with the "operational_d0#" names are actually a ton of namelists all in one single file (for advanced users only).

Step 5. Run RIP.
Now you are ready to run RIP. Within your $RIP_ROOT directory (where rip.exe lives) enter:
     rip -f /path_put/RIPDP_files/wrfout example_rip.in

Step 6. Post-process RIP output.
   RIP outputs .cgm files which are interactive but a little awkward to deal with. To view these, use the command ‘idt’.
idt example_rip.cgm
You can have it output pdf or ps images directly by setting ncarg_type = pdf or ps in the top portion of your namelist. If you leave the output as .cgm (NCAR Command Graphics Metafile) then you can convert it with the ncl command ctrans.

Compiling WRF and WPS

Running WPS and WRF Manually

Post-processing WRF Data

Notes updated 16 November 2016 by Sara Ganetis (sara"dot"ganetis"at"stonybrook.edu).