RELEASE NOTES FOR SLURM VERSION 2.3
28 July 2011


IMPORTANT NOTE:
If using the slurmdbd (SLURM DataBase Daemon) you must update this first.
The 2.3 slurmdbd will work with SLURM daemons of version 2.1.3 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.  No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do.  Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line

innodb_buffer_pool_size=64M

under the [mysqld] reference in the my.cnf file and restarting the mysqld.
This is needed when converting large tables over to the new database schema.

SLURM can be upgraded from version 2.2 to version 2.3 without loss of jobs or
other state information.


HIGHLIGHTS
==========
* Support has been added for Cray XT and XE computers
* Support has been added for BlueGene/Q computers.
* For architectures where the slurmd daemon executes on front end nodes (Cray
  and BlueGene systems) more than one slurmd daemon may be executed using more
  than one front end node for improved fault-tolerance and performance.
  NOTE: The slurmctld daemon will report the lack of a front_end_state file
  as an error when first started in this configuration.
* The ability to expand running jobs was added
* The ability to control how many leaf switches a job is allocated and the
  maximum delay to get that leaf switch count can be controlled.

CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
=============================================================
* In order to support more than one front end node, new parameters have been
  added to support a new data structure: FrontendName, FrontendAddr, Port,
  State and Reason.
* Added DebugFlags option of Frontend
* Added new configuration parameter MaxJobId. Use with FirstJobId to limit
  range of job ID values.
* Added new configuration parameter MaxStepCount to limit the effect of
  bad batch scripts. The default value is 40,000 steps per job.
* Changed node configuration parameter from "Procs" to "CPUs". Both parameters
  will be supported for now.
* Added GraceTime to Partition and QOS data structures. Preempted jobs will be
  given this time interval before termination.
* Added AccountingStoreJobComment to control storing job's comment field in
  the accounting database.
* More than one TaskPlugin can be configured in a comma separated list.
* DefMemPerCPU, DefMemPerNode, MaxMemPerCPU and MaxMemPerNode configuration
  options added on a per-partition basis.
* SchedulerParameters can now control the maximum delay that a job can set in
  order to be allocated some desired leaf switch count by specifying a value
  for max_switch_wait.

COMMAND CHANGES (see man pages for details)
===========================================
* Added scontrol ability to get and set front end node state.
* Added scontrol ability to set slurmctld's DebugFlags.
* Added scontrol ability to increment or decrement a job or step time limit.
* Added new scontrol option of "show aliases" to report every NodeName that is
  associated with a given NodeHostName when running multiple slurmd daemons
  per compute node (typically used for testing purposes).
* Added new squeue optioni of -R/--reservation option as a job filter.
* A reservation flag of "License_Only" has been added for use by the sview and
  scontrol commands. If set, then jobs using the reservation may use the
  licenses associated with it plus any compute nodes. Otherwise the job is
  limited to the compute nodes associated with the reservation.
* The dependency option of "expand" has been added. This option identifies a
  job whose resource allocation is intended to be used to expand the allocation
  of another job. See http://www.schedmd.com/slurmdocs//faq.html#job_size
  for a description of it's use.
* Added --switches option to salloc, sbatch and srun commands to control the
  desired number of switches allocated to a job and the maximum delay before
  starting the job with more leaf switches.
* Added scontrol ability to modify a job's desired switch count or delay.

BLUEGENE SPECIFIC CHANGES
=========================
* Bluegene/Q support added.
* The select/bluegene plugin has been substantially re-written.

OTHER CHANGES
=============
* Improved accuracy of estimated job start time for pending jobs. This should
  substantially improve scheduling of jobs elibable to execute on more than one
  cluster.
* Job dependency information will only show the currently active dependencies
  rather than the original dependencies.
* Added a reservation flag of "License_Only". If set, then jobs using the
  reservation may use the licenses associated with it plus any compute nodes.
* Added proctrack/cgroup and task/cgroup plugins to support Linux cgroups.

API CHANGES
===========

Changed members of the following structs
========================================
block_info_t
	Added	     job_list
	Added        used_mp_inx
	Added        used_mp_str
	bp_inx    -> mp_inx
	conn_type -> conn_type(DIMENSIONS]
	ionodes   -> ionode_str
	nodes     -> mp_str
	node_cnt  -> cnode_cnt

job_desc_msg_t
	conn_type -> conn_type(DIMENSIONS]

job_step_info_t
	Added	    select_jobinfo

partition_info_t
	Added       def_mem_per_cpu and max_mem_per_cpu

Added the following struct definitions
======================================
block_job_info_t		entirely new structure

front_end_info_msg_t		entirely new structure

front_end_info_t		entirely new structure
	
job_info_t
	batch_host		name of the host running the batch script
	batch_script		contents of batch script
	preempt_time		time that a job become preempted
	req_switches		maximum number of leaf switches
	wait4switches		maximum delay to get desired leaf switch count

job_step_create_response_msg_t
	select_jobinfo		data needed from the select plugin for a step

job_step_info_t
	select_jobinfo		data needed from the select plugin for a step

node_info_t
	node_addr		communication name (optional)
	node_hostname		node's hostname (optional)

partition_info_t
	grace_time		preempted job's grace time in seconds

slurm_ctl_conf
	acctng_store_job_comment  if set, store job's comment field in
				accounting database
	max_job_id		maximum supported job id before starting over
				with first_job_id
	max_step_count		maximum number of job steps permitted per job

slurm_step_layout
	front_end		name of front end host running the step

slurmdb_qos_rec_t
	grace_time		preempted job's grace time in seconds

update_front_end_msg_t		entirely new structure


Changed the following enums and #defines
========================================
job_state_reason
	FAIL_BANK_ACCOUNT -> FAIL_ACCOUNT
	FAIL_QOS        	/* invalid QOS */
	WAIT_QOS_THRES        	/* required QOS threshold has been breached */

select_jobdata_type (Size of many data structures increased)
	SELECT_JOBDATA_BLOCK_NODE_CNT /* data-> uint32_t block_cnode_cnt */
	SELECT_JOBDATA_BLOCK_PTR /* data-> bg_record_t *bg_record */
	SELECT_JOBDATA_DIM_CNT   /* data-> uint16_t dim_cnt */
	SELECT_JOBDATA_NODE_CNT  /* data-> uint32_t cnode_cnt */
	SELECT_JOBDATA_PAGG_ID   /* data-> uint64_t job container ID */
	SELECT_JOBDATA_PTR	 /* data-> select_jobinfo_t *jobinfo */
	SELECT_JOBDATA_START_LOC /* data-> uint16_t
				  * start_loc[SYSTEM_DIMENSIONS] */
select_jobdata_type (Added)
	SELECT_PRINT_START_LOC   /* Print just the start location */
select_jobdata_type (Names changed)
	SELECT_GET_BP_CPU_CNT --> SELECT_GET_MP_CPU_CNT
	SELECT_SET_BP_CNT ------>SELECT_SET_MP_CNT

select_nodedata_type
	SELECT_NODEDATA_PTR     /* data-> select_nodeinfo_t *nodeinfo */

select_print_mode
	SELECT_PRINT_START_LOC	/* Print just the start location */

select_type_plugin_info no longer exists. It's contents are now mostly #defines

DEBUG_FLAG_FRONT_END		added DebugFlags of Frontend

JOB_PREEMPTED			added new job termination state to indicated
				job termination was due to preemption

RESERVE_FLAG_LIC_ONLY		reserve licenses only, use any nodes

TRIGGER_RES_TYPE_FRONT_END	added trigger for frontend state changes


Added the following API's
=========================
slurm_free_front_end_info_msg	free front end state information
slurm_init_update_front_end_msg	initialize data structure for front end update
slurm_load_front_end		load front end state information
slurm_print_front_end_info_msg	print all front end state information
slurm_print_front_end_table	print state information for one front end node
slurm_set_debugflags		set new DebugFlags in slurmctld daemon
slurm_sprint_front_end_table	output state information for one front end node
slurm_update_front_end		update state of front end node


Changed the following API's
===========================
