IBM z/OS MVS Spooling : a Brief Introduction

IBM z/OS MVS Spooling : a Brief Introduction —

Spooling means a holding area on disk is used for input jobs waiting to run and output waiting to print.

This is a brief introduction to IBM z/OS MVS mainframe spooling.

The holding area is called spool space.  The imagery of spooling was probably taken from processes that wind some material such as thread, string, fabric or gift wrapping paper onto a spindle or spool.  In effect, spooling is a near-synonym for queueing.  Those paper towels on that roll in the kitchen are queued up waiting for you to use each one in turn, just like a report waiting to be printed or a set of JCL statements waiting for its turn to run.

On very early mainframe computers, the system read input jobs from a punched card reader, one line at a time, one line from each punched card.  It wrote printed output to a line printer, one line at a time.  Compared to disks – even the slower disks used decades ago – the card readers and line printers were super slow.  Bottlenecks, it might be said.  The system paused its other processing and waited while the next card was read, or while the next line was printed.  So that methodology was pretty well doomed.  It was okay as a first pass at getting a system to run — way better than an abacus — but that mega bottleneck had to go.  Hence came spooling.

HASP, Houston Automatic Spooling Priority program (system, subsystem) was early spooling software that was used with OS/360 (the ancestral precursor of z/OS MVS).  (See HASP origin story, if interested.)  HASP was the basis of development for JES2, which today is the most widely used spooling subsystem for z/OS MVS systems.  Another fairly widely used current spooling sub-system is JES3, based on an alternate early system called ASP.  We will focus on JES2 in this article because it is more widely used.   JES stands for Job Entry Subsystem.  In fact JES subsystems oversee both job entry (input) and processing of sysout (SYStem OUTput). 

Besides simply queueing the input and output, the spooling subsystem schedules it.  The details of the scheduling form the main point of interest for most of us. Preliminary to that, we might want to know a little about the basic pieces involved.

The Basic pieces

There are input classes, also called job classes, that control scheduling and resource limits

There are output classes, also called sysout classes, that control output print

There are real physical devices (few card readers, but many variations of printers and vaguely printer-like devices)

There are virtual devices. One virtual device is the “internal reader” used for software-submitted jobs, such as those sent in using the TSO submit command or FTP.  Virtual output devices include “external writers”.  An external writer is a program that reads and processes sysout files, and such a program can route the output to any available destination.  Many sysout files are never really printed, but are viewed (and further processed) directly from the spool space under TSO using a software product like SDSF.

There is spool sharing.  A JES2 spool space on disk (shared disk, called shared DASD) can be shared between two or more z/OS MVS systems with JES2 (with a current limit of 32 systems connected this way).  Each such system has a copy of JES2 running. Together they form a multi-access spool configuration (MAS).  Each JES2 subsystem sharing the same spool space can start jobs  from the waiting input queues on the shared spool, and can also select and process output from the shared spool.

There is checkpointing. This is obviously especially necessary when spool sharing is in use.

There is routing.  Again, useful with spool sharing, to enable you to route your job to run on a particular system, but also useful just to route your job’s output print files to print on a particular printer.

There are separate JES2 operator commands that the system operator can use to control the spooling subsystem, for example to change what classes of sysout can be sent to a specific printer, or what job classes are to be processed.  (These are the operator commands that start with a dollar sign $, or some alternative currency symbol depending on where your system is located.)

There is a set of very JCL-like control statements you can use to specify your requirements to the spooling subsystem.  (Sometimes called JECL, for Job Entry Control Language, as distinct from plain JCL, Job Control Language.)  For JES3, these statements begin with //* just like an ordinary JCL comment, so a job that has been running on a JES3 system can be copied to a system without JES3 and the JES3-specific JECL statements will simply be ignored as comments.  For JES2, on which we will focus here, the statements generally begin with /* in columns 1 and 2.  Common examples you may have seen are /*ROUTE and /*OUTPUT but notice that the newer OUTPUT statement in JCL is an upgrade from /*OUTPUT and the new OUTPUT statement offers more (and, well, newer) options.  Though the OUTPUT statement is newish, it is over a decade old, so you probably do have it on your system.

There are actual JCL parameters and statements that interact with JES2, such as the OUTPUT parameter on the DD statement, and the just-mentioned OUTPUT statement itself, which is pointed to by the parameter on the DD. 

Another example is the CLASS parameter on the JOB statement, which is used to designate the job class for job scheduling and execution.  The meanings of the individual job classes are totally made up for each site.  Some small development company might have just one job class for everything.  Big companies typically create complicated sets of job classes, each class defined with its own limits for resources such as execution time, region size, even the time of day when the jobs in each class are allowed to run.  Your site can define how many jobs of the same class are allowed to run concurrently, and the scheduling selection priority of each class relative to each other class.  Sometimes sites will set up informal rules which are not enforced by the software, but by local working rules, so that everyone there is presumed to know that they are only allowed to specify, for example, CLASS=E for emergency jobs.   (That’s one I happened to see someplace.)  If you want to know what job CLASS to specify for your various work, your best bet is to ask your co-workers, the people who are responsible for setting up the job classes, or some other knowledgeable source at your company.  Remember you can be held accountable for following rules you know nothing about that are not enforced by any software configuration, so don’t try to figure it out on your own, ask colleagues and other appropriate individuals what is permissible and expected.  Not joking.  JES2 Init & Tuning (Guide and reference) are the books that define how JES2 job classes have been configured, if you’re just curious to get a general idea of what the parameters are. The JES2 proc in proclib usually contains a HASPPARM DD statement pointing to where to find the JES2 configuration parameters on any particular system.  

In some cases similar considerations can apply for the use of SYSOUT print classes and the routing of such output to go to particular printers or to be printed at particular times.  The SYSOUT classes, like JOB classes, are entirely arbitrary and chosen by the responsible personnel at each site.  

MSGCLASS on the JOB statement controls where the job log goes — the JCL and messages portion of your listing.  The values you can specify for MSGCLASS are exactly the same as those for SYSOUT (whatever way that may be set up at your site).  If you want all your SYSOUT to go to the same place, along with your JCL and messages, specify that class as the value for MSGCLASS= on your job statement, and then specify SYSOUT=* on all of the DD statements for printed output files.  (That is, specify an asterisk as the value for SYSOUT= on the DD statements.)  

In many places, SYSOUT class A indicates real physical printed output on any printer, class X indicates routing to a held queue where it can be viewed from SDSF, and class Z specifies that the output immediately vanishes (Yup, that's an option).  However, there is no way to know for sure the details of how classes are set up at your particular site unless you ask about it.

Sometimes places maintain “secret” classes for specifying higher priority print jobs, or jobs that go to particular special reserved printers, and the secrets don’t stay secret of course.  Just because you see someone else using some print class, don’t assume it means it’s okay for you to use it for any particular job.  Ask around about the local rules and expectations.

So, for MSGCLASS (aka SYSOUT classes), as for JOB classes, the best thing is to ask whoever sets up the classes at your site; or, if that isn't practical, ask people working in the same area as you are, or just whoever you think is probably knowledgeable about the local setup.  Classes are set up by your site, for your site. 

An example of a JES2-related JCL statement that you have probably not yet seen is introduced with z/OS 2.2 — the JOBGROUP  statement, and an entire set of associated statements (ENDGROUP, SCHEDULE, BEFORE, AFTER, CONCURRENT), there are about ten of them – but that would be a topic for a follow-on post.  You probably don’t have z/OS 2.2 yet anyway, but it can be fun to know what’s coming.  JOBGROUP is coming.

That’s probably enough for an overview basic introduction.

The idea for this post came from a suggestion by Ian Watson.

 

References and Further Reading

z/OS concepts: JES2 compared to JES3
https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.zconcepts/zconc_jes2vsjes3.htm

 
z/OS JES2 Initialization and Tuning Guide, SA32-0991-00
How to initialize JES2 in a multi-access SPOOL configuration
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.hasa300/himas.htm
 
z/OS MVS JCL Reference (z/OS 2.2)
JES2 Execution Control Statements (This is where you can see the new JOBGROUP)
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieab600/jes2zone.htm
 
z/OS MVS JCL Reference, SA23-1385-00  (z/OS 2.1)
JES2 control statements
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieab600/j2st.htm
 
OUTPUT JCL statement
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieab600/outst.htm

 
z/OS JES2 Initialization and Tuning Reference, SA32-0992-00
Parameter description for JOBCLASS(class…|STC|TSU)
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.hasa400/has2u600106.htm

 
z/OS JES2 Initialization and Tuning Guide, SA32-0991-00
Defining the data set for JES2 initialization parameters
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.hasa300/defiset.htm

IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling IBM z/OS MVS spooling

CCHHR and EAV

CCHHR and EAV

Addresses on disk.

An address on disk is an odd sort of thing.

Not a number like a memory pointer.

More like a set of co-ordinates in three dimensional space.

Ordinary computer memory is mapped out in the simplest way, starting at zero, with each additional memory location having an address that is the plus-one of the location just before it, until the allowable maximum address is reached; The allowable maximum pointer address is limited by the number of digits available to express the address. 

Disks — external storage devices — are addressed differently.

Since a computer system can have multiple disks accessible, each disk unit has its own unit address relative to the system.  Each unit address is required to be unique.  This is sort of like disks attached to a PC being assigned unique letters like C, D, E, F, and so on; except the mainframe can have a lot more disks attached, and it uses multi-character addresses expressed as hex numbers rather than using letters of the alphabet.  That hex number is called the unit address of the disk.

Addresses on the disk volume itself are mapped in three-dimensional space.  The position of each record on any disk is identified by Cylinder, Head, and Record number, similar to X, Y, and Z co-ordinates, except that they're called CC, HH, and R instead of X, Y, and Z. A track on disk is a circle.  A cylinder is a set of 15 tracks that are positioned as if stacked on top of each other.  You can see how 15 circles stacked up would form a cylinder, right?  Hence the name cylinder. 

Head, in this context, equates to Track.  The physical mechanism that reads and writes data is called a read/write head, and there are 15 read/write heads for each disk, one head for each possible track within a cylinder.  All fifteen heads move together, rather like the tines of a 15-pronged fork being moved back and forth. To access tracks in a different cylinder, the heads move in or out to position to that other cylinder.  So just 15 read/write heads can read and write data on all the cylinders just by moving back and forth.  

That's the model, anyway.  And that's how the original disks were actually constructed.  Now the hardware implementation varies, and any given disk might not look at all like the model.  A disk today could be a bunch of PC flash drives rigged up to emulate the model of a traditional disk.  But Regardless of what any actual disk might look like physically now,  the original disk model was the basis of the design for the method of addressing data records on disk.  In the model, a disk is composed of a large number of concentric cylinders, with each cylinder being composed of 15 individual tracks, and each track containing some number of records. 

Record here means physical record, what we normally call a block of data (as in block size).  A physical record — a block — is usually composed of multiple logical records (logical records are what we normally think of as records conceptually and in everyday speech).  But a logical record is not a real physical thing, it is just an imaginary construct implemented in software.  If you have a physical record — a block — of 800 bytes of data, your program can treat that as if it consists of ten 80-byte records, but you can just as easily treat it as five 160-byte records if you prefer, or one 800-byte record; the logical record has no real physical existence.  All reading and writing is done with blocks of data, aka physical records.  The position of any given block of data is identified by its CCHHR, that is, its cylinder, head, and record number (where head means track, and record means physical record).  

The smallest size a data set can be is one track.  A track is never shared between multiple data sets.

The CCHHR represents 5 bytes, not 5 hex digits.  You have two bytes (a halfword) for the cylinder number and two bytes for the head (track) number.  

A "word", on the IBM mainframe, is 4 bytes, or 4 character positions.  Each byte has 8 bits, in terms of zeroes and ones, but it is usually viewed in terms of hexadecimal; In hexadecimal a byte is expressed as two hex digits.  A halfword is obviously 2 bytes, which is the size of a "small integer".  (4 bytes being a long integer, the kind of number most often used; but halfword arithmetic is also very commonly used, and runs a little faster.)  A two-byte small integer can express a number up to 32767 if signed or 65535 if unsigned.  CC and HH are both halfwords.  

Interestingly, a halfword is also used for BLKSIZE (this is a digression), but the largest block size for an IBM data set traditionally is 32760, not 32767, simply because the MVS operating system, like MVT before it, was written using 32760 as the maximum BLKSIZE.  Lately there are cases where values up to 65535 are allowed, using LBI (large block interface) and what-not, but mostly the limit is still 32760.  But watch the space; 65535 is on its way in; obviously the number need not allow for negative values, that is, it need not be signed.  End of digression on BLKSIZE.

There can be any number of concentric cylinders, but using the traditional CCHHR method you can only address a number that can be represented in a two-byte hex unsigned integer.  That would be 65,535, but in fact the highest cylinder address (now) on ordinary IBM disks is 65,520.  That is the CC-coordinate, the basis of the CC in CCHHR.

But wait, you say, you've got an entire 2 bytes — a halfword integer — to express the track number within the cylinder, yet there are always 15 tracks in a cylinder; one byte would be enough.  In fact, even  half a byte could be used to count to fifteen, which is hex F.  Right.  You got it.  What do we guess must eventually happen here? 

People want bigger disks so they can have bigger data sets, and more of them.  Big data.  You know how many customers the Bank of China has these days?  No, I don't either, but it's a lot, and that means they need big data sets.  And they aren't the only ones who want that.  I really don't want to think about guessing how much data the FBI must store.  What we do know is that there is a big – and growing – demand for gigantic data sets.

So inevitably the unused extra byte in HH  must be poached and turned into an adjunct C.  Thus is born the addressing scheme for the extended area on EAV disks (EAV = Extended Address Volumes).  So, three bytes for C, one byte for H ?  Well, no, IBM decided to leave only HALF of a byte — four bits — for H.  (As you noticed earlier, one hex digit — half of a byte — is enough to count to fifteen, which is hex F.) So IBM took 12 bits away from HH for extending the cylinder number.   Big data.  Big.

And you yourself would not care overly about EAV, frankly, except that (a) you (probably) need to change your JCL to use it, and (b) there are restrictions on it, plus (c) those restrictions keep changing, and besides that (d) people are saying your company intends converting entirely to EAV disks eventually.

Okay, so what is this EAV thing, and what do you do about it ?

EAV means Extended Address Volume, which means bigger disks than were previously possible, with more cylinders.  The first part of an EAV disk is laid out just like any ordinary disk, using the traditional CCHHR addressing.  So that can be used with no change to your programs or JCL.

In the extended area, cylinders above 65,520,  The CCHH is no longer CCHH. 

The first two bytes (sixteen bits) contain the lower part of the cylinder number, which can go as high as 65535.  The next twelve bits — one and a half bytes taken from what was previously part of HH — contain the rest of the cylinder number, so to read the whole thing as a number you would have to take those twelve bits and put them to the left of the first two bytes. The remaining four bits — the remaining half of a byte out of what was once HH — contains the track number within the cylinder, which can go as high as fifteen.

Says IBM (in z/OS DFSMS Using Data Sets):

     A track address is a 32-bit number that identifies each track
     within a volume. The address is in the format hexadecimal CCCCcccH.

        CCCC is the low order 16-bits of the cylinder number.

        ccc is the high order 12-bits of the cylinder number.

        H is the four-bit track number.

End of quote from IBM manual.

The portion of the disk that requires the new format of CCHH is called extended addressing space (EAS), and also called cylinder-managed space.  Cylinder-managed space starts at cylinder 65520.

Of course, for any space with an address below cylinder 65535, those extra 12 bits are always zero, so you can view the layout of the CCHH the old way or the new way there, it makes no difference.

Within the extended addressing area, the EAS, the cylinder-managed space, you cannot allocate individual tracks.  Space in that area is always assigned in Cylinders, or rather in chunks of 21 cylinders at a time.  The smallest data set in that area is 21 cylinders.  The 21-cylinder chunk is called the "multicylinder unit".

If you code a SPACE request that is not a multiple of 21 cylinders (for a data set that is to reside in the extended area), the system will automatically round the number up to the next multiple of 21 cylinders.

As of this writing, most types of data sets are allowed within cylinder-managed space, including PDS and PDSE libraries, most VSAM, sequential data sets including DSNLARGE, BDAM, and zFS.  This also depends on the level of your z/OS system, with more data set types being supported in newer releases.

However the VTOC cannot be in the extended area, and neither can system page data sets, HFS files, or VSAM files that have imbed or keyrange specified.  Also VSAM files must have Control Area size (CA) or Minimum Allocation Units (MAU) such as to be compatible with the restriction that space is going to be allocated in chunks of 21 cylinders at a time.  Minor limitations.

Specify EATTR=OPT in your JCL when creating a new data set that can reside in the extended area.   EATTR stands for Extended ATTRibutes.  OPT means optional.  The only other valid value for EATTR is NO, and NO is the default if you don't specify EATTR at all.

The other EAV-related JCL you can specify on a DD statement is either EXTPREF or EXTREQ as part of the DSNTYPE.  When you specify  EXTPREF it means you prefer that the data set go into the extended area; EXTREQ means you require it to go there.

Example

Allocate a new data set in the extended addressing area

//MYJOB  JOB  1,CLASS=A,MSGCLASS=X
//BR14 EXEC PGM=IEFBR14
//DD1 DD DISP=(,CATLG),SPACE=(CYL,(2100,2100)),
//   EATTR=OPT,
//   DSNTYPE=EXTREQ,
//   UNIT=3390,VOL=SER=EAVVOL,
//   DSN=&SYSUID..BIG.DATASET,
//   DCB=(LRECL=X,DSORG=PS,RECFM=VBS)

 

Addendum 1 Feb 2017: BLKSIZE in Cylinder-managed Space

This was mentioned in a previous post on BLKSIZE, but it is relevant to EAV and bears repeating here.  If you are going to take advantage of the extended address area, the EAS, on an EAV disk, you should use system-determined BLKSIZE, that is, either specify no BLKSIZE at all for the data set or specify BLKSIZE=0, signifying that you want the system to figure out the best value of BLKSIZE for the data set.

Why? Because in the cylinder managed area of the disk the system needs an extra 32 bytes for each block, which it uses for control information. Hence the optimal BLKSIZE for your Data Set will be slightly smaller when the data set resides in the extended area.  The 32 byte chunk of control information does not appear within your data.  You do not see it.  But it takes up space on disk, as a 32-byte suffix after each block.

You could end up using twice as much disk space if you choose a poor BLKSIZE, with about half the disk space being wasted.  That is true because a track must contain an integral number of blocks, for example one or two blocks.  If you think you can fit exactly two blocks on each track, but the system grabs 32 bytes for control information for each block, then there will be not quite enough room on the track for a second block.  Hence the rest of the track will be wasted, and this will be repeated for every track, approximately doubling the size of your data set.  

On the other hand, if you just let the system decide what BLKSIZE to use, it generally calculates a number that allows two blocks per track. 

And when you use system-determined BLKSIZE  — when you just leave it to the system to decide the BLKSIZE — you get a bonus; if the system migrates your data set, and the data set happens to land on the lower part of a disk, outside the extended area, then if you have used system-determined BLKSIZE, that is, BLKSIZE=0 or unspecified, the system will automatically recalculate the best BLKSIZE when the Data Set is moved.  If the data set is later moved back into the cylinder-managed EAS area, the BLKSIZE will again be automatically recalculated and the data reblocked.

If in the future IBM releases some new sort of disk with a different track length, and your company acquires a lot of the new disks and adds them to the same disk storage pool you're using now, the same consideration applies: If system-determined BLKSIZE is in effect, the best BLKSIZE will be calculated automatically and the data will be reblocked automatically when the system moves the data set to the different device type.

Yes, it is possible for a data set to reside partly in track-managed space (the lower part of the disk) and partly in cylinder-managed space (the EAS, extended address, high part of the disk), per the IBM document.  

You should generally use system-determined BLKSIZE anyway.  But if you’re using EAV disks, it becomes more important to do so because of the invisible 32-byte suffix the system adds when your data set resides in the extended area.

[End of Addendum on BLKSIZE]

References, further reading

IBM on EAV

z/OS DFSMS Using Data Sets
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idad400/eav.htm

JCL for EAV
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieab600/xddeattr.htm
http://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieab600/iea3b6_Subparameter_definition18.htm

Disk types and sizes
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.idad500/tcd.htm

a SHARE presentation on EAV
https://share.confex.com/share/124/webprogram/Handout/Session17109/SHARE_Seattle_Session%2017109_How%20to%20on%20EAV%20Planning%20and%20Best%20Practices.pdf

EAV reference – IBM manual
z/OS 2.1.0 =>
z/OS DFSMS =>
z/OS DFSMS Using Data Sets =>
All Data Sets => Allocating Space on Direct Access Volumes => Extended Address Volumes
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idad400/eav.htm

IBM manual on storage administration (for systems programmers)
z/OS DFSMSdfp Storage Administration
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idas200/toc.htm

The 32-byte per block overhead in the extended area of EAV disk (IBM manual):
https://www.ibm.com/support/knowledgecenter/SSLTBW_1.13.0/com.ibm.zos.r13.idad400/blksz.htm

z/OS DFSMS Using Data Sets ==>
Non-VSAM Access to Data Sets and UNIX Files ==>
Specifying and Initializing Data Control Blocks ==>
Selecting Data Set Options ==>
Block Size (BLKSIZE)
Extended-format data sets: In an extended-format data set, the system adds a 32-byte suffix to each block, which your program does not see. This suffix does not appear in your buffers. Do not include the length of this suffix in the BLKSIZE or BUFL values.”

DSNTYPE and DATACLAS

DSNTYPE and DATACLAS

DSNTYPE

DSNTYPE is a bit like DSORG.  For example, if you want to create a sequential data set (a flat file), specifying DSNTYPE=BASIC is about the same as saying DSORG=PS.  There are additional new values, which allow your data sets to take advantage of new features.  You specify DSNTYPE=LIBRARY to get a PDSE, as opposed to specifying DSORG=PO to get the older format of partitioned data set.  You can specify DSNTYPE=PDS to get the old format.  On newer releases of z/OS you have
DSNTYPE=(LIBRARY,1) and DSNTYPE=(LIBRARY,2) to designate the original version of the PDSE structure, and a newer version.

Other values for DSNTYPE include DSNTYPE=LARGE, which is a sequential file with benefits.  As the word implies, the LARGE data set can be bigger than an ordinary sequential data set (you may sometimes see it called DSNLARGE).   This means that some software products might not be able to handle the LARGE format – old programs generally need to be modified to allow the larger size.  Of course DSNTYPE=LARGE only lets you have bigger data sets, it doesn’t allow bigger individual records.  For that you’d need LRECL=X (but discussing LRECL now would be a digression).

DSNTYPE=EXTREQ allows you to specify that your data set must reside in the extended area of an Extended Access Volume (EAV).  DSNTYPE=EXTPREF means the data set can reside in the extended area, but it isn’t really a requirement.  So, yeah, EXTREQ means extended area required, and EXTPREF means extended area preferred.  EAV is a newer much larger disk type, divided into the ordinary area that is laid out like any other similar disk, and the extended area that is laid out differently.  As you guess, some software is not able to handle the extended area – some programs need to be modified to handle the modified format of data set labels and in fact the revision of the CCHHR structure (which we will not talk about in this post).  On newer releases of z/OS you have DSNTYPE=(EXTREQ,1) and DSNTYPE=(EXTREQ,2) to designate the original version of the extended area disk label, and a newer version.  Yes of course you also have
DSNTYPE=(EXTPREF,1) and DSNTYPE=(EXTPREF,2) for the same reason.

There are some restrictions on the types of data sets that can reside in the extended area.  This will also depend on the level of your z/OS system, with more data set types being supported in newer releases.

Important note: If you want your data set to go in the extended area, you should also specify EATTR=OPT on the same DD statement with DSNTYPE=EXTREQ or EXTPREF (The last time I checked; It seems sort of a superfluous requirement though, and IBM might eventually change things so that the presence of EXTREQ or EXTPREF implies EATTR.   EATTR means EXtended ATTRibutes.)

Other values of DSNTYPE include HFS and PIPE. Additional values are certain to be introduced over time.

Why didn’t IBM just replace DSORG with DSNTYPE?  Ah come on, when did you ever see IBM replace something and immediately retire the original? 

DATACLAS

Specifying the DATACLAS parameter in JCL is similar to specifying the LIKE parameter except that with the LIKE parameter you point to a data set to serve as a model, whereas with DATACLAS you specify the name of a class, which is a predefined set of parameter values. 

DATACLAS seems to be part of a push by IBM towards making JCL knowledge less of a requirement for ordinary developers and users.  By combining a number of parameters into one thing, DATACLAS, the job of writing up the parameters can be done by a systems programmer, a systems administrator, or anyone who can specialize in that area, thus leaving users of the system with less JCL syntax they need to know.   It also improves efficiency, accuracy, standardization, and even data security for most data set allocations.  And it insulates ordinary users from the need to learn every new JCL parameter or value that IBM needs to introduce.

So, the person responsible for this at your site sets up data classes, assigns them names, and specifies information that might otherwise have needed to be specified in JCL.  Better than the LIKE parameter as a simplification.   All you need to do is find out the name of the data class they set up for the kind of data set you want to create, and then in your JCL you specify DATACLAS=thatname

For example you might specify DATACLAS=BIGDISK if they used the name BIGDISK to include such parameters as DSNTYPE=EXTPREF – though they probably didn’t pick a name that easy to remember. They can include other parameters like UNIT and SPACE in the class definition, even LRECL and KEYLEN, a bunch of VSAM parameters, and parameters you probably never heard of and don’t care about. (listed at length in the JCL Reference manual).

Leaves you with not much that you’re required to specify in JCL, if the person setting up the data classes has been very thorough.  They’re not required to include all the parameters, though, or they can leave them the same as the z/OS system defaults, and that leaves you specifying anythjng you need that they left out.  Anyway with exceptions (usually SPACE), most of the parameter values they include in the data class are treated not as limits but as defaults, meaning you can override them in your JCL. 

For example, suppose they have set up a data class for you to use for some accounting data files, and they have named the data class ACNTDATA, and all your files in that area have identical characteristics so they have specified everything correctly for you. (Hey, it could happen.)  One day you need a data set with a longer than usual record length – maybe the data sets are generally full of records 100 characters in length, but today you got a file from a new customer where they keep extra data security fields at the end of the records, and you need to allow for 200 characters in every record rather than your standard 100.. When you create the new data set for the 200-byte records, put the LRECL override in your JCL

DATACLAS=ACNTDATA,LRECL=200

and you will get your 200-byte record size allocated for the data set.

Similarly, if you specify both the DATACLAS and the LIKE parameters on the DD for the same new data set, any attributes obtained from processing the LIKE override any parameters contained in the data class definition for the DATACLAS, unless the data class was set up to prohibit such an override..

Far-fetched Warning Note: SMS must be up and running to make DATACLAS work in your JCL, but it is NOT necessary for your data set to be SMS managed.  Notice that SMS is almost always up and running on almost every z/OS site.  But if the DATACLAS parameter in your JCL is ignored, that’s what to check – but you won’t be the only one experiencing problems if that happens, the whole system would probably go wonky. 

Of course DATACLAS only applies to new data sets, you cannot change a data class for an existing data set by putting a different DATACLAS in your JCL.  The DATACLAS you specified would be ignored.   Well, the syntax-checking part of the JCL processing would still check for valid syntax, like a security guard in a lobby entrance, but assuming you have it spelled correctly and so on, and parameters are within the valid range of card columns for parameters (starting at or before column 16 if on a continuation card, and extending no further than 71 in any case), there’s no further processing of it.   You can still change some individual data set attributes in some cases – either deliberately or accidentally — but not the DATACLAS itself.

So taken on the whole, that should give you some insight into IBM’s apparent strategy of continuing to introduce major upgrades such as allowing bigger data sets and bigger disks, with the JCL parameter changes necessary to support the new features, while at the same time shielding the user somewhat from the disruption of needing to learn a lot of new JCL specifications.   

 

References, further reading.

https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idad400/d4425.htm

https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.idad400/d4112.htm

DSNTYPE parameter in JCL:

http://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieab600/xdddsnty.htm

https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.2.0/com.ibm.zos.v2r2.ieab600/iea3b6_Subparameter_definition18.htm

DATACLAS parameter in JCL:

https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.ieab600/xdddatac.htm

How to Set up DATACLAS definitions:

z/OS DFSMSdfp Storage Administration

https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idas200/defdc.htm

also

z/OS DFSMS Using Data Sets 

 

 

 

 

IEFBR14

I E F B R 1 4   Mysteries of IEFBR14 Revealed —

People ask what IEFBR14 does.  If you laughed at that, move along to some other reading material.

There, now we’re alone to investigate the mysteries of IEFBR14.  You’re new to the mainframe perhaps. 

If you have read the previous introductory JCL post, you know that the only thing you can ask the mainframe to do for you is run a program (as specified by EXEC in JCL), and when that happens the system does various setup before running the program, plus various cleanup after the program.  Most of that setup and cleanup is orchestrated by what you specify on the DD statements (DD means Data Definition).   So when you run a program, any program, you have quite a bit of control over the actions the system will take on your behalf,  processing your DD statements.

When the program IEFBR14 runs, IEFBR14 itself does nothing, but when you tell the system to run a program (any program), that acts as a trigger to get the system to process any DD statements you include following the EXEC statement.  So you can use that fact to create and delete datasets just with JCL.  

For example, you might specify DISP=(NEW,CATLG) on a DD statement for a dataset if you want the system to create the dataset for your program just before the program runs (hence NEW), and you want the system to save the new dataset when the program ends, and you also want the system to create a catalog entry so you can access the dataset again just by its name, triggering the system to look the name up in the catalog (hence CATLG).

So all you need to do to create a dataset is to put in a DD statement after the EXEC for IEFBR14, and on the DD statement you specify DSN= whatever name you want the new dataset to have, you specify DISP=(NEW,CATLG) as just mentioned, and rather than specify a lot of other information you pick an existing dataset you like and just model the parameters for this new one based on that, saying LIKE=selected.model.dataset — the DDname on the DD statement can be anything syntactically valid, that is, not more than 8 characters long, starts with a letter or acceptable symbol, and contains only letters, numbers, and the aforementioned acceptable symbols, which are usually #, @, and a currency symbol such as $.

Example:
 
//MYJOB  JOB  1,SAMPLE,MSGCLASS=X,CLASS=A
//BR14  EXEC  PGM=IEFBR14
//NEWDATA  DD  DSN=MY.NEW.DATA,DISP=(NEW,CATLG),
//    LIKE=MY.MODEL.DATASET

So the system creates the dataset.  Your program does not create it.  Your program might put data into it (or not), and the system doesn't care about the data.  The system manages the dataset itself based on what you specify in the DD statement in the JCL – or if a program really needs to create a dataset then the program builds the equivalent of a DD statement internally and invokes “dynamic allocation” to get the system to process the specifications the same as if there had been a DD statement in JCL.

In such a case the system processes that “dynamic allocation” information exactly the same way it would have processed it if you had supplied the information on a DD statement present in the JCL.

To delete an existing dataset you no longer want, you can specify DISP=(OLD,DELETE) on the DD statement, and the system will delete the dataset.  This is similar to the way it would delete a dataset if you issued the DELETE command under TSO, or using  IDCAMS, but there are a couple of important nuances you need to know about  deleting datasets.  

One is that it is a big mistake if you try to delete a member of a dataset using JCL.  The DISP you specify applies to the entire dataset, even if you put a member name in parentheses.    Never say DELETE for a Member in JCL; you will lose the entire library.

The second thing you need to know about deleting a dataset is that, for ordinary cataloged data sets, saying DELETE causes the deletion of both the dataset and the catalog entry that points to it.  That's fine and works for most cases, but sometimes you might have just a dataset with no catalog entry, and other times you might have a catalog entry pointing to a dataset that isn't really there anymore.

If you have a dataset that is not cataloged, then you need to tell the system where it is.  You do that by specifying both the UNIT and VOL parameters.   UNIT identifies the type of device that holds the dataset, something like disk or tape, which you might be able to specify just by saying UNIT=3390 (for disk) or UNIT=TAPE.   VOL is short for VOLUME, and identifies which specific disk or tape contains your dataset.  So UNIT is a class of things, and VOL, or VOLUME, is a specific item within that class.  

It turns out that coding UNIT isn't usually as simple as saying UNIT=DISK.  The people responsible for setting up your system can name the various units anything they want.  UNIT=TEST and UNIT=PROD are common choices.  The system, as it comes from IBM, has UNIT=SYSDA and UNIT=SYSALLDA as default disk unit names, but some places change those defaults or restrict their use.  If you have access to any JCL that created or used the dataset, it would likely contain the correct UNIT name — because if there is no catalog entry for an existing dataset, then every reference to the dataset has to specify UNIT.

When you first create a dataset, you are required to supply a UNIT type, but you are not required to specify a VOLUME — the system will select an available VOLUME from within the UNIT class you specified.  

If you are dealing with a dataset that was created with just the UNIT specified, and DISP=(NEW,KEEP), then you need to find the output from the job that created the dataset.  The JCL part of the listing will show what volume the system selected for the dataset.

To code the VOL parameter in your JCL, typically you say VOL=SER=xxxxxx, where xxxxxx is the name of the volume.  There are various alternatives to this way of coding it,  SER is short for Serial Number.  The names of tapes used to be 6-digit numbers in most places, for whatever reason — possibly to make it easy to avoid duplication.  Besides Serial Number, the volume parameter has other possible subparameters too,  but you don't care right now.

If you don't know what UNIT name to use, but you do know the volume where the dataset is, then go into  something like TSO/ISPF 3.4 and do a listing of the volume.  Select any other dataset on the volume and request the equivalent of ISPF 3.2 dataset information. Whatever UNIT it says, that should work for every dataset on the volume.

Note that it is possible for the same volume to be a member of more than one UNIT class.  It might belong to 3390, SYSDA, and TEST for example.  In that case it doesn't matter which UNIT name you specify for the purpose of finding (and deleting) the dataset.  The only point of putting UNIT into your JCL for an existing dataset is to help the system find it.

Note that it is possible to have multiple datasets with exactly the same name on different volumes and in different UNIT classes.  An easy example to visualize is having a disk dataset that you copy to a tape,  giving it the same DSN, and then later you copy it again to a different tape.  

Another, more perverse, example occurs when someone creates a new dataset they intend to keep, but mistakenly specifies DISP=(NEW,KEEP) rather than DISP=(NEW,CATLG).  Later they can't find the dataset, because no catalog entry was created. Rather than figure out what happened, they run the same job again.  If the system puts the second copy of the dataset onto a different disk volume, they now have two uncataloged copies of it.  If they keep doing that, at some point the system will select a volume that already has a copy of the dataset, and then the job will fail with an error message saying a duplicate name exists on the volume.  To clean up something like that, you need to find every uncataloged copy of the dataset  and delete it, specifying VOL and UNIT along with DISP=(OLD,DELETE) — you can use IEFBR14 for that.

On many systems, they have it set up so that a disk housekeeping program runs every night (or on some other schedule), and deletes all uncataloged disk datasets.  So if you find yourself in possession of a set of identically named uncataloged disk datasets, and you don't want to look for all the volume names, you might get lucky if you wait for a possible overnight housekeeping utility to run automatically and delete them for you.

One other point on those uncataloged datasets.  You also have the option of creating a catalog entry for a dataset, rather than deleting it, if you want to keep it around.  To do that, you specify UNIT and VOL, as just discussed, and DISP=(OLD,CATLG) — but you can only have one cataloged copy of any dataset name.

So, back to the point we touched on earlier, about how a program can create the equivalent of a DD statement internally.

It is also possible for part of the information to be specified on a DD statement in JCL, and part of the information to be specified in the program.  In that case the program needs to specify its DD modifications before the OPEN for the file.  The system then merges what the program specified with what the JCL specified, and if there’s a difference then the program takes precedence.  So the program can change what the JCL said, and the program wins any disagreements – but the program has to have its say BEFORE the OPEN for the file.

Note that IEFBR14 does not OPEN any files.  The system does the setup and cleanup involved in allocation processing, and only that. Various JCL specifications can be used to indicate processes that occur only at OPEN or CLOSE of a file.  Releasing unused disk space is an example of that.  If you code the RLSE subparameter in your SPACE parameter on a DD statement for IEFBR14,  that subparameter is ignored; no space is released.

This general discussion of DD parameters that can be specified within a program is pretty much irrelevant to IEFBR14, except to note that IEFBR14 does no file handling of any kind, so it will never override anything in your JCL (as some other program might).  So if you use IEFBR14 to set up a dataset, and the dataset does not come out the way you wanted, that is not due to anything IEFBR14 did. Because IEFBR14 does nothing.

If some item of information about a dataset is not specified in JCL, and the program does not specify it either, then if it is a dataset that already exists, the system looks to see if the missing item of information might be specified in some existing saved information relevant to the dataset, such as the Catalog entry or the Dataset Label.  Things like record length, record format, and allowable space allocation are generally kept in the Dataset Label for ordinary datasets.  For VSAM datasets most of the information is kept in the Catalog.  The system looks for the information, and if the information is found, it merges it together with the information obtained from the program and the JCL to form a composite picture of the dataset.

What if the system needs to know something about a dataset – record length, for example (LRECL), or record format (RECFM) or one of those other parameters – and after looking in all three places just named, the system has not found the information anyplace – what happens?  Default values apply.  Ha ha, because you don’t usually like the defaults, with the single glowing exception of BLKSIZE where the default is always the best value possible.  The system can calculate BLKSIZE really well.  Other defaults – you don’t want to know.  So specify everything except BLKSIZE.  You don’t need to specify everything explicitly though, you can use the LIKE parameter to specify a model.  Then the system will go look at the model dataset you’ve specified and copy all the attributes it can from there, rather than using the very lame system defaults.  So you specify whatever parameters you want to specify, and you also say LIKE=My.Favorite.Existing.Dataset to tell the system to copy everything it can from that dataset’s settings before applying the <<shudder>> system defaults.  Note: The system will not copy BLKSIZE from the model you specify in LIKE.  No, the system knows where its strengths and weaknesses lie.  It recalculates the BLKSIZE unless you explicitly specify some definite numeric you want for that.

Also note that any particular system, such as the one you're using, can be set up with Data Classes and other system-specific definitions of things that affect defaults for new datasets.  Something could be set up, for example, stating that by default a new dataset with a certain pattern of DSN would have certain attributes.  If so, that would generally take precedence over any general IBM-supplied system defaults, but usually stuff you specify explicitly will override any site-specific definitions of attributes — unless, of course, the ones on your system were purposely set up designating they couldn't be overridden.

Okay, so then, does IEFBR14 do some magical internal specifications?  Nope.  IEFBR14 does absolutely nothing.  You do the work yourself by specifying everything you want on the DD statements.  IEFBR14 never opens the files, modifies nothing, does nothing.  You can code your own program equivalent to IEFBR14 by writing no executable program statements except RETURN. Or, for that matter, no statements at all, since most compilers will supply a missing RETURN at the end for you.  Yes, that is IEFBR14.  Laziest program there is: Lets the system do all the work.

You can use any ddnames you want when you run IEFBR14.  The files are never opened.  The system does the setup prior to running the program, including creating any new files you’ve requested.  The program runs, ignoring everything, doing nothing.  When it ends, the system does the cleanup, including deleting any datasets where you specified DELETE as the second subparameter of DISP.

So that’s how it works.  Often people think IEFBR14 does some magic, but it doesn’t.  It relies on the system to go through the normal setup and cleanup. 

You can add extra DD statements into any other program you might happen to be running, and the system will do the same setup and cleanup.  Of course you’d need to be sure the program you pick doesn’t happen to use the ddnames you pick – you wouldn’t want to use ddnames like SYSUT1 or SYSUT2  with most IBM-supplied programs, for example. 

Ddnames SALLY, JOE, and TOMMY should work just fine though.  The IEFBR14 program doesn’t look at them.  The system doesn’t know and doesn’t care whether the program uses the datasets. 

People use IEFBR14 for convenience because they know for sure that the program will not tamper with any of the file specifications. 

How did IEFBR14 get its name?  Well, the IEF prefix is a common prefix IBM uses for system-level programs it supplies.  BR14 is based on the Assembler Language instruction : BR 14
which means Branch (BR for branch) to the address contained in register 14 — the pointer that holds the return address.  So, Branch to the Return Address, that is, Return to Caller.

Correction 23 November:  Sorry, it's parsed BR 14 rather than B R14.    The notation BR in Assembler Language means Branch to the address contained in the specified Register, as distinct from B. branching to an address specified some other way, for example as a label within the program.

That’s it, secrets of IEFBR14 revealed.

Or,  JCL Basic Concepts part II 

JCL Basic Concepts

JCL Basic Concepts — Introduction to JCL

This is an introduction to IBM z/OS MVS JCL basic concepts.  Yes, there are people who want that.  If that’s not you, then there’s nothing to see here, move along.  But if you ARE somebody who just wants a simple intro to basic JCL, here it is.

JCL is easier than it might look.  Obvious once you know it, like so many things.

A Short History of JCL

Sort of like a short bio of a person.  If you’re in a hurry, skip ahead to ”Basic JCL statements”.

Notice how a line length of 80 characters is used in MVS a lot?

Originally computers did not have screens.  Later there was one screen for the main operation of the computer only.  Screens caught on, and we take them for granted now, but before that people mostly used punch cards to enter their programs and data into the computer.  Each card was 80 columns wide.  Why 80? Probably because that was a standard line length on letter-size typewriter paper.

Yes, to write a letter or a paper of any kind in those days, a person would put a sheet of paper, like copier paper, into a mechanism that fed the sheet through the device on a roller, advancing one line at a time.  The keyboard was used to type characters directly onto the paper.  Those were called typewriters, and they pre-dated computers.  At the end of each line you would hit the ”carriage return” to go down to the start of the next line.  That was the model for the idea of the ENTER key.  It was also the model for CRLF: Carriage Return + Line Feed.

Early computer input devices, called keypunch machines, simply substituted an 80-column piece of rectangular lightweight cardboard instead of paper: Each such card was the equivalent of one typewritten line on paper.  Whenever you typed a character on the keyboard it caused some holes to be punched in the next available column on the card, according to a code that mapped various patterns of holes to the various letters, numbers and other symbols that together constituted the character set available on the keyboard.  Hence a small stack of cards (called  a card deck) was equivalent to a sheet of paper.  There could be any number of cards in a deck, making it conceptually more like a parchment scroll than a sheet of paper.

The visual representation of the punched holes appeared on the top line of the card, looking like a printed line a person could read.  Well, as long as the ”interpreter” function was active on the machine, the top line was printed.

Notice how even today 80-column files often have columns 73 through 80 reserved for line numbers?

Yes, people quickly noticed that if you dropped a deck of cards it wasn’t always easy to be sure you put them back in the proper order.

IBM therefore made it an option on the keypunch machines that they could be rigged up to put line numbers automatically into the last 8 columns of each card.  Most of the time these columns at the far right were otherwise unused, so this seemed a natural choice, to put the numbers off to the side out of the way.

You could also rig it up to skip to a particular column, based on the idea of tabbing on a typewriter.  Yes, this ability led naturally to the now quaint-seeming custom of expecting particular types of input to be in particular columns.  At the time, that seemed like a simplification.

Column 72, the last column just before the default line number field in columns 73-80, came to be commonly used to indicate continuation of a logical line (like, say, a sentence in English) onto another card.  The set of cards thus linked was called a statement. Since most statements were usually less than one line long, the word “card” was often used informally as a shorter synonym for the word “statement” (as in: “the Job card” or “a DD card”).  Strictly speaking they are not synonyms – but now you know why people refer to “a Job card”.

Nomenclature: One character (a single column) is also called a byte (pronounced bite), at least in ordinary “EBCDIC” (ebb-suh-dick)

Eventually they used up all of the available combinations that would fit into one column – that is, 256 possible internal combinations on an IBM machine– and they needed to add more characters for things like the Japanese katakana character set.

For that they came up with another system called a “double byte character set” (dbcs), and that is used for representing the extended set of characters that can’t be represented in one byte.

For now, as a simplification, we’re just going to proceed as if one byte still represented one character, which in JCL is true.  Also note that characters used in JCL are REQUIRED TO BE UPPERCASE unless they are within a quoted string.  By quotes we mean single quotes, also called apostrophes – JCL doesn’t use the double-quote character.  Okay, if you want to embed a single apostrophe within a quoted string, you do that by putting two consecutive apostrophes in that place in the string, and the system will compress the two into a single apostrophe when it processes the string.  But the actual double-quote character is not used in JCL.

There were other machines besides key punch machines:  Some to read files from the computer (or from a deck of punched cards) and copy the lines onto output punched cards and/or paper, and those were called Card punch machines and printers; Some to handle cards in other ways, for example sorting them.  All of these were collectively known as Unit Record machines.  They fell in with adding machines (EAM = Electric Adding Machine) and were considered office equipment.

(Think about it, IBM = International Business Machines. Computer-related equipment is, or at least was, office equipment.)

The next big step was for IBM to rig up typewriters – IBM Selectric typewriters – to accept long rolls of paper fan-folded into perforated sheets, and to rig those typewriters to input your typing directly into a computer connection.  Hence was the first commonly used modern computer terminal born, the IBM model 2741.  It had a switch on the side so you could convert it back and forth between a plain typewriter and a terminal; also it could be a printer.  The lines you typed on the terminal were patterned on the design that had been developed for punched cards – so columns 73 through 80 were still designated as line number columns and 72 was still the continuation column (a convention that continues to this day).

So let’s say you’re living back then and you want to type some data and put it into a computer file.  You might want the file to be on hard disk, or you might want it to be on a reel of magnetic tape that you could carry away with you.  How do you tell the computer to do that?

JCL: Job Control Language.  Also, “utility programs”, aka “utilities”.  People often conflate these two things, but they aren’t the same really.  JCL is the language that the computer reads directly from an input device (such as a card reader or a terminal).  The utility programs are apps provided by IBM to do commonplace functions like copying a file.  You use JCL to tell the system to run (execute) a utility program, just as you would use JCL to run any other program.

JCL needs a minimum of two statements.

Basic JCL statements

First is a JOB statement, which is like a header record: It is used to direct any results or system-generated error messages back to you, to convey any accounting information if applicable for charging, to specify limits such as a time limit, and in general to communicate to the system any information, restrictions, routing and so on that will be applicable to the entire piece of work that you want the system to do for you; That piece of work is called a job.

Each job executes (runs) one or more programs.  Each program to be executed requires an EXEC statement.  The EXEC statements follow after the JOB statement.

If a program requires input data or produces output data, then the EXEC statement for that program is followed by a DD (Data Definition) statement for each data file the program will use: One DD statement for each data file.

Because it usually contains unintuitive non-obvious information like account numbers, people traditionally have not much liked to create job statements, in general, and doing so seemed to be error-prone besides.

So, when punched cards were in use, the people in charge of a system often handed out pre-printed color-coded job cards tailored to each user of the system.

If you signed up for an account you might be handed a small deck of pre-printed job “cards” – one card per job statement – which were identical except for the Job Name.  The Job Name was typically assigned in ascending sequence.  Say your user name was SALLY1 (letters were required to be uppercase only, and mostly in JCL they still are); you might be given a deck of cards with the job name field preprinted as SALLY101 through SALLY199.  If you used up the cards you could get another set.

Most places printed the job cards on a special color of card stock, such as pink, so they stood out from the beige card stock that was used for most purposes.  Sometimes if there were different classes of jobs – say short jobs and long-running jobs – they might provide two different colors of job cards, one for each job class.

Now you know why they are so often called “job cards” rather than “job statements”.

To this day people usually just copy an old job statement when they create a new job, because that’s easier than trying to remember accounting information.

Fields on a JCL Statement

Columns 1 and 2 of a JCL card are called the ID field or the Delimiter field. 

The system uses this to recognize that the card represents a JCL statement.  This was important originally, when all the input was being read from punched cards, and the system had to separate the Job Control Language instructions from input program source statements and other input data.

Usually the ID field is just two slashes.  These have to be in columns one and two exactly.  If you have a statement with just the // in 1 and 2, and nothing else on the card, that is called a NULL statement and its main purpose is to terminate the job.

People used to be very careful to include a NULL statement at the end of each job to avoid any chance of pulling in somebody else’s JCL as part of their job — in case the next person who turned in a deck of JCL for processing had failed to include a job statement at the start.  If your job ended with a NULL statement, any other cards that might be read in right after it would be flushed until another job card was found.  If you did not have a NULL statement, then somebody else could piggyback their work onto yours using your same accounting information – meaning that if charges were applicable, you or your department would be charged, and if security authorization was required, your authorization would be used for their work.

Those days are pretty much gone, and if you’re submitting a job online then the system generally takes care of terminating the JCL after the last line you send.  But if you have a NULL statement, the system will terminate your job right then and there.

The Name field starts in column 3, and your basic name field can be up to 8 columns wide.  A complex statement name consists of two or more basic names each separated from the preceding one by a dot (aka full stop, period, decimal point or point).  A job name, though, is always a simple name, limited to a maximum of 8 characters.

Eight is a special number in JCL, and sometimes in other IBM things as well.  Why eight?  I have never heard a satisfactory explanation for the choice.  You’ll run into the length limitation of 8 over and over again, though, so get used to it.  Many people credit the choice to the fact that many of the early designers were engineers and as such were not required to have extensive vocabularies in basic English, and they thought 8 letters was quite long enough for a word to be.  That seems perhaps a bit unkind, but I do not have a better explanation to offer either.

After the Job statement, you have to provide an EXEC statement, which tells the system what you want it to do for you.  If you want it to do several things, you can have several EXEC statements within the same job.  Almost the only thing the computer is able to do for you is to execute a program, so you tell it what program to execute, and then the program does all the work – or rather, the program, when it executes, directs the computer what to do, with very explicit step by step instructions (called the program).

Besides executing the program(s) you tell it to run by way of the EXEC statements you supply, the one other thing the computer system does for you is to run invisible internal programs that are part of the system itself, and these do setup and cleanup tasks on behalf of your program(s).  However, the system only does those setup and cleanup tasks as a byproduct of processing for some program that you specify on an EXEC statement (except that it also does some overall job initiation and job termination for the whole job).

The name field of an EXEC statement is called the “step name”.  The running of each specified program is called a job step.  If you have only one EXEC statement, your job has only one step.  If you have a bunch of EXEC statements then your job has a bunch of steps.  If you have zero EXEC statements then the job fails and you get the message “job has no steps”.

Unsurprisingly, the name field of a JOB statement is called the Job name, and the name field of a DD statement is called the ddname.

An EXEC statement looks somewhat like a JOB statement, but usually simpler.  It has all the same FIELDS.  The ID or delimiter field comes first as always, composed of a double slash // in columns one and two.  The name field still starts in column 3 as always – However, unlike the Job Name on the JOB card, the step name is optional on the EXEC statement.  You can leave it blank as long as you never need to refer to it anyplace else.  In any case, the name field ends with the first blank space, wherever the first blank occurs, whether or not the name field is used.

You can have several blanks if you wish, but only one is required.  You will see a lot of JCL that is lined up so that the next field after the name field starts in column 12, as it would if the name were 8 bytes long.  That isn’t required.  It is done to make it easier for the human eye to read the JCL.

Operation Field

After the blank that terminates the name field, you have the OPERATION field.  The operation field defines the type of JCL statement.  It says JOB for a JOB statement, EXEC for an EXEC statement, DD for a DD statement and so on.  Other examples are SET, INCLUDE, PROC, PEND, IF THEN, ELSE, END-IF, OUTPUT, JOBLIB, JCLLIB – but we’re only going to cover JOB, EXEC, and DD here because this is just an introduction.  The IBM z/OS MVS JCL Reference Manual is the master reference, and you can read it online for free to get more information.

The operation field is (and must be) delineated by one or more blanks both before and after.  This field is always required EXCEPT on a continuation card.

Operand Field (aka Parameter Field)

After the operation field is terminated with an ending blank or blanks, you have the OPERAND field, also known as the PARAMETER field.

Most of the information you want to convey is in the operands.  There are lots of possible operands, but right now we are interested only in a few basic ones.  The first operand on the most basic EXEC statement identifies the name of the program you want executed.  It is possible to specify, instead of a program name, the name of a JCL procedure – that is, a set of other JCL statements – that will be brought in from someplace else at that exact point in your “deck”.  Ultimately the JCL composed by this method must include at least one EXEC statement specifying a program to be run.

If your EXEC statement calls directly for a program to be executed, it will say PGM=ANYNAME for the first (and perhaps only) operand, but of course instead of ANYNAME you put in the name of the program you want: If you want to run IEBGENER, you say PGM=IEBGENER for example.

The other option instead of PGM= is PROC= and that form is used to specify a JCL procedure that is to be brought in.  But since PROC= happens to be the default value, they let you just say the name of the JCL proc and leave off the PROC= part.  Hence

//STEP1  EXEC  OTHERJCL

Is the same as saying

//STEP1  EXEC  PROC=OTHERJCL

Usually JCL procedures are stored in a procedure library on the system and if someone else has set them up then you don’t need to think much about what they contain.

However, you can put a JCL procedure within your job (instream, as it’s called) as long as you put it prior to the EXEC statement that will invoke it.  Your instream procedure would start with the PROC type of statement and end with the PEND statement (PEND stands for Procedure End).  We’re not going to discuss procs in depth in this introductory article, except to point out one common pitfall: if you leave out the PEND statement on an instream PROC then your job will fail and you will get the “job has no steps” message, because everything following the PROC statement will be considered part of the unterminated procedure.  Hence no actual request to EXEC the procedure will be found; hence there is no embedded request to execute an actual program. Hence, Job has no steps.

The PROC or PGM operand is called a positional parameter and it has to come first.

Usually a positional parameter does not have a name or an equal sign. The first parameter on the EXEC statement is an exception, used to specify either a PGM or a PROC.

There are two basic types of operands (throughout all types of JCL), keyword and positional.  If it says something equals something else, like an equation, that is a keyword parameter, and the thing on the left of the equals sign is the keyword.  Otherwise, if there is no equals sign, then the parameter is positional.  There are a few exceptions, and the main exception is that one you just saw, the first parameter on the EXEC statement, which is just an unusual hybrid of the two concepts.

When one keyword parameter has to convey more than one piece of information, then you put a set of parentheses on the right of the equal sign; Within the parentheses you list the set of subparameters you need to convey.

The subparameters can be either keyword or positional, and in some cases there is a mix.  When both keyword and positional subparameters are listed, the positional ones always come first.  We are going to get to examples of this soon.  The parameter TIME has positional subparameters of minutes and seconds.

So let’s talk about the other operands you might want to put on the EXEC statement.

The rest of the parameters on the EXEC are keyword and optional.

TIME is an optional keyword parameter.  You can say TIME=5 to request 5 minutes of CPU processing time for the program, which is quite a long time if you aren’t doing some enormous task.  TIME has subparameters available, and those are positional, with minutes being the first.  If you want to ask for only 30 seconds, then you can say TIME=(0,30) indicating zero minutes and thirty seconds, or you can say TIME=(,30) so that the comma functions as a place-holder for the unspecified minutes, and from that the system can know that the 30 is intended to be the second positional subparameter, seconds.

REGION is another optional keyword parameter.  Region size is loosely equivalent to the maximum amount of memory the program is allowed to use at any one time.   At least that is what it meant originally.  Now there is a convoluted way of interpreting the value; Read the separate post on What TSO Region SIZE Really Means to understand how it is interpreted now.  REGION in JCL is nearly the same thing as SIZE in the TSO Logon.  You can say REGION=0K or REGION=NOLIMIT to request the maximum possible region size available for your program to use.  You can leave the parameter off to accept the default the system assigns.

If you want to specify some other amount, you should look in the JCL reference manual to see how to do that, or else at least read the aforementioned post on TSO Region size.  It isn’t a straightforward matter of just specifying how much memory you want, because the system has defaults, limits, and algorithms it uses to determine the region you are actually allowed – it takes what you specify and combines that with its strange rules and then sets your limit at some amount it calculates.  Not only that but there are different types of memory, such as 24-bit addressable memory and 31-bit addressable memory, and the system also calculates amounts allowed for each type.  Some old and/or oversimplified books and write-ups will tell you that REGION specifies the amount of memory/region your program will get, but that is simply not true, and has not been so during this century.  In general 32 Megabytes is usually the smallest amount of 31-bit addressable memory your job will be allowed even if you ask for it to be limited to less.  Note that the Region size allowed for your program is an upper limit, not a straight up allocation.  Memory is actually assigned to the program in chunks at the time the memory is required, until the assigned limit is reached.

PARM is also an optional keyword parameter on the EXEC statement, used to pass a parameter string to the program that will be executed.  Not all programs look at the PARM string.  If a program does accept a PARM string as input, the format it accepts is entirely determined by that program, and differs from one program to another.  Essentially the PARM string is data you supply to a program.

There are other optional parameters available on the EXEC statement too but we aren’t going to talk about any more of them here today.

Okay, quick review:

  • We have an ID or delimiter field at the start of a JCL statement, usually // in columns one and two.
  • We have a name field, sometimes optional, starting in column 3 and ending at the first blank space.
  • The operation field is after that, it identifies the type of JCL statement (such as JOB, EXEC, DD, PROC, PEND, SET, INCLUDE, IF, etc), and the operation field is terminated by another blank space.
  • Next beyond that is the operand field.

~~~ End  quick Review ~~~

How do you suppose the operand field is terminated?

You take a wild guess: Could it be by another blank? Yes.

So, then, what is a very common source of hidden, hard-to-find errors in operands?

Yes, Blanks embedded within the operands field.

As soon as the system sees a blank after an operand, it terminates the operands field and ignores the rest of the card EXCEPT for column 72 which is still interpreted as a continuation column.  Columns 73-80 are ignored, assumed to be reserved for line numbers.

After the operands field, up through column 71, you have the comments field.  You may put comments there, provided you precede the start of the comments with at least one blank to separate the comments from the operands.

If you have a comma between two operands – what you intended to be two operands, at any rate – the operand that follows after the blank is turned into a comment.  Notice this is not an error and no error message is issued for it.  What you thought was another operand is ignored, and it’s up to you to figure out the result of that.  Typically there is no error message, you just get peculiar results, like garbled records or your commented-out DISP being ignored.

Are you required to put a character into column 72 if you want to continue a statement onto more than one line?  Not at all.  In the very distant past it was required.  Then they realized that if a line ended in a comma, they could interpret that as implying the next card would be a continuation of the same statement, and the continuation column was redundant, superfluous, surplus to requirements.

So you can end one line after a comma and put more operands on the next line.

The next line has to start with // in columns one and two, followed by at least one blank, and the continued operands must start by column 16 at the latest.

So, // in cols 1 and 2 followed by one to 13 blanks, then more parameters.

The operands must end before column 72 on each card, that is, the last usable column for operands is 71.

Leave a blank prior to the line number field in 73-80.  That is, leave column 72 blank.  (Unless you actually intend to continue onto another line, in which case you can put something into 72 or not, as long as it’s not part of your parameters.)

If a parameter has a lot of subparameters that don’t fit on one card, split and continue the subparameters in exactly the same way as you would parameters.

You can go on like that for several continuation lines, as long as you have operands to add.  But, as you know, if any of the lines has an extra blank after one of the commas, all operands on that line after the blank are thereby transformed magically into comments and instantly disappear from consideration as JCL operands – and you are not informed that they have been ignored.

If your comments run on into the line number field, that is okay as long as none of the characters fall into column 72.  If the line happens to be continued onto another line, then it doesn’t matter about column 72, because the worst it can do is indicate that the line is going to be continued, and that is already true.  If, however, your comments fall into column 72 on a line that is not being continued any further, then the following line after that one becomes a JCL error (“expected continuation not received”).  Okay, that’s continuations and comments.

But wait! You might ask – How do I continue a quoted string onto multiple lines?  The PARM field can be up to 100 characters long, right?  Not on an 80-byte line it can’t!  So come on, what happens with the quoted string?  Hnnh??

Okay.  Continuing a quoted string.

By quotes we mean single quotes, also called apostrophes.

On the first line of the quoted string, continue the string right up through column 71.  In this case, the character in column 71 is going to be used as part of the string.  On the next line, put // in columns 1 and 2 as usual, and then start the rest of the parameter string exactly in column 16.  If it is still too long, go right on through column 71, using both columns 71 and 16 as part of the data string.  Close off the string as usual on the final continuation line.  Since you asked.

What you now know:

~ You now know the joys and the pitfalls of comments and continuations.

~ You know about parameters and subparameters, and about keyword vs positional parameters and subparameters, and required vs optional parameters and subparameters.

~ You know the fields of a JCL statement:

  • ID or delimiter field,
  • name field,
  • operation field,
  • Operands field or parameter field,
  • Comments,
  • Continuation column,
  • Line number field

~ Plus, you know what a JOB statement and an EXEC statement are.

Sounds like you know a thing or two about basic JCL concepts now.

There are several other types of JCL statements also, but this is an introductory article so we’re just going to introduce DD statements and then you’re good to go, as far as having a basic understanding of JCL.  For advanced material, Remember that IBM z/OS MVS JCL Reference Manual – you can read the newest version online and it always has the definitive answers.  Well, to the extent that it has the answers, whatever answers it gives are definitive.

Onward to our last type of JCL statement for today:

DD statementsData Definition

Computer programs on the IBM don’t usually contain embedded names of actual disk files, as programs do on some platforms.

On the mainframe these are called not disk files but disk datasets, and their names are not called file names but rather dataset names (dsnames, or DSNs).

Typically a mainframe program refers to a file internally by a file name, or ddname, of 8 characters or less.  When the program runs, that internal file name is mapped to the name of an actual file (a dataset) through the use of a DD statement.

The Name field of the DD statement in the JCL must be the same as the internal file name the program uses to reference the file.  That is what connects the actual dataset to the program file name.

The operands field on the DD statement specifies the dataset name with the DSN keyword parameter, that is, as DSN=the.actual.disk.dataset.name

If you do not specify a DSN on the DD statement, the system makes up a name based on the date, time of day, job name and step name.  Typically this dataset is then discarded when the step ends, unless you have made other provisions for it in your JCL.  Rarely do you really want a nameless temporary dataset, so you want to specify an actual DSN for your program to use.

DISP (disposition) is the other keyword parameter you usually want to specify.

When you specify an input dataset, you want to put DISP=SHR usually, unless you want to prevent other work from using the dataset while your program is processing it.  In that latter case you specify DISP=OLD.  OLD and SHR are equivalent except that OLD requests an exclusive lock on the dataset name.

What if you want to create a new dataset as output from the program?

Well, DISP has positional subparameters.  The default first subparameter is NEW, but the default second subparameter that NEW implies is DELETE.  So that means the system will create the dataset and then when the program finishes the dataset will be deleted.  Probably not what you want.

You might think you want KEEP as the second subparameter, but don’t be fooled.  What you want is probably CATLG.  If you just say KEEP, the system will keep the dataset when the job ends, but you might have trouble finding it again if you don’t save the output from the job, so you can see where the system says it put the dataset.  If you specify CATLG, the system will not only keep the dataset it creates for you, it will also create a catalog entry – kind of like a phone book listing – so the dataset can be found just by its name.  After that happens, you can get to it again by specifying something like this:

//filename  DD  DISP=SHR,DSN=whatever.it.was.named

In place of “filename” in the above example DD statement, you put the actual ddname  (file name) that the program expects to read, for example the ddname you use might be SYSUT1 if the dataset is going to be used for input to IEBGENER.  In place of “whatever.it.was.named” you put the actual dataset name that you assigned to the dataset when you created it.

Syntax of JCL examples

It is common practice in JCL examples to use lower-case characters to denote something where you must supply a value, and upper-case characters to denote anything that should be entered exactly as shown.  Many books, manuals, online HELPs, and tutorials use this convention.  However, you should be aware that there is also another more awkward convention they sometimes use, which I avoid both because it is awkward and because it is sometimes ambiguous: instead of (or even in addition to) lower-case characters, framing apostrophes (single quotes) are sometimes used to denote something where you are expected to substitute a value in place of the quoted string.

I mention that quoted-string practice just so you know what it means when you run into it.  I think it is a holdover from a time when early devices could handle letters in upper-case but did not yet have lower-case.  At that time using a quoted string seemed a reasonable convention given the few choices they had.  Today we have lower-case characters available nearly everyplace, so I avoid using the older quoted-string style.

Back to DD statement parameters – – –

This is a good example of using positional subparameters, just for review.  You can say DISP=(NEW,CATLG) or you can say DISP=(,CATLG) and it means the same thing because the default value for the first subparameter of DISP is NEW.  (Obviously “default” used this way means the value the system uses for the parameter or subparameter if you don’t specify a value yourself.)

If you say DISP=SHR or DISP=OLD you don’t have to specify the second subparameter and hence you don’t need the parentheses either.

The defaults for DISP,

and for any part of DISP you don’t specify yourself explicitly:

  • The default for the first subparameter is NEW.
  • The default for the second subparameter depends on the first subparameter.
  • If the first is NEW, then the second defaults to DELETE.
  • If the first is OLD or SHR, then the second defaults to KEEP.
  • The pattern is this: If the dataset already existed, then by default it continues to exist.  If the dataset did not already exist, then by default it returns to nonexistence.

There is a third subparameter, which is the conditional disposition for what happens to the dataset if the program bombs (abends, blows up, crashes).  (Note that a non-zero condition code – a bad return code – does not count as bombing/crashing.  The third subparameter applies only in case of an actual abend.  Abend means Abnormal Termination, that is, crash, bomb, blow up, fall over. System code 0C4, aka S0C4, is an example of an abend code, as are B37, D37, E37, 80A, 913, 806, and User abend code 4095 aka U4095.  However, the message “Last Condition code = 0012” does NOT signify an abend.)  So: Third subparameter takes effect if (and only if) the program abends.

If you are creating a new dataset and you want it to be kept and cataloged regardless of whether the job step that creates it runs brilliantly or fails abysmally, you specify CATLG as both the second and third subparameter:

DISP=(NEW,CATLG,CATLG)

Or

DISP=(,CATLG,CATLG) 

The second form shown there means the same as the first because NEW is the default.

So you’re okay with positional subparameters now, in principle, right? Right. Good.

Other oddball ID fields are possible besides // in columns 1 and 2:

//* in columns 1 through 3 means the line is a comment.

If you have the JES3 subsystem then //* is also used to identify control statements for JES3, but we aren’t going to cover that here – just wanted you to know that possibility exists, in case you happen to see a JCL deck with lines that look like weird comments.

/* in the first two columns signifies the return to JCL statements after the end of an instream input file.  It is not required in most cases.  Mostly you see it as a holdover from times past – people just put it in even though they no longer need to do so.  However, the /* is also used to signify the termination of an input file that is itself composed of JCL statements, when the input file is an instream file defined using DD DATA.  You can see an example a little further below.

In addition, if you have the JES2 susbsystem, then /* can also be used to identify control statements for JES2, but we aren’t going to go into that in detail either.  As with JES3 //* control statements, just be aware that such things exist.  Look them up in the IBM manual(s) if you need to understand them.

So, on to that DD DATA example.

When you specify a DD statement to designate an input file, it does not need to use an existing disk or tape dataset.  You can have input data right in with the JCL deck – what is called instream data.

Typically you precede such an instream file with a DD statement that includes an asterisk as the first positional parameter in the operand field (The only positional DD operand I’m aware of).  You see it like this mostly: //SYSIN DD *

If a /* statement occurs anyplace following the DD * statement, the /* terminates the input file.  However the /* is not required.  The input file is automatically terminated as soon as the system sees more JCL statements: any line with // in columns 1 and 2.

The system has become even smarter than that in fact: If it is processing JCL and it sees some non-JCL thrown in, the system will assume that you mean for those non-JCL lines to be used as an instream input file, and it will further assume the ddname SYSIN for the instream data.  You’ll get a message saying //SYSIN  DD  * generated statement.

However, if you want to include an instream input file containing JCL statements – unusual, but it happens – then you precede your input data with a DD statement that says the word DATA rather than the asterisk.  If the instream JCL data contains an embedded /* then you also need to add the keyword parameter DLM (for delimiter) onto the DD DATA statement to assign some different delimiter than /* — you specify a DLM value for the purpose of preventing the embedded /* from terminating your input file prematurely.

Example 1:
//MYJOB1 JOB (account),NAME,CLASS=A,MSGCLASS=X
//STEP1  EXEC  PGM=MYPGM
//SYSIN  DD  *
  Control statements

Example 2:
//MYJOB2 JOB (account),NAME,CLASS=A,MSGCLASS=X
//STEP1  EXEC  PGM=IEBGENER
//SYSPRINT DD SYSOUT=*
//SYSUT2   DD  SYSOUT=(A,INTRDR)
//SYSUT1   DD  DATA,DLM=QQ
//MYJOB1 JOB (account),NAME,CLASS=A,MSGCLASS=X
//STEP1  EXEC  PGM=MYPGM
//SYSIN  DD  *
  Control statements
QQ

In the second example, everything between the SYSUT1 DD statement and the QQ at the end is read into the program and used for the file that the program calls SYSUT1.

Notice the data in the SYSUT1 file in the second example is exactly the same as the entire set of JCL that constitutes the first example.

We mentioned utility programs earlier.  IBM supplies a number of such programs for the purpose of doing commonplace functions.  IEBGENER is the name of an IBM-supplied utility program that reads a flat file* from the DD statement designated as SYSUT1, and copies the specified file to whatever location is designated by SYSUT2.  This is perhaps the most frequently used IBM Utility program ever, so you may as well know it.

*Definition: A flat file, also called a sequential file, is the simplest kind of file, similar to a text file on a PC; any file organized like a card deck, that is to say, no particular special organization, just one record after another.  The records have no keys, nothing fancy is going on, the dataset is not a library containing members.  You CAN treat one single member of a library as a flat file in many cases, including this example, by specifying the member name enclosed in parentheses at the end of the dataset name in the JCL, for example: //SYSUT1  DD  DISP=SHR,DSN=my.library.name(mymember)

Those ddnames are pronounced thus: SYS is like the first syllable of SYSTEM, and UT is like the first syllable of UTILITY; sometimes a gratuitous additional “TEE” sound is added, so it sounds more like SISS YOU TEE.  The digit is simply pronounced as itself.  So, SYSTEM UTILITY ddname ONE is SISS YOUT WUHN or SISS YOU TEE WUHN, whichever you find easier to say.  Other IBM-supplied programs commonly use this naming convention, often continuing on through SYSUT3 and SYSUT4.

In the case of IEBGENER (pronounce the first three letters I.E.B. and then say GENER as if it is the first part of GENERATION) the ddnames function as follows:

SYSUT1: Input file — the file which is to be copied

SYSUT2: Output file, copied from the input file

SYSIN:   Control statements, not usually used, but they can specify minor editing

SYSPRINT: System Print file, that is, information and error messages

Okay, so where does the job shown in Example 2 copy the input file?  To SYSUT2 – You can remember this with the silliness that the number TWO is pronounced the same as the word TO, so the data is copied TO sysut-TWO.

So, where is that, exactly?  INTRDR?  Yeah, that is the “internal reader”, a software implementation of a card reader.  This piece of JCL takes the job JCL from SYSUT1 and copies it into the internal reader, thus submitting the job to run, pretty much the same as if you had submitted it from a TSO session or read it into the system from real cards.

Sounds like it could be useful someday, having a job that submits another job, but what if you wanted to copy the input file to a disk dataset instead (which is something you’d probably want to do more often) ?

If you want to copy the data into an existing disk dataset and you want to overwrite any existing data that might already be there, replacing the existing data with the new data, then all the operands you need to specify are the DSN plus DISP=OLD

If instead you want to APPEND onto the end of any existing data that might already be in the dataset, then you say DISP=MOD (which is short for modify) instead of DISP=OLD

What if you want to create a brand new dataset? Do you really have to put in all that DCB and SPACE and all that stuff?  No, you do not (usually).

As long as you have some other dataset that is already configured in the same way as you want this one to be – same approximate size, same record length, basically similar – then you can use that dataset as a model by specifying the LIKE parameter (usually).  In addition to DSN=the.new.dataset, and DISP=(NEW,CATLG), you specify LIKE=the.name.of.the.model.dataset and that is all you need to put on the DD statement for SYSUT2 with IEBGENER (or pretty much any other place you want to create a new dataset):

//SYSUT2  DD  DISP=(NEW,CATLG),DSN=the.new.dataset,
//   LIKE=the.name.of.the.model.dataset

What else is there to say about JCL that would be considered basic?

The character set.  Dataset names, ddnames, step names, job names, and most JCL names of things can use the following characters:  All the uppercase-only letters, all the ten digits, plus the three symbols # @ and $ except in countries where the $ has been replaced by something else, such as the British pound sterling symbol.

Hence the $ is called the national character or national symbol, and the three special symbols together are sometimes called national symbols or national characters.  The very first character of a name generally cannot be a number, but must be one of the letters or acceptable symbols.

Lengths of things: Most things are limited to 8 characters in length, though a few are limited to 4 or 6, and with unprecedented largesse a PARM string has been allowed to be 100 characters in length.  A dataset name can be up to 44 characters long (for most datasets) but if longer than 8 characters, as most are, the name must be divided up into 8-byte portions separated by dots (decimal, full stop, point).  Each such section must begin with a letter or one of the three allowable symbols.  HFS files are an exception and follow their own naming conventions, which resemble UNIX file names.  The name of a disk or tape volume is limited to a length of 6, BUT the compensation for that restriction is that the name can also start with a number.  Names of tape volumes used to be commonly composed of six digits.  Other limitations, caveats etc. are listed in the IBM z/OS MVS JCL Reference Manual, which you can read online for free.

That’s it.  JCL, meet the reader.  Reader, meet JCL.  There, you’ve been introduced to JCL.  I think you’re going to get along.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Further reading for JCL topic(s):

IBM z/OS MVS V2R2 JCL Reference Manual (downloadable PDF)

http://publibz.boulder.ibm.com/epubs/pdf/iea3b611.pdf

 

IBM z/OS MVS V2R2 JCL User’s Guide (downloadable PDF)

http://publibz.boulder.ibm.com/epubs/pdf/iea3b510.pdf

 

SC23-6864-01, z/OS DFSMSdfp Utilities (online reading, not a PDF)

http://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idau100/toc.htm

IEBGENER in above book (online reading, not a PDF):

http://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.idau100/iebgenr.htm

 

DFSMS Using Datasets (online reading, not a PDF)

http://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.idad400/toc.htm

 

Index(es) of IBM pdfs for V2R2:

http://www-03.ibm.com/systems/z/os/zos/library/bkserv/v2r2pdf/

http://www-03.ibm.com/systems/z/os/zos/library/bkserv/v2r2pdf/#IEA

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

BLKSIZE, the Misunderstood and Abused JCL Parameter

BLKSIZE, the Misunderstood and Abused JCL Parameter

BLKSIZE, block size — What, you ask, is a block (in this sense) ? Glad you asked. A block is a bunch of records, like a handful or a scoop. A record, as we ordinarily think of it, is, in mainframe-speak, a Logical Record.  A bunch of such records, taken together, is a Physical Record, or block.

BLKSIZE (Block Size) is the reason for the B in FB. And in VB, FBA, VBA, FBM, FBSA, and any other letter combinations for RECFM (RECFM = Record Format) that include the letter B. The B means Blocked. The records are blocked together into groups, and each such group of records constitutes a block.  So when you read a record, the system actually grabs a big handful of records at once, then feeds them to your program one at a time. The system does one big READ instead of a bunch of little ones. Much more efficient. Runs a lot faster. It works the same way when writing records: the system sets aside an area (called a buffer) equal in length to your block size, and gradually fills it up with the records your program writes. When the buffer is full, the system writes it out and starts refilling the area again. You never see any of this happening.

Actually, the system sets aside at least two buffers for each file, so it can start using the second one as soon as the first one is full, without waiting for the WRITE (or READ) to complete.  But we were talking about the size of the buffer(s) — BLKSIZE — not the number of buffers, so back to that . . .

It used to be that you had to let the system know how many records to put into a block. You had to specify the size of the scoop. Yes, the size of the scoop, or block, not the number of records it would hold. (For example, 100 records of 80 characters each would amount to a block 8,000 characters (8000 bytes) in length, BLKSIZE=8000). YOU DO NOT HAVE TO DO THIS ANYMORE except in a few special cases. Almost always you can leave off the parameter BLKSIZE entirely, or, at worst, put BLKSIZE=0 to indicate that you didn’t just forget. The system will then figure it out for you. That’s the best approach. Don’t ever specify a BLKSIZE other than zero unless you have a good reason.

If you are writing a program and you find that when you specify a file definition within your program it seems as if you're being required to specify how many records are in a block, you can generally (and generally should) say zero.  BLOCK CONTAINS ZERO RECORDS, for example, in COBOL, or possibly BLOCK CONTAINS 0 RECORDS, with a numeric digit zero rather than the word ZERO.  It is important that you NOT specify some particular number of records other than zero, because when your program is compiled that file definition will be used to build a DCB for the file, and whatever is specified in that DCB will override anything specified in the JCL.  So don't do it.  Let the system figure out the BLKSIZE;  Let there be zero in the BLKSIZE in your compiler-constructed DCB.

The other important thing to know about BLKSIZE is that I/O operations are slow. (I/O = Input and Output, that is, READ and WRITE). For most types of Data Set, using the common access methods, one physical operation is required each time you read or write a block.

Hence, if you read 100 records from a Data Set with BLKSIZE=80, that causes a hundred physical READ occurrences compared to only one READ occurrence if BLKSIZE=8000, and performing 100 separate physical events takes 100 times as long — A hundred times as much elapsed time! — as doing it just once. (System-determined BLKSIZE for an FB 80 data set will usually be 27920, enough for 349 eighty-byte records per block. I just use 100 in the example because it’s easier to do the math in your head, hence it feels easier to relate to 100.)

Experiment:

Because people tend to have a bit of intuitive difficulty comprehending how very slow I/O is compared to almost anything else on a computer, I generally suggest to any doubters that they do the following experiment.

Go into ISPF 3.2 and allocate two Data Sets, both with Data Set type BASIC, which means ordinary flat files. Specify the SPACE in terms of tracks, allowing about 500 tracks for each Data Set. Both should have RECFM (Record format) FB, and LRECL=80. One should have BLKSIZE=80. The other should have BLKSIZE=0, which will give you a system-determined BLKSIZE that will usually be 27920, enough for 349 eighty-byte records per block. . . .

That done, Go into ISPF Edit on the latter empty Data Set, the one with BLKSIZE=27920. Overtype the left-hand line number field of the first line with R34899, indicating you want the line to be replicated 34899 times. Put in any data at all in the data portion of the line, and press enter. That should give you 34,900 identical records (100 full blocks). Save and end.

Next Edit your other empty Data Set, the one with BLKSIZE=80. On the command line, put in COPY 'XXX', but instead of XXX you say the name of the Data Set where you just put the 34,900 identical records. Press Enter. The same data should now appear here also. Save and end.

You should now have two Data Sets that are identical except for the BLKSIZE.

You might have already noticed that "Save and end" took a good bit longer on the BLKSIZE=80 Data Set. That was not a random quirk. Take turns going into Edit again on each Data Set repeatedly. It will take decidedly longer to get into edit on the BLKSIZE=80 file as compared to the other. If you SAVE the Data Sets again, that too will exhibit the same difference in response time. Amazing, right?

If you doubt this at all, don't take my word for it, try the experiment. Really. In that way you will get a feeling for the true meaning of BLKSIZE.

As a bonus, go to ISPF 3.4, put in a Data Set name pattern that will match your two new Data Sets, select "Initial View" as 2 (space), and press enter. When the two Data Set names show up, type F in the "Command" column to the left of each and press enter, hence freeing the unused portion of each Data Set's allocated space. Look at the space each one is using, under the "Tracks" column. Yes, the BLKSIZE=80 Data Set uses much more space than the other.

End of Experiment

The system does not choose 27920 with a random number generator. It picks that size for efficient use of disk space. Two such blocks will fit on one track of disk (for most disk space).

Space on disk, as you know, is generally measured in units of tracks. On all modern disks, fifteen tracks are equivalent to one cylinder.

EAV

The newest disks, called EAV (Extended Addressing Volumes), are a model of 3390 disk that consists mostly of an “extended area”, or "extended addressing area", on which space is allocated only in units of cylinders, and in most cases the allocation is rounded upward to the nearest 21 cylinders. (21 cylinders is the current value of the "multi-cylinder unit" that IBM uses.) The tracks are still there, but they’re considered to be like old halfpenny coins, not worth mentioning.

Before long most places will probably have the EAV disks.

If your Data Set is allocated in the extended area, and you use system-determined BLKSIZE, the system will subtract 32 bytes per block when calculating the optimal BLKSIZE for the data set.  The extra 32 bytes on disk are used by the system for control information (in the form of an invisible-to-you suffix following each block).  The block size calculation will probably have to subtract more than 32 for any specific case, though, because BLKSIZE must usually be an even multiple of logical record size (LRECL).  Hence your Data Set will have a slightly smaller optimal  BLKSIZE if it resides in the extended addressing area.  

The extended address area is also called EAS (Extended Addressing Space).  Also it can be called cylinder-managed space, because single tracks are not assigned to data sets there; the smallest amount of space allocated to a data set within EAS is generally 21 cylinders (the aforementioned multi-cylinder unit, which IBM currently sets at 21 cylinders.)  If you ask for one track, the system will round the request upward to 21 cylinders.  Whenever the data set grows and gets more space allocated, that secondary space is also assigned in multiples of 21 cylinders.

What happens if your Data Set is moved from someplace else and it goes into the extended area? If you are using system-determined BLKSIZE, that is, BLKSIZE=0 or unspecified, then the system will automatically recalculate the BLKSIZE when the Data Set is moved.  It will also reblock the data. No problem.

Where Does BLKSIZE come from?

Alas, you cannot always use system determined block size.

The system can get the BLKSIZE from your JCL, true. It can also get the BLKSIZE by looking at the existing BLKSIZE of a Data Set.

Your JCL, incidentally, takes precedence over what is already specified on an existing Data Set. So if you have an existing Data Set with BLKSIZE=800, you can change that whenever your program writes to the Data Set. Whatever you specify in the JCL will override what is already there.

That last point can lead to a problem and also to its solution. Occasionally it happens that someone has some old JCL that specifies some nonsense like BLKSIZE=3120, and they use this JCL to write into a member of a library (or PDS). A common example is old Compile-and-Link JCL. This causes the existing BLKSIZE (saved as a number in the label of the Data Set) to be changed to 3120. If it used to be bigger than that, you have a problem. Many existing members of the Data Set actually have blocks bigger than 3120, and when you try to read them thereafter you get an I/O error message. Oops.

Fortunately the solution is equally simple. You get similar JCL and change the 3120 to the correct number, that is, whatever it used to be. Or if it’s a PDS you can just run COMPRESS JCL with the BLKSIZE specified as the correct number. When the system writes into the Data Set, it will change the BLKSIZE back to whatever you specified. If you don’t want to write into the existing Data Set, you can make a copy of it by specifying the old, larger BLKSIZE on the input DD statement. The system will use the BLKSIZE you specify, and will not even look at the 3120 specified in the label.

But that was a digression.

There is a third place where the system can obtain the BLKSIZE. Sadly, this third choice takes precedence over what you specify in the JCL. The program you are running can have BLKSIZE specified – hard-coded – within the program.

If a program has BLKSIZE specified internally, there’s not much you can do about it.

Why does anyone specify BLKSIZE inside the program? Mostly because they are copying old program code that had BLKSIZE specified, written back in the day when BLKSIZE was required, and/or they thought they could simplify either the JCL or their program logic by hard-coding a number in the program. Oh well.

If you leave BLKSIZE off your JCL, or you specify BLKSIZE=0, then if the Data Set that is created has some weird BLKSIZE like 800 or 3120, it is because the program you were running specified BLKSIZE within the program.

BLKSIZE specified within a program overrides everything else. Sorry.

Other BLKSIZE considerations

The only time I ever specify a small BLKSIZE is when some job step uses a lot of memory and has a large number of DD statements for a lot of files that will be open at the same time, AND that job is getting 80A ABENDs, 0C4s, and similar problems. Smaller BLKSIZE then is a trade-off. Each open Data Set will have at least one buffer allocated in memory, and the size of that buffer is about the same as the size of one physical block. Smaller BLKSIZE means smaller buffers.

On rare occasions it is not possible to use system determined BLKSIZE because your system is not using SMS (System Managed Storage), or because the particular disk you are using is defined to the system as being excluded from the control of SMS. The first case is super rare. The second case occurs sometimes for disks that are shared between two separate z/OS systems, when the administrators (aka System Programmers) want to be sure that one of the systems does not move any of the Data Sets off the shared disks onto some other disk accessible to only one of the systems.

Oh, yeah, by the way, SMS not only determines block sizes for you, it does lots of other useful things too, such as migrating unused Data Sets and bringing them back again as needed. As must be with such things, occasionally SMS does something you didn’t want it to do. On balance, though, it is – well, not the best thing since sliced bread, but probably the best thing since HASP.

HASP? you might ask.  Decades ago some IBM customers in Texas sped up processing by creating a spooling add-on for the mainframe operating system.  IBM bought out HASP and made it part of the mainframe operating system(s), where it's current incarnations are now called JES, as in JES2 or JES3. The S in HASP stood for Spooling: Houston Automatic Spooling Priority subsystem. Prior to Spooling, each record in a printed output file was written directly to a printer, that is, the printer was treated the same as a disk or any other attached device. Early printers were generally very slow.  You can only imagine how much that slowed down a job's total elapsed run time, right? Good for you.  JES apparently stands for Job Entry Subsystem.  A job that enters into the z/OS system, as from a "card reader" or  an "internal reader",  or via the SUBMIT command in TSO,  is spooled while it awaits execution, just as output print is spooled — So the spooling system is sort of "two mints in one", spooling both input and output.  Spooling means it (the spooled material) is put into a big holding space on disk, and that holding place is called the Spool.  But enough about spooling.

Also note that the system-determined BLKSIZE will give you the best use of disk space. A badly chosen BLKSIZE can cause your data set to take up several times as much disk space as a properly allocated equivalent Data Set.

That’s it for today. Best thing you can do with BLKSIZE is to specify BLKSIZE=0 or omit BLKSIZE entirely whenever you can.  Let the system figure out what BLKSIZE to use.  It's one of the things the system does best.   

BLKSIZE  vs  LRECL

(addendum 10 May 2016)

What is the difference between BLKSIZE and LRECL?  you may ask. (Really, some people have asked that.  Or used it for a search term and gotten to this article thereby.)  So. Here goes:

A record, as you normally think of it, is called a Logical Record.   When you look at a dataset (file) in Edit or Browse, you normally see one record per line – one Logical Record.  When your computer program refers to reading or writing a data record, again that ordinarily means a logical record.

The size, or Length, of a Logical Record is LRECL  (Logical RECord  Length).

A Physical Record (a block) contains one or more Logical Records. Hence BLKSIZE (the size, or maximum size, for a block) is generally larger than the LRECL (the length, or maximum length, of a Logical Record).

BLKSIZE (Block Size) is related to the B in FB  — and in VB, FBA, VBA, FBM, FBSA, and all other RECFM combinations. The B means Blocked. The (logical) records are blocked into groups to form physical records.

What did that mean, you ask:  that LRECL is the “. .. maximum length, of a Logical Record”?

For Fixed Length records, the logical record length is the data length – it’s the length of what you see when you look at one line of the dataset in Browse or Edit.  The length of the line is the record length, it is the same for every record in the dataset, and it is the value you use for LRECL for fixed Length records.  Fixed-length records are any records where the RECFM is F, FB, FBA, FBM, FBS, FBSA – any RECFM with an F in it –the F stands for Fixed.

Different is the situation for Varying-length records.  When you specify LRECL (in JCL or in ISPF 3.2, etc), the data length you need to specify is the maximum length of any logical record that can be in the dataset, plus 4.  Yes, for varying-length records you have to add 4 to the maximum length of the longest line you can have,; This is the actual length of the logical record as it is stored on disk; Every such varying-length data record is prefixed with a four-character “Record Descriptor Word” (RDW) that contains the length of that particular data record on disk (inclusive of said RDW).   Each block in these datasets contains an additional extra 4 bytes similar to the RDW, but here it is called a BDW (Block Descriptor Word).  Varying-length records are any records where the RECFM is V, VA, VB, VBA, VBS, VBSA – anything with a V in it – the V stands for Varying.

. . . and Why did I say a physical block is generally larger than a Logical Record” ?   What “generally” ?  Like, not always?  Right.  There are two exceptions.  Short blocks and spanned records.

The S in VBS is for Spanned records.  Spanned in this sense means that an individual record can overlap blocks.  In the simplest case a spanned record resides partly at the end of one block, with the rest of the data deposited at the beginning of the next block.   So if a record is, say, 4,000 bytes long, but it is being written at the end of a buffer that only has 3,999 unused bytes empty at the end, then, rather than waste space, the system will write as much as it can into the last part of that block, and then put the rest of the data record at the beginning of the next buffer.  (When you open a dataset for output, the system sets aside at least one buffer for it in memory; the size of the buffer is equal to the BLKSIZE; when a buffer is full the buffer is written to disk as a physical block.)  In the not-as-simple case, a logical record can overlap several physical blocks.   Nowadays it is possible to have extremely long logical records if you use RECFM=VBS together with LRECL=X – In z/OS there is an actual limit on BLKSIZE of 32760, imposed by the design of the system.  For spanned records, the logical record length can exceed the BLKSIZE.  Hence was LRECL=X invented to allow longer records.  How does the system stitch these spanned records back together when you read them back?  For a dataset with spanned records, that is, with a RECFM containing both the letters V and S, the RDW contains more complex information beyond just the record length.

The second case is a simple case: the actual size of a block can equal the LRECL.  Or similarly, a block might contain a few records, but not enough to fill up the block.  Such a block is referred to as a “short block”, if you like to pick up jargon.

For Fixed-Length records (anything with an F in the RECFM value) or Unformatted (anything with U in the RECFM value), if a block contains only one record – maybe that’s all the data you have in the dataset – then for that one record the actual physical block on disk will be the size of that one record, regardless of what you have for BLKSIZE. (Or if the block contains two records, the size of the block will be equal to the length of those two records together, and so on for anything less than the maximum that would fit inside a full block.)  This situation usually occurs only for the last block written to the dataset.  

The LRECL also equals the BLKSIZE for RECFM=F, that is, Fixed unblocked records, one logical record per physical block.  Unless your record length is quite large, please don’t do that without a good reason.  It tends to make reading and writing the data quite slow.

This brings us to the peculiar RECFM called FBS.  This seems to be the only case where a letter in the RECFM value can have more than one meaning.  That letter of course is S.  For varying-length records, the S means Spanned, as just discussed.  For Fixed length records, the S means Standard.  FBS is Fixed Block Standard.  What does standard mean in this sense?  All the blocks in the dataset have to be the same length (a standard length) except for the last block, which can be a “short block”.  The last block was exempted from the requirement for obvious reasons : depending on the number of logical records to be written into the dataset, the final block might not be long enough to fill a block of the specified size.

Hence the RECFM=FBS format lends itself to a peculiar problem:  You cannot append data onto the end of it by writing additional blocks.  Well, you can, for example if you trick the system by specifying RECFM=FB in your JCL, or inside your program.  However, if another program then tries to read back the added data using RECFM=FBS mode, haha, they will crash when they come to the short block that used to be at the end but is now someplace in the middle.

So, hopefully you now understand the difference between LRECL and BLKSIZE, if that was puzzling you earlier.


Again, that's if for the article on BLKSIZE for today.

Reminder, bottom line:  Advice: imho, Use BLKSIZE=0 or leave BLKSIZE unspecified and let the system assign it.  I mean, as a general rule, unless you are using it for a specific (albeit rare) reason.

(end of May 10th addendum)