Addresses on disk.

An address on disk is an odd sort of thing.

Not a number like a memory pointer.

More like a set of co-ordinates in three dimensional space.

Ordinary computer memory is mapped out in the simplest way, starting at zero, with each additional memory location having an address that is the plus-one of the location just before it, until the allowable maximum address is reached; The allowable maximum pointer address is limited by the number of digits available to express the address. 

Disks — external storage devices — are addressed differently.

Since a computer system can have multiple disks accessible, each disk unit has its own unit address relative to the system.  Each unit address is required to be unique.  This is sort of like disks attached to a PC being assigned unique letters like C, D, E, F, and so on; except the mainframe can have a lot more disks attached, and it uses multi-character addresses expressed as hex numbers rather than using letters of the alphabet.  That hex number is called the unit address of the disk.

Addresses on the disk volume itself are mapped in three-dimensional space.  The position of each record on any disk is identified by Cylinder, Head, and Record number, similar to X, Y, and Z co-ordinates, except that they're called CC, HH, and R instead of X, Y, and Z. A track on disk is a circle.  A cylinder is a set of 15 tracks that are positioned as if stacked on top of each other.  You can see how 15 circles stacked up would form a cylinder, right?  Hence the name cylinder. 

Head, in this context, equates to Track.  The physical mechanism that reads and writes data is called a read/write head, and there are 15 read/write heads for each disk, one head for each possible track within a cylinder.  All fifteen heads move together, rather like the tines of a 15-pronged fork being moved back and forth. To access tracks in a different cylinder, the heads move in or out to position to that other cylinder.  So just 15 read/write heads can read and write data on all the cylinders just by moving back and forth.  

That's the model, anyway.  And that's how the original disks were actually constructed.  Now the hardware implementation varies, and any given disk might not look at all like the model.  A disk today could be a bunch of PC flash drives rigged up to emulate the model of a traditional disk.  But Regardless of what any actual disk might look like physically now,  the original disk model was the basis of the design for the method of addressing data records on disk.  In the model, a disk is composed of a large number of concentric cylinders, with each cylinder being composed of 15 individual tracks, and each track containing some number of records. 

Record here means physical record, what we normally call a block of data (as in block size).  A physical record — a block — is usually composed of multiple logical records (logical records are what we normally think of as records conceptually and in everyday speech).  But a logical record is not a real physical thing, it is just an imaginary construct implemented in software.  If you have a physical record — a block — of 800 bytes of data, your program can treat that as if it consists of ten 80-byte records, but you can just as easily treat it as five 160-byte records if you prefer, or one 800-byte record; the logical record has no real physical existence.  All reading and writing is done with blocks of data, aka physical records.  The position of any given block of data is identified by its CCHHR, that is, its cylinder, head, and record number (where head means track, and record means physical record).  

The smallest size a data set can be is one track.  A track is never shared between multiple data sets.

The CCHHR represents 5 bytes, not 5 hex digits.  You have two bytes (a halfword) for the cylinder number and two bytes for the head (track) number.  

A "word", on the IBM mainframe, is 4 bytes, or 4 character positions.  Each byte has 8 bits, in terms of zeroes and ones, but it is usually viewed in terms of hexadecimal; In hexadecimal a byte is expressed as two hex digits.  A halfword is obviously 2 bytes, which is the size of a "small integer".  (4 bytes being a long integer, the kind of number most often used; but halfword arithmetic is also very commonly used, and runs a little faster.)  A two-byte small integer can express a number up to 32767 if signed or 65535 if unsigned.  CC and HH are both halfwords.  

Interestingly, a halfword is also used for BLKSIZE (this is a digression), but the largest block size for an IBM data set traditionally is 32760, not 32767, simply because the MVS operating system, like MVT before it, was written using 32760 as the maximum BLKSIZE.  Lately there are cases where values up to 65535 are allowed, using LBI (large block interface) and what-not, but mostly the limit is still 32760.  But watch the space; 65535 is on its way in; obviously the number need not allow for negative values, that is, it need not be signed.  End of digression on BLKSIZE.

There can be any number of concentric cylinders, but using the traditional CCHHR method you can only address a number that can be represented in a two-byte hex unsigned integer.  That would be 65,535, but in fact the highest cylinder address (now) on ordinary IBM disks is 65,520.  That is the CC-coordinate, the basis of the CC in CCHHR.

But wait, you say, you've got an entire 2 bytes — a halfword integer — to express the track number within the cylinder, yet there are always 15 tracks in a cylinder; one byte would be enough.  In fact, even  half a byte could be used to count to fifteen, which is hex F.  Right.  You got it.  What do we guess must eventually happen here? 

People want bigger disks so they can have bigger data sets, and more of them.  Big data.  You know how many customers the Bank of China has these days?  No, I don't either, but it's a lot, and that means they need big data sets.  And they aren't the only ones who want that.  I really don't want to think about guessing how much data the FBI must store.  What we do know is that there is a big – and growing – demand for gigantic data sets.

So inevitably the unused extra byte in HH  must be poached and turned into an adjunct C.  Thus is born the addressing scheme for the extended area on EAV disks (EAV = Extended Address Volumes).  So, three bytes for C, one byte for H ?  Well, no, IBM decided to leave only HALF of a byte — four bits — for H.  (As you noticed earlier, one hex digit — half of a byte — is enough to count to fifteen, which is hex F.) So IBM took 12 bits away from HH for extending the cylinder number.   Big data.  Big.

And you yourself would not care overly about EAV, frankly, except that (a) you (probably) need to change your JCL to use it, and (b) there are restrictions on it, plus (c) those restrictions keep changing, and besides that (d) people are saying your company intends converting entirely to EAV disks eventually.

Okay, so what is this EAV thing, and what do you do about it ?

EAV means Extended Address Volume, which means bigger disks than were previously possible, with more cylinders.  The first part of an EAV disk is laid out just like any ordinary disk, using the traditional CCHHR addressing.  So that can be used with no change to your programs or JCL.

In the extended area, cylinders above 65,520,  The CCHH is no longer CCHH. 

The first two bytes (sixteen bits) contain the lower part of the cylinder number, which can go as high as 65535.  The next twelve bits — one and a half bytes taken from what was previously part of HH — contain the rest of the cylinder number, so to read the whole thing as a number you would have to take those twelve bits and put them to the left of the first two bytes. The remaining four bits — the remaining half of a byte out of what was once HH — contains the track number within the cylinder, which can go as high as fifteen.

Says IBM (in z/OS DFSMS Using Data Sets):

     A track address is a 32-bit number that identifies each track
     within a volume. The address is in the format hexadecimal CCCCcccH.

        CCCC is the low order 16-bits of the cylinder number.

        ccc is the high order 12-bits of the cylinder number.

        H is the four-bit track number.

End of quote from IBM manual.

The portion of the disk that requires the new format of CCHH is called extended addressing space (EAS), and also called cylinder-managed space.  Cylinder-managed space starts at cylinder 65520.

Of course, for any space with an address below cylinder 65535, those extra 12 bits are always zero, so you can view the layout of the CCHH the old way or the new way there, it makes no difference.

Within the extended addressing area, the EAS, the cylinder-managed space, you cannot allocate individual tracks.  Space in that area is always assigned in Cylinders, or rather in chunks of 21 cylinders at a time.  The smallest data set in that area is 21 cylinders.  The 21-cylinder chunk is called the "multicylinder unit".

If you code a SPACE request that is not a multiple of 21 cylinders (for a data set that is to reside in the extended area), the system will automatically round the number up to the next multiple of 21 cylinders.

As of this writing, most types of data sets are allowed within cylinder-managed space, including PDS and PDSE libraries, most VSAM, sequential data sets including DSNLARGE, BDAM, and zFS.  This also depends on the level of your z/OS system, with more data set types being supported in newer releases.

However the VTOC cannot be in the extended area, and neither can system page data sets, HFS files, or VSAM files that have imbed or keyrange specified.  Also VSAM files must have Control Area size (CA) or Minimum Allocation Units (MAU) such as to be compatible with the restriction that space is going to be allocated in chunks of 21 cylinders at a time.  Minor limitations.

Specify EATTR=OPT in your JCL when creating a new data set that can reside in the extended area.   EATTR stands for Extended ATTRibutes.  OPT means optional.  The only other valid value for EATTR is NO, and NO is the default if you don't specify EATTR at all.

The other EAV-related JCL you can specify on a DD statement is either EXTPREF or EXTREQ as part of the DSNTYPE.  When you specify  EXTPREF it means you prefer that the data set go into the extended area; EXTREQ means you require it to go there.


Allocate a new data set in the extended addressing area

//DD1 DD DISP=(,CATLG),SPACE=(CYL,(2100,2100)),


Addendum 1 Feb 2017: BLKSIZE in Cylinder-managed Space

This was mentioned in a previous post on BLKSIZE, but it is relevant to EAV and bears repeating here.  If you are going to take advantage of the extended address area, the EAS, on an EAV disk, you should use system-determined BLKSIZE, that is, either specify no BLKSIZE at all for the data set or specify BLKSIZE=0, signifying that you want the system to figure out the best value of BLKSIZE for the data set.

Why? Because in the cylinder managed area of the disk the system needs an extra 32 bytes for each block, which it uses for control information. Hence the optimal BLKSIZE for your Data Set will be slightly smaller when the data set resides in the extended area.  The 32 byte chunk of control information does not appear within your data.  You do not see it.  But it takes up space on disk, as a 32-byte suffix after each block.

You could end up using twice as much disk space if you choose a poor BLKSIZE, with about half the disk space being wasted.  That is true because a track must contain an integral number of blocks, for example one or two blocks.  If you think you can fit exactly two blocks on each track, but the system grabs 32 bytes for control information for each block, then there will be not quite enough room on the track for a second block.  Hence the rest of the track will be wasted, and this will be repeated for every track, approximately doubling the size of your data set.  

On the other hand, if you just let the system decide what BLKSIZE to use, it generally calculates a number that allows two blocks per track. 

And when you use system-determined BLKSIZE  — when you just leave it to the system to decide the BLKSIZE — you get a bonus; if the system migrates your data set, and the data set happens to land on the lower part of a disk, outside the extended area, then if you have used system-determined BLKSIZE, that is, BLKSIZE=0 or unspecified, the system will automatically recalculate the best BLKSIZE when the Data Set is moved.  If the data set is later moved back into the cylinder-managed EAS area, the BLKSIZE will again be automatically recalculated and the data reblocked.

If in the future IBM releases some new sort of disk with a different track length, and your company acquires a lot of the new disks and adds them to the same disk storage pool you're using now, the same consideration applies: If system-determined BLKSIZE is in effect, the best BLKSIZE will be calculated automatically and the data will be reblocked automatically when the system moves the data set to the different device type.

Yes, it is possible for a data set to reside partly in track-managed space (the lower part of the disk) and partly in cylinder-managed space (the EAS, extended address, high part of the disk), per the IBM document.  

You should generally use system-determined BLKSIZE anyway.  But if you’re using EAV disks, it becomes more important to do so because of the invisible 32-byte suffix the system adds when your data set resides in the extended area.

[End of Addendum on BLKSIZE]

References, further reading


z/OS DFSMS Using Data Sets


Disk types and sizes

a SHARE presentation on EAV

EAV reference – IBM manual
z/OS 2.1.0 =>
z/OS DFSMS Using Data Sets =>
All Data Sets => Allocating Space on Direct Access Volumes => Extended Address Volumes

IBM manual on storage administration (for systems programmers)
z/OS DFSMSdfp Storage Administration

The 32-byte per block overhead in the extended area of EAV disk (IBM manual):

z/OS DFSMS Using Data Sets ==>
Non-VSAM Access to Data Sets and UNIX Files ==>
Specifying and Initializing Data Control Blocks ==>
Selecting Data Set Options ==>
Block Size (BLKSIZE)
Extended-format data sets: In an extended-format data set, the system adds a 32-byte suffix to each block, which your program does not see. This suffix does not appear in your buffers. Do not include the length of this suffix in the BLKSIZE or BUFL values.”

One comment on “CCHHR and EAV

  1. Ian, Thank you for the ideas. Not sure what you mean by "sys outside" — Maybe you mean the situation where multiple z/OS MVS systems are linked by JES2 shared spool? Actually the TSO commands XMIT and RECEIVE would be good there too. Okay, well, I'll give that some thought — Job statement parameters, spooling, connectivity. Shared DASD maybe, FTP even. Thanks for that.

Comments are closed.