swap space??

Fri Aug 23 09:37:01 2002

Good discussion here. I'd like to throw in a couple more points.

On Monday 19 August 2002 03:28, Rick Thomas wrote:
> As others have said, how much space to allocate to swap is very
> dependent on your configuration and workload.  So it tends to be a
> religious issue.  Here are some facts based on 25 years of UNIX system
> administration experience.

Well, I don't agree that it's a religious issue, but it's a technical issue 
based on OS behavior and your workload.

One needs to differentiate between swap and page-out behavior. "Swapping", in 
the traditional sense, has tended to mean the wholesale copy of a program's 
data pages to the swap/page device. (You don't need to copy out shared code 
pages as they are read only and/or may be in use by another process).

Paging, on the other hand is the copy out of selective pages (usually based on 
least-recently used algorithms). Conceptually, they're similar, but swapping 
is the most extreme behavior.

Not being a Linux kernel guru, I can only speak from my experiences in 
operating systems (UNIX and others). Most modern systems, when there's a 
memory shortfall, will attempt to "steal" least recently used pages from 
other processes - it's less I/O than forcing out an entire process image. If 
this doesn't work, then a process is selected for swap-out. A little paging 
is OK, it just means that memory is being optimally used. Continual memory 
shortfalls which force processes to be swapped in and out of disk as they 
need to run causes the poor performance ("thrashing") behavior.

The question now becomes how do you allocate that space on the swap devices 
for non-resident pages. As Rick has explained, you can:

1) Do it up front as the operating system expands the process' address space. 
In UNIX this happens predominantly when an sbrk or brk call is made (initial 
stack/data allocation, or as a response to no more memory being free blocks 
available during a malloc call). The OS then allocates the requested memory 
chunk in the swap space. So if you need to swap out, you're guaranteed that 
the space is there.  If you make a request for more memory and the OS can't 
allocate it out of the swap file, will an out-of-memory indication will be 
returned to the requesting program.

2) Allocate swap on page or swap-out. I've heard this called "lazy swap 
allocation"  When the brk/sbrk call is made above, the OS simply gives the 
process an memory area, but no swap is allocated. What happens when you get 
into a situation where a page needs to be paged in, and the OS has to free up 
memory? The first step is to select a process, then write to the page out to 
disk. If there's no space allocated in the swap file for that page, then get 
some. If this swap space allocation fails - some process will get an 
out-of-swap error. Some systems can't guarantee which process will get the 
error - the process that wants a page-in request, or the one that's selected 
for page-out. Not very nice if the process that's selected is an important 
service.

So, the safe swap allocation mechanism is to have up-front swap allocation. 
But it's wasteful of disk resource.

For most workstations (general use) the "lazy" allocation works well. You only 
allocate what you estimate will be the shortfall between memory demands and 
physical RAM. But for servers, I'd want up-front allocation to get a bit more 
deterministic behavior.

Some OS's will allow you to select which behavior you want.

> 2) Today, all those factors have changed.  So modern UNIX systems don't
> allocate swap space until they need it -- on the expectation that they
> will never need it.  Cheap RAM means you don't want to swap at all if
> you can avoid it.  Cheap, fast, disk means that even if you do swap
> occasionally, it won't be a big deal.
>
> Frequent swapping is still going to be a problem and should be addressed
> by system tuning (such as buying more RAM, or spreading the swap space
> out onto multiple disks) or tuning the workload.  (There is usually much
> more to be gained by tuning the workload than by throwing resources at
> an algorithm that scales poorly!)

Reminds me of a saying... nothing helps virtual like real. If you're concerned 
about performance, you do NOT page. Period. Spreading the swap space onto 
multiple disks only masks the problem - your memory problem has now just 
become an I/O problem, and I/O is nowhere near as fast as memory access.

- Paul

---------
Paul E. Rockwell
paulrockwell@mac.com