**NOTE**I stopped updating these pages in 2011. Please visit my new home for more recent information.- Next document: Penguin - a secure system for distributed scripting
- Up one level: Documents;
- Go all the way back to my home page.

Humberto Ortiz Zuazaga

- Introduction
- Methods
- Results
- Discussion
- Complete source code
- Data structures
- Heap nodes
- Heap
`CreateHeap()``DestroyHeap ()`

- Heap manipulation
`Enqueue()``sift95down()``Dequeue()`

- Testing
`ShowHeap()``VerifyHeap()`- Test driver

- Makefile

- Data structures

The implicit binary heap is a simple and natural representation on which to base a priority queue. Extremely efficient implementations of the abstract operations`Enqueue()`

and`Dequeue()`

should be possible, and most textbooks have several tips for implementing queues in this way. I present an efficient implementation of priority queues based on this data structure, and compare my performance with previous work. My queues are significantly faster than previously reported data. Since my code is freely distributable, I hope it can be used as a benchmark in future priority queue studies.

In the handout for the Term Project for CS 5633, Dr. Das discusses
implicit binary heaps and rejects them as an alternative thusly:
``... but we still cannot take the CLR solution, because [of] the
requirement that the heap must be maintained as a complete binary tree
so that it can be mapped to an array. This precludes us from using
unknown heap sizes that is unknown at compile time.''
I had already developed a dynamically allocated priority queue for a
research project at Cincinnati **[cite SAT]**, and decided to develop an
implementation that overcame Dr. Das's objection.

Cormen *et al.* show in their textbook **[cite CLR]**, show how to
insert into and delete elements from dynamically allocated tables
(*i.e.* arrays) where no more than a constant fraction of the
space is unused, with an amortized cost of *O(1)*. These dynamic
tables form the basis of my implicit heap.

A more serious objection to implicit heap implementations of priority
queues is that the worst-case performance is *O(n)* for both the
`Enqueue()`

and `Dequeue()`

operations. It can be claimed that
the expected time for an `Enqueue()`

is *O(1)* when the distribution
of priorities in the queues does not have a very low bias
(*c.f.* **[cite RA]**). Nevertheless, the `Dequeue()`

is
inescapably *O(n)*.

It is my belief that simple data structures and algorithms may be implemented efficiently with less effort than that needed for optimizing implementations of queues with more desirable big-Oh access times. In the case of implicit binary heaps, very efficient machine level code can be used to implement the equivalent of ``tree rebalancing'' operations which have no direct machine-level code.

The code was developed and tested on `aleph.uthscsa.edu`, a
PCI-bus i486DX/2-66 computer with 16Mb of RAM. This machine runs
linux 1.2.13. All the programs were compiled with gcc v2.7.0, using
the flags
`-O6 -fomit-frame-pointer -DNDEBUG`

.
Some tests were also performed on `wayside.cs.utsa.edu`, a 4
processor Sparc running Solaris 2.4. Programs on the sparc were
compiled with gcc v2.6.4.

I measured the time it took several implementations of queues to
perform a fixed number (usually 1,000,000) of *holds*, where I
dequeue an element, modify it's priority according to a certain
function, and then enqueue the same element again.

The test driver is included in the source code in the appendix. It
uses a dummy timing loop to find the overhead of random number
generation and bookkeeping, and subtracts that time from the time to
perform the holds. The code is portable to any UNIX system that
supports the `times()`

interface (*e.g.* SVID, AT&T, POSIX,
X/OPEN, BSD 4.3).

My first implementation was derived from the pseudocode routines
in **[cite CLR]**. Below is the actual code fragment for the
`Heapify()`

routine. The heap implementation described here
is based on a structure that stores the size of the
allocated space, the index of the last node in the heap, and a pointer
to a dynamically allocated array of HeapNodes.

<Heap structure>= typedef struct { size_t size; size_t last; struct heapnode *heap_ptr; } Heap;

Heap nodes could be arbitrarily complex, but for our simulations we only used a structure with two fields: the priority, and a bogus data field.

<Heap node structure>= typedef struct heapnode { double priority; long int data; } HeapNode;

Given a heap `heap`

and an offset `i`

into the heap, and a
condition that the subtrees rooted at `Left(i)`

and `Right(i)`

are
heaps, the `Heapify()`

routine lets the node `heap[i]`

trickle down to
the correct position in the heap by exchanging it with the larger of
it's two children if necessary, and then calling itself on the subtree
it just altered.

<Heapify sample code>= void Heapify (Heap *heap, int i) { int l, r, largest; if (0 >= heap->last) /* empty heap! */ return; l = Left (i); r = Right (i); if ((l <= heap->last) && (heap->heap_ptr[l].priority > heap->heap_ptr[i].priority)) largest = l; else largest = i; if ((r <= heap->last) && (heap->heap_ptr[r].priority > heap->heap_ptr[largest].priority)) largest = r; if (largest != i) /* node is smaller than it's child */ { swap (heap->heap_ptr + i, heap->heap_ptr + largest); Heapify (heap, largest); } }

I ran some preliminary tests on these queues, and found that 26.13%of the time is spent in `Heapify()`

. This suggested the first
improvement to the algorithms. This implementation of `Heapify()`

,
although correct, is recursive. One of the exercises in the text is
to rewrite this function iteratively. The textbook by Brassard and
Bratley **[cite B+B]** has such an algorithm, and I implemented it also.

<sift down>= void sift_down (Heap *heap, int i) { int k, j, n; k = i; n = heap->last; do { j = k; if ((Left(j) <= n) && (heap->heap_ptr[Left(j)].priority > heap->heap_ptr[k].priority)) { k = Left(j); } if ((Right(j) <= n) && (heap->heap_ptr[Right(j)].priority > heap->heap_ptr[k].priority)) { k = Right(j); } swap (heap->heap_ptr + j, heap->heap_ptr + k); } while (j != k); }

Both the `Heapify()`

and `sift_down()`

routines repeatedly call
`swap()`

to exchange the positions of two heap nodes. This copying
can consume large amounts of time. By adding an extra level of
indirection to the heap, the amount of copying can be reduced to a
single pointer versus an entire record. In applications where each
heap node is large, this can be a significant savings. On the other
hand, the extra level of indirection can slow down the heap
operations.

In this implementation, the declaration of a heap becomes:

<Heap of pointers structure>= typedef struct { size_t size; size_t last; struct heapnode **heap_ptr; } Heap;

The actual changes to the code are minor, as an example, here's a
line from `sift_down()`

:

<Sample heap of pointers code>= if ((Right(j) <= n) && (heap->heap_ptr[Right(j)]->priority > heap->heap_ptr[k]->priority)) {

(*i.e.* replace the structure references ```.`

'' with pointer
references ```->`

''.)

My results are summarized in Figure [->]. Even for queues
with 100,000 elements, I can perform a hold in approximately 55
microseconds on my home computer. The same operation can be done in
approximately 17 microseconds on `wayside`. Converting the
recursive calls to `Heapify()`

to calls to the iterative
`sift_down()`

function resulted in a 26%speedup on queues with
100,000 elements (curve not shown).

Encapsulated PostScript for Figure 1

Average hold times over 1,000,000 holds for queues of various sizes. Each point represents the average of 10 trials run for 1,000,000 iterations each.

Combining

`sift_down()`

with converting to a heap of pointers
results a 40%improvement over the naive CLR heaps for hold times on
queues with 100,000 elements. I call this implementation the Modified
Heap. With very small queue sizes, the overhead of managing the
memory for the nodes and dereferencing the pointers to them makes this
implementation slightly slower.
I could not perform experiments on queues with more than 200,000 elements or so, because the queues consume all free memory on my home machine. The hold times increase precipitously when swapping occurs while traversing the heap.

I also have some limited results on timing the individual
`Enqueue()`

and `Dequeue()`

operations. Profiling studies using
gprof indicate that `Enqueue()`

takes approximately 10 microseconds
(0.01 msec) in both the CLR and Modified implementations. The
`Dequeue()`

operation takes roughly 40 microseconds in the CLR heap,
and 20 microseconds in the Modified heap. Unfortunately, the
resolution of the timer used by gprof on `aleph` seems to be a
hundredth of a millisecond. It does, however confirm the times measured
across the loops.

I have demonstrated that efficient, dynamically resizable implicit
binary heaps can be used as a foundation for priority queues. The
average time per hold on stock microcomputer hardware is much better
than that reported in **[cite RA]**. My computer is in fact faster than
the Sequent Symmetry S81 used by Ronngren and Ayani to perform their
experiments, but they claim nearly 400 microseconds are required per
hold on a queue with only 1,000 elements. My implementation can
perform a hold in 20 microseconds on a queue of that size, a 20-fold
difference.

The current implementation of priority queues could be improved upon. Memory management for the individual heap nodes is unnecessarily complex. The code for dynamic tables could be borrowed and used to allocate contiguous arrays of heap nodes, freeing the user from the bookkeeping details.

In addition, since the heap now consists entirely of word sized
pointers, the routines to swap two elements could be coded as machine
level `xor`

instructions instead of generalized assignments. This
could lead to a further increase in speed.

The complete source code for the queues is available via the World Wide Web from:

`http://www.neurobio.upr.clu.edu/~hortiz/software/EPQ/`

It is my hope that this implementation can serve as a benchmark for other queue implementations, and that it can be eventually extended to support more abstract queue operations.

**[2]** G. Brassard and P. Bratley. *Algorithms: Theory
&Practice.* Prentice Hall, Englewood Cliffs, New Jersey, 1988.

**[3]** T. H. Cormen, C. E. Leiserson, and R. L. Rivest.
*Introduction to Algorithms.* MIT Press, Cambridge,
Massachusetts, 1990.

**[4]** R. Rönngren and R. Ayani. A Comparative Study of
Parallel and Sequential List Algorithms. In *Performance
Issues in Discrete Event Simulation*.

<Copyright notice>=(U-> U->)Copyright (C) 1995 by Humberto Ortiz Zuazaga. Copyright (C) 1993, 1994 by Fred Annexstein, John Franco, Humberto Ortiz-Zuazaga, Manian Swaminathan This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Since this is a derived work, it also falls under the GPL.

This file was prepared with `noweb`

, a literate programming system
available from:

`http://www.eecs.harvard.edu/~nr/noweb/`

The .tex, .c, and .h files are all extracted from a single source document. Here are the top down specifications for the .c and .h files.

<*>= /*<Copyright notice>*/ #include <assert.h> #include <stdio.h> #include <stdlib.h> #include "heap.h"<local definitions><local function prototypes><functions>

The header chunk exports externally visible data structures and functions. Any objects with global linkage should be collected and published here.

<header>= /*<Copyright notice>*/ #ifndef HEAP_H #define HEAP_H typedef short boolean; #define FALSE 0 #define TRUE !FALSE<Data structures><external function prototypes>#endif /* HEAP_H */

<Data structures>=(<-U)<Heap node><Heap>

<Heap node>=(<-U)typedef struct heapnode { double priority; long int data; } HeapNode;

Defines`HeapNode`

(links are to index).

<Heap>=(<-U)typedef struct heap { size_t size; size_t last; struct heapnode **heap_ptr; } Heap;

Defines`Heap`

(links are to index).

`CreateHeap()`

`size`

, in fact pointers to `size + 1`

nodes are allocated but `heap_ptr[0]`

is unused. The heap is empty
after creation, the user must fill in the weights and heapify the
array by hand, or use `Enqueue()`

.

<external function prototypes>=(<-U)[D->]Heap *CreateHeap (size_t size);

<functions>=(<-U)[D->]Heap *CreateHeap (size_t size) { Heap *heap; assert (0 < size); heap = (Heap *) malloc (sizeof (Heap)); if (NULL != heap) { heap->heap_ptr = (HeapNode **) calloc (size + 1, sizeof (HeapNode *)); if (NULL == heap->heap_ptr) { free (heap); heap = NULL; } else { heap->size = size; heap->last = 0; } } return heap; }

Defines`CreateHeap`

(links are to index).

`DestroyHeap ()`

<external function prototypes>+=(<-U)[<-D->]void DestroyHeap (Heap * heap);

<functions>+=(<-U)[<-D->]void DestroyHeap (Heap *heap) { if (NULL != heap) { if (NULL != heap->heap_ptr) { while (0 < heap->last) { if (NULL != heap->heap_ptr[heap->last]) { free(heap->heap_ptr[heap->last]); } heap->last--; } free (heap->heap_ptr); } free (heap); heap = NULL; } }

Defines`DestroyHeap`

(links are to index).

<local definitions>=(<-U)#define Left(i) ((i) << 1) #define Right(i) (((i) << 1) + 1) #define Parent(i) ((i) >> 1)

Defines`Left`

,`Parent`

,`Right`

(links are to index).

`Enqueue()`

<external function prototypes>+=(<-U)[<-D->]void Enqueue (Heap *heap, HeapNode *newnode);

<functions>+=(<-U)[<-D->]void Enqueue (Heap *heap, HeapNode *newnode) { int i; assert (NULL != heap); assert (NULL != newnode); heap->last++; /* Is the heap full? */ if (heap->last > heap->size) { /* resize heap */ heap->heap_ptr = realloc (heap->heap_ptr, sizeof (HeapNode *) * (heap->size * 2 + 1)); if (NULL == heap->heap_ptr) { fprintf (stderr, "heap: error resizing heap, cannot insert.\n"); exit (1); } heap->size = heap->size * 2; } i = heap->last; while ((i > 1) && (heap->heap_ptr[Parent(i)]->priority < newnode->priority)) { heap->heap_ptr[i] = heap->heap_ptr[Parent(i)]; i = Parent(i); } heap->heap_ptr[i] = newnode; assert (VerifyHeap(heap, 1)); }

Defines`Enqueue`

(links are to index).

`sift_down()`

<local function prototypes>=(<-U)void sift_down (Heap *heap, int i);

<functions>+=(<-U)[<-D->]void sift_down (Heap *heap, int i) { int k, j, n; assert (NULL != heap); assert (1 <= i); k = i; n = heap->last; do { j = k; if ((Left(j) <= n) && (heap->heap_ptr[Left(j)]->priority > heap->heap_ptr[k]->priority)) { k = Left(j); } if ((Right(j) <= n) && (heap->heap_ptr[Right(j)]->priority > heap->heap_ptr[k]->priority)) { k = Right(j); } heap->heap_ptr[0] = heap->heap_ptr[j]; heap->heap_ptr[j] = heap->heap_ptr[k]; heap->heap_ptr[k] = heap->heap_ptr[0]; } while (j != k); assert (VerifyHeap (heap, i)); }

Defines`sift_down`

(links are to index).

`Dequeue()`

<external function prototypes>+=(<-U)[<-D->]HeapNode * Dequeue (Heap *heap);

<functions>+=(<-U)[<-D->]HeapNode * Dequeue (Heap *A) { HeapNode *max; assert (NULL != A); if (0 >= A->last) { fprintf (stderr, "Extracting from empty heap!\n"); return NULL; /* empty heap! */ } max = A->heap_ptr[1]; A->heap_ptr[1] = A->heap_ptr[A->last]; A->last--; sift_down (A, 1); /* B+B */ /* Check if the heap is less than a quarter full. */ if ((A->last * 4) < A->size) { /* resize heap to a half */ A->heap_ptr = realloc (A->heap_ptr, (1 + A->size / 2) * sizeof(HeapNode *)); if (A->heap_ptr == NULL) { fprintf (stderr, "error resizing heap.\n"); return NULL; } } return max; }

Defines`Dequeue`

(links are to index).

`ShowHeap()`

<external function prototypes>+=(<-U)[<-D->]void ShowHeap (Heap *heap);

<functions>+=(<-U)[<-D->]void ShowHeap (Heap *heap) { int i; assert (NULL != heap); puts("Heap status."); for (i=1; i<= heap->last; i++) { printf ("Heap [%d] = pri %f, val = %ld\n", i, heap->heap_ptr[i]->priority, heap->heap_ptr[i]->data); } }

Defines`ShowHeap`

(links are to index).

`VerifyHeap()`

<external function prototypes>+=(<-U)[<-D]int VerifyHeap (Heap *heap, int i);

<functions>+=(<-U)[<-D]int VerifyHeap (Heap *heap, int i) { boolean l_stat, r_stat; assert (NULL != heap); assert (0 < i); if (0 >= heap->last) return TRUE; /* empty heap trivially correct */ if (heap->last >= Left(i)) l_stat = (heap->heap_ptr[i]->priority >= heap->heap_ptr[Left(i)]->priority) && VerifyHeap (heap, Left(i)); else l_stat = TRUE; /* no left children */ if (heap->last >= Right(i)) r_stat = (heap->heap_ptr[i]->priority >= heap->heap_ptr[Right(i)]->priority) && VerifyHeap (heap, Right(i)); else r_stat = TRUE; /* no right children */ return (l_stat && r_stat); }

Defines`VerifyHeap`

(links are to index).

<test>= #include <assert.h> #include <math.h> #include <stdio.h> #include <stdlib.h> #include <sys/times.h> #include "heap.h" #include "r250.h" int main (int argc, char *argv[]) { int n, m, i; Heap *heap_ptr; HeapNode *newnode; struct tms start_tms, end_tms; clock_t start0, start1, end0, end1; double inc; if (3 != argc) { fprintf (stderr, "Usage: %s n m.\n", argv[0]); fprintf (stderr, "Perform hold test m times on queue of size n.\n"); exit (1); } n = atoi (argv[1]); /* size of queue */ if (0 >= n) { fprintf (stderr, "queue size is negative!"); exit(1); } if (1000000 <= n) { fprintf (stderr, "%d is too large for the queue.", n); exit(1); } m = atoi (argv[2]); /* number of iterations */ if (0 >= m) { fprintf (stderr, "too few heap operations"); exit(1); } if (10000000 <= m) { fprintf (stderr, "too many heap operations"); exit(1); } /* Bogus loop, for startup timings */ newnode = (HeapNode*) malloc (sizeof (HeapNode)); start0 = times (&start_tms); heap_ptr = CreateHeap (1); for (i = 1; i <= n; i++) { newnode->priority = dr250 (); newnode->data = i; } end0 = times (&end_tms); DestroyHeap (heap_ptr); /* Build the heap, should do some dequeues too. */ start1 = times (&start_tms); heap_ptr = CreateHeap (1); for (i = 1; i <= n; i++) { newnode = (HeapNode*) malloc (sizeof(HeapNode)); if (NULL == newnode) { fprintf (stderr, "no more memory for heap nodes"); exit(1); } newnode->priority = dr250 (); newnode->data = i; Enqueue (heap_ptr, newnode); } assert (VerifyHeap (heap_ptr, 1)); end1 = times (&end_tms); #ifndef NDEBUG printf ("Start time = %ld, end time = %ld.\n", start1, end1); printf ("Bogus start = %ld, Bogus end = %ld.\n", start0, end0); #endif printf ("Build heap elapsed time = %ld clock ticks at %d ticks/sec.\n", (end1 - start1) - (end0 - start0), CLOCKS_PER_SEC); #ifndef NDEBUG ShowHeap (heap_ptr); #endif /* bogus timing loop */ start0 = times (&start_tms); for (i = 1; i <= m; i++) { /* Dequeue */ do { inc = dr250 (); } while (inc < 0.001); log (inc); /* Enqueue */ } end0 = times (&end_tms); start1 = times (&start_tms); for (i = 1; i <= m; i++) { newnode = Dequeue (heap_ptr); assert (NULL != newnode); assert (VerifyHeap (heap_ptr, 1)); #ifndef NDEBUG printf ("node.priority = %f, node.data = %ld.\n", newnode->priority, newnode->data); #endif do { inc = dr250 (); } while (inc < 0.001); newnode->priority -= log (inc); Enqueue (heap_ptr, newnode); assert (VerifyHeap (heap_ptr, 1)); } end1 = times (&end_tms); #ifndef NDEBUG ShowHeap (heap_ptr); #endif printf ("HOLD elapsed time = %ld clock ticks at %d ticks/sec.\n", (end1 - start1) - (end0 - start0), CLOCKS_PER_SEC); for (i = 1; i <= n; i++) { newnode = Dequeue (heap_ptr); free (newnode); } DestroyHeap(heap_ptr); return 0; }

Here's a sample makefile for testing the queue. It assumes you have noweb installed in your path.

<Makefile>= # Makefile for noweb. # Humberto Ortiz Zuazaga 1995/3/1. NOWEAVE = noweave NOTANGLE = notangle CC = gcc CFLAGS = -O6 -fomit-frame-pointer -DNDEBUG .SUFFIXES: .c .nw .tex .dvi .html .ps .nw.html: ; $(NOWEAVE) -filter l2h -index -html $*.nw > $*.html .nw.tex: ; $(NOWEAVE) -delay $*.nw > $*.tex .nw.c: ; $(NOTANGLE) -L $*.nw > $*.c .nw.h: ; $(NOTANGLE) -L $*.nw -Rheader > $*.h .tex.dvi: latex '\scrollmode \input '"$*"; while grep -s 'Rerun to get cross-references right' $*.log; do latex '\scrollmode \input '"$*"; done .dvi.ps: ; dvips $*.dvi -o $*.ps all: driver report.ps: report.dvi report.dvi: report.tex report.tex: report.nw driver: driver.o heap.o r250.o $(CC) $(CFLAGS) -o $@ driver.o heap.o r250.o -lm heap.o: heap.c heap.h driver.c: report.nw $(NOTANGLE) -L $< -Rtest > $@ heap.c: report.nw $(NOTANGLE) -L $< -R\* > $@ heap.h: report.nw $(NOTANGLE) -L $< -Rheader > $@ clean: rm -f *.o *~ core

*<*>*: D1*<Copyright notice>*: D1, U2, U3*<Data structures>*: U1, D2*<external function prototypes>*: U1, D2, D3, D4, D5, D6, D7*<functions>*: U1, D2, D3, D4, D5, D6, D7, D8*<header>*: D1*<Heap>*: U1, D2*<Heap node>*: U1, D2*<Heap node structure>*: D1*<Heap of pointers structure>*: D1*<Heap structure>*: D1*<Heapify sample code>*: D1*<local definitions>*: U1, D2*<local function prototypes>*: U1, D2*<Makefile>*: D1*<Sample heap of pointers code>*: D1*<sift down>*: D1*<test>*: D1

- CreateHeap: U1, D2, U3
- Dequeue: U1, D2, U3
- DestroyHeap: U1, D2, U3
- Enqueue: U1, D2, U3
- Heap: U1, U2, U3, U4, D5, U6, U7, U8, U9, U10, U11, U12, U13, U14, U15, U16, U17, U18, U19, U20
- HeapNode: U1, D2, U3, U4, U5, U6, U7, U8
- Left: U1, U2, D3, U4, U5
- Parent: D1, U2
- Right: U1, U2, U3, D4, U5, U6
- ShowHeap: U1, D2, U3
- sift_down: U1, U2, D3, U4
- VerifyHeap: U1, U2, U3, D4, U5

Humberto Ortiz Zuazaga

Most recent change: 2007/9/3 at 22:04

Generated with GTML