Mandelbrot Explorer v1 for the Xilinx Spartan-3E

By Mark Bowers, Andrew Ullmann and Bryant Jones
December 18, 2009

Mandelbrot Set

For our final projects in ECE378 (Digital Logic and Microprocessor Design) and ECE495 (Advanced Embedded Systems) at Oakland University, our group designed an FPGA-based Mandelbrot generator. The three-person team consisted of myself, Bryant Jones, and Andrew Ullmann.

To begin, it's necessary to introduce what a Mandelbrot is. A Mandelbrot is a fractal (named for its discoverer, Benoit Mandelbrot) produced by performing the calculation Z n+1= Zn2 + c recursively upon coordinates in the complex plane. If, after a number of iterations, the absolute value of Zn2+ c is found to be greater than 2, it escapes the Mandelbrot set. If it never reaches this value, then it belongs to the Mandelbrot set. The boundary between the points that belong to and escape from the Mandelbrot set forms an infinitely detailed and self-similar structure known as a fractal. One can zoom infinitely along this border, limited only by (1) number of iterations taken for each point and (2) the precision of the numbers used in the calculation.

I first became introduced to the world of Mandelbrots by my grandpa, who was a math professor (and later, vice-president) at OCC. He gave presentations on Mandelbrots to illustrate the beauty of mathematics, showing how this very simple equation gives rise to such intricate and awe-inspiring detail. He would prepare slides ahead of time for presentations because they'd take days to render on the computers available at the time! The same calculations fly by in seconds on today's computers (or faster, as you'll see on our custom hardware!)

To start, we researched how one would go about plotting the Mandelbrot set by hand, and after familiarizing ourselves with that, wrote the following pseudocode:

```xC = x;  -- current x coordinate
yC = y;  -- current y coordinate
xT = x;  -- x βtempβ
yT = y;  -- y βtempβ

while(done == 0)
{
if (xT^2 + yT^2 > 4) or (iterations == 255)
{
done = 1;
}
else
{
xNew = (xT^2 - yT^2) + xC;
yNew = (2*xT*yT) + yC;
xT = xNew;
yT = yNew;
iterations++;
}
}
```

From this, we built a small program in Visual C++ to verify our algorithm. The above process was performed on each pixel on a 640-by-480 plane, with the maximum iterations per pixel set to 255. The number of iterations taken determines the color of that particular pixel. Our program worked as expected, producing the familiar shape:

Mandelbrot Set (C++ test application)

Mandelbrot Set Zoomed (C++ test application)

After we knew the algorithm worked properly in software, our focus then shifted to doing the same thing in hardware. For this, we used an FPGA, or field-programmable gate array.

An FPGA is made up of LEs (logic elements) that contain, among other things, LUTs (look-up tables) that have an input (usually 3 lines) and a programmable output table for every possible input (2^3 = 16 possible outputs). By filling this table with the appropriate values, you can encode for any possible logic combination. The LEs (which also contain programmable registers, carry in/out lines, etc) are then connected together with a configurable interconnecting bus. The result is that by configuring all the various components within the chip, you can build your own digital hardware! One of the FPGAs we used for this project (the Spartan IIIe on the Nexys2 board) can implement designs with an equivalent of 500,000 logic gates, which is pretty awesome.

The FPGA is programmed with a hardware description language. (In our case, we used VHDL.) Hardware description initially looks a lot like conventional programming, but it's really very different. Instead of writing instructions for a processor to execute sequentially, you're describing hardware to be implemented at a digital level. You have to carefully think of things in both a structural sense (how all the logic of your circuit behaves combinationally) and in a temporal sense (how the various components are timed and behave sequentially). Once you get the hang of it, it's really a lot of fun!

We then looked for a way to express the very small numbers necessary for Mandelbrot calculations in VHDL. We found a library consisting of IEEE's proposed additions to VHDL with capabilities manipulating fixed and floating point numbers, and used this to build a component we referred to as a "Mandelbrot core", which performs a single iteration of the core calculation per clock cycle on a given point. We played with the data width and settled on 32-bit fixed point numbers as the most that our chip's area could handle. Two 32-bit fixed cores along with all of the coordinate-conversion and plotting components consumed around 88% of the available logic on the chip. For comparison, to use 64-bit fixed would have consumed 117% of the logic for one core, and 32-bit floats would have consumed 160% of the logic for one core! Thus, area ended up being a pretty significant limitation.

Mandelbrot Processor Diagram

A "Mandelbrot processor" component was then built, which consisted of two of our 32-bit fixed "Mandelbrot cores" and logic to convert memory coordinates (320 by 480) into Mandelbrot coordinates (32-bit fixed numbers ranging from -2 to 2 in either direction).

The external cellular RAM has memory configured in 16-bit words, and our display component alternatively reads (through a tristate buffer) the left and right bytes of these words to display even and odd pixels. For example: memory address (160, 0) contains a 16-bit word, which has the data for 8-bit color pixels at screen coordinates (320,0) and (321,0). Our Mandelbrot processor performs the calculations for two pixels concurrently, and when both are done, writes that word to memory. Our project therefore makes use of parallel processing to cut our render time roughly in half!

Then, using an example from our textbook (Learning by Example using VHDL by Richard E. Haskell and Darrin M. Hanna, my professors for EGR240 and ECE378, respectively), as a starting point, we built a VHDL process to go through the coordinates and plot the Mandelbrot. The state diagram is shown below.

Plotting State Machine

Another VHDL process was built to allow the user to zoom with the use of a mouse input. Registers outside of the Mandelbrot processor component store the current center location as well as the current zoom radius. These are fed into the Mandelbrot processor's coordinate converter to determine where to calculate within the coordinate plane. The input for coordinate converter is multiplexed between the plotting process (inputting memory coordinates for the purpose of plotting) and the zooming process (inputting mouse coordinates for the purpose of zooming). When a click is detected, the coordinate mux is switched to the zoom process, the current mouse coordinates are fed into the processor, which then outputs Mandelbrot coordinates for that point, storing them in the center point register. If this was a left click, then the radius is halved (zooming in by 2x), and if this was a right click, then the radius is doubled (zooming out by 2x). After the center point and radius have been modified, then we tell the plot process to start drawing again based on these new parameters.

Zooming State Machine

If you want to try out our hardware, please check out the bit file below! It is suitable for programming to a Nexys2-500 board. A PS/2 mouse is used to zoom in and out with a left and right click, respectively, and the first switch turns on palette cycling. Video output uses the VGA port at 640x480 resolution, at 60Hz.

Running on a Nexys 2 FPGA board

Mandelbrot home screen, as rendered on an FPGA

Sample Zoom Sequence

Below is a video demonstrating our FPGA-based Mandelbrot generator in action.
Be sure to check out the trippy color cycling feature at 1:22!

Next, for ECE495, we implemented the same thing using a soft microprocessor on the FPGA. In this case, we used the Nios II running on the Cyclone II FPGA on an Altera DE2 board.

Rather than having to design our own processor and graphics hardware, we're able to let the design software take care of all of that. Altera's SOPC Builder program allows us to select the features we need on our processor, as well as peripheral hardware that the processor needs to interface with the outside world. All the hardware is generated by the design program according to our requirements and downloaded to the FPGA. Then, it's simply a matter of writing code in the Nios IDE and downloading it to the board as if it were a microcontroller!

Unfortunately, the educational-licensed SOPC components were somewhat limited, so graphics could only be displayed with a very limited pallete. We were free to design our own SOPC component to handle drawing (which would have been relatively simple to do), but due to time constraints, we settled for the Altera provided one. Note that it still goes through all of the iterations (255) and outputs to the same resolution (640x480).

Home screen rendered on the Nios II version

Zoomed on the Nios II version

Initially when we configured and ran our Mandelbrot generator for the Nios II, we were shocked to find that it took almost an hour to plot the home screen! We later found a hardware floating-point accelerator for the Nios II from Altera that we could import into the SOPC Builder to remedy this speed problem. After adding this and recompiling, we got a much nicer render time of 23.8 seconds!

Here's how all the various Mandelbrot versions stack up against each other, performance-wise. "Render time" here indicates the time to draw the "home screen" which is exactly the same across all versions, as is the max iterations count (255) and resolution (640x480). The Visual C++ test program was run on a 2GHz single-core Pentium M.

Clock Speed Render Time Cycles Used
VHDL-based generator 25MHz 0.2 seconds 5,000,000
Nios II (hardware floats) 100MHz 23.8 seconds 2,380,000,000
Visual C++ test program 2GHz 3.3 seconds 6,600,000,000
Nios II (software floats) 100MHz 54 minutes, 30 seconds 327,000,000,000

The purely hardware version is the fastest by far, and makes the most efficient use of clock cycles. The next fastest was the Visual C++ test program, but it took about 3 times as many clock cycles as the Nios II with floating-point hardware. The Nios II with software-emulated floating-point was over 130 times slower than the Nios II with the floating-point hardware added! The moral of the story is: the more you can afford to do with custom/specialized hardware, the better!

Zoom scale of our Mandelbrot generator
(relative to a 19" monitor)

One final interesting note: For our VHDL-based Mandelbrot generator, we're able to zoom in 23 times before we start to lose precision with our 32-bit fixed point number representation. Each zoom doubles the area of the total Mandelbrot: 1...2...4...8...etc, to a total zoom of 2^23 = 8,388,608 times! Suppose you have a 19" monitor. At this magnification, you'd need a monitor over 2500 miles across to see the full Mandelbrot, which is the size of the continental United States! (And this is only with 32-bit fixed numbers. Floating point implementations are capable of zooms of galactic proportions!)

All in all, this was an incredible project to work on. I had a great team to work with, and we had a lot of fun designing both hardware and software to explore the wonderful world of Mandelbrots!

UPDATE 2009/10/10:
Check out version 2, targeting an Altera Cyclone II FPGA!

UPDATE 2016/3/10:
Check out version 3, targeting a Xilinx Spartan-3A FPGA on the Mercury development board!