Mandelbrot Explorer v3 for the Mercury FPGA board
March 9, 2016
When I was a kid, my grandpa introduced me to the Mandelbrot set, an infinitely beautiful fractal image, all based on a short equation: Zn+1= Zn2+ C. I had no idea what that equation meant at the time, but those striking images on my grandpa's Mandelbrot slides stuck with me.
Years later, I went on to study Computer Engineering at Oakland University, and the Mandelbrot set popped into my mind again. In our digital logic course in Fall 2009, my group and I were trying to decide what to do for our final VHDL project. The majority of the other groups were making simple games for their projects, but we decided to build an FPGA-based Mandelbrot explorer! Our design was simple: at its heart were two parallel "Mandelbrot cores" that each calculated one iteration per clock cycle, for a single coordinate. It used a 32-bit fixed point number system, and was capable of running at 25MHz. It wasn't the most efficient design, using up most of the Xilinx Spartan-3E 500k gate FPGA, but it worked well.
Next year, in 2010, I decided to port the project over to a much larger Altera Cyclone II 35K LE FPGA. Version 2 ended up being an almost complete rewrite, with a new architecture based around a pipelined multiplication unit. The larger FPGA also meant I could increase the fixed point number system to 72-bits wide (this width was selected to consume an optimal number of 18x18 hard multipliers on the Altera FPGA.) The new pipelined architecture meant that it could be clocked at 100MHz, and it ended up yielding a throughput of 1.5x greater than the previous design, and with a 2.25x wider number system, too.
This year, I decided to revisit my FPGA Mandelbrot explorer and improve it even further!
Mandelbrot explorer version 3 targets the Xilinx Spartan-3A 200K gate FPGA on the Mercury FPGA development board, built by my company, MicroNova LLC. This new version 3 is has a number of performance improvements and new features:
- Calculation pipeline
Version 3 uses an 8-stage calculation pipeline, operating at 35-bit fixed point resolution (width selected to maximize the hard multiplier utilitzation on the Spartan-3A). Each flow through the 8-stage pipeline corresponds to a single iteration of the Mandelbrot calculation. When a calculation reaches the end of the pipeline, it either leaves the pipeline (freeing up a new calculaton slot) or takes another trip thru the pipeline if it needs another iteration. This is a significant improvement over version 2 which only had a partially-pipelined architecture. The multiplier (the most logic-intensive portion of the Mandelbrot calculation) was pipelined, but it still had a state machine in charge of time-multiplexing the inputs to the pipelined multiplier, and gathering the resulting outputs at the right time, with the add/sub/comp operations done on the side. Version 2 was faster than the simple single-cycle architecture of version 1, but there was still definitely room for improvement.
- Calculation dispatcher
Version 3 has a simple, efficient dispatcher that keeps the calculation pipeline fed with new coordinates at all times. The previous two versions had a very naive form of parallelism. They worked on pixels in groups; version 1 worked on pixels in groups of two, while version 2 worked on pixels in groups of eight. However, due to the way the state machine was built, it could not proceed to the next group until all pixels in the current group were completed! This meant there was a good chunk of under-utilized processing power.
- Coordinate system
Version 3 has more efficient coordinate plane hardware. With version 3, each time a frame is rendered, it first calculates the Mandelbrot coordinate plane and keeps it in two block RAMs on the FPGA (one for X and one for Y). These block RAMs are then used to translate screen coordinates (800x600 pixels) to Mandelbrot coordinates (-2.0 to 2.0 in 35-bit fixed point). These block RAMs are used during rendering of the Mandelbrot set, and when a user clicks (to determine the Mandelbrot coordinates at the cursor). In contrast, version 2 computed Mandelbrot coordinates on-the-fly while rendering, but did not store them, so it had to re-calculate to find the Mandelbrot coordinates at the cursor. Version 1 used a simplistic coordinate converter component that mapped screen coordinates to Mandelbrot coordinates combinationally (this used a lot of logic!)
- Julia set
Version 3 supports the Julia set too! The Mandelbrot set has a different C term for each pixel in the frame. The Julia set, meanwhile, is almost the same but instead keeping the C term constant across the entire frame. The user can now click the middle mouse button to select which point to use for the C term, and jump to the Julia set for that point. Pressing the middle mouse button again brings you back to the Mandelbrot set.
- Color palettes
Version 3 has wide variety of palettes to choose from. The user can select a palette using two buttons to cycle up/down thru the palette list. The color shifting feature has also been improved: the color shifting rate can be varied using the potentiometer (using Mercury's on-board SPI ADC), and a button can be used to toggle the direction that the color shifts.
Those are all of the improvements, but there are still a few things left on my to-do list for version 3.1... In particular, I would like to add a DCM component to my project and see if I can bump the calculation logic up from 50MHz to 100MHz (it appears from the compilation report that it should be able to run at 100MHz). This isn't simply a matter of plopping down a DCM; the VGA clock cannot run at 100MHz, so my design will have multiple clock domains and I need to take care that any transitions are made appropriately. Having a 100MHz clock will also let me built a more intelligent VRAM arbiter, to interleave VRAM reads and write semlessly (currently writing into external VRAM interrupts the display of the VRAM contents to VGA, leading to some slight salt-and-pepper noise during the few seconds while a frame is rendering.) This will require some extra thought for clock domain transitions (as the VGA clock needs to remain at 50MHz).
I plan to update this page with new versions and more documentation, as time allows.