ASIC VGA Controller
By Mark Bowers and Michael Prechel
April 24, 2013
For our final project in Dr. Abdel-Aty-Zodhy's ECE585 VLSIC Design of Digital Chips at Oakland University, Michael Prechel and I designed a VGA graphics controller written in Verilog and implemented as an application-specific integrated circuit.
I had previously implemented VGA on 68HCS12 and PSoC 1 microcontrollers using minimal external components, by essentially emulating the VGA standard in software. This was an interesting proof-of-concept, but it takes a considerable amount of CPU resources (nearly 95% continuous utilization!) to meet the strict timing and bandwidth requirements of the VGA standard. Needless to say, if you want to properly drive a VGA monitor, it behooves you to use hardware specifically designed for this task. This was the motivation behind our design.
Our VGA ASIC continuously displays the contents of an SRAM display buffer to a monitor adhering to the VGA 640x480 standard. It is also provides a SPI interface through which the contents of the SRAM buffer can be modified by a microcontroller. Thus, the microcontroller is no longer responsible for producing the timing-sensitive VGA signal. Nor is it bogged down by merely displaying a static image; it is only busy when modifying the video buffer over the SPI interface.
VGA is essentially a massive multiplexed display. For a 640x480 display, the pixel is changed at a rate of 25.175MHz. Each line consists of 480 pixels, but also contains some “blanking” time (historically this was time needed for the CRT deflection coil to sweep back to the beginning for the next line.) There is also a blanking time necessary to return the coil to the top of the frame after reaching the bottom.
The following block diagram illustrates the architecture of the VGA controller:
The VGA controller continuously displays the contents of the VRAM. This is achieved with the use of a horizontal counter (controlling the timing of each line), and a vertical counter (controlling the timing of the frame.) The counter values are combined to yield the VRAM read address. The counter values also pass through comparators to produce “sync pulses” which are part of the VGA standard (used by the monitor to lock-on to the VGA signal.)
Meanwhile, the microcontroller can shift-in a 24-bit command via SPI, consisting of an X coordinate, Y coordinate and the color to be written to the display buffer. Upon completion of the SPI cycle, the memory controller interleaves in the pending VRAM write. It performs the VRAM write quickly between successive VRAM reads. VRAM reads pass through a register, so that writes (which happen asynchronously, relative to the display routine) will not cause any blips or artifacts on the monitor.
The SPI state machine stays in the "idle" state until an incoming SPI word has been shifted in. The SPI cycle begins when CS (chip select) is brought low. The master (microcontroller) then shifts in data using MOSI and SCLK. When CS is brought high, the cycle ends. At this time, the SPI controller then sends a VRAM write command, and waits for this write to be serviced. The write command is 24-bits, consisting of three bytes: color, X coordinate and Y coordinate. Once the write is completed, the system is again idle and can service another write. The “wait for write to be serviced” state is necessary so that the VRAM may be written to safely, without interfering with the display routine. The VRAM is being continuously juggled between (1) reading out to the monitor, and (2) possibly servicing a pending write.
The memory “juggling” behavior is achieved in the memory controller state machine. The current state is derived from the two least significant bits of HCOUNT.
In state 0, the DOUT driver is disabled (i.e. the RAM data bus is being read from), and the read address output is output. In state 1, the memory controller waits a cycle for the VRAM to fetch the data at that address. In state 2, the memory controller registers the VRAM read, and the color becomes visible on the VGA port (by registering the VRAM read, this ensures that the monitor will not see the write transaction.) Also in this state, if an SPI write is pending, the DOUT driver is enabled (i.e. the RAM data bus is being written to), the write address is output, and WEN is asserted. In state 3, WEN is deasserted. This state is necessary to meet the write hold timing, and to ensure that we do not inadvertently write at a read address.
Several logic simulations were performed to verify that the VGA timing and SPI interface worked as intended:
The Verilog design was then compiled for a Mercury FPGA development board, which contains a Xilinx Spartan-3A 200K FPGA, to emulate the behavior of the VGA ASIC. (The design only consumed 3% of the FPGA logic resources!) The Mercury board also contains a Cypress 512K x 8bit SRAM, which was used as the video buffer. It was connected to a Dragon12 HCS12 development board, which acted as the SPI master. An external 25.175MHz oscillator was used for the clock input.
The Verilog design was then compiled into standard cells using Leonardo:
The timing properties of the design were then analyzed. The critical path is 6.49 nanoseconds, yielding a maximum operating frequency of 154 MHz.
Below are some images of the logic schematic as generated from the Verilog code:
The standard cells were then placed using the auto-place feature:
Next, the standard cells were auto-routed:
Next, some checks were performed on the auto-routed design. No shorts were found! However, four nets "overflowed" by straying outside the specified area. This is tolerable because the specified area was conservative and there is still ample space for routing to the pads.
Next, Design Rule Checking (DRC) was performed. After running DRC, the auto-routed design showed 36 errors. Most of these errors were metal spacing errors, or vias placed too close together. Correcting these errors was a bit time consuming, but straightforward using skills we learned earlier in lab tutorials.
After correcting all the DRC errors, the design was measured to be approximately 2205 λ x 2625 λ. This fits easily in the 3000 λ x 3000 λ limit for our process.