Video Frame Buffer
Introduction
For our project, we chose to implement an LCD frame buffer using the AVR. The chip was to store the current state, display the stored state on the LCD as necessary, and accept external commands to modify the displayed state.
High Level Design
The basic design of our system was strongly driven by what we could achieve at lower levels. The display architecture, synchronization, and data transfer were all largely determined by the the available resources and the necessity for a reasonably fast display.We chose to have the entire display routine performed by a single
microprocessor in the main line loop. By doing this, we were able to refresh the display as close to the maximum possible rate, which is important in reducing the inherent flicker on the LCD. Unfortunately, this resulted in many inconvienient design issues. First of all, we needed to make use of an external SRAM in order to store the entire frame in memory. In order to maintain a decent frame rate, we needed to use the built-in memory addressing. As a result, from the onset we were short two full ports and two bits from PORTD.
The necessity of using part of PORTD for the memory access meant that we had to use PORTB to output the full 8 bits required by the LCD. But by doing this, we could no longer use the SPI as originally intended to transfer instructions. Instead, we need to multiplex the output lines to PORTB to transfer data. Ensuring that there are no drive fights on the busses required the use of a simple(in theory) two-wire asynchronous handshaking protocol between the display engine and the command issuing processor.
Program/Hardware Design
Design Issues
In the implementation of the test pattern display on the LCD, there were three major issues that had to be dealt with in order to get a functional configuration.The first major issue was the lack of any meaningful timing information. The vendor neglected to include in the documentation a proper timing diagram for the LCD. This required a rather lengthy online search through the SHARP web site until documentation for a similar product was uncovered.
The second critical issue was the timing and maximizing the refresh rate of the circuit. In order to maintain a fast frame rate, we needed to maximize the refresh rate of the system. In order to do this, we built the system externally and used a seperate crystal oscillator at 8-14MHz to drive the microcontroller.
Even with the faster oscillator, the flicker was still pretty bad. In order to further reduce the flicker, the inner display loop was unrolled 10 times. This avoided the loop overhead initially involved, and increased the final display rate by about 50%.
The final major issue was the lack of port pins on the AVR. Accessing the external memory requires the use of PORTA, PORTC, and the high two bits of PORTD. This left only PORTB suitable to transfer a full byte of data on each clock cycle. Unfortunately, this also meant that the SPI was not available for inter-processor handshaking, which makes the later multi-processor handshaking much more complex.
Initial Implementation
The initial implementation proceeded in steps. The first step was to display a single value that was hard wired into the program. This allowed us to verify that the LCD was working correctly. From here, a rather simple extension was made that stored a hard coded value into a single memory location. Finally, we loaded the actual pattern into memory and walked through the memory in the display routine. At this point, we could reliably display the memory pattern on the LCD.The second step was an attempt to implement the inter-processor communication for the instruction set. Initially, we had intended to put the display routine in an ISR and perform instruction decoding in the main-line loop. While this scheme had many advantages, there was one big disadvantage. This was that coupling became a significant problem in our design. Attempting to drive a breadboarded system at 2-4MHz leads to some very interesting coupling problems. Basically, if you walked too close, the system would not function properly. Obviously, this was not acceptable as a solution.
To combat this problem, we moved the display routine into the main program. By avoiding the need to distribute high-frequency signals that are crucial to the functioning of the design, we significantly improved the system reliability. However, this requires a different scheme for processing instructions.
Instruction transfer is accomplished through use of the external interrupt together with a two-wire handshaking protocol.
Revised Implementation
Now that we had an interface into the actual buffer and refresh processor, we attempted to add a sinple instruction set that would allow us to perform all the functions we initially planned for. Our initial attempt at this involved taking in 8 bits in parallel over a multiplexed port. Our instructions ranged in length from 1 byte and 4 bytes. 1 byte instructions were used for general functions such as flipFrame and clearFrame. The 4 byte instructions were needed to specify an address in the display memory to be written. We decided to only implement access by memory address because we did not want to burden the screen refreashing processor with the calculations needed for a xy-coordinate reference. The memory access functions would take in 1 byte of opcode, 2 bytes of address and 1 byte for a value. Depending on the opcode, the 1 byte value would either be explicity written into the location or be used as a mask for an and/or/not/xor function on the referenced location in display memory.The instruction set implemented on the buffer/refresh processor was never meant for use by the end user programmer. We would implement a much nicer interface for them by using a second processor to compute the necessary translation from xy-coordinate space to display memory location and then translate that into the buffer/refresh processor instruction set. For the sake of testability, we decided to use the UART as the user end interface to the secondary processor. Although the UART would not be able to transfer enough data to refreash the LCD at a reasonable animation quality, we decided that it would be good intermediate goal.
upon implementation, we ran into multiple issues with our proprietary parallel interface. Since we were running out of time fast, we decided to try to implement a smaller and simpler instruction set directly on the buffer/refresh processor.
What We Would Do Differently
although this project was not very code intensive, we believe that it is just barely out of the AVRs performance range. Mainly, we ran into almost all the physical limitations of the AVR. Our design would bave been much simpler if we just had more:- RAM: if we had more on-chip RAM, it would have eliminated our need for external RAM. Which caused a lot of problems, because the external RAM required the use of aprox 2.25 Ports (see next item)
- Ports/pins/external interfaces: if there were more ports, we could have implemented our interface a lot cleaner than we could with only 6 bits or having to multiplex ports. Also, if we could bave used the SPI it might have made our lives easier.
- Speed (say 20 Mhz): timing was fairly critical in this project. We had a lot of problems with flicker in the LCD. This could be eliminated by just refreshing the screen more often. in our extremely optimized refresh loop, we finally eliminated the flicker at about 16 Mhz. If we could have gotten 20 Mhz chips, we could eliminate flicker and have cycles to spare for some processing.
- Specialized Instructions: There were a couple of functions that we wish we had access to (or at least had optimized versions of). Although it would have been nice (but unreasonable to demand) to have access to some specialized signal processing instructions; however, functions such as multi-bit shifts, or more word based instructions would have made our jobs a lot simpler.
Appendix A: ASM Listings
- FINAL1.ASM : our initial code for displaying a simple test pattern on the LCD
- FINAL2.ASM : buffer/refreasher code with our proprietary parallel interface (to interface with synchm.asm)
- FINAL3.ASM : the single monolithic chip with both the buffer/refreasher and the UART external interface
- SYNCHM.ASM : our first pass at the secondary interface/processor chip
Appendix B: Diagrams
Initial Configuration
Initially, we just wanted to test the interface to the LCD. We were able to write a simple test pattern into the external RAM upon startup, then read it back to refresh the LCD. Using this configuration we determined that we need at least a 16 Mhz clock to refreash the LCD without noticable flicker. Clock frequencies of 10 Mhz are still acceptable for testing, but in certain lighting conditions, the flicker is noticible. at 8Mhz, the flicker is extremely noticible.
The LCD interface is essentially accessed in halves. Data is clocked in one byte at a time, but one nibble goes to writing the upper half of the display while the other nibble goes to writing the lower half.
The LCD also has special power requirements. We had to construct our own power supply to operate it.
Proposed Configuration
This is our original intended configuration. The two 4414s with the attached RAM had enough RAM capacity for 2 frames at 2-bit grayscale. The grayscale would be acomplished by alternating frames. Both the buffer chips would only have a very simple interface to the video memory. The separate 4414/8515 would be used to synchronize the two buffer chips and to translate more complex user instructions into the simple instructions the buffer chips can recognize.
With the resouces availible on the third 4414/8515 we thougth about the posibility of specializing it for sprites or hard wiring a character set into it. We also thought about implementing specific graphics instructions such as line drawing or triangle drawing. And at one time (when we were feeling exspecially masochistic) we even threw around the idea of implementing a simple rendering engine. However, since we scarecly had enough resources on the 4414, the feasibility of these enhancements is questionable.
the thing that is not specific from the above diagram is the organization of memory in relation to the display. There are several configurations each with it's own advantages and disadvantages.
- Each Buffer only drives half the display: That is, there is one MCU that drives the upper half of the display at all times, and one that drives the lower half at all times. This configuration contains a fairly simple decode function, but both processors are tied up for each frame.
- One Primary, One Secondary: This is where one MCU controlls the primary frame and the other the secondary frame. This also has an easy decode function, and has the advantage that only one processor is tied up at any one time. However, this configuration requires a fairly complex synchronization protocol. Also, it's fairly hard to read from the active frame to construct the secondary frame.
- Interleaved Frames: This is where each MCU has one frame of the 2 frames required for 2-bit color. This configuration requires that the frames refreshed alternate between the two buffer processors. The decode is easy, the load is on only one processor at any given time, and the data for the relevant active and secondary frames are local on the MCU. The main problem with this configuration is the synchronization.
- set_pixel_in_primary : sets one pixel value in the active frame
- set_pixel_in_secondary: sets one pixel value in the secondary frame
- stream_pixels_in_primary: stream in a number of pixel values starting at the specified address of the primary frame
- stream_pixels_in_secondary: stream in a number of pixel values starting at the specified address of the secondary frame
- clr_primary : clears primary frame
- clr_secondary: clears secondary frame
- read_primary: reads pixel value from primary frame
- read_secondary: reads a pixel value from secondary frame
- read_stream_primary: read a stream of pixel values starting at a given pixel from the primary frame
- read_stream_secondary: read a stream of pixel values starting at a given pixel from the secondary frame
- copy_Primary_2_secondary: copy the primary frame into the secondary frame
- copy+Secondary_2_primary: copy the secondary frame into the primary frame
- invert_primary: inverts all the pixels in the primary frame
- invert_secondary: inverts all the pixels in the secondary frame
- flip_frame: turns the secondary frame into the primary frame and vice versa
- reset: Resets the display
Revised Configuration
The main difficulty of this design is the multiplexing of a port. We developed a handshaking routine to manage this, but we're not sure if this worked. We managed to include the limited instruction set into the buffer chip and the interface processor, but for some reason, we could not get this to work properly. We believe there may be a problem with reading in multiple bytes over the proprietery parallel interface.
See FINAL2.ASM and SYNCHM.ASM for source code.