Week 10
Today is the last day of our internship, and here are a few concluding thoughts:
- We had a very good time, and learned a lot about hardware and processor design, Unix tools and kernel programming, and also gained skill in a new hardware description language (Bluespec SystemVerilog).
- The SoCKit board is a very good platform with high potential uses for both teaching and research purposes.
This is what we have been doing in this concluding phase of our work:
Lawrence:
- Modified the device drivers to handle multiple devices;
- FreeBSD driver now takes all the necessary information from the FDT (flattened device tree);
- Linux driver now takes its information from a config file which is parsed upon insertion of the module (can be provided as a command-line argument).
Petar:
- Completed the implementation of the testing scripts for the Bluespec TTCs, and created another version of the exercise which would encourage the students to implement caching;
- This involved creating a simple Bluespec component to introduce higher latency for memory accesses, and then implementing a simple direct-mapped cache on top of that for performance improvements (gotten within 1% of the version with fast memory access).
In addition, this week we have completed an initial version of our final report (in the form of a slide deck) which will be presented to Altera in November. This may be further adjusted in the following months.
Week 9
The penultimate week of our internship has just finished - here are our thoughts on the progress:
Lawrence:
- Ported the Ring Buffer device driver to FreeBSD.
- Made slight modifications to the tunnel code (which now also works on FreeBSD).
- Succesfully set up an SSH session between the ARM and the BERI.
- Started work on improving how the driver handles multiple devices (to avoid having to recompile for every new device).
Petar:
- Successfully demoed my tracing library to a colleague, and completely reorganised my side of the summer projects' repository;
- Finished working on pipelining the Bluespec TTC version: three versions were produced - a basic pipelined version and two versions utilising different methods of branch prediction and jump detection. In addition, several bugs were fixed with the original design and four new test benches have been made to test the correctness of the system.
- Started work on a testing script (with accompanying utility C programs) which could be used for measuring the performance of a Bluespec TTC implementation against several test benches. This could then be used for a full weekly exercise for the Masters students undertaking the Advanced Computer Design course. Most of the script is completed, but several aspects will likely need to be modified after review by a colleague.
Week 8
Some of our impressions from the eighth week can be found below:
Lawrence:
- Created a tap device (a layer 2 virtual ethernet device) to bridge ethernet between the ARM and the BERI. The device makes use of the Ring Buffer to transfer data.
- Mostly finished porting the Ring Buffer kernel driver to CheriBSD; successfully built but not tested yet. The same tap device as used on Linux should work on top of the ported driver.
Petar:
- Following further inspection of the tracing library, the previously found problems with data corruption have been rectified. Some additional testing with tracing larger components may be done next week.
- Massive refactoring and cleanup of the SystemVerilog Pipelined TTC from last week. It is now in a rather stable and presentable state and no major modifications are to be expected from this point.
- Following this achievement I started work on the Bluespec version of the TTC. Initially a lot of the effort was put into tidying up the sequential version (moving large bits of combinational logic into functions, fixing minor bugs, etc). Currently my efforts are to pipeline it - it is slightly difficult to make some aspects work and still appear as presentable code, so this is likely to take up some time next week.
Week 7
Here's a more detailed summary of the work we undertaken in the seventh week:
Lawrence:
- Further work on the ring buffer; it now offers more interfaces to Qsys;
- Modified BERI's console & debug channels to work with the ring buffer (FreeBSD is booting successfully on the BERI again);
- Further additions to berictl, more merging of changes & neatening of code;
- Started work on a layer 2 network tunnel, to give Ethernet access to the BERI.
Petar:
- For this week one of the goals was to produce a pipelined version the TTC 3 processor in SystemVerilog, to be used for teaching purposes (IB Computer Design course and ECAD practicals). I was successful in achieving this, adding forwarding paths to deal with data hazards and very simple branch prediction (recognising unconditional jumps) to deal with control hazards. The produced design takes ~1.8s less to draw the Mandelbrot set (with the ARM as the driver) than the non-pipelined version.
- The other half of the week was spent continuing my work on the tracing library. It has been found that my findings from last week were incorrect; the data would get corrupted much sooner, but the way I was handling the buffer was what made the output seem correct when examining arbitrary bits of the waveform. I have hence decided to extend the buffer size to 64K and focus on snapshots instead of continuous acquisition. Initially I also developed software-based triggering, which worked fine but had potential to miss the trigger if it were to happen as the buffer was prepared to catch the next snapshot. Hence I started work on a hardware-based programmable trigger in Bluespec. This component seems to be functional, however I'm sometimes getting corrupt data when running the trace reader more than once. This will be further examined next week.
Week 6
This was a week filled with kernel programming. Here's what was done:
Lawrence:
- Built a more generic Bluespec component implementing a ring buffer & the accompanying Linux kernel module as a device driver.
- I plan to utilise this to replace debug&console channels/use it for a network tunnel (allowing the BERI to have internet access).
Petar:
- I started off by moving the entirety of my communication channel's operations to kernel space, presenting a file for the user to read from/write into. Several tests were conducted and the overall round-trip bandwidth achieved from this was as high as 16 MB/s (under certain conditions).
- My next goal has been to utilise an existing Bluespec tracing library to create a C application that would allow the ARM side to extract tracing data and convert it into a format readable by GTKWave (similar functionality to SignalTap, but completely under Bluespec control). Currently the component being debugged is dumping trace data into Lawrence's ring buffer component every clock cycle, which is then continuously read off by the application for a specified number of time steps, and converted into VCD.
- Initial tests using a traced 8-bit counter revealed that data would start to get corrupted (i.e. the component would overflow the buffer faster than the ARM can read from it) after ~82μs (about 4100 clock cycles). I then placed the actual reading code in a separate thread to the processing code and connected them with a named pipe;
this has helped immensely as I still haven't been able to get the data to be corrupted on all the time step counts I tried (the maximum being ~200ms so far). I'll try even longer times next week, and then try to add other features such as triggering.
Week 5
We're now halfway through our internship - here are some of our impressions from the fifth week.
Lawrence:
- Successfully bridged BERI's debug unit over to the ARM and the Ethernet.
- Built a component that can be driven by the existing JTAG UART driver on BERI to allow serial communication to the ARM; used this to make a console for BERI.
- Merged all of my changes into a colleague's branch, waiting for push.
Petar:
- Over the weekend I have discovered Xillybus - a custom component that takes care of establishing a communication channel between the ARM and the FPGA, immediately presenting files on the ARM side and FIFOs on the FPGA side for easy usage. This will likely not be very useful because of its restrictive licensing, however I have conducted a few tests with it to see how high a bandwidth it can achieve - the results were in the range of 60-80 MB/s.
- Afterwards I started work on developing my own communication channel - to do this I utilised BRAMs on the FPGA side where data could be written from one side and read from the other. The sides notify each other of their status by using interrupts.
- I have eventually successfully developed and tested a communication channel library in C, with most of its actions done in userspace (only the interrupts are handled in kernel modules). Timing tests (using a NIOS processor on the FPGA side) revealed that the potential bandwidth (with already prepared data prior to sending) is ~10.5 MB/s in either direction.
- This is already likely to be useful, however it is possible to speed things up and/or make them simpler by writing more elaborate kernel modules that would present files that could simply be written into or read from in userspace. I have hence started to get more involved with kernel programming, and will likely use a significant part of next week trying to move as many actions of my library as possible away from userspace.
Week 4
Here's a summary of the work we undertaken in this busy week.
Lawrence:
- Last week I have synthesised BERI on the SoCKit board. This week, FreeBSD was successfully booted on BERI (after having obtained an appropriate kernel build).
- I successfully connected the debug channels for BERI with the ARM.
- The debug stream is not as efficient as hoped - I started working on improving its performance by modifying the C code controlling it.
Petar:
- Initially, the goal was to interface to PixelStream -- this was successfully performed and combined with the exercise ported last week, giving us useful output of the Mandelbrot set (with colours) over VGA. Tried a variety of configurations and resolutions - the highest achieved stable resolution is 1024 x 768.
- Afterwards I explored the possibilities of running an X server on the ARM. This might be useful to enable f.ex. the CHERI processor to launch X applications over SSH, exploiting the ARM for graphical processing power. This involved several subproblems:
- Successfully running an LXDE desktop on the ARM side - for this, an SD card image provided by Rocketboards was used.
- The image didn't contain sshd, so it was necessary to connect the ARM to the University network to download it. This required assigning the device a MAC address - as a base for that, I used Altera's Chip ID megafunction to extract the chip's unique identifier.
- Fixing all the required network configuration files to enable the board to access the internet.
Ultimately, an X server was successfully hosted on the ARM, and we successfully accessed it from several machines in the Computer Laboratory.
- For next week, the idea is to work on establishing a communication channel between the ARM and a processor on the FPGA -- will use the NIOS initially as it's easier to synthesise than the BERI.
Week 3
Lawrence and I have finally obtained additional SoCKits at the end of week 2, so we were finally able to pursue separate ideas. As such, the blog posts will from now on be split between us.
Lawrence:
- Conducted a few tests involving getting a Bluespec component to use the DRAMs on the FPGA.
- Spent most of the week trying to synthesise BlueVecII (a vector coprocessor) onto the FPGA. It initially couldn't fit, so had to be trimmed down; however, in the end the synthesis still failed due to the way it was configured. This idea might be briefly revisited.
- Successfully placed the BERI processor on the FPGA and managed to partially boot FreeBSD on it (waiting on a more suitable build of the kernel for this board).
Petar:
- Initial goal was to fix the kernel versions problem from last week that prevented us from compiling a correct interrupt-handling kernel module. After lots of reformats, I think we finally have a good custom SD card image, containing a manually-compiled kernel, and a choice of useful tools. A working example of an interrupt processing kernel module has since been built and successfully loaded; our own kernel module built last week currently has some undecipherable errors upon initialisation, we will possibly try to re-write it later.
- Main goal for the week was porting an exercise from the Cambridge ECAD practicals (can be accessed from this link) to the SoCKit, using the ARM as a substitute for the NIOS and (for now) the on-board LCD instead of the tPad's touchscreen. This involved reading through and understanding Altera's FIFO utilities and libraries for handling the ring processor for the TTCs (both written originally for the NIOS, in C) and rewriting them to be suited for the ARM. In addition I familiarised myself with the API for the LCD. The exercise was successfully ported.
- For next week, the main plan is to interface to PixelStream, a Bluespec component that could be used to communicate to VGA devices.
Week 2
During the second week of our internship, we have:
- Successfully achieved communication FPGA -> ARM, through directly writing to the ARM's SDRAM.
- Extended this to support a logical memory model for the FPGA; more precisely:
- On the ARM, we implemented memory-mapping page-sized chunks, linked together in a doubly-linked list structure (such that the start of each chunk is filled with information such as size and pointers to the previous and next chunk).
- From the FPGA side, we have designed a custom component in Bluespec to handle memory address translation. It consists of:
- An Avalon slave connected to the hps2fpga bridge, receiving the address of the first page in the list from the ARM.
- Another Avalon slave which can be linked to any component's master, receiving a request to write at a particular logical address (word-indexed from 0); this request is then translated into the actual address(es) on the ARM and sent to
- An Avalon master connected to the fpga2hps bridge, writing directly into the relevant page(s) in the linked list structure on the ARM's SDRAM.
- This is currently all implemented using polling, we're currently trying to get interrupts to work properly. We wrote a kernel module for the ARM that receives interrupts, and additional C code that processes them (by reading from a "file" provided by the module). However we didn't get it to work yet, probably due to different versions of the Linux kernel we have on the board and what was used to compile the module.
- We also obtained additional SoCKit boards, which should help speed up development and give us a wider range of things to look at.
Week 1
During the first week of our internship, we have:
- Attended Altera's training session in High Wycombe;
- Learnt the basics of Bluespec SystemVerilog and used it to program several example modules;
- Got all of Quartus' tools to work by locating missing libraries, fixing rules for USB device permissions, etc;
- Booted the ARM processor on the SoCKit using an SD card containing Linux;
- Interfaced to the ARM in two ways: initially using Minicom over a USB connection, now also via Ethernet+SSH to ease file transfer.
- Successfully programmed the FPGA using the ARM (on-boot) as well as JTAG.
- Established basic communication ARM -> FPGA, in the following manner:
- Wrote a simple BSV module implementing an Avalon Slave that takes a number and returns it tripled;
- Connected the module to an 'HPS' component in Qsys and programmed the full design to the FPGA (using the ARM);
- Wrote a C program that writes integers to a memory-mapped hps2fpga bridge;
- Successfully ran the C program on the ARM chip, getting tripled values back from the FPGA.
For the following week, we hope to:
- Achieve communication FPGA -> ARM (hopefully directly accessing SDRAM, maybe peripherals?);
- Obtain a second SoC board.
- Decide on a project.
Lawrence and Petar