Written by Noda. (2017/10/19追記) \\\\
OpenClで高位合成したい人間のためのメモ。
これでとりあえず足し算くらいはできる。 This is a simple guide for implementing on Arria 10 SoC using the high-level synthesis environment "Intel FPGA SDK for OpenCL". By reading this memo, you can implement a simple addition circuit on Arria 10 using the environment. If you want to optimize your code, please read the following references. Introduction †Arria 10 SoC is a system on chip with ARM CPU and FPGA provided by Intel.
The strength of the Arria10 is that a hard macro (DSP) for float operation embeds on the FPGA. Please note that the DSP does not support double operation. How to develop †Preparation for implementation †The sample addition code is placed in the following directory. /home/asap2/noda/arria_test There are two directories "add" and "common". The addition code is in "add", and it will not move without the "common" directory. Copy the "arria_test" directory and move to the "add" directory. Below, let's assume that your current directory is "add". Also, we will ssh and scp to Arria 10 later, but before that you have to put your ".ssh/id_rsa.pub" in authorized_keys of Arria10. If you email asap@am.ics.keio.ac.jp with your public key, we will add it. Then you can connect to Arria 10. Emulation †Before implementing the circuit in the FPGA, we debug on a CPU, neutrino.
First, after ssh to neutrino, copy "~noda/.bash_profile" and source it. bash-4.1$ ./emu_go aoc: Environment checks are completed successfully. You are now compiling the full flow!! aoc: Selected target board a10soc_2ddr aoc: Running OpenCL parser.... aoc: OpenCL parser completed successfully. aoc: Compiling for Emulation .... aoc: Emulator Compilation completed successfully. Emulator flow is successful. To execute emulated kernel, invoke host with env CL_CONTEXT_EMULATOR_DEVICE_ALTERA=1 <host_program> For multi device emulations replace the 1 with the number of devices you which to emulate Initializing OpenCL Platform: Altera SDK for OpenCL Using 1 device(s) EmulatorDevice : Emulated Device Using AOCX: add.aocx Arria 10 SoC Turn_around_Time: 0.712237 ms Kernel time (device 0)(getStartEndTime): 0.619050 ms Output: 93.649620 Reference: 93.649620 Verification: PASS You can check the flow of calculation on the CPU. You must debug the host and kernel code until the code works properly. bash-4.1$ cd device/ bash-4.1$ ./emu_resource aoc: Environment checks are completed successfully. aoc: Selected target board a10soc_2ddr aoc: Running OpenCL parser.... aoc: OpenCL parser completed successfully. aoc: Compiling.... aoc: Linking with IP library ... +--------------------------------------------------------------------+ ; Estimated Resource Usage Summary ; +----------------------------------------+---------------------------+ ; Resource + Usage ; +----------------------------------------+---------------------------+ ; Logic utilization ; 2% ; ; ALUTs ; 1% ; ; Dedicated logic registers ; 1% ; ; Memory blocks ; 3% ; ; DSP blocks ; 0% ; +----------------------------------------+---------------------------; aoc: First stage compilation completed successfully. aoc: To compile this project, run "aoc add.aoco" The float operation automatically uses the DSP. In the table above, the DSP usage rate is 0%, but the circuit size is too small, it seems that the DSP is used properly. Compile kernel code †First, execute the shell code "aocl_shell" and launch the Altera Embedded command shell. bash-4.1$ ./aocx_go aoc: Environment checks are completed successfully. You are now compiling the full flow!! aoc: Selected target board a10soc_2ddr aoc: Running OpenCL parser.... aoc: OpenCL parser completed successfully. aoc: Compiling.... aoc: Linking with IP library ... +--------------------------------------------------------------------+ ; Estimated Resource Usage Summary ; +----------------------------------------+---------------------------+ ; Resource + Usage ; +----------------------------------------+---------------------------+ ; Logic utilization ; 2% ; ; ALUTs ; 1% ; ; Dedicated logic registers ; 1% ; ; Memory blocks ; 3% ; ; DSP blocks ; 0% ; +----------------------------------------+---------------------------; aoc: First stage compilation completed successfully. aoc: Hardware generation completed successfully. When compilation starts, the directory "to_a10soc" specified in the shell code is created. It contains an intermediate file "add.aoco" and a directory "add" containing various data. After compilation, a binary file "add.aocx" is generated in "to_a10soc". Transfer aocx file and host code to Arria 10 †After compiling the kernel code, transfer the generated aocx file and host code (uncompiled) to Arria 10 with scp. Here we transfer to arria 10 using the shell "go_scp" in the directory "to_a10soc". Please change the transfer destination by yourself. ./to_a10soc/go_scp Compile the host code on Arria 10 †Ssh to Arria 10. ssh root@131.113.69.239 Currently, everyone is Superuser, so you have to be careful about your actions. source ~/init_opencl.sh After that, you move to the transfer destination directory in Arria 10. In this example, "~/test/" contains "aocx file" and "main.cpp", and a previously prepared "Makefile". root@Arria10_linaro:~/test/test_add# make clean root@Arria10_linaro:~/test/test_add# make all ../common/src/AOCLUtils/opencl.cpp: In function ‘void* aocl_utils::alignedMalloc(size_t)’: ../common/src/AOCLUtils/opencl.cpp:55:49: warning: ignoring return value of ‘int posix_memalign(void**, size_t, size_t)’, declared with attribute warn_unused_result [-Wunused-result] posix_memalign (&result, AOCL_ALIGNMENT, size); ^ ../common/src/AOCLUtils/opencl.cpp: In function ‘bool aocl_utils::setCwdToExeDir()’: ../common/src/AOCLUtils/opencl.cpp:278:14: warning: ignoring return value of ‘int chdir(const char*)’, declared with attribute warn_unused_result [-Wunused-result] chdir(path); ^ Then, a directory "bin" is created. Inside there is a "host" which is the compiled host code. root@Arria10_linaro:~/test/test_add# ./bin/host Initializing OpenCL Platform: Altera SDK for OpenCL Using 1 device(s) a10soc_2ddrArria 10 SoC Development Kit Using AOCX: add.aocx Reprogramming device with handle 1 Arria 10 SoC Turn_around_Time: 1.022762 ms Kernel time (device 0)(getStartEndTime): 0.107940 ms Output: 93.649620 Reference: 93.649620 Verification: PASS Congrats! Now we are the king of addition! ! ! Others (GUI Profiler) †When compiling the kernel code with the "--profile" option and then running on the FPGA, "profile.mon" is generated in the directory "bin". Retransfer the "mon file" to neutrino (using go_mon), and execute "aocl report" command with "aocx" (also "aoco") file. So GUI profiler launch. (Do not forget to enable X port forwarding). bash-4.1$ ./go_mon Enter passphrase for key '/home/hlab/hoge/.ssh/id_rsa': profile.mon 100% 97 0.1KB/s 00:00 bash-4.1$ aocl report profile.mon add.aocx & もう力尽きたので後はまたこんどにゃん。 Please add your knowledge to this wiki!!! |