1

Accurate, efficient and scalable graph embedding
Throughput-optimized frequency domain CNN with fixed-point quantization on FPGA
A fast and efficient parallel algorithm for pruned landmark labeling
An FPGA framework for edge-centric graph processing
A framework for generating high throughput CNN implementations on FPGAs
Quickly finding a truss in a haystack
Design and implementation of parallel PageRank on multicore platforms