Moving brute-force search to FPGAfpga internal metastabilityVHDL Fpga debouncingPattern Recogniser on FPGATransmitting HDMI/DVI over an FPGA with no support for TMDSFPGA Test Equipmentfpga clock muxingswapping char VHDL FPGAFPGA input synchronisationReverse Search NTEBrute-force convolution reverb in FPGA

Why does a simple loop result in ASYNC_NETWORK_IO waits?

Moving brute-force search to FPGA

Does the Linux kernel need a file system to run?

How to fade a semiplane defined by line?

Can I visit Japan without a visa?

How to hide some fields of struct in C?

The IT department bottlenecks progress. How should I handle this?

Why can Carol Danvers change her suit colours in the first place?

Limits and Infinite Integration by Parts

What are the advantages of simplicial model categories over non-simplicial ones?

Strong empirical falsification of quantum mechanics based on vacuum energy density

Pre-mixing cryogenic fuels and using only one fuel tank

Extract more than nine arguments that occur periodically in a sentence to use in macros in order to typset

What should you do when eye contact makes your subordinate uncomfortable?

Redundant comparison & "if" before assignment

Why is the "ls" command showing permissions of files in a FAT32 partition?

Plot of a tornado-shaped surface

How should I respond when I lied about my education and the company finds out through background check?

How to cover method return statement in Apex Class?

Lowest total scrabble score

How do apertures which seem too large to physically fit work?

Temporarily disable WLAN internet access for children, but allow it for adults

Why does AES have exactly 10 rounds for a 128-bit key, 12 for 192 bits and 14 for a 256-bit key size?

Can the US President recognize Israel’s sovereignty over the Golan Heights for the USA or does that need an act of Congress?

Moving brute-force search to FPGA

fpga internal metastabilityVHDL Fpga debouncingPattern Recogniser on FPGATransmitting HDMI/DVI over an FPGA with no support for TMDSFPGA Test Equipmentfpga clock muxingswapping char VHDL FPGAFPGA input synchronisationReverse Search NTEBrute-force convolution reverb in FPGA

I am currently working on a scientific hobby project about computing the error detection capabilities of CRCs. Unfortunately the C++ code used for such computations has up to years of run time on normal x64 CPUs, even on multi core systems. Also the power consumption of such systems is a pain.

It came to my mind that the common way of x64 brute-force-searching isn't the best. I would like to move the algorithm to an FPGA. Alas I have worked very little with FPGAs and I lost the minimal knowledge after working in C/C++ software engineering for decades. So I need a little help about the feasibility of my idea before burying myself into the technology.

The algorithm I want to run in hardware is a specialized ~1000 line C++ code that could easily be ported to C. No floating point operations. No standard libraries required. High frequent loops. Lots of basic 64 bit integer arithmetic. Even more binary operations (shift, or, xor, bit-counting, etc.) and some array operations. A few kB of RAM and ROM should be sufficient. No peripherals required. Very few memory allocations are used that could be removed by adapting the code. The computation results can be easily filtered internally so a serial interface should be enough to pass the results to a PC.

I would like to compile the C++ or C code into VHDL code and let it run on a FPGA as fast as possible. Also, since this is a hobby project, the FPGA (including software and a developer board) should be affordable.

My questions:

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

edited 6 hours ago

asked 6 hours ago

Silicomancer

1114

New contributor

$begingroup$
Ultimately running an algorithm in an FPGA instead of software can be faster but it really depends on the algorithm details. Essentially you will gain speed if you can parallelize or pipeline the data flow. If 64 simple operations need to be applied to a single point before it can be fully processed, the FPGA can pipeline them so that a new result comes out every clock cycle. But I don't know if your algorithm is like that.
$endgroup$
– mkeith
6 hours ago

$begingroup$
Almost the entire algorithm can be highly parallelized. The algorithm processes a single dataword. For a 32 bit CRC there are ~2^30 datawords á 32 bit that can easily be processed independently (except final comparing/filtering of the results that needs to be serialized).
$endgroup$
– Silicomancer
6 hours ago

$begingroup$
Have you considered using GPU acceleration for this? Implementation will be a lot simpler, as well as less expensive.
$endgroup$
– duskwuff
5 hours ago

add a comment |

My questions:

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

edited 6 hours ago

asked 6 hours ago

Silicomancer

1114

New contributor

$begingroup$
Ultimately running an algorithm in an FPGA instead of software can be faster but it really depends on the algorithm details. Essentially you will gain speed if you can parallelize or pipeline the data flow. If 64 simple operations need to be applied to a single point before it can be fully processed, the FPGA can pipeline them so that a new result comes out every clock cycle. But I don't know if your algorithm is like that.
$endgroup$
– mkeith
6 hours ago

$begingroup$
Almost the entire algorithm can be highly parallelized. The algorithm processes a single dataword. For a 32 bit CRC there are ~2^30 datawords á 32 bit that can easily be processed independently (except final comparing/filtering of the results that needs to be serialized).
$endgroup$
– Silicomancer
6 hours ago

$begingroup$
Have you considered using GPU acceleration for this? Implementation will be a lot simpler, as well as less expensive.
$endgroup$
– duskwuff
5 hours ago

add a comment |

My questions:

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

edited 6 hours ago

asked 6 hours ago

Silicomancer

1114

New contributor

My questions:

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

fpga vhdl component-selection compiler

edited 6 hours ago

asked 6 hours ago

Silicomancer

1114

New contributor

edited 6 hours ago

asked 6 hours ago

Silicomancer

1114

New contributor

edited 6 hours ago

asked 6 hours ago

Silicomancer

1114

New contributor

asked 6 hours ago

Silicomancer

1114

asked 6 hours ago

Silicomancer

1114

New contributor

Silicomancer is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Ultimately running an algorithm in an FPGA instead of software can be faster but it really depends on the algorithm details. Essentially you will gain speed if you can parallelize or pipeline the data flow. If 64 simple operations need to be applied to a single point before it can be fully processed, the FPGA can pipeline them so that a new result comes out every clock cycle. But I don't know if your algorithm is like that.
$endgroup$
– mkeith
6 hours ago

$begingroup$
Almost the entire algorithm can be highly parallelized. The algorithm processes a single dataword. For a 32 bit CRC there are ~2^30 datawords á 32 bit that can easily be processed independently (except final comparing/filtering of the results that needs to be serialized).
$endgroup$
– Silicomancer
6 hours ago

$begingroup$
Have you considered using GPU acceleration for this? Implementation will be a lot simpler, as well as less expensive.
$endgroup$
– duskwuff
5 hours ago

add a comment |

$begingroup$
Ultimately running an algorithm in an FPGA instead of software can be faster but it really depends on the algorithm details. Essentially you will gain speed if you can parallelize or pipeline the data flow. If 64 simple operations need to be applied to a single point before it can be fully processed, the FPGA can pipeline them so that a new result comes out every clock cycle. But I don't know if your algorithm is like that.
$endgroup$
– mkeith
6 hours ago

$begingroup$
Almost the entire algorithm can be highly parallelized. The algorithm processes a single dataword. For a 32 bit CRC there are ~2^30 datawords á 32 bit that can easily be processed independently (except final comparing/filtering of the results that needs to be serialized).
$endgroup$
– Silicomancer
6 hours ago

$begingroup$
Have you considered using GPU acceleration for this? Implementation will be a lot simpler, as well as less expensive.
$endgroup$
– duskwuff
5 hours ago

Ultimately running an algorithm in an FPGA instead of software can be faster but it really depends on the algorithm details. Essentially you will gain speed if you can parallelize or pipeline the data flow. If 64 simple operations need to be applied to a single point before it can be fully processed, the FPGA can pipeline them so that a new result comes out every clock cycle. But I don't know if your algorithm is like that.

– mkeith
6 hours ago

Almost the entire algorithm can be highly parallelized. The algorithm processes a single dataword. For a 32 bit CRC there are ~2^30 datawords á 32 bit that can easily be processed independently (except final comparing/filtering of the results that needs to be serialized).

– Silicomancer
6 hours ago

Have you considered using GPU acceleration for this? Implementation will be a lot simpler, as well as less expensive.

– duskwuff
5 hours ago

add a comment |

1 Answer
1

active

oldest

votes

Can I expect a significant speedup? By which order of magnitude?

Sure, by quite a lot. CRCs can be computed on data a byte a a time using a straightforward table lookup. A moderate-sized FPGA (say, a Xilinx XC6SLX75) will have a hundred or more blocks of internal dual-port RAM that allow 200 data streams to be processed in parallel at a rate of one byte per clock cycle, where the clock could be 200 MHz or more. That's a throughput of at least 40 GB/s. How fast is your "x64" CPU?

Is there a C/C++ compiler suited for the purpose?

Not really. If you want to get the most out of your FPGA, you'll want to use an HDL to define the hardware datapath directly. Implementations derived from programming languages are possible, but the performance ranges from lousy to useless.

Which FPGAs are suitable?

That's bordering on a product recommendation, which would be off-topic for this site, but look at the midrange offerings from Xilinx (such as the Spartan-6 series) or Intel (formerly Altera, such as their Cyclone IV series). Inexpensive development boards for these families are readily available from places like Digilent.

answered 5 hours ago

Dave Tweed♦

121k9152263

$begingroup$
A table lookup is really not the best way to do it on an FPGA because it only works for small inputs. Since CRC is just a bunch of XOR gates, what you can do is run a wide data bus (and get a really high data rate) and then do an unrolled parallel CRC. You can relatively easily do a CRC 32 over data 64 bits at a time at 400 MHz, which gives you 25 Gbps, then drop a bunch of instances on the FPGA to run in parallel.
$endgroup$
– alex.forencich
1 hour ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
return StackExchange.using("schematics", function ()
StackExchange.schematics.init();
);
, "cicuitlab");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "135"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Silicomancer is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');

var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);

;
$window.on('scroll', onScroll);

);

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f428626%2fmoving-brute-force-search-to-fpga%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

answered 5 hours ago

Dave Tweed♦

121k9152263

$begingroup$
A table lookup is really not the best way to do it on an FPGA because it only works for small inputs. Since CRC is just a bunch of XOR gates, what you can do is run a wide data bus (and get a really high data rate) and then do an unrolled parallel CRC. You can relatively easily do a CRC 32 over data 64 bits at a time at 400 MHz, which gives you 25 Gbps, then drop a bunch of instances on the FPGA to run in parallel.
$endgroup$
– alex.forencich
1 hour ago

add a comment |

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

answered 5 hours ago

Dave Tweed♦

121k9152263

$begingroup$
A table lookup is really not the best way to do it on an FPGA because it only works for small inputs. Since CRC is just a bunch of XOR gates, what you can do is run a wide data bus (and get a really high data rate) and then do an unrolled parallel CRC. You can relatively easily do a CRC 32 over data 64 bits at a time at 400 MHz, which gives you 25 Gbps, then drop a bunch of instances on the FPGA to run in parallel.
$endgroup$
– alex.forencich
1 hour ago

add a comment |

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

answered 5 hours ago

Dave Tweed♦

121k9152263

Can I expect a significant speedup? By which order of magnitude?

Is there a C/C++ compiler suited for the purpose?

Which FPGAs are suitable?

answered 5 hours ago

Dave Tweed♦

121k9152263

answered 5 hours ago

Dave Tweed♦

121k9152263

answered 5 hours ago

Dave Tweed♦

121k9152263

answered 5 hours ago

Dave Tweed♦

121k9152263

$begingroup$
A table lookup is really not the best way to do it on an FPGA because it only works for small inputs. Since CRC is just a bunch of XOR gates, what you can do is run a wide data bus (and get a really high data rate) and then do an unrolled parallel CRC. You can relatively easily do a CRC 32 over data 64 bits at a time at 400 MHz, which gives you 25 Gbps, then drop a bunch of instances on the FPGA to run in parallel.
$endgroup$
– alex.forencich
1 hour ago

add a comment |

$begingroup$
A table lookup is really not the best way to do it on an FPGA because it only works for small inputs. Since CRC is just a bunch of XOR gates, what you can do is run a wide data bus (and get a really high data rate) and then do an unrolled parallel CRC. You can relatively easily do a CRC 32 over data 64 bits at a time at 400 MHz, which gives you 25 Gbps, then drop a bunch of instances on the FPGA to run in parallel.
$endgroup$
– alex.forencich
1 hour ago

A table lookup is really not the best way to do it on an FPGA because it only works for small inputs. Since CRC is just a bunch of XOR gates, what you can do is run a wide data bus (and get a really high data rate) and then do an unrolled parallel CRC. You can relatively easily do a CRC 32 over data 64 bits at a time at 400 MHz, which gives you 25 Gbps, then drop a bunch of instances on the FPGA to run in parallel.

– alex.forencich
1 hour ago

add a comment |

Silicomancer is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Silicomancer is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Electrical Engineering Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Sign up or log in

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ttykuu

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

格濟夫卡參考資料导航菜单51°3′40″N 34°2′21″E / 51.06111°N 34.03917°E / 51.06111; 34.03917ГезівкаПогода в селі 编辑或修订编

聖斯德望教堂 (塞克什白堡) 參考資料导航菜单

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

格濟夫卡 參考資料 导航菜单51°3′40″N 34°2′21″E﻿ / ﻿51.06111°N 34.03917°E﻿ / 51.06111; 34.03917ГезівкаПогода в селі 编辑或修订编

聖斯德望教堂 (塞克什白堡) 參考資料 导航菜单

1 Answer
1

1 Answer
1

1 Answer
1

格濟夫卡參考資料导航菜单51°3′40″N 34°2′21″E / 51.06111°N 34.03917°E / 51.06111; 34.03917ГезівкаПогода в селі 编辑或修订编

聖斯德望教堂 (塞克什白堡) 參考資料导航菜单