They need those click bait words, the accuracy of that 'tool' is absurdly low. It is not better than dividing programmers in 8 groups of the same size with similar coding styles, and blame an entire group of them just because their coding style matches one of their samples of malware. They could as well have one malware matching every coding style cluster and use it to imprison any programmer who ever published code. This is a direct attack against open source.
Most developers have a very distinct programming style.
Mine tends to be:
Functions that do not exceed 4 parameters (because the 5th parameter cannot be saved into a register value, so you lose performance when you need to save that 5th parameter onto RAM.
Methods that are short, since I write code that normally fits into one screen without need of scrolling.
Methods where code can be used into multi threading with almost no requirements in locking.
Speed per formant where the compiled code will always reach the end of the method without a jump condition for the most used functionality. This avoids unnecessary jumps in the assembler output and so the CPU does not have a cache miss and needs to reload the instructions from RAM.
My code tends to fail gracefully, and I avoid throwing any exceptions.
My code barely relies on libraries.
Deletes in an list I always have a reverse loop to prevent accidental deleting of the next item.
None repetitive assembly code since I do not copy and paste code.
Reuse of code that I already developed on other projects.
Other clues will be in the usage of "strings" data.
I think you maybe misunderstand what they're talking about in this article. They're using a "digital fingerprint" of sorts mined from several repos and executables all over the internet. Similar to the way the NSA identified Satoshi Nakomoto.
FYI: AI tools can unmask anonymous coders from their binary executables • The Register
'Essentially, the machine code can be decompiled back into a C-like language, and then traced back to known programmers using machine-learning algorithms. '
'They still managed 88 per cent accuracy in identifying authors. '
'The technique can help identify virus makers as well as unmask the creators of anti-censorship tools and other outlawed programs. '
'Programmers can be potentially identified from the low-level machine-code instructions in their software executables by AI-powered tools. '
' "Randomized pseudonyms with randomized code and malware (unless a malware author is planning to share only one code sample with the entire world), as well as the above methods is a starting step." ®'
10 comments
0 u/roznak [OP] 16 Mar 2018 22:27
AI tool, you don't need AI for this.
0 u/Gargilius 16 Mar 2018 22:56
AI tools with VR and block chains ! :-) (that’s how you sell an idea these days, didn’t you get the memo?)
0 u/Nini 21 Mar 2018 15:08
They need those click bait words, the accuracy of that 'tool' is absurdly low. It is not better than dividing programmers in 8 groups of the same size with similar coding styles, and blame an entire group of them just because their coding style matches one of their samples of malware. They could as well have one malware matching every coding style cluster and use it to imprison any programmer who ever published code. This is a direct attack against open source.
0 u/the_sharpest_knife 16 Mar 2018 22:58
What do you mean? There's another way to identify a digital fingerprint?
0 u/roznak [OP] 17 Mar 2018 00:07
Of course, it is called "statistics".
Most developers have a very distinct programming style. Mine tends to be:
Other clues will be in the usage of "strings" data.
0 u/the_sharpest_knife 17 Mar 2018 03:00
I think you maybe misunderstand what they're talking about in this article. They're using a "digital fingerprint" of sorts mined from several repos and executables all over the internet. Similar to the way the NSA identified Satoshi Nakomoto.
https://en.wikipedia.org/wiki/Stylometry#Data_and_methods
0 u/derram 16 Mar 2018 22:41
https://archive.fo/HNRzR :
'Essentially, the machine code can be decompiled back into a C-like language, and then traced back to known programmers using machine-learning algorithms. '
'They still managed 88 per cent accuracy in identifying authors. '
'The technique can help identify virus makers as well as unmask the creators of anti-censorship tools and other outlawed programs. '
'Programmers can be potentially identified from the low-level machine-code instructions in their software executables by AI-powered tools. '
' "Randomized pseudonyms with randomized code and malware (unless a malware author is planning to share only one code sample with the entire world), as well as the above methods is a starting step." ®'
This has been an automated message.
0 u/Omnicis 16 Mar 2018 23:41
"Styleometry" has its limits. especially if you are heavily relying on stackoverflow code.
0 u/redog 20 Mar 2018 16:16
Cool, so can we use this to figure out who the real satoshi nakamoto is?
0 u/ceopsm 28 Mar 2018 08:15
Thats great