GitHub is under automated attack by millions of cloned repositories filled with malicious code
GitHub has become a vital resource for programmers the world over, and an extensive knowledge base and repository for open-source coding projects, data storage and code management. However, the site is currently undergoing an automated attack involving the cloning and creation of huge numbers of malicious code repositories, and while the developers have been working to remove the affected repos, a significant amount are said to survive, with more uploaded on a regular basis.
An unknown attacker has managed to create and deploy an automated process that forks and clones existing repositories, adding its own malicious code which is concealed under seven layers of obfuscation (via Ars Technica). These rogue repositories are difficult to tell from their legitimate counterparts, and some users unaware of the malicious nature of the code are forking the affected repos themselves, unintentionally adding to the scale of the attack.
Once a developer makes use of an affected repo, a hidden payload begins unpacking seven layers worth of obfuscation, including malicious Python code and a binary executable. The code then sets to work collecting confidential data and login details before uploading it to a control server.
Research and data teams at security provider Apiiro have been monitoring a resurgence of the attack since its relatively minor beginnings back in May of last year. And while the company says that GitHub has been quickly removing the affected repositories, its automation detection system is still missing many of them, and manually uploaded versions are still slipping the net.
Given the current scale of the attack, said by the researchers to be in the millions of uploaded or forked repositories, even a 1% miss-rate still means potentially thousands of compromised repos still on the site.
While the attack was initially somewhat small-scale when it was first documented, with several packages detected on the site with early versions of the malicious code, it has gradually developed in size and sophistication. The researchers have identified several potential reasons for the success of the operation thus far, including the overall size of GitHub’s user base and the developing complexity of the technique.
What’s really intriguing here is the combination of sophisticated automated attack methods and simple human nature. While the methods of obfuscation have become increasingly complex, the attackers have relied heavily on social engineering to confuse developers into picking the malicious code over the real one and unintentionally spreading it onwards, compounding the attack and making it much harder to detect.
As things stand this method seems to have worked remarkably well, and while GitHub has yet to comment on the attack directly, it did issue a general statement reassuring its users that “We have teams dedicated to detecting, analyzing, and removing content and accounts that violate our Acceptable Use Policies. We employ manual reviews and at-scale detection that use machine learning and constantly evolve and adapt to adversarial attacks”.
The perils of becoming popular, it seems, have manifested themselves here. While GitHub remains a vital resource for developers worldwide, its open-source nature and huge user base appears to have left it somewhat vulnerable, although given the effectiveness of the method, it comes as no surprise that solving the issue entirely seems to be an uphill battle that GitHub has yet to overcome.