This article is more than 1 year old

Even big data devs make big data security gaffes

Is that a useful tool or a compromised .exe? Only one way to find out, apparently

Apache Big Data Europe Big data application programmers routinely download and execute unverified code, opening the door to information-stealing hackers, a security researcher has claimed.

Olaf Flebbe, chief software architect at European software integrator Science+Computing, is upset that software engineers have got into the habit of insecurely reusing components. This puts the organization a developer works for, as well as its clients and partners, at risk of compromise, he said.

We should note that this isn't really a problem limited to big data apps: securing package managers to fend off malicious updates is necessary for all programming languages, operating systems and areas of software development. Flebbe was speaking at the Apache Big Data conference in Seville, Spain, however, hence the big data connection.

Flebbe's research is based on testing a Big Data integration project compiling over 11 Mio Lines of Java Code: The Apache Bigtop project.

Flebbe said miscreants can set up shop on expired web domains used by abandoned projects to push out tampered builds of code to unsuspecting coders. Another trouble area, we're told, is Maven – the Apache build manager for Java projects. The risk on this front arises from configuring insecure repositories for Maven. For instance using http://repo.maven.org rather https://repo.maven.org in the "pom.xml" file.

During his presentation at the Apache conference last week, Flebbe demonstrated an exploit involving Apache Flink – an open-source stream and batch-processing tool that can be installed via Maven. He showed how it was possible to fool Maven into downloading and running an arbitrary executable, a type of attack that creates a means to hack into targeted Windows machines. The point was to show that it is possible to trick Maven users into unknowingly bringing malicious software onto their computers.

The demonstrated attack could only be pulled off from a privileged network position – for instance, in the same subnet as the victim. The hacker has to set up a web server on the local network that offers booby-trapped packages, and use DNS forgery to point a mark towards the hacker’s box, thus inducing Maven into downloading a dodgy package.

Flebbe reported it to the Apache Flink Community, which fixed the weakness in Apache Flink Version 1.1.3.

Developers should migrate at least to Maven 3.3.x, a version of the software that uses and validates secure TLS connections to thwart the aforementioned man-in-the-middle interference.

"Download mirrors, for instance the official Apache Distribution Sites, often do not offer TLS, nor would this help to provide trust: Software engineers should check keys or checksum," Flebbe explained. "Do not trust ... verify."

"These kind of errors are frequently seen in mainstream development too," he added.

Downloading software over unencrypted links opens the door to man-in-the-middle attacks, a weakness that’s well understood among security pros. Despite this, only a few download mirrors for Tomcat, an open-source Java servlet container, support TLS.

During his research, Flebbe discovered that Zookeeper – a centralized service for maintaining configuration information – was using abandoned repositories and non-TLS-validated resources. He reported these issues, which were resolved with Zookeeper 3.4.10, 3.5.3 and 3.6.0.

Security issues can be a bug bear for developers working on big data projects, in a similar way to mainstream developers.

A techie at a big data integrator gave us a a rundown of security-related topics in big data. Security internally in a cluster, related to transport security mostly, needs to be addressed alongside data encryption and access security issues. Projects like Apache Ranger or Apache Knox can help developers in delivering a secure project, he added. ®

More about

TIP US OFF

Send us news


Other stories you might like