The Queen's Guard: A Secure Enforcement of Fine-grained Access Control In Distributed Data Analytics Platforms
Distributed data analytics platforms (i.e., Apache Spark, Hadoop) provide high-level APIs to programmatically write analytics tasks that are run distributedly in multiple computing nodes. The design of these frameworks was primarily motivated by performance and usability. Thus, the security takes a back seat. Consequently, they do not inherently support fine-grained access control or offer any plugin mechanism to enable it, making them \textit{risky} to be used in multi-tier organizational settings.
There have been attempts to build ``add-on'' solutions to enable fine-grained access control for distributed data analytics platforms. In this paper, first, we show that straightforward enforcement of ``add-on'' access control is insecure under \textit{adversarial} code execution. \textit{Specifically, we show that an attacker can abuse platform-provided APIs to evade access controls without leaving any traces.} Second, we design a new fine-grained access control framework with enhanced policy definitions, and finally, we design a defense system named {\bfseries \scshape SecureDL} for secure enforcement of the access control framework under code execution. Our defense system has two layers (i.e., \textit{proactive} and \textit{reactive}) to protect against API abuses. On submission of a user code, our proactive security layer statically screens it to find potential attack signatures prior to its execution. The reactive security layer employs code instrumentation-based runtime checks and sandboxed execution to throttle any exploits at runtime.
To the best of our knowledge, this is the first fine-grained attribute-based access control framework for distributed data analytics platforms that is secure against platform API abuse attacks. Performance evaluation showed that the overhead due to added security is low.