Full Program »
T5. Big Data Analytics Over Encrypted Data
Tuesday, 6 December 2016
13:30 - 17:00
Salon 5
With increasing growth of cloud services, machine learning services can be run on cloud providers' infrastructure, essentially offering Machine Learning as a Service (MLaaS). However, machine learning solutions require access to the raw data, which creates potential security and privacy risks. Therefore, we need to provide solutions to run machine learning algorithms on encrypted data and allow the parties to provide/ receive the service without having to reveal their sensitive data to the other parties. In this tutorial, we present state-of-the-art privacy preserving machine learning with focus on how to design and implement different machine learning algorithms, both classic and deep learning algorithms, over encrypted data. The tutorial includes an overview of cryptographic mechanisms such as homomorphic encryption and secure multiparty computation. Then, we provide a detailed explanation of how classic machine learning algorithms are implemented over encrypted data. Next, a brief overview of deep learning and its challenges in encrypted domain will be presented and discussions on how to address those challenges will be provided. Finally, some real world application scenarios will be discussed.
Prerequisites.
· No specific prerequisite is required. A basic understanding of cryptography and machine learning algorithms is enough.
Outline:
1. Cryptographic Schemes
We will first provide an overview of Homomorphic Encryption (HE) scheme. The concept of Fully Homomorphic Encryption (FHE) scheme will be introduced, the issues of noise, boot strapping, encryption parameters, etc. will be discussed. Then, the concepts of somewhat homomorphic, and leveled homomorphic encryption will be presented, and practical considerations for their implementation will be discussed. Two examples of practical homomorphic encryption, HElib and SEAL, will be introduced and their limitations will be discussed. HElib is a software library that implements homomorphic encryption. Currently available is an implementation of the Brakerski-Gentry-Vaikuntanathan (BGV) scheme, along with many optimizations to make homomorphic evaluation runs faster, focusing mostly on effective use of the Smart-Vercauteren ciphertext packing techniques and the Gentry-Halevi Smart optimizations. Simple Encrypted Arithmetic Library (SEAL), an easy-to-use homomorphic encryption library developed by Microsoft and available to public. We will also briefly overview other cryptographic schemes such as Secure Multiparty Computation (SMC), in which multiple parties (two or more) wish to jointly compute a function on their private data, but do not want to reveal their data to one another. Garbled Circuits, Oblivious Transfer and Secret Sharing are examples of SMC techniques.
2. Classic Machine Learning Algorithms Over Encrypted Data
We will present privacy preserving machine learning, more specifically how to implement machine learning algorithms over encrypted data. The focus is on classic machine learning algorithms in both training and classification phases. Since most traditional ML models cannot simply be translated to encrypted versions without modification, we will discuss what modifications are needed and how they are performed. The performance measures and scalability issues will also be discussed.
3. Deep Learning Over Encrypted Data
Deep learning technology, which has attracted attention as a breakthrough in the advance of artificial intelligence and is the mainstream in current AI research, has achieved extremely high recognition accuracy with images and speech, and already surpassed human level recognition of traffic signs, and performs at least as well as humans on 1000 class object recognition tasks. Similar to classic ML algorithms, analyzing data with deep learning algorithms raises privacy concerns in many settings and we need solutions to be able to analyze data using deep learning while preserving privacy. However, deep learning algorithms have totally different structure compared to classic ML algorithms and we need approaches other than the ones discussed in section 2 to implement deep learning algorithms in encrypted domain.
In order to have efficient and practical solutions, we typically need to use somewhat homomorphic schemes instead of fully homomorphic encryption. However, a solution that builds upon these encryption schemes has to be restricted to computing low degree polynomials in order to be practical. Approximating a function with low degree polynomials is an important issue for computations over encrypted data when we use homomorphic encryption. Some of the existing work use polynomial approximation. However, these are specific solutions which enable us to work around certain problems, but there is no generic solution to this problem yet. We present a framework capable of handling general cases. Having a real valued elementary function f, we are interested in finding polynomials with the lowest possible degree that estimate f within a certain error range. By finding the polynomials, we aim to answer the following question: Is it possible to use homomorphic encryption to estimate the "encrypted" f or not? In addition, we try to keep an eye on the computation time and computation complexity of the procedures that will be introduced. The next step is to use these polynomial approximations in developing efficient and scalable deep learning algorithms in encrypted domain.
4. Applications
Finally, we will discuss a number of application scenarios including but not limited to bioinformatics, genetics, healthcare, signal processing, privacy preserving data analytics. There are many different types of analyses which researchers or healthcare professionals may wish to perform on sensitive genomic or medical data. In the big data era and with enormous growth of Internet of Things (IoT), different types of data (voice, video, text, health, etc.) are collected and analyzed for application in different domains. Many of these domains deal with sensitive data and privacy of users' data is of great concern when analyzing the data.
About the Instructor:
Dr. Hassan Takabi is an Assistant Professor in the Department of Computer Science and Engineering at University of North Texas where he directs the INformation Security and Privacy: Interdisciplinary Research and Education (INSPIRE) Lab. He received his PhD from University of Pittsburgh in 2013. His research interests span a wide range of topics in cybersecurity and privacy including privacy preserving machine learning, privacy and security of cloud computing, security and privacy of online social networks, mobile and location privacy, advanced access control models, insider threats, and usable security and privacy. He has published three book chapters and more than 50 papers in renowned conferences and journals and is recipient of best paper award at ACM CODASY 2011. Dr. Takabi serves on organizing/ program committee of several top security conferences including ACM CCS, IEEE Security and Privacy, ACM CODASPY, and ACSAC. He is a member of IAPP, ACM, and IEEE.