The 2nd workshop on Personal Data Systems, Sommarøy, Norway (PDS 2017)

Hosted by the Department of Computer Science, UiT – The Arctic University of Norway, Schedule and slides available on:

http://www.corporesano.no/eventspersonal-data-systems-workshop-2017pds-2017-program-preliminary/

Sommarøy

Keynote: “Security, Privacy, and the Free Rider Problem: The Dark Side of the Internet of Things”, Stephen B. Wicker, School of Electrical and Computer Engineering, Cornell University

Stephen Wicker discussed the broad side of privacy and economics of the IoT space. Internet connectivity coming to everyday objects, app,s and apparatus (e.g., in health-related products) has seen potentials for botnets (e.g., Dyn attacks), attacks, and privacy threats at fine level and granularity. The lack of security from manufacturers (some hard-coded into the devices) leads to large-scale DNS attacks in this space. Stephan discussed the economics of the Internet, and treating it as a “public good”, and the economic impacts of this philosophy. Though the “common good” does not necessarily mean open access, one needs cultural and societal governance norms and rules to avoid free-rider problems and tragedies. Solutions such as policy-based mechanisms (certification, regulation), and technology-based mechanisms (cheaper security solutions, automated vulnerability analysis tools, etc) could improve this situation.

“Biometric Key Generation from Body Impedance Data“, Kasper Bonne Rasmussen, Department of Computer Science, University of Oxford

Kasper presented a method for generating crypto keys from biometric data and the privacy implications of this approach. The permanent nature of biometric data makes them difficult to use as a key on their own in case of security flaws and leaks. Hence acquisition of biometrics, extracting biometric samples, then features from samples, and generating keys from them makes the process of key-generation repeatable. Using the Siamese networks on a neural nets for comparing the feature vectors of the individuals it’s possible to converge on quantised feature vectors from the same individual. These are then used to feed into a tokenhash and a keyhash functions.This leads to an equivalence of 59-bit keys to guarantee that a person was involved in a transaction.

“WiFi Scanning, Crowd Monitoring, Privacy: An Experience Report“, Maarten van Steen,

University of Twente, The Netherlands

WiFI scanners are mostly designed for MAC address collection and processing for data mining purposes (security, crowd analysis, visitor detection, configured network SSID list, location classification etc). Though there are major privacy challenges here: small MAC address space, identification by opt-out, etc. However research in this space faces issues such as : faulty scanners, irregular/dynamic transmission ranges, signal and timing issues, multiple addresses per device (or vice-versa!), lost data, etc. However, using feature vector extraction methods it is possible to develop fingerprints from by just looking at visiting patterns for locations analysis. Others have already discovered that MAC randomization is a nuisance but not secure, since the implementations are sloppy, as packet information uniquely identifies a device. An observation Maarten made was that apart from ad agencies, most others are interested in aggregate statistics, hence designing systems based on questions asked might help with building useful and privacy-aware systems. Using client-side encryption and bloom-filter-based hashing functions can be a solution in this space, though scalability is an issue.

“SGX Enforcement of Use-Based Privacy”, Eleanor Birrell, Department of Computer Science, Cornell University

Eleanor discussed the privacy vs. utility conflicts of using personal data. Use-based privacy and contextual awareness is key to utilizing personal data. This mandates the presence of an expressive policy language, efficient policy associations, and pervasive enforcement. Examples include user-defined preferences for data sharing to researchers and legal data-use restrictions. The proposed Avenance Policies language enables the data processors to cope with these changes and expressions. They have evaluated the language on privacy preferences of Facebook and other use cases such as HIPPA. The challenges here include enforcement and policy checks, hence a few prototypes has been developed for understanding enforcement scenarios by the provider or delegated monitoring. They have then extended the Ohmage mobile health app API system to provide policy enforcement in few kLoCs, using SGX for program attestation.

“Privacy in the Cloud, Hard-won Lessons from Shipping Information Retrieval and Discovery Experiences at Scale to Microsoft Office 365 Users“, Bjørn Olstad and Troels Walsted Hansen, Microsoft Development Center Norway

Microsoft has seen an increasing success with its cloud-based model. The next step is to use Data+machine learning to reason about data while keeping the users’ trust (e.g., enterprise search). Understanding the types of data and interactions and aggregation of data for analytics is important for organising a product like Office around its users. A social-network-based search for the enterprise (e.g., MS Delve) is an attempt in this space for making the search relevant for an individual. The information can also include graphs from other sources such as LinkedIn. Addressing privacy perceptions and comfort levels are important here.

Some lessons learned: perceived privacy is equal to privacy for most consumers, hence it is important to communicate privacy in an acceptable yet intuitive and simple way to the consumer. There is also a shift away towards simple, user-centric permission models. It’s important to have self-explanatory products and communicate with the user at the time of actions. Signals and their perception can be different from the users’ point of view, for example “editing” a document might be a public signal, while “viewing” it might be a private signal. Presenting the social network of collaborators to the user also eases these choices.

“Building and Measuring Privacy-Preserving Mobility Analytics”, Emiliano De Cristofaro, Department of Computer Science, University College London

Location analytics are important for assessing user behaviours in spaces, urban transport, or crowd management. Two main modes of trusted aggregator, or centralised ways, are present today. Additively homomorphic encryption can be utilised to enable users to take part in such schemes. Yet these might not be scalable on large user spaces/items. Aggregate analytics are useful here for the statistics were individuals should not be identified. Data from 1 month of TFL users, and 1 month of SF cab commutes is used to assess these. Aggregate stats can be used to forecast the traffic very well. Methods such as differential privacy also does not work for such time series data due to utility loss. Prior Knowledge about individuals can improve aggregate estimates and potentially identify individuals probabilistically.

“User-Centric Personal Data Analytics on the Edge”, Hamed Haddadi, School of Electronic Engineering and Computer Science, Queen Mary University of London

Abstract: In this talk, I discuss the ways in which we can utilize edge-computing to improve the scalability and privacy of user-centered analytics in the context of Databox project. I present a hybrid framework where edge devices and resources centered around the user, collectively referred to as fog, can complement the cloud for providing privacy-aware, yet accurate and efficient analytics. I present the evaluations of the proposed framework on a number of exemplar applications, and discuss the broader implications of such approaches for future systems.

“Efficient Machine Learning for Disease Detection in the Human Digestive System”, Michael Riegler, Simula Research Laboratory, Oslo

Michael presented the results of using machine learning for medical image analysis, and challenges with quality of images and complexity in analysis. Using tensorflow and CNNs they get 80% precision and 96% recall, while the global feature approach performs better (94%p, 98%r). They also release new datasets and tools for the vision community to improve detection and localisation of diseases. They found collaborations between the medical doctors and computer scientists challenging! http://datasets.simula.no

“META-pipe: marine metagenomics data analysis service”, Lars Ailo Bongo, Department of Computer Science, UiT – The Arctic University of Norway

Lars presented the methods and data acquisition and training for systems and the marine metagenomics data analysis pipelines developed in the ELIXIR Excelerate project. He presented the systems, security, and implementation challenges of the infrastructure, and possibilities for including human data.

“Safeguarding Analytics on Privacy Sensitive Data”, Anders Tungeland Gjerdrum, Department of Computer Science, UiT – The Arctic University of Norway

Pervasive storing of personal data for monetisation and analytics raises a number of privacy challenges. Trusted computing can be useful in running secure pieces of code and data in trusted environments. Intel SGX and ARM TrustZone provide such mechanisms today by reducing some of the functionality and inherent risks of interrupts, system calls, and illegal instructions using Virtual Enclave Memory. Anders presented a setup for evaluating latency and memory overheads of enclave memory. He provided recommendations for use of enclaves (data leaks versus provision costs) for larger applications and users. Diggi Analytics lets the users implement isolation schemes, and performs security analytics for the configurations. Diggi provides asynchronous and synchronous communications schemes. Future work will focus on storage, caching, and fault tolerance.

“Do You See What I See? — Mutable Data for Localized Data Sharing”, Jörg Ott, Faculty of Informatics, Technische Universität München.

Jorg presented Liberouter for DIY networking to provide distributed networking for disconnected areas or cloudless operations. This can also enable localised content sharing. Challenges include distributed data storage management, replications and caching, and distributed access control. Mutable data, arising from the DTN & ICN community, can be utilised with such data sharing (e.g., opportunistic wikipedia!). This approach has interesting problems with mergers or adoption (git-like operations) but it can be useful for transient data sharing. Combined adoption and merging systems perform pretty well for the data. Write access control is more complex (think GoogleDoc) for a distributed system, Using middlebox-style local hubs these permissions can be managed and supported, though challenges remain here in ongoing work.

Sommarøy Databox

Written on August 17, 2017