Completed
Thesis' Author:
Christophe da Silva FerreiraCourse description: MSc in Computer Science
Supervisor(s):
Co-supervisor(s):
Abstract:
<p style="text-align: justify;"><span class="fontstyle0">The migration of societal processes to the Internet, the massification of digital services and more recently, </span><span class="fontstyle2">IoT </span><span class="fontstyle0">devices in the form of personal sensors, has changed completely the way personal information is collected, stored and used. The exponential growth on the amount of personal data that is thus collected, opens new possibilities on the way it can be used for scientific research, or otherwise more mundane commercial purposes. However, special care must be taken because personal privacy is a basic human right that is strongly protected by Law. There is, therefore, a high demand for privacy aware solutions that allows for the safe and lawful re-use of </span><span class="fontstyle2">datasets </span><span class="fontstyle0">based on personal information. One can argue that one way to comply with the law resides in the appropriate application of </span><span class="fontstyle2">de-identification </span><span class="fontstyle0">techniques, as a way of guaranteeing privacy, by deriving useful </span><span class="fontstyle2">de-identified datasets </span><span class="fontstyle0">that still has enough information to be useful.<br />The goal of this dissertation is to describe the development of a </span><span class="fontstyle2">web anonymization </span><span class="fontstyle0">application, that simplifies the </span><span class="fontstyle2">de-identification </span><span class="fontstyle0">of </span><span class="fontstyle2">datasets </span><span class="fontstyle0">containing personal information. First, well-known available desktop solutions were analyzed, in order to choose the most adequate and complete, that could be used as a strong base for a web </span><span class="fontstyle2">de-identification </span><span class="fontstyle0">platform. We found that </span><span class="fontstyle2">ARX </span><span class="fontstyle0">is a desktop </span><span class="fontstyle2">de-identification </span><span class="fontstyle0">platform that fulfils our requirements. The </span><span class="fontstyle2">de-identified<br />datasets </span><span class="fontstyle0">produced by </span><span class="fontstyle2">ARX </span><span class="fontstyle0">were then tested, in terms of resistance to well-known </span><span class="fontstyle2">re-identification </span><span class="fontstyle0">attacks, and the results thus obtained were deemed satisfactory. </span><span class="fontstyle2">ARX </span><span class="fontstyle0">also has an interface API that was integrated into a REST based API for our</span><span class="fontstyle2">Web Anonymizer </span><span class="fontstyle0">platform, to support a responsive web interface that mimics the interface found on the original </span><span class="fontstyle2">ARX </span><span class="fontstyle0">desktop application.<br />Finally, we performed a series of tests in order to verify if the web application produced results were similar to its desktop counterpart, the execution times were acceptable when compared to the original desktop application. We concluded the </span><span class="fontstyle2">Web Anonymizer </span><span class="fontstyle0">fulfils its initial objectives. However, as expected, the execution times for the platform created were longer than the desktop </span><span class="fontstyle2">ARX </span><span class="fontstyle0">times. This is solely due to the network and REST API induced delays, because the library supporting the core </span><span class="fontstyle2">de-identification </span><span class="fontstyle0">algorithms remained the same.</span></p><p style="text-align: justify;"><span class="fontstyle0"><span class="fontstyle0">However, this in</span></span><span style="text-align: center;">crease does not compromise practicality, because execution times remain well within reasonable</span><span class="fontstyle0" style="text-align: center;">end-user usability constraints. Some </span><span class="fontstyle2" style="text-align: center;">de-identification </span><span class="fontstyle0" style="text-align: center;">configurations available on </span><span class="fontstyle2" style="text-align: center;">ARX </span><span class="fontstyle0" style="text-align: center;">were no </span><span class="fontstyle0" style="text-align: center;">implemented in this version of the </span><span class="fontstyle2" style="text-align: center;">Web Anonymizer</span><span class="fontstyle0" style="text-align: center;">. This caused a slight decrease in the </span><span class="fontstyle2" style="text-align: center;">datasets </span><span class="fontstyle2" style="text-align: center;">re-identification </span><span class="fontstyle0" style="text-align: center;">resistance when compared to the ones produced by the desktop </span><span class="fontstyle2" style="text-align: center;">ARX</span><span style="text-align: center;"></span></p><p style="text-align: justify;"><span class="fontstyle0"> <br style="font-variant-numeric: normal; font-variant-east-asian: normal; line-height: normal; text-align: -webkit-auto; text-size-adjust: auto;" /></span></p><p style="text-align: justify;"><span class="fontstyle0"></span> </p>
