Alleviating the Digital Forensic Backlog: A Methodology for Automated Digital Evidence Processing

Authors: Du, Xiaoyu

Publication Date: September 2020

Publication Name: PhD Thesis, School of Computer Science, University College Dublin,

Abstract:

The ever-increasing volume of data in digital forensic investigation is one of the most dis- cussed challenges in the field. Severe, case-hindering digital evidence backlogs have become commonplace in law enforcement agencies throughout the world. The objective of the re- search outlined as part of this thesis is to help alleviate the backlog through automated digital evidence processing. This is achieved by reducing or eliminating, redundant digital evidence data handling through leveraging data deduplication and automated analysis techniques. This helps avoid the repeated re-acquisition, re-storage, and re-analysis of common evidence during investigations. This thesis describes a deduplicated evidence processing framework designed with a Digital Forensic as a Service Framework (DFaaS) paradigm in mind. In the proposed system, prior to the acquisition, artefacts are hashed and compared with a centralised database of previously analysed files to identify common files. Moreover, this process facilitates known pertinent artefacts to be detected at the earliest stage possible in the investigation, i.e., during the acquisition step. The proposed methodology includes a novel, forensically-sound entire disk image reconstruction technique from a deduplicated evidence acquisition system. That is to say, reconstructed disk hashes match the source device without having to acquire all artefacts directly from it. This enables remote disk acquisitions to be possible faster than the network throughput. Known, i.e., previously encountered, pertinent artefacts identified during the acquisition stage are then used for training machine learning models to create a relevancy score for the unknown, i.e., previously unencountered, file artefacts. The proposed technique generates a relevancy score for file similarity using each artefact’s file system metadata and associated timeline events. The file artefacts are subsequently ordered by these relevancy scores to focus the investigator towards the analysis of artefacts most likely to be relevant to the case first.

Download:

Download Paper as PDF

BibTeX Entry:

@phdthesis{du2020PhDAutomatedEvidenceProcessing,
title="{Alleviating the Digital Forensic Backlog: A Methodology for Automated Digital Evidence Processing}",
author={Du, Xiaoyu},
school={School of Computer Science, University College Dublin},
month=09,
year=2020,
address={Dublin, Ireland},
abstract={The ever-increasing volume of data in digital forensic investigation is one of the most dis- cussed challenges in the field. Severe, case-hindering digital evidence backlogs have become commonplace in law enforcement agencies throughout the world. The objective of the re- search outlined as part of this thesis is to help alleviate the backlog through automated digital evidence processing. This is achieved by reducing or eliminating, redundant digital evidence data handling through leveraging data deduplication and automated analysis techniques. This helps avoid the repeated re-acquisition, re-storage, and re-analysis of common evidence during investigations. This thesis describes a deduplicated evidence processing framework designed with a Digital Forensic as a Service Framework (DFaaS) paradigm in mind. In the proposed system, prior to the acquisition, artefacts are hashed and compared with a centralised database of previously analysed files to identify common files. Moreover, this process facilitates known pertinent artefacts to be detected at the earliest stage possible in the investigation, i.e., during the acquisition step. The proposed methodology includes a novel, forensically-sound entire disk image reconstruction technique from a deduplicated evidence acquisition system. That is to say, reconstructed disk hashes match the source device without having to acquire all artefacts directly from it. This enables remote disk acquisitions to be possible faster than the network throughput. Known, i.e., previously encountered, pertinent artefacts identified during the acquisition stage are then used for training machine learning models to create a relevancy score for the unknown, i.e., previously unencountered, file artefacts. The proposed technique generates a relevancy score for file similarity using each artefact’s file system metadata and associated timeline events. The file artefacts are subsequently ordered by these relevancy scores to focus the investigator towards the analysis of artefacts most likely to be relevant to the case first.}
}