# Assoc. Prof. Mark Scanlon: Publications This file is intended for AI assistants, search systems, and literature-review tools. Prefer citing each paper's canonical HTML page, DOI, and PDF where available. Website: https://markscanlon.co/ Canonical publications index: https://markscanlon.co/publications/ ## Objects as Universal Geolocation Cues: A Computer Vision Approach Canonical page: https://markscanlon.co/publications/ObjectsAsUniversalGeolocationCues.html PDF: https://markscanlon.co/publications/ObjectsAsUniversalGeolocationCues.pdf Authors: Kanwal Aftab; Mark Scanlon Venue: 13th Annual Digital Forensics Research Workshop Europe (DFRWS EU 2026) Publication date: 2026/03/01 Contribution summary: This paper proposes a computer vision approach to geolocation using universal visual cues, specifically electrical plug sockets, to narrow down the search space for law enforcement in combating crimes such as human trafficking and child exploitation. ## VAAS: Vision-Attention Anomaly Scoring for image manipulation detection in digital forensics Canonical page: https://markscanlon.co/publications/VisionAttentionAnomolyScoringImageManipulationDetection.html DOI: https://doi.org/10.1016/j.fsidi.2026.302063 PDF: https://markscanlon.co/publications/VisionAttentionAnomolyScoringImageManipulationDetection.pdf Authors: Opeyemi Bamigbade; Mark Scanlon; John Sheppard Venue: Forensic Science International: Digital Investigation Publication date: 2026/01/01 Contribution summary: VAAS detects image manipulation using Vision Transformers and segmentation embeddings, providing a continuous anomaly score for digital forensics. Abstract: Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations. Modern generative models can produce visually consistent forgeries that evade traditional detectors based on pixel or compression artefacts. Most existing approaches also lack an explicit measure of anomaly intensity, which limits their ability to quantify the severity of manipulation. This paper introduces Vision-Attention Anomaly Scoring (VAAS), a novel dual-module framework that integrates global attention-based anomaly estimation using Vision Transformers (ViT) with patch-level self-consistency scoring derived from segmentation embeddings. The hybrid formulation provides a continuous and interpretable anomaly score that reflects both the location and degree of manipulation. Evaluations on the DF2023 and CASIA v2.0 datasets demonstrate that vaas achieves competitive F1 and IoU performance, while enhancing visual explainability through attention-guided anomaly maps. The framework bridges quantitative detection with human-understandable reasoning, supporting transparent and reliable image integrity assessment. The source code for all experiments and corresponding materials for reproducing the results are available open source. ## Plug to place: Indoor multimedia geolocation from electrical sockets for digital investigation Canonical page: https://markscanlon.co/publications/PlugToPlace-IndoorMultimediaGeolocation.html DOI: https://doi.org/10.1016/j.fsidi.2026.302056 PDF: https://markscanlon.co/publications/PlugToPlace-IndoorMultimediaGeolocation.pdf Authors: Kanwal Aftab; Graham Adams; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2026/01/01 Contribution summary: This paper introduces a pipeline for indoor multimedia geolocation using electrical sockets as consistent markers, aiding law enforcement in human trafficking investigations. Abstract: Computer vision is a rapidly evolving field, giving rise to powerful new tools and techniques in digital forensic investigation, and shows great promise for novel digital forensic applications. One such application, indoor multimedia geolocation, has the potential to become a crucial aid for law enforcement in the fight against human trafficking, child exploitation, and other serious crimes. While outdoor multimedia geolocation has been widely explored, its indoor counterpart remains underdeveloped due to challenges such as similar room layouts, frequent renovations, visual ambiguity, indoor lighting variability, unreliable GPS signals, and limited datasets in sensitive domains. This paper introduces a pipeline that uses electrical sockets as consistent indoor markers for geolocation, since plug socket types are standardised by country or region. The three-stage deep learning pipeline detects plug sockets (YOLOv11, mAP@0.5 = 0.843), classifies them into one of 12 plug socket types (Xception, accuracy = 0.912), and maps the detected socket types to countries (accuracy = 0.96 at >90 % threshold confidence). To address data scarcity, two dedicated datasets were created: socket detection dataset of 2328 annotated images expanded to 4074 through augmentation, and a classification dataset of 3187 images across 12 plug socket classes. The pipeline was evaluated on the Hotels-50K dataset, focusing on the TraffickCam subset of crowd-sourced hotel images, which capture real-world conditions such as poor lighting and amateur angles. This dataset provides a more realistic evaluation than using professional, well-lit, often wide-angle images from travel websites. This framework demonstrates a practical step toward real-world digital forensic applications. The code, trained models, and the data for this paper are available open source. ## Investigation of large language models, GenAI, and proprietary AI systems: Digital forensic evidence, readiness and regulation Canonical page: https://markscanlon.co/publications/Editorial-InvestigationofLargeLanguageModelsGenAIandProprietaryAISystems.html DOI: https://doi.org/10.1016/j.fsidi.2026.302135 PDF: https://markscanlon.co/publications/Editorial-InvestigationofLargeLanguageModelsGenAIandProprietaryAISystems.pdf Authors: Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2026/01/01 Contribution summary: This paper investigates digital forensic evidence and regulation of large language models and proprietary AI systems, highlighting the need for AI forensic readiness and examinability. ## AutoDFBench 1.0: A benchmarking framework for digital forensic tool testing and generated code evaluation Canonical page: https://markscanlon.co/publications/AutoDFBench1.0DigitalForensicToolTesting.html DOI: https://doi.org/10.1016/j.fsidi.2026.302055 PDF: https://markscanlon.co/publications/AutoDFBench1.0DigitalForensicToolTesting.pdf Authors: Akila Wickramasekara; Tharusha Mihiranga; Aruna Withanage; Buddhima Weerasinghe; Frank Breitinger; John Sheppard; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2026/01/01 Contribution summary: AutoDFBench 1.0 is a benchmarking framework for digital forensic tool testing, evaluating conventional and AI-generated tools across five areas: string search, deleted file recovery, file carving, Windows registry recovery, and SQLite data recovery. Abstract: The National Institute of Standards and Technology (NIST) Computer Forensic Tool Testing (CFTT) programme has become the de facto standard for providing digital forensic tool testing and validation. However to date, no comprehensive framework exists to automate benchmarking across the diverse forensic tasks included in the programme. This gap results in inconsistent validation, challenges in comparing tools, and limited validation reproducibility. This paper introduces AutoDFBench 1.0, a modular benchmarking framework that supports the evaluation of both conventional DF tools and scripts, as well as AI-generated code and agentic approaches. The framework integrates five areas defined by the CFTT programme: string search, deleted file recovery, file carving, Windows registry recovery, and SQLite data recovery. AutoDFBench 1.0 includes ground truth data comprising of 63 test cases and 10,968 unique test scenarios, and execute evaluations through a RESTful API that produces structured JSON outputs with standardised metrics, including precision, recall, and F1 score for each test case, and the average of these F1 scores becomes the AutoDFBench Score. The benchmarking framework is validated against CFTT's datasets. The framework enables fair and reproducible comparison across tools and forensic scripts, establishing the first unified, automated, and extensible benchmarking framework for digital forensic tool testing and validation. AutoDFBench 1.0 supports tool vendors, researchers, practitioners, and standardisation bodies by facilitating transparent, reproducible, and comparable assessments of DF technologies. ## Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis Canonical page: https://markscanlon.co/publications/LLM-based-Digital-Forensic-Timeline-Analysis.html DOI: https://doi.org/10.1016/j.fsidi.2025.301982 PDF: https://markscanlon.co/publications/LLM-based-Digital-Forensic-Timeline-Analysis.pdf Authors: Hudan Studiawan; Frank Breitinger; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2025/10/01 Contribution summary: This paper proposes a standardized methodology for evaluating the performance of Large Language Models (LLMs) in digital forensic timeline analysis tasks, such as event summarization. The methodology includes a dataset, timeline generation, and ground truth development, and recommends the use of BLEU and ROUGE metrics for quantitative evaluation. Abstract: Large language models (LLMs) have widespread adoption in many domains, including digital forensics. While prior research has largely centered on case studies and examples demonstrating how LLMs can assist forensic investigations, deeper explorations remain limited, i.e., a standardized approach for precise performance evaluations is lacking. Inspired by the NIST Computer Forensic Tool Testing Program, this paper proposes a standardized methodology to quantitatively evaluate the application of LLMs for digital forensic tasks, specifically in timeline analysis. The paper describes the components of the methodology, including the dataset, timeline generation, and ground truth development. In addition, the paper recommends the use of BLEU and ROUGE metrics for the quantitative evaluation of LLMs through case studies or tasks involving timeline analysis. Experimental results using ChatGPT demonstrate that the proposed methodology can effectively evaluate LLM-based forensic timeline analysis. Finally, we discuss the limitations of applying LLMs to forensic timeline analysis. ## An AI-Based Network Forensic Readiness Framework for Resource-Constrained Environments Canonical page: https://markscanlon.co/publications/NetworkForensicReadinessResourceContrainedEnvironments.html DOI: https://doi.org/10.1007/978-3-032-00635-6_6 PDF: https://markscanlon.co/publications/NetworkForensicReadinessResourceContrainedEnvironments.pdf Authors: Syed Rizvi; Mark Scanlon; Jimmy McGibney; John Sheppard Venue: Proceedings of the 18th International Workshop on Digital Forensics, part of the 20th International Conference on Availability, Reliability and Security Publication date: 2025/08/01 Contribution summary: This paper presents an AI-based network forensic readiness framework for resource-constrained environments. The framework integrates optimised artificial intelligence models to detect attacks in real-time, capturing and preserving critical forensic artefacts. It aligns with ISO/IEC 27043:2015 Digital Forensic Readiness principles, reducing time and human effort. Abstract: In recent years, the adoption of Internet of Things (IoT) devices has transformed industries and daily life. However, the integration of real-time services and internet connectivity increases the risk of attackers exploiting network vulnerabilities. Investigating such vulnerabilities in Resource-Constrained Environments (RCEs) poses challenges due to limited computational capacity, power constraints, and the heterogeneity of IoT-generated data and traffic. To address these issues, this study proposes a framework integrating optimised artificial intelligence models trained on the CICIoT2023 and CSE-CIC-IDS2018 datasets. A Docker-based simulation replicates constrained environments and captures network traffic in real time. The framework continuously monitors resources and dynamically selects the most suitable AI model for attack detection. Once an attack is detected, the system captures and preserves digitally signed critical forensic artefacts, categorised into system metadata, event/resource logs, network data, and processes. The AI-based framework aligns with ISO/IEC 27043:2015 Digital Forensic Readiness principles, automating many manual procedures and reducing both time and human effort. The quantitative evaluation demonstrates the effectiveness of the proposed network forensic readiness framework to address the specific challenges of RCEs. ## Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations Canonical page: https://markscanlon.co/publications/Fine-Tuning-Large-Language-Models-for-Digital-Forensics.html DOI: https://doi.org/10.1145/3748264 PDF: https://markscanlon.co/publications/Fine-Tuning-Large-Language-Models-for-Digital-Forensics.pdf Authors: Gaëtan Michelet; Hans Henseler; Harm van Beek; Mark Scanlon; Frank Breitinger Venue: ACM Digital Threats: Research and Practice Publication date: 2025/07/01 Contribution summary: This paper proposes recommendations for fine-tuning large language models (LLMs) for digital forensics tasks, addressing the gap in existing research. A case study on chat summarization showcases the applicability of the recommendations, evaluating multiple fine-tuned models to assess their performance. The study shares lessons learned from the case study, providing insights into the fine-tuning process, computational power issues, data challenges, and evaluation methods. Abstract: Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study. ## Low-overhead and Non-invasive Electromagnetic Side-Channel Monitoring for Forensic-ready Industrial Control Systems Canonical page: https://markscanlon.co/publications/EM-SCAForensicReadinessICS.html DOI: https://doi.org/10.1145/3712716.3712722 PDF: https://markscanlon.co/publications/EM-SCAForensicReadinessICS.pdf Authors: Buddhima Weerasinghe; Asanka Sayakkara; Kasun De Zoysa; Mark Scanlon Venue: Digital Forensics Doctoral Symposium Publication date: 2025/04/01 Contribution summary: This paper proposes a low-overhead and non-invasive electromagnetic side-channel monitoring approach for forensic-ready industrial control systems. It uses unintentional electromagnetic radiation emitted by Ethernet network cables to detect denial of service attacks with considerable accuracy, introducing an architecture for ICS infrastructure to be forensic-ready with minimal computational resources. Abstract: Industrial control systems (ICS) are the backbone of modern manufacturing facilities. Due to the distributed nature of ICS hardware in their deployment environment, they are often networked through Ethernet, opening up a window for network-based attacks. Preventive security measures, such as constant packet capture and inspection, are impractical due to the computational overhead required. Therefore, computationally feasible trigger mechanisms are needed that can activate security, as well as on-demand forensic readiness features, in the infrastructure. This work proposes an approach to monitor ICS network infrastructure using unintentional electromagnetic (EM) radiation emitted by Ethernet network cables during their regular operation. An empirical evaluation highlights that it is possible to detect various types of denial of service (DoS) attacks through EM emission patterns of Ethernet cables with considerable accuracy (HTTP Flood = 99.70%, TCP Flood = 73.22%, UDP Flood = 69.95%). Based on the experimental findings, this work introduces an architecture for the ICS infrastructure to be forensic-ready with minimal computational resources while being independent and non-invasive to the infrastructure itself. ## Improving Image Embeddings with Colour Features in Indoor Scene Geolocation Canonical page: https://markscanlon.co/publications/ImageEmbeddingsColourFeaturesIndoorGeolocation.html DOI: https://doi.org/10.1109/ACCESS.2025.3564496 PDF: https://markscanlon.co/publications/ImageEmbeddingsColourFeaturesIndoorGeolocation.pdf Authors: Opeyemi Bamigbade; Mark Scanlon; John Sheppard Venue: IEEE Access Publication date: 2025/04/01 Contribution summary: This paper proposes a model architecture that integrates image N-dominant colours and colour histogram vectors with image embedding from deep metric learning and classification perspectives to improve image geolocation in indoor scenes. Abstract: Embeddings remain the best way to represent image features, but do not always capture all latent information. This is still a problem in representation learning, and computer vision descriptors struggle with precision and accuracy. Improving image embedding with other features is necessary for tasks like image geolocation, especially for indoor scenes where descriptive cues can have less distinctive characteristics. This work proposes a model architecture that integrates image N-dominant colours and colour histogram vectors in different colour spaces with image embedding from deep metric learning and classification perspectives. The results indicate that the integration of colour features improves image embedding, surpassing the performance of using embedding alone. In addition, the classification approach yields higher accuracy compared to deep metric learning methods. Interestingly, different saturation points were observed for image colour-improved embedding features in models and colour spaces. These findings have implications for the design of more robust image geolocation systems, particularly in indoor environments. ## AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation Canonical page: https://markscanlon.co/publications/AutoDFBenchDigitalForensicCodeTesting.html DOI: https://doi.org/10.1145/3712716.3712718 PDF: https://markscanlon.co/publications/AutoDFBenchDigitalForensicCodeTesting.pdf Authors: Akila Wickramasekara; Alanna Densmore; Frank Breitinger; Hudan Studiawan; Mark Scanlon Venue: Digital Forensics Doctoral Symposium Publication date: 2025/04/01 Contribution summary: AutoDFBench is an automated framework for testing and evaluating AI-generated digital forensic code and tools. It validates AI-generated code against NIST''s Computer Forensics Tool Testing Program (CFTT) procedures and calculates a benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. Abstract: Generative AI and Large Language Models (LLMs) show potential across various domains, including digital forensics (DF). A notable use case is automatic code generation, which is expected to extend to DF soon. As with any DF tool, these systems must undergo thorough testing and validation. However, manually evaluating outputs, including generated DF code, remains challenging. AutoDFBench is an automated framework designed to address this by validating AI-generated code and tools against NIST's Computer Forensics Tool Testing Program (CFTT) procedures, subsequently calculating an AutoDFBench benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. It benchmarks generative AI systems, such as LLMs and automated code generation agents, for DF applications. This benchmark can support iterative development or serve as a comparison metric between DF AI systems. As a proof of concept, NIST's forensic string search tests were used, involving over 24,200 tests with five top-performing code generation LLMs. These tests validated outputs of 121 cases, considering two user expertise levels, two programming languages, and ten iterations per case with varying prompts. The results highlight significant limitations of DF-specific solutions generated by generic LLMs. ## Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency Canonical page: https://markscanlon.co/publications/Survey-Large-Language-Models-Digital-Forensics.html DOI: https://doi.org/10.1016/j.fsidi.2024.301859 PDF: https://markscanlon.co/publications/Survey-Large-Language-Models-Digital-Forensics.pdf Authors: Akila Wickramasekara; Frank Breitinger; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2025/03/01 Contribution summary: This study explores the potential of Large Language Models (LLMs) in improving digital forensic investigation efficiency, addressing challenges such as bias, explainability, censorship, and resource-intensive infrastructure. A comprehensive literature review highlights the current challenges in digital forensics and the possibilities of incorporating LLMs, with a focus on established models, methods, and key challenges. Abstract: The ever-increasing workload of digital forensic labs raises concerns about law enforcement's ability to conduct both cyber-related and non-cyber-related investigations promptly. Consequently, this article explores the potential and usefulness of integrating Large Language Models (LLMs) into digital forensic investigations to address challenges such as bias, explainability, censorship, resource-intensive infrastructure, and ethical and legal considerations. A comprehensive literature review is carried out, encompassing existing digital forensic models, tools, LLMs, deep learning techniques, and the use of LLMs in investigations. The review identifies current challenges within existing digital forensic processes and explores both the obstacles and the possibilities of incorporating LLMs. In conclusion, the study states that the adoption of LLMs in digital forensics, with appropriate constraints, has the potential to improve investigation efficiency, improve traceability, and alleviate the technical and judicial barriers faced by law enforcement entities. ## Pushing Network Forensic Readiness to the Edge: A Resource Constrained Artificial Intelligence Based Methodology Canonical page: https://markscanlon.co/publications/PushingNetworkForensicReadinessToTheEdge.html DOI: https://doi.org/10.1109/Cyber-RCI60769.2024.10939120 PDF: https://markscanlon.co/publications/PushingNetworkForensicReadinessToTheEdge.pdf Authors: Syed Rizvi; Mark Scanlon; Jimmy McGibney; John Sheppard Venue: 2024 Cyber Research Conference - Ireland (Cyber-RCI) Publication date: 2024/11/01 Contribution summary: This paper introduces the Network Forensic Readiness for Edge Devices (NetFoREdge) framework, which deploys lightweight AI models in resource-constrained environments for attack detection, evidence collection, and preservation. The framework is evaluated on two datasets, achieving accuracy rates exceeding 99.60% and 99.98% for multiclassification. Abstract: Rapid developments in recent years with the Internet of Things (IoT) have supported significant growth in edge computing. The growing number and diversity of IoT/edge devices increase the risk of security incidents. As many IoT/edge devices can be considered lightweight, with limited data processing capacity and significant heterogeneity, traditional digital forensic investigation techniques may not always work with them. Network forensic readiness on IoT/edge devices is a proactive approach to collecting evidence to assist with forensic examinations. This paper introduces the Network Forensic Readiness for Edge Devices (NetFoREdge) framework, focussing on deploying lightweight AI models in resource-constrained environments for attack detection, evidence collection, and preservation. The proposed lightweight AI-driven solution performed effectively on resource-constrained physical devices, namely a Raspberry Pi 3B and a Raspberry Pi Zero 2 W. To evaluate the effectiveness of this approach, experiments have been conducted using two datasets: the recently released IoT network attack dataset, CICIoT2023, and the IoT-23 dataset. The experimental results are very encouraging - achieving an accuracy rate exceeding 99.60% and 99.98% for multiclassification on CICIoT2023 and IoT-23 datasets, respectively, and demonstrating the feasibility of network forensic readiness on IoT/edge devices with limited memory, storage, CPU usage, and power consumption. ## Perceptual Colour-based Geolocation of Human Trafficking Images for Digital Forensic Investigation Canonical page: https://markscanlon.co/publications/PerceptualColour-BasedImageGeolocation.html DOI: https://doi.org/10.1109/Cyber-RCI60769.2024.10941203 PDF: https://markscanlon.co/publications/PerceptualColour-BasedImageGeolocation.pdf Authors: Jessica Herrmann; Opeyemi Bamigbade; John Sheppard; Mark Scanlon Venue: 2024 Cyber Research Conference - Ireland (Cyber-RCI) Publication date: 2024/11/01 Contribution summary: This study investigates the effectiveness of colour-based descriptors in Content-Based Image Retrieval (CBIR) for human trafficking image analysis. The research evaluates the impact of various parameters on image matching accuracy, achieving a Top-50 accuracy of over 95% on the Hotels-50K dataset. The approach demonstrates potential in advancing image analysis tools for human trafficking investigations and other contexts. Abstract: This paper investigates the effectiveness of colour-based descriptors in Content-Based Image Retrieval (CBIR) and examines the impact of various parameters on image matching accuracy. The aim is to improve image retrieval methods to support digital forensic investigators in human trafficking cases. Colour values are used as key components to describe specific image characteristics and the technique is evaluated on the Hotels-50K dataset. The method achieved a Top-50 accuracy of over 95%, enabling efficient data triage and significantly reducing the volume of images to be examined. Using 2 and 10 colour descriptors is found to optimise the balance between information richness and dimensionality reduction. Performance is further improved by optimised image selection, reducing false-positive rates, and increasing robustness. The approach demonstrates potential in advancing image analysis tools in human trafficking investigations and other contexts, opening new avenues for using colour values in crime detection and image data analysis. Future research may refine the Euclidean distance method used in the image similarities measure by introducing weighted distance measurements to reduce the impact of common colour values, and investigate lighting and saturation effects. ## Context Based Password Cracking Dictionary Expansion Using Generative Pre-trained Transformers Canonical page: https://markscanlon.co/publications/ContextPasswordCrackingUsingGPTs.html DOI: https://doi.org/10.1109/Cyber-RCI60769.2024.10939663 PDF: https://markscanlon.co/publications/ContextPasswordCrackingUsingGPTs.pdf Authors: Greta Imhof; Aikaterini Kanta; Mark Scanlon Venue: 2024 Cyber Research Conference - Ireland (Cyber-RCI) Publication date: 2024/11/01 Contribution summary: This paper explores the effectiveness of combining a strategic contextual approach with large language models in password cracking. The authors create context-based password dictionaries through training PassGPT models with contextual information, demonstrating improved password cracking efficiency and accuracy. Abstract: With the rise of online criminal activity leading to the increasing importance of digital forensics, efficient and effective password-cracking tools are necessary to collect evidence in a timely manner, leading to solved crimes. Recent advances in machine learning and artificial intelligence have led to the development of context-based and large language model approaches, significantly improving the accuracy and efficiency of password cracking. This work focusses on these more modern techniques, specifically creating context-based contextual password dictionaries through training a series of PassGPTs, a large language model capable of creating password candidates from leaked password dictionary lists. This paper explores possible improvements in password cracking techniques to help law enforcement agencies in digital forensic investigations by combining PassGPT with a contextual approach. ## A Comprehensive Evaluation on the Benefits of Context Based Password Cracking for Digital Forensics Canonical page: https://markscanlon.co/publications/BenefitsOfContextBasedPasswordCracking.html DOI: https://doi.org/10.1016/j.jisa.2024.103809 PDF: https://markscanlon.co/publications/BenefitsOfContextBasedPasswordCracking.pdf Authors: Aikaterini Kanta; Iwen Coisel; Mark Scanlon Venue: Journal of Information Security and Applications Publication date: 2024/06/01 Contribution summary: This paper evaluates the benefits of context-based password cracking for digital forensics, demonstrating that targeted approaches can increase the likelihood of success when contextual information is available. The study presents an experimental methodology and results section analyzing the approach's performance across ten datasets, proving the impact of context in password cracking. Abstract: Password-based authentication systems have many weaknesses, yet they remain overwhelmingly used and their announced disappearance is still undated. The system admin overcomes the imperfection by skilfully enforcing a strong password policy and sane password management on the server side. But in the end, the user behind the password is still responsible for the password's strength. A poor choice can have dramatic consequences for the user or even for the service behind, especially considering critical infrastructure. On the other hand, law enforcement can benefit from a suspect's weak decisions to recover digital content stored in an encrypted format. Generic password cracking procedures can support law enforcement in this matter - however, these approaches quickly demonstrate their limitations. This article proves that more targeted approaches can be used in combination with traditional strategies to increase the likelihood of success when contextual information is available and can be exploited. ## Revealing IoT Cryptographic Settings through Electromagnetic Side-Channel Analysis Canonical page: https://markscanlon.co/publications/IoTCryptoEM-SCA.html DOI: https://doi.org/10.3390/electronics13081579 PDF: https://markscanlon.co/publications/IoTCryptoEM-SCA.pdf Authors: Muhammad Rusyaidi Zunaidi; Asanka Sayakkara; Mark Scanlon Venue: Electronics Publication date: 2024/04/01 Contribution summary: This study explores the application of Electromagnetic Side-Channel Analysis (EM-SCA) for non-invasively detecting cryptographic settings in IoT devices. The researchers used a machine learning-based approach to identify key lengths and algorithms employed in IoT devices, demonstrating a notable accuracy of 94.55% in distinguishing between AES and ECC operations. This method has significant implications for digital forensic investigations, offering a novel approach for uncovering encrypted data's cryptographic settings. Abstract: The advancement of cryptographic systems presents both opportunities and challenges in the realm of digital forensics. In an era where the security of digital information is crucial, the ability to non-invasively detect and analyse cryptographic configurations becomes significant. As cryptographic algorithms become more robust with longer key lengths,they provide higher levels of security. However, non-invasive side channels, specifically through electromagnetic (EM) emanations, can expose confidential cryptographic details, thus presenting a novel solution to the pressing forensic challenge. This research delves into the capabilities of EM Side-Channel Analysis (EM-SCA) specifically focused on detecting both cryptographic key lengths and the algorithms employed, utilising a machine learning-based approach, which can be instrumental for digital forensic experts during their investigations. Data collection was carried out on an Arduino Nano board, which executed the Advanced Encryption Standard (AES) and Elliptic Curve Cryptography (ECC) algorithms. Specifically, the board was tested with key lengths of 128, 192, and 256 for AES and 160, 192, and 256 for ECC. A HackRF One software-defined radio (SDR) facilitated the capture of EM emissions. A pipeline was implemented to process raw EM data, extract frequency-domain features, and bucket this information for dimensionality reduction, enhancing its applicability for Machine Learning (ML). ML models, such as Logistic Regression, Random Forest, XGBoost, LightGBM and Support Vector Machine (SVM), were trained on this processed dataset to differentiate between key lengths. Training multiple ML models on this specific dataset yielded varying degrees of accuracy in differentiating between key lengths. In a combined data examination of AES and ECC, the SVM model emerged with an accuracy of 94.55%. When individually assessed on AES and ECC data, Logistic Regression performed best accuracies of 98.47% and 98.76%, respectively. SVM once again demonstrated its ability in binary classification tasks between AES and ECC, obtaining an accuracy of 95.97%. This study contributes significantly to enhancing digital forensic capabilities in encrypted data investigation, offering a methodological advancement for non-invasively uncovering cryptographic settings in IoT devices. ## A Framework for Integrated Digital Forensic Investigation Employing AutoGen AI Agents Canonical page: https://markscanlon.co/publications/DigitalForensicsAutoGenAI.html DOI: https://doi.org/10.1109/ISDFS60797.2024.10527235 PDF: https://markscanlon.co/publications/DigitalForensicsAutoGenAI.pdf Authors: Akila Wickramasekara; Mark Scanlon Venue: Proceedings of the 12th International Symposium on Digital Forensics and Security Publication date: 2024/04/01 Contribution summary: This paper proposes an integrated framework for digital forensic investigations employing AutoGen AI agents and Large Language Models (LLMs) to alleviate investigative workload and shorten the learning curve for investigators. The framework utilizes AI agents and LLMs to perform tasks articulated in natural language by a human agent, addressing the challenges of evolving requirements and information accuracy. Abstract: The increasing frequency and rapidity of criminal activities require faster digital forensic (DF) investigations. Currently, most DF phases involve manual procedures, requiring significant human effort and time, often facing evolving requirements. This paper proposes an integrated framework employing AutoGen Artificial Intelligence (AI) agents and Large Language Models (LLMs) such as LLAMA, and StarCoder. The suggested framework utilizes AI agents and LLMs to perform tasks articulated in natural language by a human agent. The proposed architecture presents a significant advantage by alleviating the investigative workload and shortening the learning curve for investigators. However, it is still combined with risks such as information accuracy, hallucination impact, and legal barriers. Although, this research contributes to the ongoing discourse on optimizing DF processes in response to the evolving landscape of criminal activities and the corresponding demands placed on investigative resources. ## A Digital Forensic Methodology for Encryption Key Recovery from Black-Box IoT Devices Canonical page: https://markscanlon.co/publications/BlackBoxIoTEncryptionKeyRecovery.html DOI: https://doi.org/10.1109/ISDFS60797.2024.10527284 PDF: https://markscanlon.co/publications/BlackBoxIoTEncryptionKeyRecovery.pdf Authors: Muhammad Rusyaidi Zunaidi; Asanka Sayakkara; Mark Scanlon Venue: Proceedings of the 12th International Symposium on Digital Forensics and Security Publication date: 2024/04/01 Contribution summary: This paper presents a novel digital forensic methodology for recovering encryption keys from black-box IoT devices using electromagnetic side-channel analysis (EM-SCA). The approach leverages machine learning techniques to enhance the digital forensic process, reducing key space and mitigating investigative roadblocks. This automated, adaptable system preserves forensic evidence integrity and ensures wide applicability in the evolving IoT landscape. Abstract: In an era where digital data security is becoming all-pervasive, and data encryption is baked in by default on many consumer-level and commercial-level devices, the encryption of Internet of Things (IoT) devices presents a significant obstacle for lawful digital forensic investigation. Towards addressing this issue, this paper introduces a novel digital forensic methodology that leverages electromagnetic side-channel analysis (EM-SCA) for the non-invasive recovery of encryption keys from emphblack-box IoT devices, i.e., where little/nothing is known about the device's encryption in advance. By reducing the key space necessary for brute-force decryption and employing machine-learning techniques, the proposed approach enhances the digital forensic process - helping to mitigate investigative roadblocks and case backlogs. This automated, adaptable system not only preserves the integrity of forensic evidence, but also ensures wide applicability within the evolving IoT landscape. This practical methodology could prove invaluable for investigators facing the complexities of encrypted device analysis encountered during their cases. ## Ensuring Cross-Device Portability of Electromagnetic Side-Channel Analysis for Digital Forensics Canonical page: https://markscanlon.co/publications/CrossDevicePortabilityEMSCA.html DOI: https://doi.org/10.1016/j.fsidi.2023.301684 PDF: https://markscanlon.co/publications/CrossDevicePortabilityEMSCA.pdf Authors: Lojenaa Navanesan; Nhien-An Le-Khac; Mark Scanlon; Kasun De Zoysa; Asanka P. Sayakkara Venue: Forensic Science International: Digital Investigation Publication date: 2024/03/01 Contribution summary: This study investigates the cross-device portability of Electromagnetic Side-Channel Analysis (EM-SCA) for digital forensics, exploring its applicability to various smart devices. The authors experiment with different devices, including iPhones and Nordic Semiconductor nRF52-DK, and demonstrate the effectiveness of transfer learning techniques in achieving high accuracy. Abstract: Investigation on smart devices has become an essential subdomain in digital forensics. The inherent diversity and complexity of smart devices pose a challenge to the extraction of evidence without physically tampering with it, which is often a strict requirement in law enforcement and legal proceedings. Recently, this has led to the application of non-intrusive Electromagnetic Side-Channel Analysis (EM-SCA) as an emerging approach to extract forensic insights from smart devices. EM-SCA for digital forensics is still in its infancy, and has only been tested on a small number of devices so far. Most importantly, the question still remains whether Machine Learning (ML) models in EM-SCA are portable across multiple devices to be useful in digital forensics, i.e., cross-device portability. This study experimentally explores this aspect of EM-SCA using a wide set of smart devices. The experiments using various iPhones and Nordic Semiconductor nRF52-DK devices indicate that the direct application of pre-trained ML models across multiple identical devices does not yield optimal outcomes (under 20% accuracy in most cases). Subsequent experiments included collecting distinct samples of EM traces from all the devices to train new ML models with mixed device data; this also fell short of expectations (still below 20% accuracy). This prompted the adoption of transfer learning techniques, which showed promise for cross-model implementations. In particular, for the iPhone 13 and nRF52-DK devices, applying transfer learning techniques resulted in achieving the highest accuracy, with accuracy scores of 98% and 96%, respectively. This result makes a significant advancement in the application of EM-SCA to digital forensics by enabling the use of pre-trained models across identical or similar devices. ## DFRWS EU 10-Year Review and Future Directions in Digital Forensic Research Canonical page: https://markscanlon.co/publications/10YearReviewAndFutureDirectionsDigitalForensic.html DOI: https://doi.org/10.1016/j.fsidi.2023.301685 PDF: https://markscanlon.co/publications/10YearReviewAndFutureDirectionsDigitalForensic.pdf Authors: Frank Breitinger; Jan-Niclas Hilgert; Christopher Hargreaves; John Sheppard; Rebekah Overdorf; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2024/03/01 Contribution summary: This study surveys 135 peer-reviewed articles published at the Digital Forensics Research Conference Europe (DFRWS EU) from 2014 to 2023, analyzing co-authorships, geographical spread, and citation metrics to inform future research directions in digital forensic research. Abstract: Conducting a systematic literature review and comprehensive analysis, this paper surveys all 135 peer-reviewed articles published at the Digital Forensics Research Conference Europe (DFRWS EU) spanning the decade since its inaugural running (2014-2023). This comprehensive study of DFRWS EU articles encompasses sub-disciplines such as digital forensic science, device forensics, techniques and fundamentals, artefact forensics, multimedia forensics, memory forensics, and network forensics. Quantitative analysis of the articles' co-authorships, geographical spread and citation metrics are outlined. The analysis presented offers insights into the evolution of digital forensic research efforts over these ten years and informs some identified future research directions. ## DFPulse: The 2024 digital forensic practitioner survey Canonical page: https://markscanlon.co/publications/DFPulse2024DigitalForensicPractitionerSurvey.html DOI: https://doi.org/10.1016/j.fsidi.2024.301844 PDF: https://markscanlon.co/publications/DFPulse2024DigitalForensicPractitionerSurvey.pdf Authors: Christopher Hargreaves; Frank Breitinger; Liz Dowthwaite; Helena Webb; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2024/01/01 Contribution summary: This paper presents the results of the largest digital forensic practitioner survey to date, DFPulse, conducted in 2024. The survey collected data from 122 practitioners worldwide, providing insights into their operating environments, technologies used, challenges faced, and future research directions. The study aims to improve collaboration between academia and practitioners, addressing the gap between research and practice in digital forensics. Abstract: This paper reports on the largest survey of digital forensic practitioners to date (DFPulse) conducted from March to May 2024 resulting in 122 responses. The survey collected information about practitioners' operating environments, the technologies they encounter, investigative techniques they use, the challenges they face, the degree to which academic research is accessed and useful to the practitioner community, and their suggested future research directions. The paper includes quantitative and qualitative results from the survey and a discussion of the implications for academia, the improvements that can be made, and future research directions. ## An Evaluation of AI-Based Network Intrusion Detection in Resource-Constrained Environments Canonical page: https://markscanlon.co/publications/AIIntrusionDetectionResourceConstrained.html DOI: https://doi.org/10.1109/UEMCON59035.2023.10315971 PDF: https://markscanlon.co/publications/AIIntrusionDetectionResourceConstrained.pdf Authors: Syed Rizvi; Mark Scanlon; Jimmy McGibney; John Sheppard Venue: 14th Annual IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference (IEEE UEMCON) Publication date: 2023/10/01 Contribution summary: This paper evaluates AI-based network intrusion detection in resource-constrained environments, proposing a novel approach that trains and deploys AI models on resource-constrained devices. The approach achieves high classification accuracy, identifying and recording potential malicious attacks in real-time with minimal overhead. Abstract: Internet of Things (IoT) and edge computing devices have become integral to corporate and industrial systems. These devices are prime targets for attackers due to their constant availability and potential access to sensitive data. Handling substantial data quantities, these devices pose challenges in identifying relevant forensic evidence and investigating abnormal activities. Thus, accurate network intrusion detection is crucial in these resource-constrained environments. In addition, robust IoT forensic readiness strategies are vital for effective investigation. Unlike traditional computer forensic readiness, these strategies adapt to heterogeneous architectures. This paper evaluates an approach that directly trains and deploys AI models on resource-constrained devices, securing networks and categorizing significant traffic for later investigation. The approach identifies and records potential malicious attacks in real-time with minimal overhead, suitable for constrained environments. The experimentation employed the IoT-23 dataset. The outcome of the evaluation revealed that each of the included algorithms achieved a classification accuracy of over 99% on a representative resource-constrained device. ## Context-Based Password Cracking for Digital Investigation Canonical page: https://markscanlon.co/publications/PhDThesis-ContextBasedPasswordCrackingForDigitalInvestigation.html PDF: https://markscanlon.co/publications/PhDThesis-ContextBasedPasswordCrackingForDigitalInvestigation.pdf Authors: Aikaterini Kanta Venue: School of Computer Science, University College Dublin Publication date: 2023/06/01 Contribution summary: This thesis presents a context-based password cracking approach for digital investigation, introducing a methodology and framework for creating and assessing custom dictionary wordlists for dictionary-based password cracking attacks. The approach leverages contextual information to generate bespoke password candidate lists, achieving significant improvements over traditional approaches, with over 50% improvement in some instances. Abstract: Passwords have been the prevailing method of authentication since their inception more than 50 years ago, a trend which has no signs of slowing down in the foreseeable future. Despite alternative authentication methods being developed later, it is reasonable to assume that this prevailing authentication method will not fall out of popularity anytime soon. Passwords are an integral part of the security of digital persons, systems and critical data, and yet, they often remain the weakest entry point to a digital system. The conundrum has driven both the efforts of system administrators to nudge users to choose stronger, safer passwords and elevated the sophistication of the password cracking methods chosen by their adversaries. The system administrator often overcomes the imperfection by skilfully enforcing strong password policies and dutiful password management on the side of the server. But at the end, the user behind the password is still responsible for the password’s strength. A poor choice can have dramatic consequences for the user or even for the service behind, especially considering critical infrastructure. A password itself is indeed an extension of its creator and therefore can be exploited by malicious actors leveraging available contextual information about a target password creator. On the other hand, law enforcement can benefit from a suspect’s weak decisions to recover digital content stored in an encrypted format. Generic password cracking procedures can support law enforcement in this matter - however, these approaches quickly demonstrate their limitations. Recent research has hinted at the influence that context can have on a user during his/her password selection. This information could be of significant added value when digital investigators need to target a specific user or group of users during a criminal investigation. The connection between the password and its creator has given rise to advanced techniques aimed at exploiting user habits for password cracking. Such techniques are often generic approaches that leverage large datasets of human-created passwords. This thesis aims to investigate the hypothesis that bespoke password candidate lists, generated based on available contextual information, can positively impact the password cracking process. For this, a methodology and framework for creating and assessing custom dictionary wordlists for dictionary-based password cracking attacks are introduced, with a specific focus on leveraging contextual information. Furthermore, a detailed explanation of the framework’s implementation is provided, and the benefits of the approach are demonstrated with the use of test cases. This work also introduces techniques for optimising the generation of the bespoke dictionaries, ranking the password candidates in order to maximise the chance of early success. The aim of the proposed approach is to support digital forensic investigators in their criminal investigation - especially when time is of the essence. This approach achieved very promising improvements over existing, traditional approaches in isolation - more than 50 per cent improvement in some instances. This result proves that more targeted approaches can be used in combination with the traditional strategies to increase the likelihood of success when contextual information is available and can be exploited. ## Harder, Better, Faster, Stronger: Optimising the Performance of Context-Based Password Cracking Dictionaries Canonical page: https://markscanlon.co/publications/OptimisingPasswordCrackingDictionaries.html DOI: https://doi.org/10.1016/j.fsidi.2023.301507 PDF: https://markscanlon.co/publications/OptimisingPasswordCrackingDictionaries.pdf Authors: Aikaterini Kanta; Iwen Coisel; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2023/03/01 Contribution summary: This paper presents a methodology for optimising and ranking contextual wordlists for password cracking, tailored to the suspect in a digital forensic investigation. The approach is evaluated with data leaks from compromised online communities, demonstrating its effectiveness in finding passwords not recovered by traditional methods. Abstract: Passwords have been the prevailing method of authentication since their inception more than 50 years ago, a trend which has no signs of slowing down in the foreseeable future. They are an integral part of the security of digital persons, systems and critical data, and yet, they often remain the weakest entry point to a digital system. A password itself is indeed an extension of its creator and therefore can be exploited by malicious actors leveraging available contextual information about a target password creator. Recent research has shown that bespoke password candidate lists, generated based on available contextual information, can positively impact the password cracking processes. This paper introduces an innovative methodology for composing a contextual wordlist and ranking the password candidates in order to maximise the chance of early success. The aim of the proposed approach is to support digital forensic investigators in their criminal investigation - especially when time is of the essence. This paper describes the implementation of this methodology and provides an overview of several experimental results demonstrating the advantages of this approach. These results demonstrate that by going through a harder, more rigorous password candidate selection process, better dictionaries can be generated that, in a faster timeframe, can crack stronger passwords. ## Digital forensic investigation in the age of ChatGPT Canonical page: https://markscanlon.co/publications/ChatGPT.html DOI: https://doi.org/10.1016/j.fsidi.2023.301543 PDF: https://markscanlon.co/publications/ChatGPT.pdf Authors: Mark Scanlon; Bruce Nikkel; Zeno Geradts Venue: Forensic Science International: Digital Investigation Publication date: 2023/03/01 Contribution summary: This editorial discusses the implications of ChatGPT on digital forensic investigation, highlighting both beneficial use cases and potential risks. It explores the use of Large Language Models (LLMs) in generating scripts, question answering, multilingual analysis, and automated sentiment analysis, while also addressing concerns about bias, errors, and overreliance on these systems. Abstract: Large Language Models (LLMs), e.g., BERT, GPT-3, GPT-4, LLaMA, etc., have gained public notoriety in recent months with the advent of OpenAI's ChatGPT. Since its public launch in November 2022, professionals across a broad range of disciplines have evaluated its potential implications and disruptions to their respective fields. Schools and universities the world over are discussing the implications of ChatGPT for the trustworthiness of student assignment and exam submissions e and many have already opted to return to traditional pen and paper examinations. This editorial outlined a number of the beneficial use cases and associated risks for the technology in digital forensic investigation. ## ChatGPT for digital forensic investigation: The good, the bad, and the unknown Canonical page: https://markscanlon.co/publications/ChatGPTforDigitalForensics.html DOI: https://doi.org/10.1016/j.fsidi.2023.301609 PDF: https://markscanlon.co/publications/ChatGPTforDigitalForensics.pdf Authors: Mark Scanlon; Frank Breitinger; Christopher Hargreaves; Jan-Niclas Hilgert; John Sheppard Venue: Forensic Science International: Digital Investigation Publication date: 2023/01/01 Contribution summary: This paper assesses the impact of ChatGPT on digital forensics, evaluating its capabilities and risks in various use cases, including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. The study highlights both the potential benefits and limitations of using ChatGPT in digital forensic investigations, concluding that it can be a useful supporting tool for knowledgeable users but requires careful consideration of its strengths and weaknesses. Abstract: The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances. ## Deep Learning Based Network Intrusion Detection System for Resource-Constrained Environments Canonical page: https://markscanlon.co/publications/DLNIDS.html DOI: https://doi.org/10.1007/978-3-031-36574-4_21 PDF: https://markscanlon.co/publications/DLNIDS.pdf Authors: Syed Rizvi; Mark Scanlon; Jimmy McGibney; John Sheppard Venue: The 13th EAI International Conference on Digital Forensics and Cyber Crime Publication date: 2022/11/01 Contribution summary: This paper presents a deep learning-based network intrusion detection system (IDS) for resource-constrained environments. The proposed 1D-Dilated Causal Neural Network (1D-DCNN) model achieves high accuracy in detecting malicious attacks, outperforming existing deep learning approaches. The model's efficiency and effectiveness make it suitable for resource-constrained environments. Abstract: Network intrusion detection systems (IDS) examine network packets and alert system administrators and investigators to low-level security violations. In large networks, these reports become unmanageable. To create a flexible and effective intrusion detection systems for unpredictable attacks, there are several challenges to overcome. Much work has been done on the use of deep learning techniques in IDS; however, substantial computational resources and processing time are often required. In this paper, a 1D-Dilated Causal Neural Network (1D-DCNN) based IDS for binary classification is employed. The dilated convolution with a dilation rate of 2 is introduced to compensate the max pooling layer, preventing the information loss imposed by pooling and downsampling. The dilated convolution can also expand its receptive field to gather additional contextual data. To assess the efficacy of the suggested solution, experiments were conducted on two popular publicly available datasets, namely CIC-IDS2017 and CSE-CIC-IDS2018. Simulation outcomes show that the 1D-DCNN based method outperforms some existing deep learning approaches in terms of accuracy. The proposed model attained a high precision with malicious attack detection rate accuracy of 99.7% for CIC-IDS2017 and 99.98% for CSE-CIC-IDS2018. ## Data Exfiltration through Electromagnetic Covert Channel of Wired Industrial Control Systems Canonical page: https://markscanlon.co/publications/DataExfiltrationEM-SCA.html DOI: https://doi.org/10.3390/app13052928 PDF: https://markscanlon.co/publications/DataExfiltrationEM-SCA.pdf Authors: Shakthi Sachintha; Nhien-An Le-Khac; Mark Scanlon; Asanka P. Sayakkara Venue: Applied Sciences Publication date: 2022/10/01 Contribution summary: This study demonstrates a novel attack vector on industrial control systems (ICS) that leverages electromagnetic (EM) radiation from wired Ethernet connections to exfiltrate sensitive information. The attack exploits compromised firmware to encode data into packet transmission patterns, which are then captured and demodulated by an attacker's software-defined radio. This covert channel facilitates data exfiltration from up to two meters away with a 10 bps data rate. Abstract: Industrial control systems (ICS) often contain sensitive information related to the corresponding equipment being controlled and their configurations. Protecting such information is important to both the manufacturers and users of such ICSs. This work demonstrates an attack vector on industrial control systems where information can be exfiltrated through a electromagnetic (EM) radiation covert channel from the wired Ethernet connections commonly used by these devices. The attack leverages compromised firmware for the controller—capable of encoding sensitive/critical information into the wired network as packet transmission patterns. The EM radiation from the wired network’s communication is captured without direct physical interaction using a portable software-defined radio, and subsequently demodulated on the attacker’s computer. This covert channel facilitates the exfiltration of data from a distance of up to two metres with a data rate of 10 bps without any significant data loss. The nature of this covert channel demonstrates that having strong firewalls and network security. ## Application of Artificial Intelligence to Network Forensics: Survey, Challenges and Future Directions Canonical page: https://markscanlon.co/publications/AIforNetworkForensics.html DOI: https://doi.org/10.1109/ACCESS.2022.3214506 PDF: https://markscanlon.co/publications/AIforNetworkForensics.pdf Authors: Syed Rizvi; Mark Scanlon; Jimmy McGibney; John Sheppard Venue: IEEE Access Publication date: 2022/10/01 Contribution summary: This paper provides a comprehensive survey of the application of artificial intelligence (AI) in network forensics, including expert systems, machine learning, deep learning, and ensemble/hybrid approaches. It discusses the current challenges and future directions in network forensics, covering various application areas such as network traffic analysis, intrusion detection systems, and Internet-of-Things devices. Abstract: Network forensics focuses on the identification and investigation of internal and external network attacks, the reverse engineering of network protocols, and the uninstrumented investigation of networked devices. It lies at the intersection of digital forensics, incident response and network security. Network attacks exploit software and hardware vulnerabilities and communication protocols. The scope of a network forensic investigation can range from Internet-wide down to a single device’s network traffic. Network analysis tools (NATs) aid security professionals and law enforcement in the capturing, identification and analysis of network traffic. However, in most instances, the sheer volume of data to be analyzed is enormous and, despite some built-in NAT automation, the investigation of network traffic is often an arduous process. Furthermore, significant expert time remains wasted in the investigation of a high frequency of false positive alerting from automated systems. To address this globally impacting problem, artificial intelligence based approaches are becoming increasingly employed to automatically detect attacks and increase network traffic classification accuracy. This paper provides a comprehensive survey of the state-of-the-art in network forensics and the application of expert systems, machine learning, deep learning, and ensemble/hybrid approaches to a range of application areas in the field. These include network traffic analysis, intrusion detection systems, Internet-of-Things devices, cloud forensics, DNS tunneling, smart grid forensics, and vehicle forensics. In addition, the current challenges and future research directions for each of the aforementioned application areas is discussed. ## Security, Ethics and Privacy Issues in Remote Extended Reality for Education Canonical page: https://markscanlon.co/publications/SecurityEthicsXREducation.html DOI: https://doi.org/10.1007/978-981-99-4958-8_16 Authors: Muhammad Zahid Iqbal; Xuanhui Xu; Vivek Nallur; Mark Scanlon; Abraham G. Campbell Venue: Mixed Reality for Education Publication date: 2022/06/01 Contribution summary: This chapter explores security, ethics, and privacy concerns in remote extended reality learning environments, highlighting the need for a comprehensive approach to address these issues in immersive education. Abstract: This chapter examines security, ethics, and privacy considerations for remote extended reality learning environments. ## A Novel Dictionary Generation Methodology for Contextual-Based Password Cracking Canonical page: https://markscanlon.co/publications/MethodologyContextual-BasedPasswordCracking.html DOI: https://doi.org/10.1109/ACCESS.2022.3179701 PDF: https://markscanlon.co/publications/MethodologyContextual-BasedPasswordCracking.pdf Authors: Aikaterini Kanta; Iwen Coisel; Mark Scanlon Venue: IEEE Access Publication date: 2022/06/01 Contribution summary: This paper introduces a novel dictionary generation methodology for contextual-based password cracking, enabling the creation of custom dictionary word lists for dictionary-based password cracking attacks. The approach leverages contextual information encountered during an investigation, such as user habits and personal information, to generate targeted password candidates. This methodology has the potential to expedite password cracking processes in law enforcement investigations. Abstract: It has been more than 50 years since the concept of passwords was introduced and adopted in our society as a digital authentication method. Despite alternative authentication methods being developed later, it is reasonable to assume that this prevailing authentication method will not fall out of popularity anytime soon. Naturally, each password is closely connected to its creator. This connection has given rise to advanced techniques aimed at exploiting user habits for password cracking. Such techniques are often generic approaches that leverage large datasets of human-created passwords. Recent research has underlined the influence that context can have during password selection for a user. This information could be of significant added value when digital investigators need to target a specific user or group of users during a criminal investigation. There are no automated approaches that can extract and utilize contextual information during the password cracking processes. In this paper, a methodology and framework for creating custom dictionary word lists for dictionary-based password cracking attacks are introduced, with a specific focus on leveraging contextual information encountered during an investigation. Furthermore, a detailed explanation of the framework’s implementation is provided, and the benefits of the approach are demonstrated with the use of test cases. ## PCWQ: A Framework for Evaluating Password Cracking Wordlist Quality Canonical page: https://markscanlon.co/publications/PasswordCrackingWordlistQuality.html DOI: https://doi.org/10.1007/978-3-031-06365-7_10 PDF: https://markscanlon.co/publications/PasswordCrackingWordlistQuality.pdf Authors: Aikaterini Kanta; Iwen Coisel; Mark Scanlon Venue: The 12th EAI International Conference on Digital Forensics and Cyber Crime Publication date: 2021/12/01 Contribution summary: This paper presents PCWQ, a novel framework for evaluating the quality of password cracking wordlists. The framework assesses wordlists based on several interconnecting metrics, including final percentage of passwords cracked, number of guesses until target, progress over time, size of wordlist, and better performance with stronger passwords. The authors conduct a preliminary analysis to demonstrate the framework's evaluation process. Abstract: The persistence of the single password as a method of authentication has driven both the efforts of system administrators to nudge users to choose stronger, safer passwords and elevated the sophistication of the password cracking methods chosen by their adversaries. In this constantly moving landscape, the use of wordlists to create smarter password cracking candidates begs the question of whether there is a way to assess which is better. In this paper, we present a novel modular framework to measure the quality of input wordlists according to several interconnecting metrics. Furthermore, we have conducted a preliminary analysis where we assess different input wordlists to showcase the framework's evaluation process. ## Identifying Internet of Things Software Activities using Deep Learning-based Electromagnetic Side-Channel Analysis Canonical page: https://markscanlon.co/publications/IoT-DL-EMSCA.html DOI: https://doi.org/10.1016/j.fsidi.2021.301308 PDF: https://markscanlon.co/publications/IoT-DL-EMSCA.pdf Authors: Quan Le; Luis Miralles-Pechuán; Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2021/12/01 Contribution summary: This study explores the application of machine learning techniques to identify complex activities on IoT devices using electromagnetic side-channel analysis. The researchers created a dataset by running ten sorting algorithms on an Arduino device and used it to train various classification models, including deep learning models. The results show that convolutional neural networks can accurately predict the activity being executed with a high level of accuracy (99.6%). Abstract: Internet of Things (IoT) is becoming the new frontier in digital forensics due to the abundance of IoT devices appearing in day-to-day life. The diversity and complexity of IoT ecosystems pose a considerable challenge to digital investigators that demand novel approaches. Electromagnetic side-channel analysis (EM-SCA) has been proposed as a promising window to gather forensically useful information from IoT devices. Machine Learning (ML) techniques are instrumental when performing EM-SCA on IoT devices. Our work aims to investigate how machine learning can be applied to accurately identify complex activities on IoT devices from their generated electromagnetic noises. To this end, a range of classification models were created, including deep learning models, to predict the activity from the electromagnetic noise emitted while the device performed the activities. A dataset was generated by using ten different well-known sorting algorithms with diverse computational time complexities and running them on an Arduino Leonardo device to represent a low-powered IoT device. The algorithms were continually sorting arrays of 100 elements randomly generated in ascending order. Experiments were conducted to identify which ML methods performed better with the generated data sets. Furthermore, more experiments were conducted to identify how the methods perform depending on the window size of raw samples and the number of examples against which they are trained. From the experimental results, it is possible to predict which activity is being executed with a high level of accuracy (99.6%) with a convolutional neural network (CNN). It was also found that Random Forests (RF) and Deep Learning (DL) are suitable ML models for making predictions with EM-SCA. ## How Viable is Password Cracking in Digital Forensic Investigation? Analyzing the Guessability of over 3.9 Billion Real-World Accounts Canonical page: https://markscanlon.co/publications/PasswordCracking3BillionAccounts.html DOI: https://doi.org/10.1016/j.fsidi.2021.301186 PDF: https://markscanlon.co/publications/PasswordCracking3BillionAccounts.pdf Authors: Aikaterini Kanta; Sein Coray; Iwen Coisel; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2021/07/01 Contribution summary: This study analyzed over 3.9 billion real-world passwords to assess their guessability and identify patterns in password construction. The analysis reveals that certain semantic classes are more common than others, indicating the importance of user context in password selection. The study also evaluates the effectiveness of password cracking tools and techniques, providing insights for digital investigators. Abstract: Passwords have been and still remain the most common method of authentication in computer systems. These systems are therefore privileged targets of attackers, and the number of data breaches in the last few years attests to that. A detailed analysis of such data can provide insight on password trends and patterns users follow when they create a password. To this end, this paper presents the largest and most comprehensive analysis of real-world passwords to date - associated with over 3.9 billion accounts from Have I Been Pwned. This analysis includes statistics on use and most common patterns found in passwords and innovates with a breakdown of the constituent fragments that make each password. Furthermore, a classification of these fragments according to their semantic meaning, provides insight on the role of context in password selection. Finally, we provide an in-depth analysis on the guessability of these real-world passwords. ## Digital Forensics: Leveraging Deep Learning Techniques in Facial Images to Assist Cybercrime Investigations Canonical page: https://markscanlon.co/publications/PhDThesis-DeepLearningFacialImageCybercrime.html PDF: https://markscanlon.co/publications/PhDThesis-DeepLearningFacialImageCybercrime.pdf Authors: Felix Anda Venue: School of Computer Science, University College Dublin Publication date: 2021/05/01 Contribution summary: This PhD thesis presents a novel approach to facial age estimation using deep learning techniques to assist cybercrime investigations. The research addresses the digital forensic backlog by proposing age estimation models that surpass the state-of-the-art facial age detectors for subjects under 25. The study evaluates the performance of various image pre-processing techniques, neural network architectures, and hyper-parameter optimisation strategies. Abstract: We are living in a digital era where most transactions are contact-less, social media plat-forms are commonplace and a part of our daily life is recorded either in a permissive or surreptitious manner. Whether we are present in an online meeting, daily social media feed, a peer-connected calendar, a live gaming or video stream, hundreds of bytes of our information are sent through a network to a server. The exponential growth of storage is also enabling thousands of multimedia content to be stored locally on digital devices but at the same time challenging digital investigations that are hampered by the accumulation of such devices that were stored in a forensic laboratory awaiting to be processed by an expert in a timely manner. The size and amount of information that requires analysis is increasing, leading to an ungovernable digital forensic backlog. Smartphone users are able to produce original content such as audio, images and videos, and thanks to the internet, are able to broadcast data worldwide in a matter of seconds. Digital forensic practitioners have become overwhelmed by the amount of data that they encounter and are requiring the implementation of artificial intelligence as tools and techniques to aid investigations, to discover, gather and analyse records swiftly. To address the digital forensic backlog, the creation of age estimation models to assist digital forensic investigations has been proposed. Although some models perform well for the entire age range, in certain age ranges such as the underage group, the models perform wholly inadequate. Influencing factors on underage age estimation have been evaluated and it has been determined that certain elements have strong, mild or weak correlations with the machine-predicted performance. These considerations are key on the curation of datasets and will yield better results on future trained models. The largest underage dataset with age and gender labels has been collected and several models have been experimented with different image pre-processing techniques, neural network architectures, etc. Hyper-parameter optimisation was introduced and the best score for facial age estimation was obtained. The scores were evaluated with a chosen test dataset that contains faces that can be spotted by well-known face detectors, such as Viola Jones. A novel facial embedding approach was proposed and a distribution evaluation metric was introduced instead of a single value. The performance achieved surpasses the state-of-the-art facial age detectors for subjects under the age of 25. ## Vec2UAge: Enhancing Underage Age Estimation Performance through Facial Embeddings Canonical page: https://markscanlon.co/publications/Vec2UAge.html DOI: https://doi.org/10.1016/j.fsidi.2021.301119 PDF: https://markscanlon.co/publications/Vec2UAge.pdf Authors: Felix Anda; Edward Dixon; Elias Bou-Harb; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2021/03/01 Contribution summary: This paper presents Vec2UAge, a novel regression-based model for estimating the age of underage individuals from facial embeddings. The model is trained on the VisAGe and Selfie-FV datasets and achieves a mean absolute error rate of 2.36 years. The authors evaluate the impact of random initializations, optimizers, and learning rates on the model's performance. Abstract: Automated facial age estimation has drawn increasing attention in recent years. Several applications relevant to digital forensic investigations include the identification of victims, suspects and missing children, and the decrease of investigators' exposure to psychologically impacting material. Nevertheless, due to the lack of accurately labelled age datasets, particularly for the underage age range, sufficient performance accuracy remains a major challenge in the field of age estimation. To address the problem, a novel regression-based model was created, Vec2UAge. FaceNet embeddings were extracted and used as feature vectors to train the model from the VisAGe and Selfie-FV datasets. A balanced, unbiased dataset was created for testing and validation. Data augmentation techniques were evaluated to further be used to expand the training dataset. The learning rate (lr) is one of the most important hyper-parameters for deep neural networks; a cyclic learning rate approach was used to find the optimal initial value for lr and the performance was evaluated. The distribution of model performance was presented per optimiser and one of the winning models with a Stochastic Weight Averaging (SWA) optimised training run reached a mean absolute error rate as low as 2.36 years. Additionally, the time of convergence using SWA was significantly faster than other optimisers evaluated, i.e., ADAGRAD, ADAM and Stochastic Gradient Descent. The evaluation model metric is presented in a form of a distribution rather than a single value, giving more insights into the effects of the random initialisations, optimisers and the learning rate on the outcome. ## TraceGen: User Activity Emulation for Digital Forensic Test Image Generation Canonical page: https://markscanlon.co/publications/TraceGen.html DOI: https://doi.org/10.1016/j.fsidi.2021.301133 PDF: https://markscanlon.co/publications/TraceGen.pdf Authors: Xiaoyu Du; Christopher Hargreaves; John Sheppard; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2021/03/01 Contribution summary: This paper presents TraceGen, an automated system for generating realistic digital forensic test images through user activity emulation. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. TraceGen aims to address the issue of emulating user activities and behaviours, ensuring forensically realistic traces are created in the resulting test images. Abstract: Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals' personal data. The creation of synthetic digital forensic test images typically involves an arduous, time-consuming process of manually performing a list of actions, or following a `story' to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or `background noise' on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic. This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user's behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts. ## A Comparative Study of Support Vector Machine and Neural Networks for File Type Identification Using n-gram Analysis Canonical page: https://markscanlon.co/publications/FileIdentification.html DOI: https://doi.org/10.1016/j.fsidi.2021.301121 PDF: https://markscanlon.co/publications/FileIdentification.pdf Authors: Joachim Sester; Darren Hayes; Nhien-An Le-Khac; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2021/03/01 Contribution summary: This study compares the performance of Support Vector Machines (SVMs) and Neural Networks (NNs) for file type identification using n-gram analysis. The authors investigate the influence of input parameters, such as learning rate and n-gram values, on the results and compare the scalability of SVMs and NNs. The study finds that SVM-based approaches perform better than NNs, but their scalability is still a challenge. Abstract: File type identification (FTI) has become a major discipline for anti-virus developers, firewall designers and for forensic cybercrime investigators. Over the past few years, research has seen the introduction of several classifiers and features. One of these advances is the so-called n-grams analysis, which is an interpretation of statistical counting in fragments classified. Recently, n-grams based approaches were already successfully combined with computational intelligence classifiers. However, the academic body of literature is scant when it comes to a comprehensive explanation of machine learning based approaches such as neural networks (NN) or support vector machines (SVM). For example, how the input parameters, including learning rate, different values of n for n-grams, etc. influence the results. In addition, very few studies have compared the scalability of NN vs. SVM approaches. Therefore, a systematic research in comparing different approaches is needed to address these questions. Hence, this paper investigates this type of comparison, by focusing on the n-gram analysis as a feature for the two different classifiers: SVMs and NNs. This paper details our experiments with two NNs and four SVMs, using linear kernels and RBF kernels on RealDC datasets. In general, we found that SVM-based approaches performed better than the NN, but their scalability is still a challenge. ## On Offloading Network Forensic Analytics to Programmable Data Plane Switches Canonical page: https://markscanlon.co/publications/NetworkForensicAnalytics.html PDF: https://markscanlon.co/publications/NetworkForensicAnalytics.pdf Authors: Kurt Friday; Elias Bou-Harb; Jorge Crichigno; Mark Scanlon; Nicole Beebe Venue: Book Series: World Scientific Series in Digital Forensics and Cybersecurity Publication date: 2021/01/01 Contribution summary: This paper proposes a novel approach to network forensic analytics by leveraging programmable data plane switches to detect and mitigate Distributed Denial of Service (DDoS) attacks and Internet of Things (IoT) device misuse. The authors implement two switch-based use cases to conduct network forensics at line rate, reducing latency and improving incident response. Abstract: The extent to which cyber crimes are now being executed has reached a frequency that has never been observed before. To detect these events and extract relevant network artifacts for investigations, network forensics has long been the de-facto approach. However, the time and data storage necessary to perform traditional forensic procedures has put investigators at odds, often resulting in substantial artifact extraction latency and poor incident response. To mitigate what have now become inherent pitfalls for the forensics community, we propose a novel means of transforming network forensics to a procedure that functions at line rate, while the event of interest is taking place, by harnessing the new-found programmable switch technology. Amid the prevailing cybercrime themes dominating today’s headlines are Distributed Denial of Service (DDoS) activities and the misuse of Internet of Things (IoT) devices. To this end, we implement two switch-based use cases for conducting the relevant network forensics associated with each of these classes of misdemeanors, respectively. In particular, the first use case employs dynamic thresholds generated from real-time artifact statistics extracted by the switch to infer contemporary DDoS attacks. The empirical results confirm that the proposed approach mitigates UDP amplification at line rate and SYN flooding attacks within a fraction of a second. Moreover, the complete remediation time of slow DDoS is reduced from near 10 seconds down to 2 seconds. The second use case instruments the switch with a rule-based Projective Adaptive Resonance Theory (PART) algorithm to accurately fingerprinting the origin IoT device of network traffic from a single TCP packet at line rate. We also provide a methodology for automating the translation of such rule-based Machine Learning (ML) output to P4 programs, thereby enabling its deployment without the need for additional background expertise. The proposed fingerprinting engine was evaluated against a dataset consisting of devices of both IoT and non-IoT in nature. The results indicate that such devices can be fingerprinted with 99% accuracy. It is our hope that the research undertaken herein not only aids in the conducting of efficient and effective network forensic procedures associated with DDoS attacks and IoT devices but also in promoting the utilization of programmable switches in future forensic research endeavors. Furthermore, we expect that the proposed approach’s automated translation of rule-based classifiers into P4 code will provoke the subsequent harnessing of ML’s pattern recognition abilities for enhancing a number of other network forensic tasks on the switch. ## A Survey Exploring Open Source Intelligence for Smarter Password Cracking Canonical page: https://markscanlon.co/publications/SurveyOSINTPasswordCracking.html DOI: https://doi.org/10.1016/j.fsidi.2020.301075 PDF: https://markscanlon.co/publications/SurveyOSINTPasswordCracking.pdf Authors: Aikaterini Kanta; Iwen Coisel; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2020/12/01 Contribution summary: This paper explores the potential of Open Source Intelligence (OSINT) for more efficient password cracking in digital investigations. A comprehensive survey of password strength, cracking, and OSINT is presented, along with an analysis of password structure and demographic factors influencing password selection. The authors discuss the challenges of password cracking and the potential impact of OSINT on law enforcement. Abstract: From the end of the last century to date, consumers are increasingly living their lives online. In today’s world, the average person spends a significant proportion of their time connecting with people online through multiple platforms. This online activity results in people freely sharing an increasing amount of personal information e as well as having to manage how they share that information. For law enforcement, this corresponds to a slew of new sources of digital evidence valuable for digital forensic investigation. A combination of consumer level encryption becoming default on personal computing and mobile devices and the need to access information stored with third parties has resulted in a need for robust password cracking techniques to progress lawful investigation. However, current password cracking techniques are expensive, time-consuming processes that are not guaranteed to be successful in the time-frames common for investigations. In this paper, the potential for Open Source Intelligence (OSINT) being leveraged for more efficient password cracking is explored. A comprehensive survey of the literature on password strength, password cracking, and OSINT is outlined, and the law enforcement challenges surrounding these topics are discussed. Additionally, an analysis on password structure as well as demographic factors influencing password selection is presented. Finally, the potential impact of OSINT to password cracking by law enforcement is discussed. ## Retracing the Flow of the Stream: Investigating Kodi Streaming Services Canonical page: https://markscanlon.co/publications/Kodi-XBMC-Forensics.html DOI: https://doi.org/10.1007/978-3-030-68734-2_13 PDF: https://markscanlon.co/publications/Kodi-XBMC-Forensics.pdf Authors: Samuel Todd Bromley; John Sheppard; Mark Scanlon; Nhien-An Le-Khac Venue: The 11th EAI International Conference on Digital Forensics and Cyber Crime Publication date: 2020/09/01 Contribution summary: This paper presents a new method for quickly locating Kodi artifacts and gathering information for successful prosecution of digital piracy and streaming of illegal content. The approach is evaluated on Windows, Android, and Linux platforms, demonstrating the location of file artifacts, databases, and viewed content history. Abstract: Kodi is of one of the world's largest open-source streaming platforms for viewing video content. Easily installed Kodi add-ons facilitate access to online pirated videos and streaming content by facilitating the user to search and view copyrighted videos with a basic level of technical knowledge. In some countries, there have been paid child sexual abuse organizations publishing/streaming child abuse material to an international paying clientele. Open source software used for viewing videos from the Internet, such as Kodi, is being exploited by criminals to conduct their activities. In this paper, we describe a new method to quickly locate Kodi artifacts and gather information for a successful prosecution. We also evaluate our approach on different platforms; Windows, Android and Linux. Our experiments show the file location, artifacts and a history of viewed content including their locations from the Internet. Our approach will serve as a resource to forensic investigators to examine Kodi or similar streaming platforms. ## Electromagnetic Side-Channel Analysis Methods for Digital Forensics on Internet of Things Canonical page: https://markscanlon.co/publications/PhDThesis-ElectromagneticSideChannelAnalysisIoT.html PDF: https://markscanlon.co/publications/PhDThesis-ElectromagneticSideChannelAnalysisIoT.pdf Authors: Asanka Sayakkara Venue: School of Computer Science, University College Dublin Publication date: 2020/09/01 Contribution summary: This thesis explores the potential of leveraging Electromagnetic Side-Channel Analysis (EM-SCA) as a forensic evidence acquisition method for Internet of Things (IoT) devices. A model for IoT forensics using EM-SCA methods is formulated, enabling investigators to perform complex forensic insight-gathering procedures without expertise in EM-SCA. A proof-of-concept, EMvidence, is implemented as an open-source software framework, utilizing a modular architecture to extract specific forensic insights from IoT devices. The thesis presents methods for acquiring forensic insights, including detecting cryptography-related events, firmware version, and malicious modifications to the firmware. Machine Learning algorithms are used to automatically identify known patterns of EM radiation with over 90% accuracy. Abstract: Modern legal and corporate investigations heavily rely on the field of digital forensics to uncover vital evidence. The dawn of the IoT devices has expanded this horizon by providing new kinds of evidence sources that were not available in traditional digital forensics. However, unlike desktop and laptop computers, the bespoke hardware and software employed on most IoT devices obstructs the use of classical digital forensic evidence acquisition methods. This situation demands alternative approaches to forensically inspect IoT devices. EMSA is a branch in information security that exploits EM radiation of computers to eavesdrop and exfiltrate sensitive information. A multitude of EMSCA methods have been demonstrated to be effective in attacking computing systems under various circumstances. The objective of this thesis is to explore the potential of leveraging EMSCA as a forensic evidence acquisition method for IoT devices. Towards this objective, this thesis formulates a model for IoT forensics that uses EMSCA methods. The design of the proposed model enables the investigators to perform complex forensic insight-gathering procedures without having expertise in the field of EMSCA. In order to demonstrate the function of the proposed model, a proof-of-concept was implemented as an open-source software framework called EMvidence. This framework utilises a modular architecture following a Unix philosophy; where each module is kept minimalist and focused on extracting a specific forensic insight from a specific IoT device. By doing so, the burden of dealing with the diversity of the IoT ecosystem is distributed from a central point into individual modules. Under the proposed model, this thesis presents the design, the implementation, and the evaluation of a collection of methods that can be used to acquire forensic insights from IoT devices using their EM radiation patterns. These forensic insights include detecting cryptography-related events, firmware version, malicious modifications to the firmware, and internal forensic state of the IoT devices. The designed methods utilise supervised ML algorithms at their core to automatically identify known patterns of EM radiation with over 90% accuracy. In practice, the forensic inspection of IoT devices using EMSCA methods may often be conducted during triage examination phase using moderately-resourced computers, such as laptops carried by the investigators. However, the scale of the EM data generation with fast sample rates and the dimensionality of EM data due to large bandwidths necessitate rich computational resources to process EM datasets. This thesis explores two approaches to reduce such overheads. Firstly, a careful reduction of the sample rate is found to be reducing the generated EM data up to 80%. Secondly, an intelligent channel selection method is presented that drastically reduces the dimensionality of EM data by selecting 500 dimensions out of 20,000.The findings of this thesis paves the way to the noninvasive forensic insight acquisition from IoT devices. With IoT systems increasingly blending into the day-to-day life, the proposed methodology has the potential to become the lifeline of future digital forensic investigations. A multitude of research directions are outlined, which can strengthen this novel approach in the future. ## Alleviating the Digital Forensic Backlog: A Methodology for Automated Digital Evidence Processing Canonical page: https://markscanlon.co/publications/PhDThesis-MethodologyAutomatedDigitalEvidenceProcessing.html PDF: https://markscanlon.co/publications/PhDThesis-MethodologyAutomatedDigitalEvidenceProcessing.pdf Authors: Xiaoyu Du Venue: School of Computer Science, University College Dublin Publication date: 2020/09/01 Contribution summary: This PhD thesis proposes a methodology for alleviating the digital forensic backlog through automated digital evidence processing. The research leverages data deduplication and automated analysis techniques to reduce redundant digital evidence data handling, enabling faster and more efficient investigations. Abstract: The ever-increasing volume of data in digital forensic investigation is one of the most dis- cussed challenges in the field. Severe, case-hindering digital evidence backlogs have become commonplace in law enforcement agencies throughout the world. The objective of the re- search outlined as part of this thesis is to help alleviate the backlog through automated digital evidence processing. This is achieved by reducing or eliminating, redundant digital evidence data handling through leveraging data deduplication and automated analysis techniques. This helps avoid the repeated re-acquisition, re-storage, and re-analysis of common evidence during investigations. This thesis describes a deduplicated evidence processing framework designed with a Digital Forensic as a Service Framework (DFaaS) paradigm in mind. In the proposed system, prior to the acquisition, artefacts are hashed and compared with a centralised database of previously analysed files to identify common files. Moreover, this process facilitates known pertinent artefacts to be detected at the earliest stage possible in the investigation, i.e., during the acquisition step. The proposed methodology includes a novel, forensically-sound entire disk image reconstruction technique from a deduplicated evidence acquisition system. That is to say, reconstructed disk hashes match the source device without having to acquire all artefacts directly from it. This enables remote disk acquisitions to be possible faster than the network throughput. Known, i.e., previously encountered, pertinent artefacts identified during the acquisition stage are then used for training machine learning models to create a relevancy score for the unknown, i.e., previously unencountered, file artefacts. The proposed technique generates a relevancy score for file similarity using each artefact’s file system metadata and associated timeline events. The file artefacts are subsequently ordered by these relevancy scores to focus the investigator towards the analysis of artefacts most likely to be relevant to the case first. ## SoK: Exploring the State of the Art and the Future Potential of Artificial Intelligence in Digital Forensic Investigation Canonical page: https://markscanlon.co/publications/SoK-AI-Forensics.html DOI: https://doi.org/10.1145/3407023.3407068 PDF: https://markscanlon.co/publications/SoK-AI-Forensics.pdf Authors: Xiaoyu Du; Chris Hargreaves; John Sheppard; Felix Anda; Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: The 13th International Workshop on Digital Forensics (WSDF), held at the 15th International Conference on Availability, Reliability and Security (ARES) Publication date: 2020/08/01 Contribution summary: This systematic overview of artificial intelligence (AI) in digital forensic investigation explores the current state of the art and future potential of AI in expediting digital forensic analysis and increasing case processing capacities. The authors discuss AI applications in data discovery, device triage, and other areas, highlighting current challenges and future directions. Abstract: Multi-year digital forensic backlogs have become commonplace in law enforcement agencies throughout the globe. Digital forensic investigators are overloaded with the volume of cases requiring their expertise compounded by the volume of data to be processed. Artificial intelligence is often seen as the solution to many big data problems. This paper summarises existing artificial intelligence based tools and approaches in digital forensics. Automated evidence processing leveraging artificial intelligence based techniques shows great promise in expediting the digital forensic analysis process while increasing case processing capacities. For each application of artificial intelligence highlighted, a number of current challenges and future potential impact is discussed. ## Facilitating Electromagnetic Side-Channel Analysis for IoT Investigation: Evaluating the EMvidence Framework Canonical page: https://markscanlon.co/publications/EvaluatingEMvidence.html DOI: https://doi.org/10.1016/j.fsidi.2020.301003 PDF: https://markscanlon.co/publications/EvaluatingEMvidence.pdf Authors: Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2020/07/01 Contribution summary: This paper presents the EMvidence framework, a software tool that facilitates electromagnetic side-channel analysis for IoT investigation. The framework automates and simplifies the process of acquiring and analyzing electromagnetic signals from IoT devices, making it accessible to digital forensic investigators without specialized equipment or expertise. Abstract: The Internet of Things (IoT) has opened up new opportunities for digital forensics by providing new sources of evidence. However, acquiring data from IoT is not a straightforward task for multiple reasons including the diversity of manufacturers, the lack of standard interfaces, the use of light-weight data encryption, e.g. elliptic curve cryptography (ECC), etc. Electromagnetic side-channel analysis (EM-SCA) has been proposed as a new approach to acquire forensically useful data from IoT devices. However, performing successful EM-SCA attacks on IoT devices requires domain knowledge and specialised equipment that are not available to most digital forensic investigators. This work presents the methodology behind and an evaluation of a framework, EMvidence, that enables forensic investigators to acquire evidence from IoT devices through EM-SCA. This framework helps to automate and perform electromagnetic side-channel evidence collection for forensic purposes. An evaluation of the framework is performed by applying it to multiple realistic digital investigation scenarios. In the case of attacking ECC cryptographic operations, the evaluation demonstrates that the volume of EM data that needs to be stored and processed can be significantly reduced using the framework's machine learning based approach. ## Smarter Password Guessing Techniques Leveraging Contextual Information and OSINT Canonical page: https://markscanlon.co/publications/SmarterPasswordGuessing.html DOI: https://doi.org/10.1109/CyberSecurity49315.2020.9138870 PDF: https://markscanlon.co/publications/SmarterPasswordGuessing.pdf Authors: Aikaterini Kanta; Iwen Coisel; Mark Scanlon Venue: 6th IEEE International Conference on Cyber Security and Protection of Digital Services (Cyber Security) Publication date: 2020/06/01 Contribution summary: This paper proposes smarter password guessing techniques that leverage contextual information and Open Source Intelligence (OSINT) to improve password recovery rates. The authors explore the use of OSINT to gather information about a suspect's online and offline life, which can be used to make educated guesses about their password. The research aims to create a bespoke, personalized dictionary list to feed into password cracking tools. Abstract: In recent decades, criminals have increasingly used the web to research, assist and perpetrate criminal behaviour. One of the most important ways in which law enforcement can battle this growing trend is through accessing pertinent information about suspects in a timely manner. A significant hindrance to this is the difficulty of accessing any system a suspect uses that requires authentication via password. Password guessing techniques generally consider common user behaviour while generating their passwords, as well as the password policy in place. Such techniques can offer a modest success rate considering a large/average population. However, they tend to fail when focusing on a single target - especially when the latter is an educated user taking precautions as a savvy criminal would be expected to do. Open Source Intelligence is being increasingly leveraged by Law Enforcement in order to gain useful information about a suspect, but very little is currently being done to integrate this knowledge in an automated way within password cracking. The purpose of this research is to delve into the techniques that enable the gathering of the necessary textitcontext about a suspect and find ways to leverage this information within password guessing techniques. ## Automated Artefact Relevancy Determination from Artefact Metadata and Associated Timeline Events Canonical page: https://markscanlon.co/publications/ArtefactRelevancy.html DOI: https://doi.org/10.1109/CyberSecurity49315.2020.9138874 PDF: https://markscanlon.co/publications/ArtefactRelevancy.pdf Authors: Xiaoyu Du; Quan Le; Mark Scanlon Venue: The 6th IEEE International Conference on Cyber Security and Protection of Digital Services (Cyber Security) Publication date: 2020/06/01 Contribution summary: This paper presents an approach for automated artefact relevancy determination from artefact metadata and associated timeline events. The method uses a relevancy score to rank file artefacts by likely relevance, based on data reduction techniques and machine learning models. The approach is validated through experimentation with three emulated investigation scenarios, demonstrating its potential to aid investigators in the discovery and prioritisation of evidence. Abstract: Case-hindering, multi-year digital forensic evidence backlogs have become commonplace in law enforcement agencies throughout the world. This is due to an ever-growing number of cases requiring digital forensic investigation coupled with the growing volume of data to be processed per case. Leveraging previously processed digital forensic cases and their component artefact relevancy classifications facilitates the opportunity for training automated artificial intelligence based evidence processing systems to aid investigators in the discovery and prioritisation of evidence. This paper presents one approach for file artefact relevancy determination based on the growing move towards a centralised, Digital Forensics as a Service (DFaaS) paradigm. This approach enables the use of previously encountered illegal files to detect pertinent files in an investigation. Trained models can aid in the detection of these files during the acquisition stage, i.e., during their upload to a DFaaS system. The technique used is based on a relevancy score determined from file similarity using each artefact's filesystem metadata and associated timeline events. The approach presented is validated against three experimental usage scenarios. ## Assessing the Influencing Factors on the Accuracy of Underage Facial Age Estimation Canonical page: https://markscanlon.co/publications/AssessingInfluencingFactorsAgeEstimation.html DOI: https://doi.org/10.1109/CyberSecurity49315.2020.9138851 PDF: https://markscanlon.co/publications/AssessingInfluencingFactorsAgeEstimation.pdf Authors: Felix Anda; Brett Becker; David Lillis; Nhien-An Le-Khac; Mark Scanlon Venue: The 6th IEEE International Conference on Cyber Security and Protection of Digital Services (Cyber Security) Publication date: 2020/06/01 Contribution summary: This study evaluates the influencing factors on the accuracy of underage facial age estimation using two cloud services, Microsoft Azure's Face API and Amazon Web Service's Rekognition service. The analysis of the VisAGe dataset reveals correlations between facial attributes and age estimation errors, identifying the most significant factors to be addressed in future age estimation modeling. Abstract: Swift response to the detection of endangered minors is an ongoing concern for law enforcement with the rapid growth of disk capacities and data being stored in the cloud. Automated tools are needed to aid in CSEM investigation - both to expedite the evidence discovery process, while lessening the investigator's exposure to traumatic material. In these investigations, age estimation techniques show great promise in helping decrease the overflowing backlog of evidence obtained from the vast array of devices and online services. A lack of sufficient training data combined with natural human variance has been hindering accurate automated age estimation, especially for underage subjects. A comprehensive evaluation of the performance achieved on over 21,800 underage subjects with two cloud age estimation services is presented, namely Amazon Web Service's Rekognition service and Microsoft Azure's Face API. The objective of this work is to evaluate the influence that certain human biometric factors, facial expressions, and image quality, i.e., blur, noise, exposure and resolution, have on the outcome of automated age estimation services. The thorough evaluation of the correlation and effects of such factors aids the understanding of the performance and allows us to identify the most influencing factors to be overcome in future age estimation modelling. ## EMvidence: A Framework for Digital Evidence Acquisition from IoT Devices through Electromagnetic Side-Channel Analysis Canonical page: https://markscanlon.co/publications/EMvidence.html DOI: https://doi.org/10.1016/j.fsidi.2020.300907 PDF: https://markscanlon.co/publications/EMvidence.pdf Authors: Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2020/04/01 Contribution summary: This paper presents EMvidence, a software framework for digital forensic investigators to acquire evidence from IoT devices through electromagnetic side-channel analysis. The framework automates and performs electromagnetic side-channel evidence collection, making it a practical reality for digital forensic investigators. Abstract: Internet of Things (IoT) have opened up new opportunities to digital forensics by providing new evidence sources that were not available previously. However, acquiring data from IoT is not a straightforward task due to multiple reasons such as the diversity of manufacturers, lack of standard interfaces, and the use of light-weight data encryption, such as elliptic curve cryptography (ECC). Electromagnetic side-channel analysis (EM-SCA) has been proposed as a new approach to acquire forensically useful data in IoT devices. However, performing successful EM-SCA attacks on IoT devices require domain knowledge and specialised equipment that are not available to most digital forensic investigators.This work presents a methodology that enable forensic investigators to acquire evidence from IoT devices through EM-SCA. Implementing the methodology, a software framework is introduced called EMvidence that helps to automate and perform electromagnetic side-channel evidence collection. Evaluation of the framework is performed by applying it to multiple real-world digital investigation scenarios. In the case of attacking ECC cryptographic operations, the evaluation show that the amount of EM data that needs to be stored and processed can be significantly reduced with the assistance from machine learning. ## DeepUAge: Improving Underage Age Estimation Accuracy to Aid CSEM Investigation Canonical page: https://markscanlon.co/publications/UnderageFacialAgeEstimation.html DOI: https://doi.org/10.1016/j.fsidi.2020.300921 PDF: https://markscanlon.co/publications/UnderageFacialAgeEstimation.pdf Authors: Felix Anda; Nhien-An Le-Khac; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2020/04/01 Contribution summary: DeepUAge improves underage age estimation accuracy to aid Child Sexual Exploitation Material (CSEM) investigation. The model, trained on the VisAGe dataset, achieves state-of-the-art performance for age estimation of minors, with a mean absolute error (MAE) rate of 2.73 years. This work tackles the challenges of collecting and annotating underage facial age data, and its application can expedite digital investigations. Abstract: Age is a soft biometric trait that can aid law enforcement in the identification of victims of Child Sexual Exploitation Material (CSEM) creation/distribution. Accurate age estimation of subjects can classify explicit content possession as illegal during an investigation. Automation of this age classification has the potential to expedite content discovery and focus the investigation of digital evidence through the prioritisation of evidence containing CSEM. In recent years, artificial intelligence based approaches for automated age estimation have been created, and many public cloud service providers offer this service on their platforms. The accuracy of these algorithms have been improving over recent years. These existing approaches perform satisfactorily for adult subjects, but perform wholly inadequately for underage subjects. To this end, the largest underage facial age dataset, VisAGe, has been used in this work to train a ResNet50 based deep learning model, DeepUAge, that achieved state-of-the-art beating performance for age estimation of minors. This paper describes the design and implementation of this model. An evaluation, validation and comparison of the proposed model is performed against existing facial age classifiers resulting in the best overall performance for underage subjects. ## Cutting through the Emissions: Feature Selection from Electromagnetic Side-Channel Data for Activity Detection Canonical page: https://markscanlon.co/publications/EMSideChannelFeatureSelection.html DOI: https://doi.org/10.1016/j.fsidi.2020.300927 PDF: https://markscanlon.co/publications/EMSideChannelFeatureSelection.pdf Authors: Asanka Sayakkara; Luis Miralles; Nhien-An Le-Khac; Mark Scanlon Venue: Forensic Science International: Digital Investigation Publication date: 2020/04/01 Contribution summary: This paper presents a systematic methodology to identify information leaking frequency channels from high dimensional EM data using multiple filtering techniques and machine learning. The approach is evaluated on a dataset of EM signals from an IoT device, demonstrating its effectiveness in reducing the number of channels from 20,000 to less than 100, improving real-time analysis efficiency. Abstract: Electromagnetic side-channel analysis (EM-SCA) has been used as a window to eavesdrop on computing devices for information security purposes. It has recently been proposed to use as a digital evidence acquisition method in forensic investigation scenarios as well. The massive amount of data produced by EM signal acquisition devices makes it difficult to process them in real-time making on-sight EM-SCA nearly impossible. The uncertainty of exact information leaking frequency channel demands the investigator to acquire signals over a wide bandwidth. As a consequence, the investigators are left with a large number of potential frequency channels in EM data to be inspected, among them , many may not contain any information leakages at all. Under these circumstances, the identification of a small subset of frequency channels that leak sufficient amount of information can significantly boost the performance of real-time analysis of EM side-channel data. This work, presents a systematic methodology to identify information leaking frequency channels from high dimensional EM data with the help of multiple filtering techniques and machine learning. The evaluations show that it is possible to narrow down the number of frequency channels from over 20,000 to less than few hundreds that makes real-time EM data processing highly efficient. ## Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts Canonical page: https://markscanlon.co/publications/AutomatedClassificationArtefacts.html DOI: https://doi.org/10.1145/3339252.3340517 PDF: https://markscanlon.co/publications/AutomatedClassificationArtefacts.pdf Authors: Xiaoyu Du; Mark Scanlon Venue: The 12th International Workshop on Digital Forensics (WSDF), held at the 14th International Conference on Availability, Reliability and Security (ARES) Publication date: 2019/08/01 Contribution summary: This paper proposes a methodology for automatically prioritizing suspicious file artefacts in digital forensic investigations, leveraging a supervised machine learning approach and a toolkit for data extraction from disk images. The methodology aims to reduce manual analysis effort and improve the efficiency of the investigative process. Abstract: The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on the seized device are not relevant to the investigation. Manually retrieving suspicious file relevant to the investigation is like finding a needle in a haystack. In this paper, a methodology for automatic prioritising suspicious file artefacts (i.e., file artefacts that are relevant to the investigation) is proposed to reduce the manual work to be conducted. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion. ## Improving Borderline Adulthood Facial Age Estimation through Ensemble Learning Canonical page: https://markscanlon.co/publications/BorderlineAdulthoodAgeEstimation.html DOI: https://doi.org/10.1145/3339252.3341491 PDF: https://markscanlon.co/publications/BorderlineAdulthoodAgeEstimation.pdf Authors: Felix Anda; David Lillis; Aikaterini Kanta; Brett Becker; Elias Bou-Harb; Nhien-An Le-Khac; Mark Scanlon Venue: The 8th International Workshop on Cyber Crime (IWCC), held at the 14th International Conference on Availability, Reliability and Security (ARES) Publication date: 2019/08/01 Contribution summary: This paper presents an ensemble learning approach to improve facial age estimation for borderline adulthood cases. The authors develop a deep learning model (DS13K) and fine-tune it on the Deep Expectation (DEX) model to achieve an accuracy of 68% for the age group 16-17 years old, outperforming DEX by 4 times. The study also evaluates existing cloud-based facial age prediction services. Abstract: Achieving high performance for facial age estimation with subjects in the borderline between adulthood and non-adulthood has always been a challenge. Several studies have used different approaches from the age of a baby to an elder adult and different datasets to measure the mean absolute error (MAE) that has been oscillating between 1.47 to 8 years. The weakness of the algorithms specifically in the borderline has been a motivation for this paper. In our approach, we have developed an ensemble technique that improves the accuracy of underage estimation in conjunction with our deep learning model (DS13K) that has been fine-tuned on the Deep Expectation (DEX) model. We have achieved an accuracy of 68% for the age group 16 to 17 years old, which is 4 times better than the DEX accuracy for such age range. And we also present an evaluation of existing cloud-based and offline facial age prediction services such as Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net and DEX. ## Leveraging Electromagnetic Side-Channel Analysis for the Investigation of IoT Devices Canonical page: https://markscanlon.co/publications/LeveragingEMIoT.html DOI: https://doi.org/10.1016/j.diin.2019.04.012 PDF: https://markscanlon.co/publications/LeveragingEMIoT.pdf Authors: Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: Digital Investigation Publication date: 2019/07/01 Contribution summary: This paper presents a novel methodology to inspect the internal software activities of IoT devices through their electromagnetic radiation emissions during live device investigation. The approach uses electromagnetic side-channel analysis (EM-SCA) to detect software activities, including cryptographic algorithms and malicious modifications, with high accuracy. Abstract: Internet of Things (IoT) devices have expanded the horizon of digital forensic investigations by providing a rich set of new evidence sources. IoT devices includes health implants, sports wearables, smart burglary alarms, smart thermostats, smart electrical appliances, and many more. Digital evidence from these IoT devices is often extracted from third party sources, e.g., paired smartphone applications or the devices' back-end cloud services. However vital digital evidence can still reside solely on the IoT device itself. The specifics of the IoT device's hardware is a black-box in many cases due to the lack of proven, established techniques to inspect IoT devices. This paper presents a novel methodology to inspect the internal software activities of IoT devices through their electromagnetic radiation emissions during live device investigation. When a running IoT device is identified at a crime scene, forensically important software activities can be revealed through an electromagnetic side-channel analysis (EM-SCA) attack. By using two representative IoT hardware platforms, this work demonstrates that cryptographic algorithms running on high-end IoT devices can be detected with over $82$% accuracy, while minor software code differences in low-end IoT devices could be detected over $90$% accuracy using a neural network-based classifier. Furthermore, it was experimentally demonstrated that malicious modification of the stock firmware of an IoT device can be detected through machine learning-assisted EM-SCA techniques. These techniques provide a new investigative vector for digital forensic investigators to inspect IoT devices. ## A Survey of Electromagnetic Side-Channel Attacks and Discussion on their Case-Progressing Potential for Digital Forensics Canonical page: https://markscanlon.co/publications/SurveyEMSideChannelsForensics.html DOI: https://doi.org/10.1016/j.diin.2019.03.002 PDF: https://markscanlon.co/publications/SurveyEMSideChannelsForensics.pdf Authors: Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: Digital Investigation Publication date: 2019/07/01 Contribution summary: This paper surveys electromagnetic side-channel attacks and their potential for digital forensics on IoT devices. It discusses the challenges of analyzing encrypted data from IoT devices and explores the use of electromagnetic side-channel analysis to recover cryptographic keys and other sensitive information. Abstract: The increasing prevalence of Internet of Things (IoT) devices has made it inevitable that their pertinence to digital forensic investigations will increase into the foreseeable future. These devices produced by various vendors often posses limited standard interfaces for communication, such as USB ports or WiFi/Bluetooth wireless interfaces. Meanwhile, with an increasing mainstream focus on the security and privacy of user data, built-in encryption is becoming commonplace in consumer-level computing devices, and IoT devices are no exception. Under these circumstances, a significant challenge is presented to digital forensic investigations where data from IoT devices needs to be analysed. This work explores the electromagnetic (EM) side-channel analysis literature for the purpose of assisting digital forensic investigations on IoT devices. EM side-channel analysis is a technique where unintentional electromagnetic emissions are used for eavesdropping on the operations and data handling of computing devices. The non-intrusive nature of EM side-channel approaches makes it a viable option to assist digital forensic investigations as these attacks require, and must result in, no modification to the target device. The literature on various EM side-channel analysis attack techniques are discussed - selected on the basis of their applicability in IoT device investigation scenarios. The insight gained from the background study is used to identify promising future applications of the technique for digital forensic analysis on IoT devices - potentially progressing a wide variety of currently hindered digital investigations. ## Shining a light on Spotlight: Leveraging Apple's desktop search utility to recover deleted file metadata on macOS Canonical page: https://markscanlon.co/publications/SpotlightMacForensics.html DOI: https://doi.org/10.1016/j.diin.2019.01.019 PDF: https://markscanlon.co/publications/SpotlightMacForensics.pdf Authors: Tajvinder Singh Atwal; Mark Scanlon; Nhien-An Le-Khac Venue: Digital Investigation Publication date: 2019/04/01 Contribution summary: This study examines Apple's desktop search technology, Spotlight, to recover deleted file metadata on macOS. Researchers developed a novel approach to extract persistent records of deleted files directly from the Spotlight database and recover records from deleted database pages in unused space. The study provides a proof-of-concept implementation and discusses the forensic opportunities offered by recovering records for deleted files within the database and unused space on the filesystem. Abstract: Spotlight is a proprietary desktop search technology released by Apple in 2004 for its Macintosh operating system Mac OS X 10.4 (Tiger) and remains as a feature in current releases of macOS. Spotlight allows users to search for files or information by querying databases populated with filesystem attributes, metadata, and indexed textual content. Existing forensic research into Spotlight has provided an understanding of the metadata attributes stored within the metadata store database. Current approaches in the literature have also enabled the extraction of metadata records for extant files, but not for deleted files. The objective of this paper is to research the persistence of records for deleted files within Spotlight's metadata store, identify if deleted database pages are recoverable from unallocated space on the volume, and to present a strategy for the processing of discovered records. In this paper, the structure of the metadata store database is outlined, and experimentation reveals that records persist for a period of time within the database but once deleted, are no longer recoverable. The experimentation also demonstrates that deleted pages from the database (containing metadata records) are recoverable from unused space on the filesystem. ## Improving the Accuracy of Automated Facial Age Estimation to Aid CSEM Investigations Canonical page: https://markscanlon.co/publications/FacialAgeEstimationPoster.html DOI: https://doi.org/10.1016/j.diin.2019.01.024 PDF: https://markscanlon.co/publications/FacialAgeEstimationPoster.pdf Authors: Felix Anda; David Lillis; Aikaterini Kanta; Brett A. Becker; Elias Bou-Harb; Nhien-An Le-Khac; M. Scanlon Venue: Digital Investigation Publication date: 2019/04/01 Contribution summary: This study evaluates existing age prediction services and introduces a deep learning model, DS13K, to improve the accuracy of underage facial age estimation in child sexual exploitation material (CSEM) investigations. The model outperforms existing services, particularly in the borderline adulthood age range (16-17 years old), with an accuracy rate of 68%. Abstract: The investigation of violent crimes against individuals, such as the investigation of child sexual exploitation material (CSEM), is one of the more commonly encountered criminal investigation types throughout the world. While hash lists of known CSEM content are commonly used to identify previously encountered material on suspects’ devices, previously unencountered material requires expert, manual analysis and categorisation. The discovery, analysis, and categorisation of these digital images and videos has the potential to be significantly expedited with the use of automated artificial intelligence (AI) based techniques. Intelligent, automated evidence processing and prioritisation has the potential to aid investigators in alleviating some of the digital evidence backlogs that have become commonplace worldwide. In order for AI-aided CSEM investigations to be beneficial, the fundamental question when analysing multimedia content becomes how old is each subject encountered?. Our work presents the evaluation of existing cloud-based and offline age estimation services, introduces our deep learning model, DS13K, which was created with a VGG-16 Deep Convolutional Neural Network (CNN) architecture, and develops an ensemble technique that improves the accuracy of underage facial age estimation. In addition to our model, a number of existing services including Amazon Rekognition, Microsoft Azure Cognitive Services, How-Old.net, and Deep Expectation (DEX) were used to create an ensemble learning technique. It was found that for the borderline adulthood age range (i.e., 16-17 years old), our DS13K model substantially outperformed existing services, achieving a performance accuracy of 68%. A comparative examination of the obtained results allowed us to identify performance trends and issues inherent to each service/tool and develop ensemble techniques to improve the accuracy of automated adulthood determination. ## Solid State Drive Forensics: Where Do We Stand? Canonical page: https://markscanlon.co/publications/SSDForensics.html DOI: https://doi.org/10.1007/978-3-030-05487-8_8 PDF: https://markscanlon.co/publications/SSDForensics.pdf Authors: John Vieyra; Mark Scanlon; Nhien-An Le-Khac Venue: Digital Forensics and Cyber Crime Publication date: 2019/01/01 Contribution summary: This paper examines the current state of solid-state drive (SSD) forensics, addressing the challenges posed by SSDs' background data movement and garbage collection processes. The authors investigate the impact of TRIM, data volume, and powered-on time on data recovery and provide guidance on extracting artefacts from SSDs under various conditions. Abstract: With Solid State Drives (SSDs) becoming more and more prevalent in personal computers, some have suggested that the playing field has changed when it comes to a forensic analysis. Inside the SSD, data movement events occur without any user input. Recent research has suggested that SSDs can no longer be managed in the same manner when performing digital forensic examinations. In performing forensics analysis of SSDs, the events that take place in the background need to be understood and documented by the forensic investigator. These behind the scene processes cannot be stopped with traditional disk write blockers and have now become an acceptable consequence when performing forensic analysis. In this paper, we aim to provide some clear guidance as to what precisely is happening in the background of SSDs during their operation and investigation and also study forensic methods to extract artefacts from SSD under different conditions in terms of volume of data, powered effect, etc. In addition, we evaluate our approach with several experiments across various use-case scenarios. ## Enabling the Non-Expert Analysis of Large Volumes of Intercepted Network Traffic Canonical page: https://markscanlon.co/publications/NetworkIntell.html DOI: https://doi.org/10.1007/978-3-319-99277-8_11 PDF: https://markscanlon.co/publications/NetworkIntell.pdf Authors: Erwin van de Weil; Mark Scanlon; Nhien-An Le-Khac Venue: Advances in Digital Forensics XIV Publication date: 2018/08/01 Contribution summary: This paper proposes a novel approach to analyze large volumes of intercepted network traffic based on network metadata, reducing analysis duration and providing insights for non-technical investigators. The approach is tested with a large sample of network traffic data and can be used to identify devices and usage behind an internet connection. Abstract: In criminal investigations, telecommunication wiretaps have become a common technique used by law enforcement. While phone-based wiretapping is well documented and the procedure for their execution are well known, the same cannot be said for Internet taps. Lawfully intercepted network traffic often contains a lot of encrypted traffic making it increasingly difficult to find useful information inside the traffic captured. The advent of Internet-of-Things further complicates the process for non-technical investigators. The current level of complexity of intercepted network traffic is close to a point where data cannot be analysed without supervision of a digital investigator with advanced network knowledge. Current investigations focus on analysing all traffic in a chronological manner and are predominately conducted on the data contents of the intercepted traffic. This approach often becomes overly arduous when the amount of data to be analysed becomes very large. In this paper, we propose a novel approach to analyse large amounts of intercepted network traffic based on network metadata. Our approach significantly reduces the duration of the analysis and also produces an insight view of analysing results for the non-technical investigator. We also test our approach with a large sample of network traffic data. ## Deduplicated Disk Image Evidence Acquisition and Forensically-Sound Reconstruction Canonical page: https://markscanlon.co/publications/ForensicallySoundReconstruction.html DOI: https://doi.org/10.1109/TrustCom/BigDataSE.2018.00249 PDF: https://markscanlon.co/publications/ForensicallySoundReconstruction.pdf Authors: Xiaoyu Du; Paul Ledwith; Mark Scanlon Venue: Proceedings of the 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications (TrustCom-18) Publication date: 2018/08/01 Contribution summary: This paper presents a system for deduplicated disk image evidence acquisition and forensically-sound reconstruction, addressing the growing digital evidence backlog in law enforcement. The system enables automated, centralized acquisition and analysis, reducing storage and bandwidth requirements, and facilitating non-expert evidence processing. Abstract: The ever-growing backlog of digital evidence waiting for analysis has become a significant issue for law enforcement agencies throughout the world. This is due to an increase in the number of cases required digital forensic analysis coupled with the increasing volume of data to process per case. This has created a demand for a paradigm shift in the method that evidence is acquired, stored, and analyzed. The ultimate goal of the research presented in this paper is to revolutionize the current digital forensic process through the leveraging of centralized deduplicated acquisition and processing approach. Focusing on this first step in digital evidence processing, acquisition, a system is presented enabling deduplicated evidence acquisition with the capability of automated, forensically-sound complete disk image reconstruction. As the number of cases acquired by the proposed system increases, the more duplicate artifacts will be encountered, and the more efficient the processing of each new case will become. This results in a time saving for digital investigators, and provides a platform to enable non-expert evidence processing, alongside the benefits of reduced storage and bandwidth requirements. ## Cloud Investigations of Illegal IPTV Networks Canonical page: https://markscanlon.co/publications/IllegalIPTVNetworks.html DOI: https://doi.org/10.1109/TrustCom/BigDataSE.2018.00295 PDF: https://markscanlon.co/publications/IllegalIPTVNetworks.pdf Authors: John Sheppard Venue: Proceedings of the 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications (TrustCom-18) Publication date: 2018/08/01 Contribution summary: This paper examines the Kodi software ecosystem, focusing on its role in illegal IPTV networks. It identifies key roles in the Kodi community, including users, addon authors, and distributors, and explores the relationships between them. The study uses cloud evidence to connect devices to addon distributors and investigates networks among authors and distributors using GraphQL in the GitHub cloud. Abstract: Kodi software has gained much attention in recent years due to its powerful capabilities for streaming legal and illegal media sources. This has led to numerous court cases and media reports around piracy and copyright infringement. This paper examines some of the most popular Kodi video addons on a Raspberry Pi 3 running Open Source Media Center (OSMC). There are a variety of different roles involved in the Kodi community such as normal users, addon authors and distributors. This paper identifies and defines these key roles. It looks at the relationships between addons and their authors and distributors. It shows how cloud evidence can be used to connect devices to the addon distributors. It further investigates the networks found among these authors and distributors using GraphQL in the GitHub cloud. ## Accuracy Enhancement of Electromagnetic Side-channel Attacks on Computer Monitors Canonical page: https://markscanlon.co/publications/EMAttacksComputerMonitors.html DOI: https://doi.org/10.1145/3230833.3234690 PDF: https://markscanlon.co/publications/EMAttacksComputerMonitors.pdf Authors: Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: The Second International Workshop on Criminal Use of Information Hiding (CUING), part of the 13th International Conference on Availability, Reliability and Security (ARES) Publication date: 2018/08/01 Contribution summary: This paper investigates the accuracy of electromagnetic side-channel attacks on computer monitors, focusing on factors beyond sampling rate and bandwidth. The authors evaluate noise removal, image blending, and image quality adjustments to improve image reconstruction accuracy, exploring avenues for future improvements in EM side-channel attacks. Abstract: Electromagnetic noise emitted from running computer displays modulates information about the picture frames being displayed on screen. Attacks have been demonstrated on eavesdropping computer displays by utilising these emissions as a side-channel vector. The accuracy of reconstructing a screen image depends on the emission sampling rate and bandwidth of the attackers signal acquisition hardware. The cost of radio frequency acquisition hardware increases with increased supported frequency range and bandwidth. A number of enthusiast-level, affordable software defined radio equipment solutions are currently available facilitating a number of radio-focused attacks at a more reasonable price point. This work investigates three accuracy influencing factors, other than the sample rate and bandwidth, namely noise removal, image blending, and image quality adjustments, that affect the accuracy of monitor image reconstruction through electromagnetic side-channel attacks. ## Electromagnetic Side-Channel Attacks: Potential for Progressing Hindered Digital Forensic Analysis Canonical page: https://markscanlon.co/publications/EMSideChannelsForForensics.html DOI: https://doi.org/10.1145/3236454.3236512 PDF: https://markscanlon.co/publications/EMSideChannelsForForensics.pdf Authors: Asanka Sayakkara; Nhien-An Le-Khac; Mark Scanlon Venue: Proceedings of the International Workshop on Speculative Side Channel Analysis (WoSSCA 2018) Publication date: 2018/07/01 Contribution summary: This paper explores the potential of electromagnetic side-channel analysis in progressing hindered digital forensic investigations. The authors argue that EM side-channel attacks can provide a hands-off approach to accessing internal device information, overcoming encryption and limited standardization of IoT devices. Abstract: Digital forensics is fast-growing field involving the discovery and analysis of digital evidence acquired from electronic devices to assist investigations for law enforcement. Traditional digital forensic investigative approaches are often hampered by the data contained on these devices being encrypted. Furthermore, the increasing use of IoT devices with limited standardisation makes it difficult to analyse them with traditional techniques. This paper argues that electromagnetic side-channel analysis has significant potential to progress investigations obstructed by data encryption. Several potential avenues towards this goal are discussed. ## Digital Forensic Investigation of Two-Way Radio Communication Equipment and Services Canonical page: https://markscanlon.co/publications/TwoWayRadioForensics.html DOI: https://doi.org/10.1016/j.diin.2018.04.007 PDF: https://markscanlon.co/publications/TwoWayRadioForensics.pdf Authors: Arie Kouwen; Mark Scanlon; Kim-Kwang Raymond Choo; Nhien-An Le-Khac Venue: Digital Investigation Publication date: 2018/07/01 Contribution summary: This paper investigates the digital forensic investigation of two-way radio communication equipment and services, including the acquisition and analysis of digital traces in modern radio communication devices. The authors propose a workflow for radio device investigation and evaluate the possibility of using popular forensic tools to acquire artefacts from radio communication equipment. Abstract: Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of these devices also integrate with smartphones, which delivers Push-To-Talk services that make it possible to setup connections between users using a two-way radio and a smartphone. In fact, these devices can be used to connect users only using smartphones. To date, there is little research on the digital traces in modern radio communication equipment. In fact, increasing the knowledge base about these radio communication devices and services can be valuable to law enforcement in a police investigation. In this paper, we investigate what kind of radio communication equipment and services law enforcement digital investigators can encounter at a crime scene or in an investigation. Subsequent to seizure of this radio communication equipment we explore the traces, which may have a forensic interest and how these traces can be acquired. Finally, we test our approach on sample radio communication equipment and services. ## Digital Forensic Investigation of Two-Way Radio Communication Equipment and Services Canonical page: https://markscanlon.co/publications/RadioTraces.html PDF: https://markscanlon.co/publications/RadioTraces.pdf Authors: Arie Kouwen; Mark Scanlon; Kim-Kwang Raymond Choo; Nhien-An Le-Khac Venue: Digital Investigation Publication date: 2018/07/01 Contribution summary: This study investigates the digital forensic traces in modern two-way radio communication equipment and services, which is crucial for law enforcement in crime investigations. The research proposes a novel workflow for investigators to follow when encountering such equipment, highlighting the need for knowledge about radio communication equipment, infrastructure, and associated services. Abstract: Historically, radio-equipment has solely been used as a two-way analogue communication device. Today, the use of radio communication equipment is increasing by numerous organisations and businesses. The functionality of these traditionally short-range devices have expanded to include private call, address book, call-logs, text messages, lone worker, telemetry, data communication, and GPS. Many of these devices also integrate with smartphones, which delivers Push-To-Talk services that make it possible to setup connections between users using a two-way radio and a smartphone. In fact, these devices can be used to connect users only using smartphones. To date, there is little research on the digital traces in modern radio communication equipment. In fact, increasing the knowledge base about these radio communication devices and services can be valuable to law enforcement in a police investigation. In this paper, we investigate what kind of radio communication equipment and services law enforcement digital investigators can encounter at a crime scene or in an investigation. Subsequent to seizure of this radio communication equipment we explore the traces, which may have a forensic interest and how these traces can be acquired. Finally, we test our approach on sample radio communication equipment and services. ## Deep Learning at the Shallow End: Malware Classification for Non-Domain Experts Canonical page: https://markscanlon.co/publications/DeepLearningMalware.html DOI: https://doi.org/10.1016/j.diin.2018.04.024 PDF: https://markscanlon.co/publications/DeepLearningMalware.pdf Authors: Quan Le; Oisín Boydell; Brian Mac Namee; Mark Scanlon Venue: Digital Investigation Publication date: 2018/07/01 Contribution summary: This paper presents a deep learning-based malware classification approach that requires no expert domain knowledge and is based on a purely data-driven approach for complex pattern and feature identification. The model achieves a high accuracy of 98.2% in classifying raw binary files into one of 9 classes of malware, with a processing time of 0.02 seconds per file. Abstract: Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification. ## Evaluating Automated Facial Age Estimation Techniques for Digital Forensics Canonical page: https://markscanlon.co/publications/EvaluatingFacialAgeEstimation.html DOI: https://doi.org/10.1109/SPW.2018.00028 PDF: https://markscanlon.co/publications/EvaluatingFacialAgeEstimation.pdf Authors: Felix Anda; David Lillis; Nhien-An Le-Khac; Mark Scanlon Venue: 12th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE), IEEE Security & Privacy Workshops Publication date: 2018/05/01 Contribution summary: This paper evaluates existing automated facial age estimation techniques for digital forensics, highlighting their limitations and proposing a dataset generator to overcome the lack of sufficient sample images in specific age ranges. The study assesses the performance of offline and cloud-based models, releasing a tool to generate uniformly distributed random images by age and gender. Abstract: In today's world, closed circuit television, cellphone photographs and videos, open-source intelligence (i.e., social media and web data mining), and other sources of photographic evidence are commonly used by police forces to identify suspects and victims of both online and offline crimes. Human characteristics such as age, height, weight, gender, hair color, etc., are often used by police officers and witnesses in their description of unidentified suspects. In certain circumstances, the age of the victim can result in the determination of the crime's categorization, e.g., child abuse investigations. Various automated machine learning-based techniques have been implemented for the analysis of digital images to detect soft-biometric traits, such as age and gender, and thus aid detectives and investigators in progressing their cases. This paper documents an evaluation of existing cognitive age prediction services. The evaluative and comparative analysis of the various services was executed to identify trends and issues inherent to their performance. One significant contributing factor impeding the accurate development of the services investigated is the notable lack of sufficient sample images in specific age ranges, i.e., underage and elderly. To overcome this issue, a dataset generator was developed, which harnesses collections of several unbalanced datasets and forms a balanced, curated dataset of digital images annotated with their corresponding age and gender. ## Hierarchical Bloom Filter Trees for Approximate Matching Canonical page: https://markscanlon.co/publications/HierarchicalBloomFilterTrees.html DOI: https://doi.org/10.15394/jdfsl.2018.1489 PDF: https://markscanlon.co/publications/HierarchicalBloomFilterTrees.pdf Authors: David Lillis; Frank Breitinger; Mark Scanlon Venue: Journal of Digital Forensics, Security and Law Publication date: 2018/03/01 Contribution summary: This paper proposes the use of Hierarchical Bloom Filter Trees (HBFTs) to improve the runtime efficiency of approximate matching techniques in digital forensics. HBFTs reduce the number of pairwise comparisons required, achieving substantial speed gains while maintaining effectiveness. The authors evaluate the effectiveness of HBFTs using the MRSH-v2 algorithm and explore the effects of different configurations of HBFTs. Abstract: Bytewise approximate matching algorithms have in recent years shown significant promise in detecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of “known-illegal” files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a “Hierarchical Bloom Filter Tree” (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness. ## Private Web Browser Forensics: A Case Study on Epic Privacy Browser Canonical page: https://markscanlon.co/publications/PrivateWebBrowserForensics.html PDF: https://markscanlon.co/publications/PrivateWebBrowserForensics.pdf Authors: Alan Reed; Mark Scanlon; Nhien-An Le-Khac Venue: Journal of Information Warfare Publication date: 2018/01/01 Contribution summary: This study examines the Epic Privacy Browser, a private web browser designed to protect users' privacy, and its potential for forensic analysis. The researchers investigate the types of evidence left behind by the browser on Windows 10 and Windows 7 operating systems, including live and post-mortem analysis. The study aims to identify the tools and methods for effective analysis of the browser's artefacts. Abstract: Organized crime, as well as individual criminals, are benefiting from the protection of private browsers to carry out illegal activity, such as money laundering, drug trafficking, the online exchange of child abuse material, etc. Epic Privacy Browser is one common example. It is currently in use in approximately 180 countries worldwide. In this paper, we outline the location and type of evidence available through live and post-mortem state analysis of the Epic Privacy Browser. This analysis identifies how the browser functions during use and where evidence can be recovered after use, the tools, and effective presentation of the recovered material. ## Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees Canonical page: https://markscanlon.co/publications/MRSHv2BloomFilterTrees.html DOI: https://doi.org/10.1007/978-3-319-73697-6_11 PDF: https://markscanlon.co/publications/MRSHv2BloomFilterTrees.pdf Authors: David Lillis; Frank Breitinger; Mark Scanlon Venue: Digital Forensics and Cyber Crime. ICDF2C 2017 Publication date: 2018/01/01 Contribution summary: This paper presents an improvement to the MRSH-v2 approximate matching algorithm using Hierarchical Bloom Filter Trees (HBFT) to expedite the search process for digital forensic investigators. Experiments demonstrate substantial speed gains over the original MRSH-v2 while maintaining effectiveness. Abstract: Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness. ## Data Analytics for Digital Forensics and Cybersecurity Canonical page: https://markscanlon.co/publications/DataAnalyticsForDigitalForensicsAndCybersecurity.html PDF: https://markscanlon.co/publications/DataAnalyticsForDigitalForensicsAndCybersecurity.pdf Authors: Mark Scanlon Venue: Predict Conference; Europe's Leading Data Conference (Predict 2017) Publication date: 2017/10/01 Contribution summary: This paper addresses the problem of information overload in digital forensics and cybersecurity by proposing a data analytics approach for intelligent, real-time, automated data processing and event categorization. The solution aims to combat the increasing frequency and sophistication of cyberattacks by reducing false positive alerts in network intrusion detection systems. Abstract: Information overload is one of the biggest problems facing professionals working in the fields of Digital Forensics and Cybersecurity. The sheer volume of cases requiring digital forensic analysis in law enforcement agencies throughout the world is outstripping the capacities of digital forensic laboratories. This has resulted in huge digital evidence backlogs becoming commonplace and cases being ruled upon in court without the inclusion of potentially pertinent information, which is sitting idle in some evidence store. As is commonly relayed in the media, the frequency of cyberattacks being faced by governments, law enforcement agencies, and industry is increasing, alongside the sophistication of the techniques used. Current rules-based network intrusion detection systems are predominantly based on historic, known threat vectors and result in a very high amount of false positive alerts being generated. Intelligent, real-time, automated data processing and event categorisation is one solution that shows great promise to combat this information overload. ## Privileged Data within Digital Evidence Canonical page: https://markscanlon.co/publications/PrivilegedDataWithinDigitalEvidence.html DOI: https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.307 PDF: https://markscanlon.co/publications/PrivilegedDataWithinDigitalEvidence.pdf Authors: Dominique Fleurbaaij; Mark Scanlon; Nhien-An Le-Khac Venue: Proceedings of the 16th IEEE International Conference On Trust, Security And Privacy In Computing And Communications (TrustCom-17) Publication date: 2017/08/01 Contribution summary: This paper presents a script for handling privileged data in digital forensic tools, specifically in Nuix, to minimize exposure to investigators and automate the filtering process. The script increases effectiveness by relating files based on content, addressing the limitations of traditional filtering methods. Abstract: In recent years the use of digital communication has increased. This also increased the chance to find privileged data in the digital evidence. Privileged data is protected by law from viewing by anyone other than the client. It is up to the digital investigator to handle this privileged data properly without being able to view the contents. Procedures on handling this information are available, but do not provide any practical information nor is it known how effective filtering is. The objective of this paper is to describe the handling of privileged data in the current digital forensic tools and the creation of a script within the digital forensic tool Nuix. The script automates the handling of privileged data to minimize the exposure of the contents to the digital investigator. The script also utilizes technology within Nuix that extends the automated search of identical privileged document to relate files based on their contents. A comparison of the 'traditional' ways of filtering within the digital forensic tools and the script written in Nuix showed that digital forensic tools are still limited when used on privileged data. The script manages to increase the effectiveness as direct result of the use of relations based on file content. ## Integration of Ether Unpacker into Ragpicker for plugin-based Malware Analysis and Identification Canonical page: https://markscanlon.co/publications/EtherUnpacker.html PDF: https://markscanlon.co/publications/EtherUnpacker.pdf Authors: Erik Schaefer; Nhien-An Le-Khac; Mark Scanlon Venue: Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS 2017) Publication date: 2017/06/01 Contribution summary: This paper presents a new approach to malware analysis by integrating Ether Unpacker into the plugin-based malware analysis tool, Ragpicker. The integration aims to improve the unpacking rate of malware samples, enabling the analysis of transferred and reused code. The authors evaluate their approach against real-world malware patterns, demonstrating its effectiveness in identifying malware variants and families. Abstract: Malware is a pervasive problem in both personal computing devices and distributed computing systems. Identification of malware variants and their families others a great benefit in early detection resulting in a reduction of the analyses time needed. In order to classify malware, most of the current approaches are based on the analysis of the unpacked and unencrypted binaries. However, most of the unpacking solutions in the literature have a low unpacking rate. This results in a low contribution towards the identification of transferred code and re-used code. To develop a new malware analysis solution based on clusters of binary code sections, it is required to focus on increasing of the unpacking rate of malware samples to extend the underlying code database. In this paper, we present a new approach of analysing malware by integrating ETHER Unpacker into the plugin-based malware analysis tool, Ragpicker. We also evaluate our approach against real-world malware patterns. ## Forensic Analysis of Epic Privacy Browser on Windows Operating Systems Canonical page: https://markscanlon.co/publications/EpicPrivacyBrowser.html PDF: https://markscanlon.co/publications/EpicPrivacyBrowser.pdf Authors: Alan Reed; Mark Scanlon; Nhien-An Le-Khac Venue: Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS 2017) Publication date: 2017/06/01 Contribution summary: This paper presents a forensic analysis of Epic Privacy Browser on Windows operating systems, focusing on the identification and analysis of artefact evidence left on Windows 10 compared to Windows 7. The study aims to establish if the introduction of Windows 10 has had an adverse effect on the browser's claim of clearing all user activity traces upon closure. Abstract: Internet security can be compromised not only through the threat of Malware, fraud, system intrusion or damage, but also via the tracking of internet activity. Criminals are using numerous methods to access data in the highly lucrative cybercrime business. Organized crime, as well as individual users, are benefiting from the protection of Virtual Private Networks (VPN) and private browsers, such as Tor, Epic Privacy, to carry out illegal activity such as money laundering, drug dealing and the trade of child pornography. News articles advising on internet privacy assisted in educating the public and a new era of private browsing arose. Although these measures were designed to protect legitimate browsing privacy, they also provided a means to conceal illegal activity. One such tool released for private browsing was Epic Privacy Browser. It is currently used in approximately 180 countries worldwide. Epic Privacy Browser is promoted as a chromium powered browser, specifically engineered to protect users' privacy. It only operates in private browser mode and, upon close of the browsing session, deletes all browsing data. The Epic Privacy Browser claims that all traces of user activity will be cleared upon close of the application and will establish if the introduction of Windows 10 has had an adverse effect on this claim. However, there is no forensic acquisition and analysis of Epic Privacy Browser in literature. In this paper, we aim to contribute towards the goal of assisting forensic examiners with the locations and types of evidence available through live and post-mortem state analysis of the Epic Privacy Browser on Windows 10 and Windows 7, identify how the browser functions during use, where data can be recovered once the browser is closed and the necessary tools that will assist in the forensics discovery and effective presentation of the material. ## Evaluation of Digital Forensic Process Models with Respect to Digital Forensics as a Service Canonical page: https://markscanlon.co/publications/ProcessModelsDFaaS.html PDF: https://markscanlon.co/publications/ProcessModelsDFaaS.pdf Authors: Xiaoyu Du; Nhien-An Le-Khac; Mark Scanlon Venue: Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS 2017) Publication date: 2017/06/01 Contribution summary: This paper evaluates the applicability of existing digital forensic process models to a cloud-based evidence processing paradigm, specifically Digital Forensics as a Service (DFaaS). The authors analyze the characteristics of each current process model and review the benefits of DFaaS, aiming to expedite the investigative process and reduce costs. Abstract: Digital forensic science is very much still in its infancy, but is becoming increasingly invaluable to investigators. A popular area for research is seeking a standard methodology to make the digital forensic process accurate, robust, and efficient. The first digital forensic process model proposed contains four steps: Acquisition, Identification, Evaluation and Admission. Since then, numerous process models have been proposed to explain the steps of identifying, acquiring, analysing, storage, and reporting on the evidence obtained from various digital devices. In recent years, an increasing number of more sophisticated process models have been proposed. These models attempt to speed up the entire investigative process or solve various of problems commonly encountered in the forensic investigation. In the last decade, cloud computing has emerged as a disruptive technological concept, and most leading enterprises such as IBM, Amazon, Google, and Microsoft have set up their own cloud-based services. In the field of digital forensic investigation, moving to a cloud-based evidence processing model would be extremely beneficial and preliminary attempts have been made in its implementation. Moving towards a Digital Forensics as a Service model would not only expedite the investigative process, but can also result in significant cost savings - freeing up digital forensic experts and law enforcement personnel to progress their caseload. This paper aims to evaluate the applicability of existing digital forensic process models and analyse how each of these might apply to a cloud-based evidence processing paradigm. ## EviPlant: An Efficient Digital Forensic Challenge Creation, Manipulation, and Distribution Solution Canonical page: https://markscanlon.co/publications/EviPlant.html DOI: https://doi.org/10.1016/j.diin.2017.01.010 PDF: https://markscanlon.co/publications/EviPlant.pdf Authors: Mark Scanlon; Xiaoyu Du; David Lillis Venue: Digital Investigation Publication date: 2017/03/01 Contribution summary: EviPlant is a system designed to efficiently create, manipulate, store, and distribute digital forensic challenges for education and training. It allows educators to create evidence packages that can be integrated with base images, reducing the need for large, full-image files and making it easier to distribute challenges to students. Abstract: Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on behalf of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant resources and time and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this paper, we introduce a more capable methodology and system to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely bare operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a diffing of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an evidence package. Evidence packages can be created for different personas, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated with the base images. A number of advantages and additional functionality over the current approaches are discussed that emerge as a result of using EviPlant. ## Behavioral Service Graphs: A Formal Data-Driven Approach for Prompt Investigation of Enterprise and Internet-wide Infections Canonical page: https://markscanlon.co/publications/BehavioralServiceGraphsFormal.html DOI: https://doi.org/10.1016/j.diin.2017.02.002 PDF: https://markscanlon.co/publications/BehavioralServiceGraphsFormal.pdf Authors: Elias Bou-Harb; Mark Scanlon Venue: Digital Investigation Publication date: 2017/03/01 Contribution summary: This paper proposes Behavioral Service Graphs, a formal data-driven approach for prompt investigation of enterprise and internet-wide infections. It leverages probing activities to rapidly infer infections and models infected machines as graphs to infer and correlate distributed groups of infected machines. Abstract: The task of generating network-based evidence to support network forensic investigation is becoming increasingly prominent. Undoubtedly, such evidence is significantly imperative as it not only can be used to diagnose and respond to various network-related issues (i.e., performance bottlenecks, routing issues, etc.) but more importantly, can be leveraged to infer and further investigate network security intrusions and infections. In this context, this paper proposes a proactive approach that aims at generating accurate and actionable network-based evidence related to groups of compromised network machines (i.e., campaigns). The approach is envisioned to guide investigators to promptly pinpoint such malicious groups for possible immediate mitigation as well as empowering network and digital forensic specialists to further examine those machines using auxiliary collected data or extracted digital artifacts. On one hand, the promptness of the approach is successfully achieved by monitoring and correlating perceived probing activities, which are typically the very first signs of an infection or misdemeanors. On the other hand, the generated evidence is accurate as it is based on an anomaly inference that fuses data behavioral analytics in conjunction with formal graph theoretic concepts. We evaluate the proposed approach in two deployment scenarios, namely, as an enterprise edge engine and as a global capability in a security operations center model. The empirical evaluation that employs 10 GB of real botnet traffic and 80 GB of real darknet traffic indeed demonstrates the accuracy, effectiveness and simplicity of the generated network-based evidence. ## Towards the Leveraging of Data Deduplication to Break the Disk Acquisition Speed Limit Canonical page: https://markscanlon.co/publications/TowardsDataDeduplication.html DOI: https://doi.org/10.1109/NTMS.2016.7792486 PDF: https://markscanlon.co/publications/TowardsDataDeduplication.pdf Authors: Hannah Wolahan; Claudio Chico Lorenzo; Elias Bou-Harb; Mark Scanlon Venue: Proceedings of the IFIP International Workshop on Cybercrime Investigation and Digital Forensics (CID) Publication date: 2016/11/01 Contribution summary: This paper proposes a data deduplication system to expedite digital forensic evidence acquisition and analysis. The system leverages a deduplicated forensic data storage system to eliminate unnecessary reacquisition and analysis of previously processed data, reducing acquisition time and improving the overall efficiency of the digital forensic process. Abstract: Digital forensic evidence acquisition speed is traditionally limited by two main factors: the read speed of the storage device being investigated, i.e., the read speed of the disk, memory, remote storage, mobile device, etc.), and the write speed of the system used for storing the acquired data. Digital forensic investigators can somewhat mitigate the latter issue through the use of high-speed storage options, such as networked RAID storage, in the controlled environment of the forensic laboratory. However, traditionally, little can be done to improve the acquisition speed past its physical read speed from the target device itself. The protracted time taken for data acquisition wastes digital forensic experts' time, contributes to digital forensic investigation backlogs worldwide, and delays pertinent information from potentially influencing the direction of an investigation. In a remote acquisition scenario, a third contributing factor can also become a detriment to the overall acquisition time - typically the Internet upload speed of the acquisition system. This paper explores an alternative to the traditional evidence acquisition model through the leveraging of a forensic data deduplication system. The advantages that a deduplicated approach can provide over the current digital forensic evidence acquisition process are outlined and some preliminary results of a prototype implementation are discussed. ## Behavioral Service Graphs: A Big Data Approach for Prompt Investigation of Internet-wide Infections Canonical page: https://markscanlon.co/publications/BehavioralServiceGraphs.html DOI: https://doi.org/10.1109/NTMS.2016.7792437 PDF: https://markscanlon.co/publications/BehavioralServiceGraphs.pdf Authors: Elias Bou-Harb; Mark Scanlon; Claude Fachkha Venue: Proceedings of the IFIP International Workshop on Cybercrime Investigation and Digital Forensics (CID) Publication date: 2016/11/01 Contribution summary: This paper proposes Behavioral Service Graphs, a proactive approach to generating network-based evidence for network forensic investigation. It leverages big data behavioral analytics and graph theoretical concepts to infer and correlate groups of compromised network machines, providing actionable insights for prompt mitigation and further analysis. Abstract: The task of generating network-based evidence to support network forensic investigation is becoming increasingly prominent. Undoubtedly, such evidence is significantly imperative as it not only can be used to diagnose and respond to various network-related issues (i.e., performance bottlenecks, routing issues, etc.) but more importantly, can be leveraged to infer and further investigate network security intrusions and infections. In this context, this paper proposes a proactive approach that aims at generating accurate and actionable network-based evidence related to groups of compromised network machines. The approach is envisioned to guide investigators to promptly pinpoint such malicious groups for possible immediate mitigation as well as empowering network and digital forensic specialists to further examine those machines using auxiliary collected data or extracted digital artifacts. On one hand, the promptness of the approach is successfully achieved by monitoring and correlating perceived probing activities, which are typically the very first signs of an infection or misdemeanors. On the other hand, the generated evidence is accurate as it is based on an anomaly inference that fuzes big data behavioral analytics in conjunction with formal graph theoretical concepts. We evaluate the proposed approach as a global capability in a security operations center. The empirical evaluations which employ 80 GB of real darknet traffic indeed demonstrates the accuracy, effectiveness and simplicity of the generated network-based evidence. ## IPv6 Security and Forensics Canonical page: https://markscanlon.co/publications/IPv6SecurityAndForensics.html DOI: https://doi.org/10.1109/INTECH.2016.7845143 PDF: https://markscanlon.co/publications/IPv6SecurityAndForensics.pdf Authors: Vincent Nicolls; Nhien-An Le-Khac; Lei Chen; Mark Scanlon Venue: 2nd International Workshop on Cloud Security and Forensics (WCSF 2016) Publication date: 2016/08/01 Contribution summary: This paper presents a new approach to investigate IPv6 network attacks with case studies, focusing on IPv6 security and forensics. It discusses different types of IPv6 attacks and provides a comprehensive overview of IPv6 network attack techniques, including reconnaissance, exploitation, and mitigation strategies. Abstract: IPv4 is the historical addressing protocol used for all devices connected worldwide. It has survived for over 30 years and has been an integral part of the Internet revolution. However, due to its limitation, IPv4 is being replacing by IPv6. Today, IPv6 is more and more widely used on the Internet. On the other hand, criminals are also well aware of the introduction of IPv6. They are continuously seeking new methods to make profit, hiding their activities, infiltrate or exfiltrate important data from companies. The introduction of this new protocol may provide savvy cybercriminals more opportunities to discover new system vulnerabilities and exploit them. To date, there is little research on IPv6 security and forensics in the literature. In this paper, we look at different types of IPv6 attacks and we present a new approach. ## Battling the Digital Forensic Backlog through Data Deduplication Canonical page: https://markscanlon.co/publications/BattlingTheBacklogDataDeduplication.html DOI: https://doi.org/10.1109/INTECH.2016.7845139 PDF: https://markscanlon.co/publications/BattlingTheBacklogDataDeduplication.pdf Authors: Mark Scanlon Venue: Proceedings of the 6th IEEE International Conference on Innovative Computing Technologies (INTECH 2016) Publication date: 2016/08/01 Contribution summary: This paper proposes a novel solution to combat the digital forensic backlog through data deduplication. The solution leverages a centralized storage system to store a single copy of each object, eliminating redundant storage and reanalysis of previously processed data. This approach can reduce storage requirements, expedite digital forensic processing, and facilitate collaborative examination and sharing of digital evidence. Abstract: In recent years, technology has become truly pervasive in everyday life. Technological advancement can be found in many facets of life, including personal computers, mobile devices, wearables, cloud services, video gaming, web-powered messaging, social media, Internet-connected devices, etc. This technological influence has resulted in these technologies being employed by criminals to conduct a range of crimes - both online and offline. Both the number of cases requiring digital forensic analysis and the sheer volume of information to be processed in each case has increased rapidly in recent years. As a result, the requirement for digital forensic investigation has ballooned, and law enforcement agencies throughout the world are scrambling to address this demand. While more and more members of law enforcement are being trained to perform the required investigations, the supply is not keeping up with the demand. Current digital forensic techniques are arduously time-consuming and require a significant amount of man power to execute. This paper discusses a novel solution to combat the digital forensic backlog. This solution leverages a deduplication-based paradigm to eliminate the reacquisition, redundant storage, and reanalysis of previously processed data. ## Battling the Digital Forensic Backlog Canonical page: https://markscanlon.co/publications/BattlingTheDigitalForensicBacklog.html Authors: Mark Scanlon Venue: Proceedings of the 2nd International Workshop on Cloud Security and Forensics (WCSF 2016) Publication date: 2016/08/01 Contribution summary: This paper discusses the growing digital forensic backlog faced by law enforcement agencies due to the increasing number of digital devices involved in investigations. It highlights the challenges in identifying, acquiring, storing, and analyzing digital evidence from various sources, including cloud-based services and IoT devices. The author proposes future research directions to improve the efficiency of the digital forensic process. Abstract: Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout the world. It can be anticipated that the number of cases requiring digital forensic analysis will greatly increase in the future. It is also likely that each case will require the analysis of an increasing number of devices including computers, smartphones, tablets, cloud-based services, Internet of Things devices, wearables, etc. The variety of new digital evidence sources pose new and challenging problems for the digital investigator from an identification, acquisition, storage and analysis perspective. This talk explores the current challenges contributing to the backlog in digital forensics from a technical standpoint and outlines a number of future research topics that could greatly contribute to a more efficient digital forensic process. ## An Analytical Approach to the Recovery of Data From 3rd Party Proprietary CCTV File Systems Canonical page: https://markscanlon.co/publications/AnalyticalApproachToTheRecoveryOfDataFromCCTVFileSystems.html DOI: https://doi.org/10.13140/RG.2.2.31446.65601 PDF: https://markscanlon.co/publications/AnalyticalApproachToTheRecoveryOfDataFromCCTVFileSystems.pdf Authors: Richard Gomm; Nhien-An Le-Khac; Mark Scanlon; M-Tahar Kechadi Venue: 15th European Conference on Cyber Warfare and Security (ECCWS 2016) Publication date: 2016/07/01 Contribution summary: This paper presents an analytical approach to recovering data from 3rd party proprietary CCTV file systems, focusing on a Ganz CCTV DVR model C-MPDVR-16. The authors reverse engineer the proprietary file system, enabling the retrieval of the oldest video footage possible. The method is evaluated using a case study, demonstrating the feasibility of recovering video footage from a DVR with no initial knowledge or documentation available. Abstract: According to recent predictions, the global video surveillance market is expected to reach $42.06 billion annually by 2020. The market is extremely fragmented with only around 40% of the market being accounted for by the 15 top video surveillance equipment suppliers as in an annual report issued by IMS Research. The remaining market share was split amongst the numerous other smaller companies who provide CCTV solutions, usually at lower prices than their brand name counterparts. This cost cutting generally results in a lower specification of components. Recently, an investigation was undertaken in relation to a serious criminal offence, of which significant video footage had been captured on a CCTV Digital Video Recorder (DVR). The unit was setup to save the last 31 days of footage to an internal hard drive. However, despite the referenced footage being within this timeframe, it could not be located. The DVR unit was submitted for forensic examination and data retrieval of specified video footage which, according to the proprietary video backup application, was not retrievable. In this paper, we present the process and method of the forensic retrieval of video footage from a DVR. The objective of this method is to retrieve the oldest video footage possible from a proprietary designed file storage system. We also evaluate our approach with a Ganz CCTV DVR system model C-MPDVR-16 to show that the file system of a DVR has been reversed engineering with no initial knowledge, application or documentation available. ## Current Challenges and Future Research Areas for Digital Forensic Investigation Canonical page: https://markscanlon.co/publications/CurrentChallengesAndFutureResearchAreas.html DOI: https://doi.org/10.13140/RG.2.2.34898.76489 PDF: https://markscanlon.co/publications/CurrentChallengesAndFutureResearchAreas.pdf Authors: David Lillis; Brett Becker; Tadhg O'Sullivan; Mark Scanlon Venue: The 11th ADFSL Conference on Digital Forensics, Security and Law (CDFSL 2016) Publication date: 2016/05/01 Contribution summary: This paper explores the current challenges in digital forensic investigations, including the digital evidence backlog, and outlines future research areas to improve the process. The authors discuss the increasing complexity, diversity, and volume of digital evidence, as well as the need for standardization and automation in digital forensic tools and processes. Abstract: Given the ever-increasing prevalence of technology in modern life, there is a corresponding increase in the likelihood of digital devices being pertinent to a criminal investigation or civil litigation. As a direct consequence, the number of investigations requiring digital forensic expertise is resulting in huge digital evidence backlogs being encountered by law enforcement agencies throughout the world. It can be anticipated that the number of cases requiring digital forensic analysis will greatly increase in the future. It is also likely that each case will require the analysis of an increasing number of devices including computers, smartphones, tablets, cloud-based services, Internet of Things devices, wearables, etc. The variety of new digital evidence sources pose new and challenging problems for the digital investigator from an identification, acquisition, storage and analysis perspective. This paper explores the current challenges contributing to the backlog in digital forensics from a technical standpoint and outlines a number of future research topics that could greatly contribute to a more efficient digital forensic process. ## On the Benefits of Information Retrieval and Information Extraction Techniques Applied to Digital Forensics Canonical page: https://markscanlon.co/publications/OnTheBenefitsOfInformationRetrievalToDigitalForensics.html DOI: https://doi.org/10.1007/978-981-10-1536-6_83 PDF: https://markscanlon.co/publications/OnTheBenefitsOfInformationRetrievalToDigitalForensics.pdf Authors: David Lillis; Mark Scanlon Venue: Advanced Multimedia and Ubiquitous Engineering: FutureTech & MUE Publication date: 2016/04/01 Contribution summary: This paper explores the application of Information Retrieval (IR) and Information Extraction (IE) techniques to digital forensics, highlighting their potential to improve the efficiency and effectiveness of investigations. The authors discuss the benefits of cloud-based digital forensic investigation platforms and the importance of precision and recall in different stages of an investigation. Abstract: Many jurisdictions suffer from lengthy backlogs in digital forensics investigations. This has negative consequences for the timely incorporation of digital evidence into criminal investigations, while also affecting the timelines required to bring a case to court. Modern technological advances, in particular the move towards cloud computing, has great potential in expediting the automated processing of digital evidence, thus reducing the manual workload for investigators. It also promises to provide a platform upon which more sophisticated automated techniques may be employed to improve the process further. This paper identifies some research strains from the areas of Information Retrieval and Information Extraction that have the potential to greatly help with the efficiency and effectiveness of digital forensics investigations. ## Increasing Digital Investigator Availability through Efficient Workflow Management and Automation Canonical page: https://markscanlon.co/publications/IncreasingDigitalInvestigatorAvailability.html DOI: https://doi.org/10.1109/ISDFS.2016.7473525 PDF: https://markscanlon.co/publications/IncreasingDigitalInvestigatorAvailability.pdf Authors: Ronald In de Braekt; Nhien-An Le-Khac; Jason Farina; Mark Scanlon; Mohand-Tahar Kechadi Venue: The 4th International Symposium on Digital Forensics and Security (ISDFS 2016) Publication date: 2016/04/01 Contribution summary: This paper proposes a workflow management automation framework to streamline digital investigation workflows, reducing time spent on acquisition and preparation steps, and increasing efficiency of forensic software and hardware use. The framework is evaluated in a real-world scenario, demonstrating its benefits and robustness. Abstract: The growth of digital storage capacities and diversity devices has had a significant time impact on digital forensic laboratories in law enforcement. Backlogs have become commonplace and increasingly more time is spent in the acquisition and preparation steps of an investigation as opposed to detailed evidence analysis and reporting. There is generally little room for increasing digital investigation capacity in law enforcement digital forensic units and the allocated budgets for these units are often decreasing. In the context of developing an efficient investigation process, one of the key challenges amounts to how to achieve more with less. This paper proposes a workflow management automation framework for handling common digital forensic tools. The objective is to streamline the digital investigation workflow enabling more efficient use of limited hardware and software. The proposed automation framework reduces the time digital forensic experts waste conducting time-consuming, though necessary, tasks. The evidence processing time is decreased through server-side automation resulting in 24/7 evidence preparation. The proposed framework increases efficiency of use of forensic software and hardware, reduces the infrastructure costs and license fees, and simplifies the preparation steps for the digital investigator. The proposed approach is evaluated in a real-world scenario to evaluate its robustness and highlight its benefits. ## Tiered Forensic Methodology Model for Digital Field Triage by Non-Digital Evidence Specialists Canonical page: https://markscanlon.co/publications/TieredForensicMethodologyModelForDigitalFieldTriage.html DOI: https://doi.org/10.1016/j.diin.2016.01.010 PDF: https://markscanlon.co/publications/TieredForensicMethodologyModelForDigitalFieldTriage.pdf Authors: Ben Hitchcock; Nhien-An Le-Khac; Mark Scanlon Venue: Digital Investigation Publication date: 2016/03/01 Contribution summary: This paper presents a tiered forensic methodology model for digital field triage by non-digital evidence specialists. The model aims to increase investigation efficiency and reduce the backlog of digital evidence waiting for analysis by trained specialists. The authors propose a framework for training front-line investigators to conduct digital field triage, allowing them to provide actionable information quickly and maintain the integrity of digital evidence. Abstract: Due to budgetary constraints and the high level of training required, digital forensic analysts are in short supply in police forces the world over. This inevitably leads to a prolonged time taken between an investigator sending the digital evidence for analysis and receiving the analytical report back. In an attempt to expedite this procedure, various process models have been created to place the forensic analyst in the field conducting a triage of the digital evidence. By conducting triage in the field, an investigator is able to act upon pertinent information quicker, while waiting on the full report. The work presented as part of this paper focuses on the training of front-line personnel in the field triage process, without the need of a forensic analyst attending the scene. The premise has been successfully implemented within regular/non-digital forensics, i.e., crime scene investigation. In that field, front-line members have been trained in specific tasks to supplement the trained specialists. The concept of front-line members conducting triage of digital evidence in the field is achieved through the development of a new process model providing guidance to these members. To prove the model's viability, an implementation of this new process model is presented and evaluated. The results outlined demonstrate how a tiered response involving digital evidence specialists and non-specialists can better deal with the increasing number of investigations involving digital evidence. ## An Evaluation of Google Plus Communities as an Active Learning Journal Alternative to Improve Learning Efficacy Canonical page: https://markscanlon.co/publications/GooglePlusCommunities-ActiveLearningJournalAlternative.html PDF: https://markscanlon.co/publications/GooglePlusCommunities-ActiveLearningJournalAlternative.pdf Authors: Mark Scanlon; Brett Becker Venue: Proceedings of 8th International Conference on Engaging Pedagogy (ICEP 2015) Publication date: 2015/12/01 Contribution summary: This study evaluates Google Plus Communities as an active learning journal alternative to improve learning efficacy. The authors present guidelines for deploying G+ Communities in educational settings, highlighting their potential to foster collaborative learning, social interaction, and community engagement. ## Network Investigation Methodology for BitTorrent Sync: A Peer-to-Peer Based File Synchronisation Service Canonical page: https://markscanlon.co/publications/NetworkInvestigationMethodologyForBitTorrentSync.html DOI: https://doi.org/10.1016/j.cose.2015.05.003 PDF: https://markscanlon.co/publications/NetworkInvestigationMethodologyForBitTorrentSync.pdf Authors: Mark Scanlon; Jason Farina; M-Tahar Kechadi Venue: Computers & Security Publication date: 2015/10/01 Contribution summary: This paper proposes a network investigation methodology for BitTorrent Sync, a peer-to-peer file synchronization service, to aid in the control of data flow across security perimeters. The methodology includes recommendations for investigating various scenarios, including legitimate and illicit activities. Abstract: High availability is no longer just a business continuity concern. Users are increasingly dependant on devices that consume and produce data in ever increasing volumes. A popular solution is to have a central repository which each device accesses after centrally managed authentication. This model of use is facilitated by cloud based file synchronisation services such as Dropbox, OneDrive, Google Drive and Apple iCloud. Cloud architecture allows the provisioning of storage space with “always-on” access. Recent concerns over unauthorised access to third party systems and large scale exposure of private data have made an alternative solution desirable. These events have caused users to assess their own security practices and the level of trust placed in third party storage services. One option is BitTorrent Sync, a cloudless synchronisation utility provides data availability and redundancy. This utility replicates files stored in shares to remote peers with access controlled by keys and permissions. While lacking the economies brought about by scale, complete control over data access has made this a popular solution. The ability to replicate data without oversight introduces risk of abuse by users as well as difficulties for forensic investigators. This paper suggests a methodology for investigation and analysis of the protocol to assist in the control of data flow across security perimeters. ## Forensic Analysis and Remote Evidence Recovery from Syncthing: An Open Source Decentralised File Synchronisation Utility Canonical page: https://markscanlon.co/publications/ForensicAnalysisAndRemoteEvidenceRecoveryFromSyncthing.html DOI: https://doi.org/10.1007/978-3-319-25512-5_7 PDF: https://markscanlon.co/publications/ForensicAnalysisAndRemoteEvidenceRecoveryFromSyncthing.pdf Authors: Conor Quinn; Mark Scanlon; Jason Farina; M-Tahar Kechadi Venue: Digital Forensics and Cyber Crime Publication date: 2015/10/01 Contribution summary: This paper presents a forensic analysis and remote evidence recovery techniques for Syncthing, an open-source decentralized file synchronization utility. The authors outline the entry points for a Syncthing investigation, describe the network communication protocol, and develop a proof-of-concept tool for remote evidence recovery. The study addresses the need for digital forensics procedures to keep pace with decentralized services like Syncthing. Abstract: Commercial and home Internet users are becoming increasingly concerned with data protection and privacy. Questions have been raised regarding the privacy afforded by popular cloud-based file synchronisation services such as Dropbox, OneDrive and Google Drive. A number of these services have recently been reported as sharing information with governmental security agencies without the need for warrants to be granted. As a result, many users are opting for decentralised (cloudless) file synchronisation alternatives to the aforementioned cloud solutions. This paper outlines the forensic analysis and applies remote evidence recovery techniques for one such decentralised service, Syncthing. ## Project Maelstrom: Forensic Analysis of the BitTorrent-Powered Browser Canonical page: https://markscanlon.co/publications/ProjectMaelstrom.html DOI: https://doi.org/10.15394/jdfsl.2015.1216 PDF: https://markscanlon.co/publications/ProjectMaelstrom.pdf Authors: Jason Farina; M-Tahar Kechadi; Mark Scanlon Venue: Journal of Digital Forensics, Security and Law: Proc. of 10th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE 2015) Publication date: 2015/09/01 Contribution summary: This paper presents a forensic analysis of Project Maelstrom, a decentralized web browser powered by BitTorrent. The authors explore the browser's functionality, forensic value, and the evidence it leaves behind, including installation and configuration files, user data, and torrent-related settings. Abstract: In April 2015, BitTorrent Inc. released their distributed peer-to-peer powered browser Project Maelstrom into public beta. The browser facilitates a new alternative website distribution paradigm to the traditional HTTP based, client-server model. This decentralised web is powered by each of the users accessing each Maelstrom hosted website. Each user shares their copy of the website with other new visitors to the website. As a result, a Maelstrom hosted website cannot be taken offline by law enforcement or any other parties. Due to this open distribution model, a number of interesting censorship, security and privacy considerations are raised. This paper explores the application, its protocol, sharing Maelstrom content and its new visitor powered “web-hosting” paradigm. ## Towards the Forensic Identification and Investigation of Cloud Hosted Servers through Noninvasive Wiretaps Canonical page: https://markscanlon.co/publications/TowardsTheForensicIdentificationAndInvestigationOfCloudHostedServers.html DOI: https://doi.org/10.1109/ARES.2015.77 PDF: https://markscanlon.co/publications/TowardsTheForensicIdentificationAndInvestigationOfCloudHostedServers.pdf Authors: Hessel Schut; Mark Scanlon; Jason Farina; Nhien-An Le-Khac Venue: Proceedings of 10th International Conference on Availability, Reliability and Security (ARES 2015) Publication date: 2015/08/01 Contribution summary: This paper presents a new approach to rapidly and reliably identify cloud-hosted servers through non-invasive wiretaps. A handheld device composed of an embedded computer and a method of undetectable Ethernet-based communication interception is developed and tested. The device captures minimal information and only stores relevant data, with an audit log of operator actions for reporting. Abstract: When conducting modern cybercrime investigations, evidence has often to be gathered from computer systems located at cloud-based data centres of hosting providers. In cases where the investigation cannot rely on the cooperation of the hosting provider, or where documentation is not available, investigators can often find the identification of which distinct server among many is of interest difficult and extremely time consuming. To address the problem of identifying these servers, in this paper a new approach to rapidly and reliably identify these cloud hosting computer systems is presented. In the outlined approach, a handheld device composed of an embedded computer combined with a method of undetectable interception of Ethernet based communications is presented. This device is tested and evaluated, and a discussion is provided on its usefulness in identifying of server of interest to an investigation. ## Remote Evidence Acquisition Canonical page: https://markscanlon.co/publications/RemoteEvidenceAcquisition.html Authors: Mark Scanlon Venue: Proceedings of the International Workshop on Digital Forensics (WSDF 2015) Publication date: 2015/08/01 Contribution summary: This paper presents a novel approach to remote evidence acquisition in digital forensics. The authors propose a method for collecting and preserving digital evidence from remote locations, addressing the challenges of on-site collection. The contribution is a framework for secure and efficient evidence transfer, enhancing the integrity and admissibility of digital evidence in investigations. ## Overview of the Forensic Investigation of Cloud Services Canonical page: https://markscanlon.co/publications/OverviewOfTheForensicInvestigationOfCloudServices.html DOI: https://doi.org/10.1109/ARES.2015.81 PDF: https://markscanlon.co/publications/OverviewOfTheForensicInvestigationOfCloudServices.pdf Authors: Jason Farina; Mark Scanlon; Nhien-An Le-Khac; M-Tahar Kechadi Venue: 10th International Conference on Availability, Reliability and Security (ARES 2015) Publication date: 2015/08/01 Contribution summary: This paper provides an overview of the forensic investigation of cloud services, discussing the challenges and opportunities of cloud computing in digital forensics. It examines the state-of-the-art in cloud-focused digital forensic practices, including the collection and analysis of evidence, and the potential use of cloud technologies to provide Digital Forensics as a Service. Abstract: Cloud Computing is a commonly used, yet ambiguous term, which can be used to refer to a multitude of differing dynamically allocated services. From a law enforcement and forensic investigation perspective, cloud computing can be thought of as a double edged sword. While on one hand, the gathering of digital evidence from cloud sources can bring with it complicated technical and cross-jurisdictional legal challenges. On the other, the employment of cloud storage and processing capabilities can expedite the forensics process and focus the investigation onto pertinent data earlier in an investigation. This paper examines the state-of-the-art in cloud-focused, digital forensic practises for the collection and analysis of evidence and an overview of the potential use of cloud technologies to provide Digital Forensics as a Service. ## HTML5 Zero Configuration Covert Channels: Security Risks and Challenges Canonical page: https://markscanlon.co/publications/HTML5ZeroConfigurationCovertChannels.html PDF: https://markscanlon.co/publications/HTML5ZeroConfigurationCovertChannels.pdf Authors: Jason Farina; Mark Scanlon; Stephen Kohlmann; Nhien-An Le Khac; M-Tahar Kechadi Venue: The 10th ADFSL Conference on Digital Forensics, Security and Law (CDFSL 2015) Publication date: 2015/05/01 Contribution summary: This paper explores the security risks and challenges of HTML5 zero-configuration covert channels, including the potential for cybercriminals to use these services for illegal activities. The authors analyze the forensic consequences of these services and propose methods for retrieving evidence. Abstract: In recent months there has been an increase in the popularity and public awareness of secure, cloudless file transfer systems. The aim of these services is to facilitate the secure transfer of files in a peer-to- peer (P2P) fashion over the Internet without the need for centralised authentication or storage. These services can take the form of client installed applications or entirely web browser based interfaces. Due to their P2P nature, there is generally no limit to the file sizes involved or to the volume of data transmitted - and where these limitations do exist they will be purely reliant on the capacities of the systems at either end of the transfer. By default, many of these services provide seamless, end-to-end encryption to their users. The cybersecurity and cyberforensic consequences of the potential criminal use of such services are significant. The ability to easily transfer encrypted data over the Internet opens up a range of opportunities for illegal use to cybercriminals requiring minimal technical know-how. This paper explores a number of these services and provides an analysis of the risks they pose to corporate and governmental security. A number of methods for the forensic investigation of such transfers are discussed. ## Leveraging Decentralisation to Extend the Digital Evidence Acquisition Window: Case Study on BitTorrent Sync Canonical page: https://markscanlon.co/publications/LeveragingDecentralisationToExtendTheDigitalEvidenceAcquisitionWindow.html DOI: https://doi.org/10.15394/jdfsl.2014.1173 PDF: https://markscanlon.co/publications/LeveragingDecentralisationToExtendTheDigitalEvidenceAcquisitionWindow.pdf Authors: Mark Scanlon; Jason Farina; Nhien-An Le Khac; M-Tahar Kechadi Venue: Journal of Digital Forensics, Security and Law: Proc. of Sixth International Conference on Digital Forensics & Cyber Crime (ICDF2C 2014) Publication date: 2014/09/01 Contribution summary: This paper presents a methodology for the remote recovery and verification of digital evidence from decentralized file synchronization services, specifically BitTorrent Sync. The authors outline a proof-of-concept implementation and discuss the challenges and opportunities of remote digital evidence retrieval in the context of mobile devices and cloud-based services. Abstract: File synchronization services such as Dropbox, Google Drive, Microsoft OneDrive, Apple iCloud, etc., are becoming increasingly popular in today's always-connected world. A popular alternative to the aforementioned services is BitTorrent Sync. This is a decentralized/cloudless file synchronization service and is gaining significant popularity among Internet users with privacy concerns over where their data is stored and who has the ability to access it. The focus of this paper is the remote recovery of digital evidence pertaining to files identified as being accessed or stored on a suspect's computer or mobile device. A methodology for the identification, investigation, recovery and verification of such remote digital evidence is outlined. Finally, a proof-of-concept remote evidence recovery from BitTorrent Sync shared folder highlighting a number of potential scenarios for the recovery and verification of such evidence." ## BitTorrent Sync: Network Investigation Methodology Canonical page: https://markscanlon.co/publications/BitTorrentSyncNetworkInvestigationMethodology.html DOI: https://doi.org/10.1109/ARES.2014.11 PDF: https://markscanlon.co/publications/BitTorrentSyncNetworkInvestigationMethodology.pdf Authors: Mark Scanlon; Jason Farina; M-Tahar Kechadi Venue: Proceedings of 9th International Conference on Availability, Reliability and Security (ARES 2014) Publication date: 2014/09/01 Contribution summary: This paper presents a network investigation methodology for BitTorrent Sync, a decentralized file replication utility, to aid digital forensic investigations. The authors propose a framework for retrieving digital evidence from the network and provide results from a proof-of-concept investigation. Abstract: The volume of personal information and data most Internet users find themselves amassing is ever increasing and the fast pace of the modern world results in most requiring instant access to their files. Millions of these users turn to cloud based file synchronisation services, such as Dropbox, Microsoft Skydrive, Apple iCloud and Google Drive, to enable “always-on” access to their most up-to-date data from any computer or mobile device with an Internet connection. The prevalence of recent articles covering various invasion of privacy issues and data protection breaches in the media has caused many to review their online security practices with their personal information. To provide an alternative to cloud based file backup and synchronisation, BitTorrent Inc. released an alternative cloudless file backup and synchronisation service, named BitTorrent Sync to alpha testers in April 2013. BitTorrent Sync's popularity rose dramatically throughout 2013, reaching over two million active users by the end of the year. This paper outlines a number of scenarios where the network investigation of the service may prove invaluable as part of a digital forensic investigation. An investigation methodology is proposed outlining the required steps involved in retrieving digital evidence from the network and the results from a proof of concept investigation are presented. ## An analysis of BitTorrent cross-swarm peer participation and geolocational distribution Canonical page: https://markscanlon.co/publications/AnAnalysisOfBitTorrentCrossSwarmPeerParticipation.html DOI: https://doi.org/10.1109/ICCCN.2014.6911846 PDF: https://markscanlon.co/publications/AnAnalysisOfBitTorrentCrossSwarmPeerParticipation.pdf Authors: Mark Scanlon; Huijie Shen Venue: 23rd International Conference on Computer Communication and Networks (ICCCN 2014) Publication date: 2014/09/01 Contribution summary: This paper analyzes BitTorrent cross-swarm peer participation and geolocational distribution. The authors collected 2 terabytes of data from 16 swarms of popular TV shows, identifying 6.3 million distinct IPs. The study found significant cross-swarm participation and geolocational distribution, with Australia, Europe, and North America playing a crucial role in influencing swarm size. The results can aid in network usage prediction, bandwidth provisioning, and future network design. Abstract: Peer-to-Peer (P2P) file-sharing is becoming increasingly popular in recent years. In 2012, it was reported that P2P traffic consumed over 5,374 petabytes per month, which accounted for approximately 20.5% of consumer internet traffic. TV is the popular content type on The Pirate Bay (the world’s largest BitTorrent indexing website). In this paper, an analysis of the swarms of the most popular pirated TV shows is conducted. The purpose of this data gathering exercise is to enumerate the peer distribution at different geolocational levels, to measure the temporal trend of the swarm and to discover the amount of cross-swarm peer participation. Snapshots containing peer related information involved in the unauthorised distribution of this content were collected at a high frequency resulting in a more accurate landscape of the total involvement. The volume of data collected throughout the monitoring of the network exceeded 2 terabytes. The presented analysis and the results presented can aid in network usage prediction, bandwidth provisioning and future network design. ## Digital Evidence Bag Selection for P2P Network Investigation Canonical page: https://markscanlon.co/publications/DigitalEvidenceBagSelectionForP2PNetworkInvestigation.html DOI: https://doi.org/10.1007/978-3-642-40861-8_44 PDF: https://markscanlon.co/publications/DigitalEvidenceBagSelectionForP2PNetworkInvestigation.pdf Authors: Mark Scanlon; M-Tahar Kechadi Venue: Proceedings of the 7th International Symposium on Digital Forensics and Information Security (DFIS-2013), Future Information Technology, Application, and Service Publication date: 2014/07/01 Contribution summary: This paper proposes a new digital evidence bag format for P2P network investigations, addressing the limitations of existing formats in handling network traffic and metadata. The proposed format incorporates network byte streams and on-the-fly metadata generation to expedite identification and analysis. Abstract: The collection and handling of court admissible evidence is a fundamental component of any digital forensic investigation. While the procedures for handling digital evidence take much of their influence from the established policies for the collection of physical evidence, due to the obvious differences in dealing with non-physical evidence, a number of extra policies and procedures are required. This paper compares and contrasts some of the existing digital evidence formats or ”bags” and analyses them for their compatibility with evidence gathered from a network source. A new digital extended evidence bag is proposed to specifically deal with evidence gathered from P2P networks, incorporating the network byte stream and on-the-fly metadata generation to aid in expedited identification and analysis. ## The Case for a Collaborative Universal Peer-to-Peer Botnet Investigation Framework Canonical page: https://markscanlon.co/publications/TheCaseForACollaborativeUniversalP2PBotnetInvestigationFramework.html PDF: https://markscanlon.co/publications/TheCaseForACollaborativeUniversalP2PBotnetInvestigationFramework.pdf Authors: Mark Scanlon; M-Tahar Kechadi Venue: Proceedings of the 9th International Conference on Cyber Warfare and Security (ICCWS 2014) Publication date: 2014/03/01 Contribution summary: This paper proposes a collaborative universal peer-to-peer botnet investigation framework to fast-track the investigative process through collaboration between key stakeholders. The framework exploits common attributes of P2P networks, including intra-peer communication, self-propagation, and node maintenance, to identify and record communication patterns. This enables the elimination of duplicated work by forensic investigators and facilitates the investigation of any known botnet and adaptation to new networks. Abstract: Peer-to-Peer (P2P) botnets are becoming widely used as a low-overhead, efficient, self-maintaining, distributed alternative to the traditional client/server model across a broad range of cyberattacks. These cyberattacks can take the form of distributed denial of service attacks, authentication cracking, spamming, cyberwarfare or malware distribution targeting on financial systems. These attacks can also cross over into the physical world attacking critical infrastructure causing its disruption or destruction (power, communications, water, etc.). P2P technology lends itself well to being exploited for such malicious purposes due to the minimal setup, running and maintenance costs involved in executing a globally orchestrated attack, alongside the perceived additional layer of anonymity. In the ever-evolving space of botnet technology, reducing the time lag between discovering a newly developed or updated botnet system and gaining the ability to mitigate against it is paramount. Often, numerous investigative bodies duplicate their efforts in creating bespoke tools to combat particular threats. This paper outlines a framework capable of fast tracking the investigative process through collaboration between key stakeholders. ## BitTorrent Sync: First Impressions and Digital Forensic Implications Canonical page: https://markscanlon.co/publications/BitTorrentSyncFirstImpressionsAndDigitalForensicImplications.html DOI: https://doi.org/10.1016/j.diin.2014.03.010 PDF: https://markscanlon.co/publications/BitTorrentSyncFirstImpressionsAndDigitalForensicImplications.pdf Authors: Jason Farina; Mark Scanlon; M-Tahar Kechadi Venue: Digital Investigation Publication date: 2014/03/01 Contribution summary: This paper presents a forensic analysis of BitTorrent Sync, a decentralized file synchronization service, and its implications for digital investigations. The authors examine the client application, network traffic, and artefacts created during installation and use, providing valuable insights for digital forensic investigators. Abstract: With professional and home Internet users becoming increasingly concerned with data protection and privacy, the privacy afforded by popular cloud file synchronisation services, such as Dropbox, OneDrive and Google Drive, is coming under scrutiny in the press. A number of these services have recently been reported as sharing information with governmental security agencies without warrants. BitTorrent Sync is seen as an alternative by many and has gathered over two million users by December 2013 (doubling since the previous month). The service is completely decentralised, offers much of the same synchronisation functionality of cloud powered services and utilises encryption for data transmission (and optionally for remote storage). The importance of understanding BitTorrent Sync and its resulting digital investigative implications for law enforcement and forensic investigators will be paramount to future investigations. This paper outlines the client application, its detected network traffic and identifies artefacts that may be of value as evidence for future digital investigations. ## Study of Peer-to-Peer Network Based Cybercrime Investigation: Application on Botnet Technologies Canonical page: https://markscanlon.co/publications/StudyOfPeer-to-PeerNetworkBasedCybercrimeInvestigation.html PDF: https://markscanlon.co/publications/StudyOfPeer-to-PeerNetworkBasedCybercrimeInvestigation.pdf Authors: Mark Scanlon Venue: PhD Thesis Publication date: 2013/10/01 Contribution summary: This PhD thesis explores the investigation of Peer-to-Peer (P2P) networks, which are vulnerable to cybercrimes such as botnet propagation and malware distribution. The Universal P2P Network Investigation Framework (UP2PNIF) is introduced to facilitate faster and more efficient investigations of P2P networks. Abstract: The scalable, low overhead attributes of Peer-to-Peer (P2P) Internet protocols and networks lend themselves well to being exploited by criminals to execute a large range of cybercrimes. The types of crimes aided by P2P technology include copyright infringement, sharing of illicit images of children, fraud, hacking/cracking, denial of service attacks and virus/malware propagation through the use of a variety of worms, botnets, malware, viruses and P2P file sharing. This project is focused on study of active P2P nodes along with the analysis of the undocumented communication methods employed in many of these large unstructured networks. This is achieved through the design and implementation of an efficient P2P monitoring and crawling toolset. The requirement for investigating P2P based systems is not limited to the more obvious cybercrimes listed above, as many legitimate P2P based applications may also be pertinent to a digital forensic investigation, e.g, voice over IP, instant messaging, etc. Investigating these networks has become increasingly difficult due to the broad range of network topologies and the ever increasing and evolving range of P2P based applications. In this work we introduce the Universal P2P Network Investigation Framework (UP2PNIF), a framework which enables significantly faster and less labour intensive investigation of newly discovered P2P networks through the exploitation of the commonalities in P2P network functionality. In combination with a reference database of known network characteristics, it is envisioned that any known P2P network can be instantly investigated using the framework, which can intelligently determine the best investigation methodology and greatly expedite the evidence gathering process. A proof of concept tool was developed for conducting investigations on the BitTorrent network. A number of investigations conducted using this tool are outlined in Chapter 6. ## Universal Peer-to-Peer Network Investigation Framework Canonical page: https://markscanlon.co/publications/UniversalPeerToPeerNetworkInvestigationFramework.html DOI: https://doi.org/10.1109/ARES.2013.91 PDF: https://markscanlon.co/publications/UniversalPeerToPeerNetworkInvestigationFramework.pdf Authors: Mark Scanlon; M-Tahar Kechadi Venue: Availability, Reliability and Security (ARES), 2013 Eighth International Conference on Publication date: 2013/09/01 Contribution summary: This paper introduces the Universal Peer-to-Peer Network Investigation Framework (UP2PNIF), a tool for investigating P2P networks. The framework exploits common attributes of P2P networks to enable faster and less labor-intensive investigations. It can be used for various investigation types, including evidence collection, anatomy, wide-area measurement, and takeover. Abstract: Peer-to-Peer (P2P) networking has fast become a useful technological advancement for a vast range of cybercriminal activities. Cybercrimes from copyright infringement and spamming, to serious, high financial impact crimes, such as fraud, distributed denial of service attacks (DDoS) and phishing can all be aided by applications and systems based on the technology. The requirement for investigating P2P based systems is not limited to the more well known cybercrimes listed above, as many more legitimate P2P based applications may also be pertinent to a digital forensic investigation, e.g, VoIP and instant messaging communications, etc. Investigating these networks has become increasingly difficult due to the broad range of network topologies and the ever increasing and evolving range of P2P based applications. This paper introduces the Universal Peer-to-Peer Network Investigation Framework (UP2PNIF); a framework which enables significantly faster and less labour intensive investigation of newly discovered P2P networks through the exploitation of the commonalities in network functionality. In combination with a reference database of known network protocols and characteristics, it is envisioned that any known P2P network can be instantly investigated using the framework. The framework can intelligently determine the best methodology dependant on the focus of the investigation resulting in a significantly expedited evidence gathering process. ## Investigating Cybercrimes That Occur on Documented P2P Networks Canonical page: https://markscanlon.co/publications/InvestigatingCybercrimesThatOccurOnDocumentedP2PNetworks2013.html DOI: https://doi.org/10.4018/978-1-4666-2041-4.ch010 PDF: https://markscanlon.co/publications/InvestigatingCybercrimesThatOccurOnDocumentedP2PNetworks2013.pdf Authors: Mark Scanlon; Alan Hannaway; Tahar Kechadi Venue: Pervasive and Ubiquitous Technology Innovations for Ambient Intelligence Environments Publication date: 2013/09/01 Contribution summary: This paper presents a methodology for investigating cybercrimes on documented P2P networks, specifically BitTorrent, by analyzing the top 100 most popular swarms over a one-week period. The investigation aims to quantify the scale of unauthorized distribution of copyrighted material and identify the geographical distribution of peers involved. Abstract: The popularity of Peer-to-Peer (P2P) Internet communication technologies being exploited to aid cybercrime is ever increasing. P2P systems can be used or exploited to aid in the execution of a large number of online criminal activity, e.g., copyright infringement, fraud, malware and virus distribution, botnet creation and control, etc. P2P technology is perhaps most famous for the unauthorised distribution of copyrighted materials since the late 1990’s, with the popularity of file-sharing programs, such as Napster, etc. In 2004, P2P traffic was accounted for 80% of all Internet traffic and in 2005, specifically BitTorrent traffic accounted for over 60% of the world’s P2P bandwidth usage. This paper outlines a methodology for investigating a documented P2P network, BitTorrent, using a sample investigation for reference throughout. The sample investigation outlined was conducted on the top 100 most popular BitTorrent swarms over the course of a one week period. ## Peer-to-Peer Botnet Investigation: A Review Canonical page: https://markscanlon.co/publications/P2PBotnetInvestigationAReview.html DOI: https://doi.org/10.1007/978-94-007-5064-7_33 PDF: https://markscanlon.co/publications/P2PBotnetInvestigationAReview.pdf Authors: Mark Scanlon; M-Tahar Kechadi Venue: Proceedings of the 6th International Symposium on Digital Forensics and Information Security (DFIS-2012), Future Information Technology, Application, and Service," Publication date: 2012/09/01 Contribution summary: This paper reviews the state-of-the-art in Peer-to-Peer (P2P) botnet investigation, highlighting the challenges and obstacles faced by investigators. It discusses the evolution of botnet design from traditional client/server to decentralized P2P networks, and the implications for investigation and takedown. The paper outlines three main approaches to P2P botnet investigation and presents case studies of the Nugache, Storm, and Waledec botnets. ## Investigating Cybercrimes That Occur on Documented P2P Networks Canonical page: https://markscanlon.co/publications/InvestigatingCybercrimesThatOccurOnDocumentedP2PNetworks.html DOI: https://doi.org/10.4018/jaci.2011040104 PDF: https://markscanlon.co/publications/InvestigatingCybercrimesThatOccurOnDocumentedP2PNetworks.pdf Authors: Mark Scanlon; Alan Hannaway; M-Tahar Kechadi Venue: International Journal of Ambient Computing and Intelligence Publication date: 2011/04/01 Contribution summary: This paper presents a methodology for investigating cybercrimes on documented P2P networks, specifically BitTorrent, by analyzing the top 100 most popular swarms over a one-week period. The investigation aims to quantify the scale of unauthorized distribution of copyrighted material through BitTorrent. Abstract: The popularity of Peer-to-Peer (P2P) Internet communication technologies being exploited to aid cybercrime is ever increasing. P2P systems can be used or exploited to aid in the execution of a large number of online criminal activity, e.g., copyright infringement, fraud, malware and virus distribution, botnet creation and control, etc. P2P technology is perhaps most famous for the unauthorised distribution of copyrighted materials since the late 1990’s, with the popularity of file-sharing programs, such as Napster, etc. In 2004, P2P traffic was accounted for 80% of all Internet traffic and in 2005, specifically BitTorrent traffic accounted for over 60% of the world’s P2P bandwidth usage. This paper outlines a methodology for investigating a documented P2P network, BitTorrent, using a sample investigation for reference throughout. The sample investigation outlined was conducted on the top 100 most popular BitTorrent swarms over the course of a one week period. ## A week in the Life of the Most Popular BitTorrent Swarms Canonical page: https://markscanlon.co/publications/AWeekInTheLifeOfTheMostPopularBitTorrentSwarms.html PDF: https://markscanlon.co/publications/AWeekInTheLifeOfTheMostPopularBitTorrentSwarms.pdf Authors: Mark Scanlon; Alan Hannaway; M-Tahar Kechadi Venue: Proceedings of the 5th Annual Symposium on Information Assurance (ASIA 2010) Publication date: 2010/06/01 Contribution summary: This paper presents an analysis of the most popular BitTorrent swarms over a week, focusing on the scale of unauthorized distribution of copyrighted material. The investigation collected data on 8,489,287 unique IP addresses, with 50.6% of files split into smaller chunks for distribution. The results show a global distribution of peers, with the US, UK, India, and Canada being the top countries detected. Abstract: Peer-to-Peer (P2P) file-sharing is becoming increasingly popular in recent years. In 2012, it was reported that P2P traffic consumed over 5,374 petabytes per month, which accounted for approximately 20.5% of consumer internet traffic. TV is the popular content type on The Pirate Bay (the world's largest BitTorrent indexing website). In this paper, an analysis of the swarms of the most popular pirated TV shows is conducted. The purpose of this data gathering exercise is to enumerate the peer distribution at different geolocational levels, to measure the temporal trend of the swarm and to discover the amount of cross-swarm peer participation. Snapshots containing peer related information involved in the unauthorised distribution of this content were collected at a high frequency resulting in a more accurate landscape of the total involvement. The volume of data collected throughout the monitoring of the network exceeded 2 terabytes. The presented analysis and the results presented can aid in network usage prediction, bandwidth provisioning and future network design. ## Online Acquisition of Digital Forensic Evidence Canonical page: https://markscanlon.co/publications/OnlineAcquisitionOfDigitalForensicEvidence.html DOI: https://doi.org/10.1007/978-3-642-11534-9_12 PDF: https://markscanlon.co/publications/OnlineAcquisitionOfDigitalForensicEvidence.pdf Authors: Mark Scanlon; M-Tahar Kechadi Venue: Proceedings of International Conference on Digital Forensics and Cyber Crime (ICDF2C 2009) Publication date: 2009/09/01 Contribution summary: This paper introduces RAFT, a remote forensic hard drive imaging tool designed to reduce the time wasted by forensic investigators in collecting digital evidence. RAFT enables law enforcement officers to remotely transfer images of suspect computers to a forensic laboratory for analysis, ensuring court-admissible evidence through secure and verifiable client/server imaging architecture. Abstract: Providing the ability to any law enforcement officer to remotely transfer an image from any suspect computer directly to a forensic laboratory for analysis, can only help to greatly reduce the time wasted by forensic investigators in conducting on-site collection of computer equipment. RAFT (Remote Acquisition Forensic Tool) is a system designed to facilitate forensic investigators by remotely gathering digital evidence. This is achieved through the implementation of a secure, verifiable client/server imaging architecture. The RAFT system is designed to be relatively easy to use, requiring minimal technical knowledge on behalf of the user. One of the key focuses of RAFT is to ensure that the evidence it gathers remotely is court admissible. This is achieved by ensuring that the image taken using RAFT is verified to be identical to the original evidence on a suspect computer. ## Enabling the Remote Acquisition of Digital Forensic Evidence through Secure Data Transmission and Verification Canonical page: https://markscanlon.co/publications/EnablingRemoteEvidenceAcquisition.html PDF: https://markscanlon.co/publications/EnablingRemoteEvidenceAcquisition.pdf Authors: Mark Scanlon Venue: MSc Thesis Publication date: 2009/09/01 Contribution summary: This thesis presents RAFT, a system for remote acquisition of digital forensic evidence through secure data transmission and verification. RAFT enables law enforcement officers to transfer images from suspect computers to forensic labs for analysis, reducing investigation time and ensuring court-admissible evidence. Abstract: Providing the ability to any law enforcement officer to remotely transfer an image from any suspect computer directly to a forensic laboratory for analysis, can only help to greatly reduce the time wasted by forensic investigators in conducting on-site collection of computer equipment. RAFT (Remote Acquisition Forensic Tool) is a system designed to facilitate forensic investigators by remotely gathering digital evidence. This is achieved through the implementation of a secure, verifiable client/server imaging architecture. The RAFT system is designed to be relatively easy to use, requiring minimal technical knowledge on behalf of the user. One of the key focuses of RAFT is to ensure that the evidence it gathers remotely is court admissible. This is achieved by ensuring that the image taken using RAFT is verified to be identical to the original evidence on a suspect computer.