Education and training in digital forensics requires a variety of suitable challenge corpora containing realistic features including regular wear-and-tear, background noise, and the actual digital traces to be discovered during investigation. Typically, the creation of these challenges requires overly arduous effort on behalf of the educator to ensure their viability. Once created, the challenge image needs to be stored and distributed to a class for practical training. This storage and distribution step requires significant resources and time and may not even be possible in an online/distance learning scenario due to the data sizes involved. As part of this project, we introduce a more capable methodology and system to current approaches. EviPlant is a system designed for the efficient creation, manipulation, storage and distribution of challenges for digital forensics education and training. The system relies on the initial distribution of base disk images, i.e., images containing solely bare operating systems. In order to create challenges for students, educators can boot the base system, emulate the desired activity and perform a diffing of resultant image and the base image. This diffing process extracts the modified artefacts and associated metadata and stores them in an evidence package. Evidence packages can be created for different personas, different wear-and-tear, different emulated crimes, etc., and multiple evidence packages can be distributed to students and integrated with the base images.
With an increasing emphasis on digital evidence in criminal prosecution, law enforcement is encountering a corresponding rise in the number of cases requiring expert digital forensic analysis. The sheer volume of data to be processed in each case has also significantly increased in the same time period. As a result, the requirement for more efficient digital forensic investigation has ballooned, and law enforcement agencies throughout the world are buckling under the overwhelming stress. While an increasing number of law enforcement personnel are being trained to perform the required investigations, the supply of highly skilled staff is not meeting the demand. Consequently, large, unprocessed digital forensic backlogs, reaching 12-24 months or more, are commonplace in Europe and throughout the world.
This project introduces a novel solution to replace much of the current, overly-arduous digital forensic process to combat this backlog. The project leverages a Digital Forensics-as-a-Service paradigm based on data deduplication. This aims to eliminate the reacquisition, redundant storage, and reanalysis of previously processed data. Data deduplication involves the deployment of a centralised storage system, whereby each artefact added exists only once. As a result, previously stored artefacts can be eliminated at the acquisition stage. Metadata surrounding the artefact’s file name, modification dates, and physical disk address will still be recorded and used at any point to reconstruct a verifiable copy of the original data, e.g., hard drive, mobile device, etc.
Moving to a deduplicated model provides significant advantages over the current approach, including reduced costs, higher throughput, faster processing, among many others. In the de- velopment of such a system, a number of independently useful milestones emerge, including digital forensic challenge creation for education & training, forensic tool testing & validation, and automated investigation.