Revisiting REAPER: Automating digital forensic investigations

4 minute read

The Rapid Evidence Acquisition Project for Event Reconstruction [1] was one of the first projects that I worked on during my PhD. It started around 2008, when I got interested in trying to completely automate digital forensic investigations. Yes, it sounds impossible, but I wanted to see how far we could automatically handle digital evidence.

This was a little before digital forensic triage [2] and preliminary analysis gained popularity.

The idea was that once the process started, the investigator would not need to interact with the system. At the end of the automated investigation process, the "smoking gun" would be presented to the investigator in context.

Literally push-button forensics.

The Process
An investigator would insert a forensic live CD into the suspect's computer (single mortem). After starting the computer, the live CD (with attached external disk) would provide only an information panel to see the stage of the investigation process.

First, REAPER would check the suspect computer to see what disks it could access, and if there was encryption / hidden data. If hidden / encrypted data was detected, it would try to recover / access the data. With toy examples, this worked, but how it would work on real systems - especially now - I'm not sure. All detectable media would be hashed, and verbose logging was on by default (for every action).

Next, all detectable media would be automatically imaged to the investigator's external disk. Once complete, the images would be verified. If verification failed, the disk would be re-imaged.
 
Next, REAPER would start standard carving, parsing and indexing. The Open Computer Forensic Architecture was used to extract as much data as possible. OCFA is an extremely powerful architecture, but the open source version is a bit difficult to use (especially from a live CD). I understand that the NFI has a commercial front-end that makes working with it much easier.

Once all data has been acquired, verified and processed, the actual investigation / analysis should take place.

Here is where things get tricky.

First, we have to know what the investigation question is, and we have to 'tell' the system what the investigation question is. We currently do this by specifying the type of investigation generally. For example, "hacking" or "child exploitation". We then have a (manually) pre-set list of tasks related to those particular types of crimes. Either that or we could search for 'all crimes'.

Here, some basic analysis could take place. For example, we could automatically determine attack paths of intrusions based on processed data [3]. We could also test whether it was possible / impossible for a certain statement to be true based on the current state of the system [4]. Also, by building up 'knowledge' (models) about systems before an investigation, we could also accurately, automatically determine user actions using traces that are difficult for humans to analyze [5].

Where it falls apart
The problem is, we are still essentially in the processing phase of the investigation. We are condensing the available information into a useable form, but we are not yet saying what this information means in the context of the investigation. While we can gain more information about the data in an automated way, a human still needs to 'make sense' of the information.

Even though we are not there yet, automation has been shown to be useful for investigations [6], and can help reduce the time for investigations while improving the accuracy [7] of the investigation. For more comments on automation in investigations, please see [8].


  1. James, J. I., Koopmans, M., & Gladyshev, P. (2011). Rapid Evidence Acquisition Project for Event Reconstruction. In The Sleuth Kit & Open Source Digital Forensics Conference. McLean, VA: Basis Technology. Retrieved from http://www.basistech.com/about-us/events/open-source-forensics-conference/2011/presentations/ 
  2. Koopmans, M. B., & James, J. I. (2013). Automated network triage. Digital Investigation, 1–9. http://doi.org/10.1016/j.diin.2013.03.002
  3. Shosha, A. F., James, J. I., & Gladyshev, P. (2012). A novel methodology for malware intrusion attack path reconstruction. In P. Gladyshev & M. K. Rogers (Eds.), Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (Vol. 88 LNICST, pp. 131–140). Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-642-35515-8_11
  4. James, J., Gladyshev, P., Abdullah, M. T., & Zhu, Y. (2010). Analysis of Evidence Using Formal Event Reconstruction. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (pp. 85–98). Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-642-11534-9_9
  5. James, J. I., & Gladyshev, P. (2014). Automated inference of past action instances in digital investigations. International Journal of Information Security. http://doi.org/10.1007/s10207-014-0249-6
  6. James, J. I., & Gladyshev, P. (2013). A survey of digital forensic investigator decision processes and measurement of decisions based on enhanced preview. Digital Investigation, 10(2), 148–157. http://doi.org/10.1016/j.diin.2013.04.005
  7. James, J. I., Lopez-Fernandez, A., & Gladyhsev, P. (2014). Measuring Accuracy of Automated Parsing and Categorization Tools and Processes in Digital Investigations. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (pp. 147–169). Springer International Publishing. http://doi.org/10.1007/978-3-319-14289-0_11
  8. James, J. I., & Gladyshev, P. (2013). Challenges with Automation in Digital Forensic Investigations, 17. Computers and Society. Retrieved from http://arxiv.org/abs/1303.4498







Leave a Comment