"comeforth" -- Scan, view, and assemble raw filesystem blocks.
Copyright (c) 2003-2004 Danamis Associates (http://danamis.com).

================================================================================
CONTENTS

LEGAL
DESCRIPTION
NOTES
USAGE
EXAMPLES

================================================================================
LEGAL

This library is free software; you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License as published by the Free
Software Foundation; either version 2.1 of the License, or (at your option) any
later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along
with this library; if not, write to the Free Software Foundation, Inc., 59
Temple Place, Suite 330, Boston, MA 02111-1307 USA

================================================================================
DESCRIPTION

Parse raw filesystem blocks, or block image data produced by "dls", found in the
Sleuth Kit from www.atstake.com. This was inspired by lazarus
(www.porcupine.org/forensics) but provides a bit more flexibility for processing
very large data sets.

Blocks of certain file types or matching certain regular expressions are first
found and saved in a scan phase.

After scanning, blocks that have been saved can be viewed, and based on their
contents files can be reassembled from various other blocks. An auto-assemble
feature is provided which can reassemble a complete file in many cases, knowing
only the first block in the file (only for ext2/ext3 filesystems).

================================================================================
NOTES

- You'll need Tcl 8.4 compiled with 64 bit support if you want to work on more
  than 2GB.
- You need to be on a somewhat ANSI-compatible terminal (for reverse video).
- This is tested mainly on Linux; everything SHOULD work with Windows, but
  Cygwin would be required (www.cygwin.com), or at least a Unix-ish "file"
  command in your path.
- In this version, reverse video escapes are COMPLETELY DISABLED on Windows.
- At least a 132x40 terminal is necessary for 4K blocks.
- If you want to use a file containing multiple regexes, it's best to append all
  of them into a single long one with backslash at the end of each line. This is
  because each regex has to be compiled before it's used (see EXAMPLES).

================================================================================
USAGE

comeforth [-noscan] [-width <width>] [-skip <block>]

Where:
    -noscan = don't perform scan of data, go directly into file assembly mode.
    <width> = width of terminal; default is 132.
    <block> = starting block, whether scanning or assembling; default is 1.

This is designed to be an interactive program. Simply follow the prompts as it
runs. Following are some comments about each prompt.

1. Data file
   - Default will be the first file it finds in the current dir matching the
     pattern "*.dls"; this would be a file created by "dls" from the Sleuth Kit
     at www.atstake.com.
   - You may also specify an unmounted partition device.
   
2. Data block size
    - Default is 4K, which works for most modern filesystems.
    
3. Recovery directory
    - This is the subdir in which matching block files and assembled files will
      be stored.
    - Default is "recov".
    
4. File type regex
    - Leave this blank to specify a "Custom data regex" (see next prompt).
    - This is used to match the output from the "file" command for each block
      being scanned. E.g. to find blocks that start Microsoft Office files, you
      could enter "Microsoft.+Document$".
    - Use "< file" to specify a file with 1 regex per line; long lines can be
      split with a "\" at the very end.
    - Tcl-style backslash substitution is performed on this input, whether from
      a file or entered directly. This allows you to specify binary or other
      special non-ASCII characters, but this also means YOU MUST DOUBLE ANY
      LITERAL "\" CHARACTERS.
      
    1. Block work dir
        - This is the work area where temporary files will be stored while
          running the system "file" command.
      
5. Custom data regex
    - To get this prompt, you must leave "File type regex" blank.
    - This is used to match the actual data within each block being scanned.
      E.g. to find blocks that start JPEG image files, you could enter
      "^......JFIF".
    - Tcl-style backslash substitution is performed on this input. This allows
      you to specify binary or other special non-ASCII characters, but this also
      means YOU MUST DOUBLE ANY LITERAL "\" CHARACTERS.
      
    1. Number of bytes to check
        - This is the number of bytes within each block against which the regex
          should be applied.
        - Use this if you know the size of the signature for the blocks you want
          to find.
          
    2. Offset of bytes to check
        - This is the offset of the bytes within each block against which the
          regex should be applied.
        - This setting and the "Number of bytes to check" allow you to focus on
          any particular section within a block to apply your regex to.
        - E.g. if you know the signature of the blocks you want starts 37 bytes
          into the block and is 8 bytes long, you can enter 8 for "Number of
          bytes to check" and 37 here.

6. Progress indicator block interval
    - While scanning, a progress indicator will display, giving you an estimate
      of how much longer the scan will take. This value determines how often the
      progress display updates.
    - A reasonable default is calculated automatically.

7. Start at block
    - This value lets you start scanning where you want.
    - This is useful if you stopped a previous scan and want to continue.
    
8. Inspect and assemble files?
    - This appears after scanning is complete.
    
9. Inspect/Assemble Blocks Phase
    - This is an interactive phase that allows you to view blocks that were
      found during the scan phase.
    - There are various subcommands that help you navigate around blocks, and
      assemble blocks into files.
    - To assemble a file, blocks are written in sequence from the start block to
      the current block, excluding any blocks you have specified in between.
    - Following are details of each subcommand in this phase.
    
        1. n ... go to next block.
        2. p ... go to previous block.
        3. +n
           -n ... advance forward or back n blocks; "n" here can be an
                  expression, like "1037-848".
        4. =n ... advance to the exact block; "n" can be an expression.
        5. e ... exclude the current block from the possible blocks that will
                 make up the next file.
        6. e! ... clear all blocks in the current exclude list.
        7. r ... make current block the start block for the file to assemble.
        8. t ... write a test file, to allow you to check its validity but
                 keep the same parameters to try again if you need to; you may
                 overwrite an existing file here, and the last path you wrote to
                 is remembered.
        9. w ... write the final file, clearing the current parameters, marking
                 the current block done, and advancing to the next block found
                 in the scan phase; you cannot overwrite an existing file here.
        10. m ... mark this block done without writing anything.
        11. s ... skip this block, leaving it to process later.
        12. a ... auto-assemble; this will only work on ext2 or ext3
                  filesystems, and the current block must be the first block of
                  the file, and the blocks making up the file must be packed
                  close together (at least the first 13 blocks need to be
                  clustered together). Usually files that have been created all
                  at once on a near-empty filesystem and remain static will fit
                  these criteria.
        13. q ... quit; you may run this phase again if there are any scanned
                  blocks that have not been marked or written.
        14. ; ... separator between multiple commands together on one line; the
                  output will appear just as if you had entered the commands
                  separately.
    
================================================================================
EXAMPLES

1. Sample custom regexes for raw data of particular file types.

    - Following are a few regexes you can use for "Custom data regex" values for
      particular types of files. Type only what is inside quotes (" ").
    
    JPEG ... "***:(?i)^......(jf)|(ex)if" (Num. bytes to check=10, Offset=0)
    GIF .... "^GIF8[97]" (Num. bytes to check=5, Offset=0)
    MP3 .... "^(ID3)|(\xff\xfb\x90.\0)" (Num. bytes to check=5, Offset=0)
             NOTE: If the file does NOT have an ID3v2 tag, then the raw data
             will be matched by this regex iff the MP3 file is
             128 kbits, 44.1 kHz, Joint-Stereo.
             
    - Following are a few regexes you can use for "File type regex" values for
      particular types of files. Type only what is inside quotes (" ").
      
    JPEG ... "***:(?i)jpeg"
    GIF .... "***:(?i)gif"
    MP3 .... "***:(?i)mp3"  NOTE: Beware of false positives here!
    
2. Sample regexes file for "File type regex" values.

    - If one needed to scan a large data set for JPEG's, GIF's, and MP3's, set
      up a file with the following lines, name it "myregexes", then enter
      "< /path/to/myregexes" at the "File type regex" prompt.
      
        ***:(?i)\
        (jpeg)|\
        (gif)|\
        (mp3)
    
    - Notice how there is really just one regex, continued with "\" at the end
      of each line. This will perform faster than dividing it into separate
      regexes.

3. Sample regexes file for "Custom data regex" values.

    - Here are the lines that would combine the custom regexes above into a
      single file that would find JPEG's, GIF's, and MP3's.
      
        # Case-insensitive regexes
        ***:(?i)^......(jf)|(ex)if
        
        # Normal case-sensitive regexes
        (^GIF8[97])|\
        (^(ID3)|(\xff\xfb\x90.\0))
    
    - First notice that this file also has comments and blank lines; they are
      ignored.
    - In this case, there are 2 regexes; the 1st is line 2, and the 2nd is lines
      5 and 6. Always try to reduce the total number of regexes to as small as
      possible for speed.
