#
# April 30, 1998 Version 1.0
#
This is the NSWC Cooperative Intrusion Detection Package. I am the author.
I am an experienced UNIX person, but as you can tell from the contents of
the package, an inexperienced software developer. Actually, the software works
here just fine, but packaging and nicefying it for public distribution is 
proceeding slowly as I learn.

The system works at NSWC on 2 computers: a sensor placed outside our firewall
running the scripts and required binaries in the sensor subdirectory of this
package. The sensor runs tcpdump and stores the gzipped raw files. A crontab
stops and restarts the tcpdump to generate hourly files. Our sensors are 
currently either old Sun workstations running SunOS 4.1.3 or new Pentium class
machines running RedHat Linux 5.0.

The first level analyzer machine, currently a Pentium-PRO running RH Linux 5.0,
fetches each hourly gzipped tcpdump file from the sensor using SSH. It then
performs "first level" analysis on the hourly files using a set of tcpdump
filters from the subdirectory "filters." The results of this first level scan
is posted to a web server page for examination by an experienced network
intrusion detection expert. The raw file is translated to another format and
shipped to our secondary analysis experts. The scripts in the logger directory
take care of the fetching, netting, translating, and cleaning of the tcpdump
files on the first level analyzer. The scripts were written in Perl 5:

fetchem.pl - fetches the tcpdump file from the sensor, analyzes it, writes
             results to the web page, calles translate.pl to ship it out
             for secondary analysis.
cleanup.pl - Insures that the sensor does not run out of disk space by
             removing files over 2 {enter days here} old.
consolidate.pl - Combines the hourly tcpdump gzipped files into a whole
             days worth, and then separates it into tcp, udp, icmp, and
             everything else.
dir-it.pl  - A Perl script to create the home page of the results.
one_day_pat.pl - A Perl script to search a day's tcpdump files for 
             a tcpdump specific pattern on the command line.
one_day_script.pl - A Perl script to search a day's tcpdump files using a 
             pre-defined tcpdump filter from the filters subdirectory.
pat_match.pl - A Perl script to search a particular hour's tcpdump raw file
             for a pattern specified on the command line.

I confess that I am new to Perl, and make some of the bonehead mistakes 
documented in the camel book. I know that the scripts can be cleaned up,
optimized, and generalized. I am working on them using suggestions from those
who have downloaded the package. They will improve with time.

Bill Ralph wralph@nswc.navy.mil

-------------------------------------------------------

July 1, 1998 - Version 1.2

Several changes have been made to the package. The script fetchem.pl has been
modified to change the look of the web page, including a new framed look with
hyperlinks to each hour and several cgi scripts on the top. The cgi scripts 
referenced in the fetchem.pl are included in the cgi-bin directory. New filters
courtesy of Vicki Irwin have been included in the filters directory. Small bug
fixes to the scripts have been included.

-------------------------------------------------------

July 21, 1998 - Version 1.3

More changes, mostly to the fetchem.pl script. Filters created by Vicki Irwin
have been added to the filters directory. Some of her filters got large enough
that fetchem had to do them in a loop since tcpdump ran out of parse memory.
Also added is a script to sort findings by IP for the web page. Finally, 
another script is called from fetchem.pl to examine an hour's traffic, and
print a list of IP addresses that scan a built-in (currently set to 5) number
of destination addresses at your site. 

The accessories directory contains the accessory applications needed to make
this a complete package: tcpdump, libpcap, tcpslice, apache, and ssh.

One change was make to the sensor scripts start_tcpdump_logger.sh and 
stop_tcpdump_logger.sh to save the stderr file from tcpdump in the LOG 
directory. That way, by examining the tcpdump.err file, you can see how
man packets are being misses by your sensor. If the number gets too high,
it probably means that your sensor needs more horsepower for your traffic.
 
A new directory, httpd, was created. It contains the cgi scripts, images,
and apache configuration files used for the initial CID site.

-------------------------------------------------------

July 28, 1998 - Version 1.3a

A very nice communication from Olav Kolbu from the University of Oslo 
pointed out some serious security problems with the cgi-bin scripts included
in the CID-1.3 package. This version corrects those problems by verifying
that the data input from cgi generated forms meets certain expected
characteristics, e.g. a domain name contains only those characters legal
for domain names, and that an unnecessary shell is not spawned to process
the input. Our sincere thanks to Olav.

If you plan to put this package on a publicly available web server, please
EXAMINE the cgi scripts CAREFULLY and seek information about cgi security
from:
      http://www.w3.org/Security/Faq/www-security-faq.html
      http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt

-------------------------------------------------------

September 8, 1998 - Version 1.4

The principal scripts in the logger directory have been changed to obtain
the site specific information, such as path names, machine names, etc.,
from a perl header file: ${SITE}.ph. Each script requires the  "-l SITE"
parameter on the calling line. Upon execution, the script goes to the sites
subdirectory and includes the {$SITE}.ph file. I have included a GENERIC.ph
file in this distribution which contains all the site specific defines 
required by all the scripts. It's not as perfect as it could be, but I hope
this cleans up the lingering complaints/suggestions from reviewers of the 
package:

Andy Kutner <akutner@lbl.gov>, Matt Crawford <crawdad@fnal.gov>, 
Adam Shostack <adam@mail.netect.com>, Pedro A M Vazquez 
<vazquez@IQM.Unicamp.BR>, Olav Kolbu <olav.kolbu@usit.uio.no>, Bennett Todd 
<bet@mordor.net>, "Mark H. Levine" <yba@polytronics.com>, and Brian Utterback
<Brian.Utterback@East.Sun.COM>.

Note that each script and each {Site}.ph header file contains:

$CID_PATH = "/usr/local/logger";

To relocate the base directory of the scripts will require modifying the 
$CID_PATH in both the XXXX.pl and XXXX.ph files.

This version of the scripts contains a rework of several of the scripts,
(resolve_hostnames.pl, sort_by_source_ip_then_time, and find_scan.pl), to
shorten and better optimize them. At our site, significant time and CPU
resources were consumed by inefficencies in these scripts.

The filters directory has been split into subdirectories for each
"site." If one analysis station collects data from several site sensors, 
each site's filters are collected together in a separate subdirectory. The 
path to that subdirectory is defined in the header file ${SITE}.ph as: 
$FILTER_DIR="$CID_PATH/filters/$SITE";. Of course, it can be customized
to suit your needs. The fetchem.pl script expects the filter files to 
be all files in the $FILTER_DIR with a suffix of ".filter." The script 
find_scan.pl scans through the raw hourly data file looking for external 
machines that access a minimum number of machines in your purview, 
(set to $SCAN_THRESHHOLD = "5"; in {Site}.ph). The script expects a 
filter named "$FILTER_DIR/filter.getall" which defines the set of 
source IP addresses that you want to eliminate from being considered
as possible scanners.

-------------------------------------------------------

September 21, 1998 - Version 1.4a

Several bugfixes to the scripts: fetchem.pl, cleanup.pl, consolidate.pl,
and Site.ph were made to correct typos and errors. A new script was added to
find the network scans: look4scans.pl. This script combs a day's compressed
tcpdump files looking for a host sending (e.g. TCP packets with the RESET
bit set) packets to multiple hosts within the same domain, in an effort to 
scan a network while being very stealty, so as to not be detected. The way
it works is thus: (1) All the packets with RESET set, which are not coming
from an http (port 80) server, and which are aimed at one of the networks
within our domain, are pulled out of the raw TCPDUMP files; (2) The resultant
shortened TCPDUMP output, (left in numeric IP format), is sorted by source IP;
(3) The sorted output is run through uniq -c which removes duplicate lines and
prepends a column to each line containing the number of duplicates  found;
and (4) the resultant output is sorted again based on the number of duplicate
lines. Scanners are those source IPs which RESET multiple destination IPs, i.e.
have a 1 in the number of duplicates column. 

The script look4scans.pl implements steps (1)-(4) above, by using a generic
reset filter assumed to be in the $CID_DIR/filters/generic directory.(The
filter is "tcp and (not src port 80) and (tcp[13] & 4 != 0)". Additional 
scripts for different types of scans will be released when written and tested.
The script is run as "look4scans.pl -l SITE -d 980921 -f reset" and produces a
text file in the temp directory with the name scans_XXXX where XXXX is the
pid of the process running the script. We recognize this is not terribly 
elegant; integrating the script into the web page was judged less critical
than getting the script released. Web page integration will occur as time
permits.

-------------------------------------------------------

October 13, 1998 - Version 1.4b

A slightly modified fetchem.pl is included which puts the sorting of the 
output html lines and the DNS lookup of IP addresses in those lines into the
same Perl script sort_and_resolve.pl.

-------------------------------------------------------

January 5, 1999 Version 1.5

Several major modifications of the SHADOW package:

1. The primary SHADOW html page now generates a separate toolbar window
   instead of a separate frame. Each of the tools opens a separate window.
   To use it, click on "Tool Window." A small vertical window appears with
   selectors for site and date. A box lists the hour of the day. Six blue
   buttons appear at the bottom labelled: Directory, Search, Scan, NS Lookup,
   Whois, and Report. After selecting the date and site, clicking on an hour
   number fills the main screen with the SHADOW tcpdump output for that hour.
   Clicking on other hours takes you directly to that hour of the day, 
   (assuming the data has been collected).

   The buttons perform the following:
	
     Directory: Fills the data screen with the dynamically created directory
                view of the tcpdump_results subdirectory of your http tree.
                This was used in the previous version of SHADOW.
    
     Search:    Pops up a small search window that presents a form for 
                searching a day's tcpdump records for specific patterns. This
                can be CPU and IO intensive since it searches the whole day's
                records by gunzipping the raw data and piping it through 
                tcpdump with the form supplied pattern.

     Scan:      Not functional at the moment. We are attempting to get a good
                handle on the SCAN problem, i.e. when a single external 
                machine attempts some sort of contact with multiple internal
                machines.

    NS Lookup:  Pops up a small screen that performs a DNS lookup through
                a fill-in form.

    Whois:      Pops up a small screen that performs a WHOIS to query the
                Internic about the officially registered information about
                a domain or IP address.

    Report:     Pops up a screen to generate a SHADOW Incident Report. The
                screen contains a fill-in form for assembling a report 
                to a CIRT (Computer Incident Response Team) about a
                potential incident, compromise, attack, etc. that has been
                detected by your analyst examing SHADOW records. The form
                is generated by the /cgi-bin/compose_IR.cgi Perl script, and
                will most likely need modifications for your particular
                circumstances. It prepares an ASCII text letter and sends it
                to a set of addresses specified on the form.

2. A script strip.pl is included to strip comments from tcpdump filter files.
   In the logger/filters/Site1 directory are several files with .doc as their
   suffix. Documenting the filter files with comments in the /bin/sh fashion,
   enhances human understanding. Unfortunately, tcpdump doesn't handle comments.
   This script reads a documented filter file and strips everything to the
   right of the # character out of the file. strip.pl reads stdin and writes to
   stdout.

3. The cgi-bin scripts have been significantly modified to support the separate
   toolbar window and the functionality provided by the push buttons. In
   addition, necessary changes were made to the primary scripts to support the
   tool windows and fix bugs as found.

4. The documented filters in the Site1 filter directory add explanations for
   each filter to clarify their purposes.

5. I no longer include (or use) the msntp package for synchronizing time
   between machines. For some reason, the msntp hung awaiting terminal input 
   at times, thus defeating its utility in a cron script. Using rdate to a time
   source that accepts it is satisfactory. Just keeping the sensors and 
   analyzers synchronized within a few seconds is not worth a full blown ntp
   installation.

----------------------------------------------------------------------------


15 September 1999

Modifications to the SHADOW package:

1. A button has been replaced in the SHADOW toolbar window. The SCAN button
   has been replaced by the Search-2 button. This button allows a multiple
   day pattern search of the raw tcpdump files, exactly like the current
   search button, except covering a range of dates. Use with caution, the
   searches are very CPU and I/O intensive on your analyzer. It won't take
   many to bring the fastest analyzer system to its knees.

2. The sensor scripts have been changed significantly. The sensor_driver.pl
   cron script now calls start_logger.pl first and then stop_logger.pl. This
   eliminates the short time span in the old sequence when packets are not
   being collected between the stop_logger.pl and start_logger.pl scripts. All
   three scripts now require a calling parameter which specifies the name
   of a .ph file containing Perl parameters for the sensor scripts, and
   the name of a .filter file for use by the scripts. Using this parameter
   on the call of these scripts allows multiple SHADOW sensor scripts to run
   simultaneously. The pid of each is kept in a {param}.pid file. Additionally,
   different parameter files and corresponding filters can be generated and
   maintained on a single sensor. Additionally, if the parameter "ALL" is 
   passed to sensor_driver.pl or start_logger.pl, the scripts will start
   a SHADOW tcpdump process for each *.ph file in the sensor directory. If
   you use this feature, make sure you put your *.ph files in some other 
   location if you don't want them started each hour.

   The mechanism for stopping the currently running tcpdump task has been
   modified, at the risk of making it less portable. Finding the currently
   running tcpdump last started by the start_logger.pl script was a tricky
   process, since searching the ps table for tcpdump also included the
   shell started when the process runs, as well as any other tcpdump 
   processes. The new mechanism saves the group id in the pid file, and 
   thus will only kill the pids with that group id at stop time. 
   Note that this may make the scripts Linux-specific. I use the command
   "ps -ajx" to generate a list of running processes from which I find
   the tcpdump matching a saved group PID. "PS" is notoriously system
   dependant, so this process may not work on non Linux systems. I don't
   have the resources to test the scripts on other OSes, so if you get 
   them to work, let me know and I'll acknowledge your efforts in the
   releases.

   The sensor scripts have been Y2Kupped. The files are stored with the
   full 4 digit year in the name, and a link created from a 2-digit year
   file name. That way, older analysers can still work with newer sensors.

3. The compose_IR.cgi script in httpd/cgi-bin has been revised to 
   automatically generate an Incident Report (IR) number from a file. It 
   also implements a locking mechanism to prevent more than one user at
   a time from getting the same IR number. In addition, this script saves
   an archival copy of the IR in a file (/var/spool/SHADOW/Incident-Reports
   by default), for a logging record of the IR creation.

4. I have removed all the tarballs of the required and optional accessories
   from the accessories directory. I received complaints, whines,
   criticism, and general gripes about not always having the most current
   version in the tarball. So I included a README file in the accessories
   directory with a URL pointer to each of the accessories home locations.

5. I have removed the consolidate.pl script from the distribution. Recall 
   that consolidate's purpose was to take the 24 hourly files, unzip them,
   use tcpslice to combine them into a single "daily" file, and then split
   that daily file into 4 protocol specific subsets. At one point, I 
   discovered that I was missing data after the consolidation had been run.
   It turns out that Linux, and probably most Unices, has an individual
   file limit of 2 GBytes in size. The unzip daily file reached that limit,
   so the consolidate process failed. Given that limitation, I decided to
   leave the 24 hourly zipped files as is. This also eliminates an annoying
   little time discrepency that appeared after consolidation because the
   the files were no longer strictly in chronological order. Consolidate.pl
   is gone.

6. The CGI script "pat_match_form.cgi" and its helper "one_day_pat.pl" have
   been modified. It appears that the browser->apache->CGI->script path has
   a problem. The cgi script collects info from the user about parameter, and
   starts the helper script with those parameters. However, that search 
   script can take some time to unzip and run tcpdump on each of 24 files
   with a user specified search pattern. So, impatient users often hit the
   "STOP" button on their browser, which produces a "Transfer Interrupted."
   message. Unfortunately, this "interrupt" is not passed along to the CGI
   script, which merrily carries on with its search. It doesn't take many
   of these orphaned gunzip->tcpdump scripts to seriously bog down a system.
   So, I added an abort button to the output screen of the "Search" button
   which kills the helper scripts.

7. The opening "splash" page for SHADOW has been prettified and modified
   to automatically open the tools window.

8. A new button is present on the tools popup, labeled NMAP. Occasionally,
   it is desirable to scan a system that seems to be attracting a good deal
   of interest from the outside world. The NMAP button activates the 
   /cgi-bin/privileged/nmap.cgi script which puts a browser form up to allow
   the user to run an NMAP scan against a set of targets. Unfortunately,
   for NMAP to have all its capabilities available, it must be run as root.
   The Apache daemon is running as a "nobody" when connected, so I modified
   the script to call sudo to give root privileges for a single command.
   The README file in the accessories subdirectory will direct you to the
   sources for both NMAP and SUDO.

9. The analyzer scripts have all been modified to use 4-digit years in the
   names of the data and html files. Each script attempts to use a little
   reconnaissance to detect how the files are stored by default and will
   create links appropriately. The script fetchem.pl can contact a sensor 
   running an earlier (2-digit year) version of SHADOW and still fetch
   the appropriate data files. In any case, the files stored and used by the
   current crop of scripts use 4-digit years by default. In summary, this
   version of SHADOW is hopefully backward compatible.
