 -=( ---------------------------------------------------------------------- )=-
 -=( Natural Selection Issue #1 --------------------- Introducing Crunchers )=-
 -=( ---------------------------------------------------------------------- )=-

 -=( 0 : Contents --------------------------------------------------------- )=-

 0 : Contents
 1 : Introduction
 2 : Proof of Life
 3 : General Operation
 4 : Defensive Strategies
 5 : Conclusion

 -=( 1 : Introduction ----------------------------------------------------- )=-

 A cruncher is an  automation technology that takes a suspected  executable and
 determines wether it is a virus or not.  It is the  "bigger picture"  that has
 combined together suspicion,  baiting,  emulation, and extraction into a super
 virus-killing machine.

 They exist.

 They exist now.

 You have been targeted.

 -=( 2 : Proof of Life ---------------------------------------------------- )=-

 Extract 1 : "The Hot Zone" by Srikumar S Rao

 IBM  isn't  the only  firm  with new  defenses  against the  virus  spreaders.
 Symantec has a spider  that cruises the Internet,  looking at 500 known  virus
 transmission  sites  and also  randomly  downloading files.   These  files are
 checked for viruses, using various automated analytical engines.

 But then the bad guys are getting rather creative, too.  Computer vandals have
 created polymorphic  viruses that  mutate each  time they  infect a  computer,
 making immunization much more difficult.  They have taken to encrypting  viral
 code so it cannot be detected while inactive.

 The  good guys  have retaliated  by creating  safe  "virtual  computers" where
 viruses can  be tricked  to deliver  their payloads.   They are then detected,
 analysed and zapped.

 In  a  well-guarded laboratory  at  IBM's Hawthorne  office,  Jeffrey Kephart,
 manager of antivirus science and technology, demonstrates what the future will
 bring.   He infects  a PC  with a  simulated unknown  virus.   The  protection
 program detects it instantly and captures the viral code, sending it  securely
 to an analysis computer  sitting a  few yards away.  The  virus is analysed, a
 signature extracted and an antidote  developed and sent back.   Elapsed  time,
 less than five minutes.  Sometime next year IBM aims to install a system  like
 this over the Internet to its customers.

 So who's going to win this  battle, the viruses of the virus  hunters?  That's
 too hard to predict, but here's a pretty safe forecast: Corporations are going
 to have to spend more and more money on self-defence.

 Extract 2 : http://vx.netlux.org/lib/ajk01.html

 At  IBM,  we  are creating  what may  be thought  of as  an immune  system for
 cyberspace. Just as the vertebrate immune system creates immune cells  capable
 of fighting new  pathogens within a  few days of  exposure, a computer  immune
 system derives  prescriptions for  recognizing and  removing newly encountered
 computer  viruses within  minutes.  In  a current  prototype, PCs  running IBM
 AntiVirus  are connected  by a  network to  a central  computer that  analyses
 viruses. A monitoring program on each PC uses a variety of heuristics based on
 system  behavior, suspicious  changes to  programs, or  family signatures  to
 infer that  a virus  may be   present. The  monitoring program  makes a   copy
 of  any program thought  to be   infected and  sends it  over the   network to
 the virus-analysis machine.

 On receiving  a putatively  infected sample,  the machine  sends it to another
 computer that  acts as  a digital  petri dish.  Software on  this test machine
 lures  the  virus  into  infecting  specially  designed  "decoy"  programs  by
 executing,  writing  to, copying  and  otherwise manipulating  the  decoys. To
 replicate successfully, a virus must infect programs that are used often,  and
 so the decoy activity  brings the viral code  out of hiding.  Other behavioral
 characteristics of the virus can be inferred during this phase as well.

 Any decoys that have been infected can now be analysed by other components  of
 the  immune  system,   which  will  extract   viral  signatures  and   produce
 prescriptions for  verifying and  removing the  virus. Typically  it takes the
 virus analyser less  than five minutes  to produce such  prescriptions from an
 infected  sample.  The  analysis machine  sends this  information back  to the
 infected client PC, which incorporates  it into a permanent database  of cures
 for known viruses. The PC is then directed to locate and remove all  instances
 of the virus, and it is permanently protected from subsequent encounters.

 If the PC is connected to other machines on a local-area network, it is  quite
 possible that the virus  has invaded some of  them as well. In  our prototype,
 the new  prescription is  sent automatically  to neighbouring  machines on the
 network, and each machine checks itself immediately. Because computer  viruses
 can  exploit  the network  to  multiply quickly,  it  seems fitting  that  the
 antidote should use a similar strategy to spread to machines that need it.  By
 allowing  the  latest  prescriptions  to  be  propagated  to  subscribers   at
 uninfected sites, it is possible in principle to immunize the entire PC  world
 against an emerging virus very rapidly.

 Extract 3 : http://www.symantec.com/corporate/ibm/av_tech.html

 The virus research experts at SARC created what is known as the Seeker Project
 as a system of virus search, retrieval and analysis. The technology scours the
 Internet, gather viruses lingering there and create solutions for them  before
 Symantec's customers come into contact with them. The Seeker Project is broken
 down into three separate modules: Seeker, Bloodhound and SARA.

 Seeker: Seeker is a Web spider designed to scour the Internet and gather files
 for analysis. It moves out  from Symantec across the world,  obtaining samples
 for analysis in the SARC lab.

 Bloodhound: Rather than using signatures, Bloodhound uses Symantec's  patented
 heuristic  technology to  detect viruses  by inspecting  files for  virus-like
 behavior.

 SARA: SARA (Symantec AntiVirus Research Automation) is the heart of the Seeker
 project. The SARA module takes a virus sample obtained using Seeker,  extracts
 the unique qualities  of the virus,  develops a Symantec  detection and repair
 scheme and tests that newly developed scheme in less than five minutes.

 Extract 4 : http://www.symantec.com/avcenter/venc/data/automat.html

 As part  of its  continuing effort  to detect  and eradicate computer viruses,
 Symantec developed Symantec  AntiVirus Research Automation  (SARA) technology.
 SARA  analyses submitted  files to  detect new  viruses and  create the  virus
 definitions used to remove them automatically.

 To categorize virus definitions created by SARA, the term Automat is  included
 in the virus  name to indicate  the identification method.  For example, virus
 names contain  a prefix  such as  W97M that  describes the  virus type. A SARA
 -detected virus might be named W97M.Automat.A. The alphabetic character suffix
 is applied to make the virus name unique.

 SARA is fully automated.  Virus analysis, definition development,  and quality
 assurance are performed without human intervention. Once a virus definition is
 created,  it  is  automatically added  to  Symantec's  Norton AntiVirus  virus
 definition updates. If SARA is  unable to produce a signature,  the submission
 is forwarded to Symantec engineers who perform a traditional manual analysis.

 Currently, SARA is used to control  the spread of Macro viruses. Because  SARA
 eliminates numerous mundane virus identification tasks, Symantec engineers can
 concentrate on more difficult virus threats.

 Norton AntiVirus products include a feature called Scan & Deliver that  allows
 users to quickly  and easily submit  a file with  a suspected or  unrepairable
 virus to the Symantec AntiVirus  Research Center (SARC). When a  submission is
 received by SARC, the file is analysed initially by SARA.

 SARA employs artificial intelligence to analyse the virus sample, replicate  a
 potential virus, and then write a memory detection and removal definition  for
 that  virus.  SARA next  performs  an unbiased  check  of its  work  against a
 rigorous standard set to either pass or fail the results.

 After the new definition is tested, it is automatically sent back to the  user
 who submitted the file. The new definition is also added to the regular  virus
 definition updates available to all users.

 Virus definitions generated by SARA are regularly reviewed by SARC  engineers.
 After a  manual review  and confirmation,  Automat is  removed from  the virus
 name.

 Extract 5 : http://www.symantec.co.kr/press/1999/n990914a.html

 Norton AntiVirus  Corporate Edition  7.0 also  includes a  new version  of the
 highly successful Scan  and Deliver feature  that provides customers  with the
 most complete and accurate anti-virus protection cycle available. The new Scan
 and Deliver is an automatic, global response process for submitting  suspected
 or infected files and receiving new virus repair definitions over e-mail. When
 a new  viral event  is discovered  at a  client, the  system can automatically
 package and forward the sample to  the Quarantine Server from where it  can be
 submitted directly to SARC. Once SARC receives the sample, it is automatically
 passed to  the new  Symantec AntiVirus  Research Automation  (SARA) technology
 which can in turn, automatically create a cure and transmit the resulting  fix
 back to the reporting  corporation. The cure is  also made available to  other
 service  subscribers,  which   significantly  minimizes  the   possibility  of
 widespread  infection.  This  automation technology  will  greatly  reduce the
 amount of time to create a cure for a new virus, providing much relief to  the
 IT administrator from today's fast moving threats. The new Scan and Deliver is
 based on technology co-developed with IBM.

 Extract 6 : http://www.symantec.com/press/1999/n991001.html

 Symantec  Corporation  (Nasdaq:  SYMC)  today  announced  Striker32,  the most
 advanced  virus  detection  and repair  technology  engineered  to combat  the
 growing threat of complex 32-bit Windows-based viruses. Striker32, included in
 all Norton  AntiVirus products,  works by  setting up  a virtual Pentium-based
 Windows "clean room" in which a suspect Windows program is allowed to run.  By
 analysing each program as it works, Striker32 is able to determine whether the
 program is infected. Uninfected  files are processed quickly,  which minimizes
 the impact of scanning on system performance.

 Once identified by Striker32, an infected file is safely isolated using Norton
 AntiVirus' Quarantine  feature. From  there, the  Scan and  Deliver feature of
 Norton AntiVirus  enables users  to send  the file  over the  Internet to  the
 Symantec AntiVirus Research  Center (SARC) for  analysis and repair.  Scan and
 Deliver includes  automated macro  virus analysis  and repair  technology that
 enables virus cures to be created and delivered faster than the malicious code
 can spread.

 "Striker32 makes it  possible for our  researchers to analyse  complex viruses
 such as the  W32.Bolzano virus and  produce cures in  minutes rather than  the
 days required by traditional anti-virus technology," said Enrique Salem,  vice
 president of Symantec's Security and Assistance Business Unit. "With Striker32
 and Scan and Deliver technologies working together, Norton AntiVirus continues
 to  be the  most advanced,  responsive and  sophisticated anti-virus  solution
 available."

 With  Striker32,  users  are  protected  against  today's  most  sophisticated
 viruses, including all  17 variants of  the W32.Bolzano virus.  W32.Bolzano is
 considered  the largest  family of  Windows viruses .  The latest  variants of
 W32.Bolzano have eluded detection by traditional anti-virus technology because
 the variants mutate and bury themselves deep within Windows executable  files,
 hiding all signs of infection. In contrast, most traditional computer  viruses
 attach their programming instructions to a few, well-known areas of executable
 files,  making  isolation  and  detection  easy.  Because  Striker32  has  the
 capability to detect viruses regardless  of where the virus inserts  itself or
 how it conceals its programming instructions, users are assured of having  the
 most advanced defense against this growing threat.

 Extract 7 : http://symantec.com/avcenter/reference/striker.pdf

 Like generic decryption, each time it scans a new program file, Striker  loads
 this file into a self-contained virtual computer created from RAM. The program
 executes in this virtual computer as if it were running on a real computer.

 However,  Striker does  not rely  on heuristic  guesses to  guide  decryption.
 Instead, it relies on virus profiles or rules that are specific to each virus,
 not a generic set of rules that differentiate nonvirus from virus behavior.

 When scanning a new file, Striker first attempts to exclude as many viruses as
 possible from  consideration, just  as a  doctor rules  out the possibility of
 chicken pox if an examination fails to detect scabs on a patient's body.

 For example, different viruses infect different executable file formats.  Some
 infect only  .COM files.  Others infect  only .EXE  files. Some viruses infect
 both. Very  few infect  .SYS files.  As a  result, as  it scans  an .EXE file,
 Striker ignores  polymorphics that  infect only  .COM and  .SYS files.  If all
 viruses are  eliminated from  consideration, then  the file  is deemed  clean.
 Striker closes it and advances to scan the next file.

 If this preliminary scan does not rule out infection, Striker continues to run
 the file inside the  virtual computer as long  as the behavior of  the suspect
 file is consistent with at least one known polymorphic or mutation engine.

 For example, one polymorphic virus  is known to perform math  computations and
 throw  away  the  results.  A  second  polymorphic  may  never  perform   such
 calculations.  Instead,  it  may  use  specific  random  instructions  in  its
 decryption routine. A third polymorphic may call on the operating system as it
 decrypts.

 Striker catalogs these  and nearly 500  other characteristics into  each virus
 profile, one for each polymorphic and mutation engine.

 Consider a set of generic heuristic rules  that identify A, B, C, D, and  E as
 potential virus behaviors. In contrast, a Striker profile calls for Virus 1 to
 execute behaviors A, B, and C.  As it decrypts, Virus 2 executes  behaviors A,
 B, and D, while Virus 3 executes behaviors B, D, and E.

 If Striker observes behavior A while running a suspect file inside the virtual
 computer,  this  is  consistent with  viruses  1  and 2.  However,  it  is not
 consistent with Virus 3. Striker eliminates Virus 3 from consideration.

 The  heuristic-based system  must continue  searching for  all three  viruses,
 however, because  it observes  behavior that  is consistent  with its  generic
 rules.

 If Striker next observes behavior B, this is consistent with viruses 1 and  2.
 Striker must continue scanning for these two viruses. However, the  heuristics
 again continue to search for  all three viruses. Finally, if  Striker observes
 behavior  E,  this eliminates  Virus  2 from  consideration,  and Striker  now
 pursues a single potential virus.

 The heuristic-based scanner continues to  search for all three viruses.  Under
 Striker, this  process continues  until the  behavior of  the program  running
 inside the  virtual computer  is inconsistent  with the  behavior of any known
 polymorphic or mutation  engine. At this  point, Striker excludes  all viruses
 from consideration.

 On the  other hand,  a heuristic-based  system scans  for all  viruses all the
 time. It must find some behavior inconsistent with all behaviors.

 Clearly  the first  advantage to  Striker's approach  is speed.  The  profiles
 enable Striker  to quickly  exclude some  polymorphic viruses  and home  in on
 others. In contrast, heuristics labor  on, scanning all program files  against
 all  available generic  rules of  how all  known  polymorphics  and all  known
 mutation engines might behave.

 The  profiles  also  enable  Striker  to  process  uninfected  files  quickly,
 minimizing impact on system performance. In contrast, heuristic-based scanning
 is more likely to decrease  system performance, because uninfected files  must
 also be scanned against all generic  rules for how all known polymorphics  and
 mutation engines might behave.

 Second,  anti-virus  researchers  are  no  longer  forced  to  rewrite complex
 heuristic rules to scan for each new virus, then exhaustively test and  retest
 to ensure they do not inadvertently miss a polymorphic the software previously
 detected.

 Third, with Striker,  a team of  anti-virus researchers may  work in parallel,
 building profiles  for many  new polymorphic  viruses, swiftly  adding each to
 Striker. Each profile is unique,  much like a virus signature,  independent of
 any other profile. The old profiles  still work, and the new profile  does not
 affect the  old. Exhaustive,  time-consuming regression  testing is  no longer
 necessary. It  becomes easy  to update  anti-virus software  by compiling  new
 virus profiles into the Norton  Antivirus database file that is  posted online
 monthly or obtained on floppy disk.

 Extract 8 : http://symantec.com/avcenter/reference/dis.tech.brief.pdf

 The  first step  in building  an automated  anti-virus  analysis  and response
 system is detecting new or unknown threats at the desktop, the server, and the
 gateway. Suspicious  files can  then be  forwarded for  automatic analysis and
 processing.

 1. New/unknown viruses quarantined
 2. Local quarantine forwards samples to corporate quarantine
 3. Corporate quarantine securely forwards samples to regional Symantec gateway
 4. Gateway forwards samples to back-end automation
 5. Back-end automation forwards new cure/fingerprints to gateway
 6. Gateways securely forward status and fingerprints to corporate quarantine
 7. Corporate quarantine forwards fingerprints to master management server
 8. Master servers automatically forward samples to primary servers
 9. Primary servers roll out definitions to clients

 The  Digital  Immune  System  has  two  separate  processing  queues:  one for
 corporate or government customers and one for consumers. To ensure scalability
 and availability, Symantec has  deployed separate computer hardware  to manage
 each of these queues, ensure that a glut of submissions on one queue will  not
 encumber the other, and protect against denial-of-service attacks.

 Before processing a  submission, the back-end  automation adds all  submission
 information to a tracking database. During the automated analysis process, the
 Digital  Immune  System inserts  subsequent  information into  a  database for
 reference and accounting purposes. If the submission is subsequently  deferred
 for  manual analysis,  all logs  from the  analysis  and  filtering steps  are
 available to the human analyst.

 THE DIGITAL IMMUNE SYSTEM  FILTERS AUTOMATICALLY RESOLVE APPROXIMATELY  87% OF
 ALL SUBMISSIONS, WITH  AN EXPECTED INCREASE  TO 95% AS  NEW AUTOMATED ANALYSIS
 MODULES ARE ADDED TO THE SYSTEM.

 CASE 1: CLEAN FILE FILTERS The  Digital Immune System maintains a database  of
 over 700,000 clean programs found on PCs running the Windows operating system.
 If a file in a submission matches a file in the database, the back-end  system
 records  the result  in the  database and  eliminates  the  file from  further
 consideration.

 CASE 2:  FALSE POSITIVE  FILTERS A  false positive  occurs when  an anti-virus
 program incorrectly identifies a clean program as being infected with a  virus
 While  anti-virus  companies attempt  to  minimize false  positives,  they are
 inevitable and commercial-grade  automation processes must  be able to  handle
 such a scenario.

 If an  anti-virus software  company distributed  a new  virus fingerprint that
 identified false positives on millions of computers around the world, it could
 cause a large subset of users to submit files to the Digital Immune System for
 analysis. Therefore, the back-end automation  system must have a mechanism  to
 identify and respond appropriately when a submission is incorrectly labeled as
 positive. The back-end automation system  leverages a database of known  false
 positive files to automatically identify false positives in submissions.  If a
 file  in  the  submission  matches  exactly  with  one  in  the false-positive
 database, the back-end system records the result in the database and  excludes
 the file from further consideration.

 CASE 3: KNOWN VIRUS FILTERING In the final stage of submission filtering,  the
 back-end system  scans all  remaining files  with Norton  AntiVirus, using the
 very latest virus definition files. The back-end attempts to repair all  files
 detected as viruses, and if the repair is successful, IT automatically records
 this result in the database.  This filtering step enables the  back-end system
 to  automatically  resolve  submissions that  contain  new  but only  recently
 identified  viruses.  Because  Symantec Security  Response  updates  its virus
 definition  files  several times  per  day, this  filtering  step can  quickly
 identify new viruses and ensure that customers quickly get the most up-to-date
 cures.

 ANALYSIS CENTER REPLICATION MODULES CAN REPLICATE AND COMPLETELY ANALYSE A NEW
 MACRO VIRUS IN APPROXIMATELY  30 MINUTES. AUTOMATIC VIRUS  REPLICATION ENABLES
 DIGITAL IMMUNE SYSTEM  COMPUTERS TO REPLICATE  NEW AND UNKNOWN  COMPUTER MACRO
 VIRUSES, CHARACTERIZE  THEIR BEHAVIOR,  AND AUTOMATICALLY  GENERATE A CURE-ALL
 WITHOUT HUMAN INTERVENTION. RAPID REPLICATION AND AUTO-ANALYSIS OF NEW THREATS
 IS KEY TO COMBAT THREATS LIKE MELISSA.  THIS IS THE FIRST SYSTEM IN THE  WORLD
 TO PROVIDE AUTOMATIC PROTECTION AGAINST COMPUTER VIRUSES.

 The Digital  Immune System  from Symantec  deploys a  back-end system analysis
 architecture, called the Analysis  Center created by scientists at  IBM's T.J.
 Watson Research center which offers:

 1. Fast replication and auto-analysis of Word and Excel macro viruses, and  of
    DOS viruses.
 2. Multiple,  simultaneous,  replication  and  analysis  sessions  to  support
    multiple customer requests.
 3. Improved filtering of clean files and false positives.
 4. An extendable architecture.

 The Analysis Center automatically replicates and analyses DOS, Word, and Excel
 macro viruses. If any of the files in a submission appear to be Word documents
 or  Excel  spreadsheets, those  documents  are queued  for  processing by  the
 Analysis Center Macro  replication module. If  the files contain  a DOS virus,
 they are routed to the DOS replication module.

 Symantec and IBM Research engineers are  in the process of adding and  rolling
 out  additional  auto-analysis modules  for  32-bit Windows  viruses,  and for
 computer worms within virtual email networks such as Explore.Zip and Melissa.

 The  Analysis  Center  is  a  fully  contained  network-within-a-network.  Its
 replication system feeds each submitted sample (e.g., a Word document) into  a
 simulated Windows computer running on an enterprise-grade server, and attempts
 to coax any viruses or worms within the document or spreadsheet to infect  the
 virtual system.

 If the  document or  spreadsheet contains  no viruses,  this will  be apparent
 after the  replication session;  however, just  because no  viral activity was
 detected doesn't  mean the  file is  not infected.  Specifically, the  file in
 question might contain a "picky" virus that fails to spread itself during  the
 replication  session. Consequently,  the replication  system inserts  all  log
 files and other data into the database for reference purposes, and the file is
 manually examined by Symantec Security Response researchers.

 If the  file in  question does  contain a  computer virus,  and if  that virus
 replicates  itself  within  the simulated  environment,  then  the replication
 system  gathers  all potentially  infected  files and  analyses  them. If  all
 infected files show similar infection characteristics, the replication  system
 automatically generates a new virus fingerprint for the virus.

 Next,  the  replication  system  creates  a  test  virus  fingerprint database
 containing the new  fingerprint and launches  Norton AntiVirus, with  this new
 database, to  scan the  virus and  all of  its child  infections. If  the test
 fingerprint database correctly detects and repairs all of the infections,  the
 Digital Immune System provisionally certifies the new fingerprint.

 Finally, the back-end  system obtains a  copy of the  latest virus fingerprint
 database, without the new fingerprint, and scans all of the viral samples once
 more. If the most recent fingerprint database fails to detect the  infections,
 the back-end  inserts the  provisional fingerprint  into the  master database.
 This final check  is performed to  reduce the likelihood  that redundant virus
 fingerprints will  be inserted  into the  fingerprint database.  A new  set of
 definitions  is  then  built  including  the  newly  created  fingerprint, and
 automatically  returned  via  the Digital  Immune  System  gateway to  Central
 Quarantine at the customer site.

 Once the  back-end system  has replicated  and analysed  an infected file, and
 created a new fingerprint database,  it initiates a house-cleaning session  to
 check whether  any currently  open submissions  contain the  same virus; these
 submissions can be automatically re-filtered, eliminating the need for  manual
 processing.

 -=( 3 : General Operation ------------------------------------------------ )=-

 Cruncher strategy can be split into two components, the front-end that selects
 and submits suspicious files for processing, and the back-end which replicates
 samples of the virus from within the file and constructs a signature to detect
 those samples.

 There are three front-end strategies detailed within the extracts above and we
 can extrapolate some ideas as to how each section works.

 1. Antivirus software can send suspicious files from home/workstations if they
    are considered suspicious.
    a) Files are suspicious if their checksums are made at install, and change.
    b) Files are suspicious if their internal structure is inconsistent.
    c) Files are suspicious if when run through an emulator, their code acts in
       rare ways that trigger heuristics, or are decrypted in a way that allows
       heuristics to pick up routines within it.

 2. Antivirus companies can search through internet sources and submit them for
    testing.
    d) Searches are conducted on newsgroups and websites (particularly hacking,
       warez and virus sites), and all files are sent for testing.

 3. Users can manually submit suspicious files for processing.
    e) If files are sent through email or existing files run noticeably slower,
       then users are likely to send them for testing.

 Once selected, the files are processed by the back-end using these strategies.

 1. Files are checked against filter strategies to reduce time wasted during an
    outbreak.
    a) A database of known clean files is kept,  and compared against submitted
       files before any other processing is done.  This is most likely the same
       software database  used to exclude false-positives when extracting virus
       signatures.
    b) A smaller database of previously known false-positives is kept, and when
       any are found, newer signature files are sent to the client.
    c) A scan is done using latest in-house signatures to exclude very recently
       detected viruses (ie: in the past few hours).

 2. Proper analysis begins.
    a) Files are moved into a virtual computer (and virtual network) where they
       are forced to reproduce by different modules.  These modules most likely
       execute macro programs through set steps, or programs in set directories
       containing baits, or opening and forwarding dummy emails.
    b) Once samples (most likely files copied from the false positive database)
       have been modified from their original forms, an emulation is run to get
       an unencrypted form of the virus, and signatures tested against samples,
       to see if they are effective.

 Bait technology can be  implemented by crunchers in  a few ways.  Either  they
 implement a system that generates random filenames with varying sizes and runs
 them through various tests  trying to get them  infected.  Or they can  simply
 set  up a  full  "normal"  installation of  a system  and use  scripts to  run
 programs normally, hoping  that some get  infected.  Or they  can just fill  a
 directory with a database of real program executables and watch what changes.

 -=( 4 : Defensive Strategies --------------------------------------------- )=-

 If is a well publicised fact  that thousands of viruses have been  detected on
 web  sites by  Seeker, and  had their  signatures extracted  long before  ever
 having a chance to replicate in  the wild.  Your  best defence against this is
 common  sense, don't publish malware  in ways that automated engines  can find
 the files and extract the contents.

 Depending on the method used to compare files against the known clean database
 (either various checksums or full file comparisons),  interchangeable checksum
 neutral code may fool the cruncher into bypassing some infections, creating an
 incomplete detection signature (or none at all).

 Furthermore, the internal structure of files should be made more consistent so
 that headers arouse less suspicion.  Anti-emulation code can be tested against
 antivirus engines by wrapping known viruses in polymorphic decryptors, so that
 signature extraction by the cruncher simply will not find any matches.

 Let it also be noted that code must not rely on condition tests of any sort as
 emulation systems often follow bad-condition paths just to see what happens (a
 good example of this is described in the Striker32 paper of which extracts are
 given above).

 Gauge system parameters like disk and keyboard activity, and watch for changes
 before doing infections.  Keep an internal clock and wait before infection, so
 that automated systems running on a short fuse can't virus get samples.

 Baits can be avoided by stepping outside of the same-directory mind frame, and
 focusing on directory structures, program methodology (is it in the  registry,
 does it import support files), and even viewing networks as a whole entity, by
 comparing installations across accessible shares  and doing web searches on  a
 filename to see if it returns matches.

 -=( 5 : Conclusion ------------------------------------------------------- )=-

 Cruncher technology is a closely guarded secret in antivirus firms, because of
 the substantial research investment and the possibility that exposing too much
 information could give the leg up to competitors or teach virus writers how to
 cheat the system.

 You'll notice that this article is mostly Symantec based, and this is  because
 they are  the only  company that  markets to  the enterprise using information
 about the technical expertise of  their internal processes.  But it  should be
 clear that other antivirus companies will have similar, but secret, processes.

 While it would be interesting  to gain further information about  the specific
 vulnerabilities of the Cruncher systems in use around the world, it is  likely
 to be far less valuable than thinking in general terms of what is, and what is
 not, a good idea in your virus activities.

 In 1998 we were talking internally "If we were asked to create a Cruncher, how
 would we go about it?"   Funnily enough, everything we discussed,  from ruling
 out submissions with a software  database, and automated bait generators  with
 signature extraction, became reality in the Symantec project.

 So the most important lesson learned here is that virus writers can be just as
 intelligent as antivirus programmers, and that it is possible and fruitful  to
 predict (and counter) future antivirus technology before it is implemented.

 -=( ---------------------------------------------------------------------- )=-
 -=( Natural Selection Issue #1 --------------- (c) 2002 Feathered Serpents )=-
 -=( ---------------------------------------------------------------------- )=-
