What Is Software Forensics?

Software forensics is a branch of science that investigates computer software text codes and binary codes in cases involving patent infringement or theft. Software forensics can be used to support evidence for legal disputes over intellectual property, patents, and trademarks.

Digital forensics and computer forensics are both tools used to recover computer files. Digital forensics tries to find files that are the same, and software forensics examiners focus on function.

Software forensics is especially important in patent and trade cases. In these cases, someone might have copied another person's code, but rewritten that code in a way to hide the theft. A digital forensic examiner may not have the tools or capabilities to prove a crime occurred.

The following common terms are important to understanding when software forensics is relevant to your legal situation.

  • Software forensics: analyzes computer programs for legal purposes
  • IP address: A number that labels every computer attached to the internet
  • Domain name system: Names for computers using the internet. These registered domains include the person's name and contact information.
  • Hashing: Hashing is a way to map out the large amount of information on a computer. Some experts use hashing to determine whether someone has copied a file, although it's not always an effective method to prove theft. Hashing helps to find exact matches of code. If a programmer changes code with even one space, hashing can't conclusively determine if copyright infringement or theft exists.
  • Source Code: The text form written by a computer programmer based on high-level instructions. Source code is always written in a computer programming language.
  • Decompiled: Source code gets created to perform a process. Decompiling the process could open up the source code for interpretation and investigation. Often, information in a computer or in a file is lost in the decompiling process.
  • Logging: Logging files tends to be informative for researchers. The problem is that too much logging uses up disk space or slows down a computer. Administrators must find a careful balance of logging information and saving space. Almost all operating systems record usage behaviors. Logging aids in determining who accessed a file and when the file was accessed. Computer files can also tell investigators what users do when they log into computers. Using the Perl programming language, a high-level computer programming language, investigators typically create their programs for log file analysis. Many of the current commercial products aren't adequate to cover the variety of log files that an investigator may experience.
  • Network Surveillance: Networks refer to the way data travels between computers. To monitor networks, network administrators can use a special type of software and hardware. Data gets split into packets when it travels over networks. In software forensics, people in the field call watching networks packet sniffing with packet sniffers, network protocol analyzers, or network sniffers. Ethereal, which runs on UNIX and Windows, is the most widely available and free system for packet sniffing.

Reasons to Use Software Forensics

Unfortunately, people use computers to cause harm. Below are some ways people have caused problems using computers.

  • Viruses: People write computer codes that repeat themselves and cause damage.
  • Worms: A programmer writes computer code to repeat and spread over networks causing damage.
  • Logic Bombs: Logic bombs are timers that are usually attached to a virus or worm to activate the virus or worm at a certain time.
  • Plagiarism: This term refers to intentionally copying someone else's work.
  • Computer Fraud: This action involves using a computer for a crime.
  • Trojan Horses: In computer terms, a Trojan horse is a program that looks like a safe program. These programs appear safe, but they cause harm.

Famous Viruses

The Internet worm, a code written by Robert Morris and released in 1989, is one of the most damaging and malicious source codes that have existed. Studying the Internet Worm led researchers to conclude:

  • It wasn't well written and had many errors.
  • Either the author did sloppy work or the worm was released early.
  • The authors didn't have classically trained or advanced programming ability.
  • The author wrote it over a long period.

The WANK and OILZ Worms

These two worms were also released in 1989 and attacked both NASA and the U.S. Department of Energy (DOE). Studying this code led to the following information:

  • The first author was academically trained and had a high understanding of coding.
  • The first author's intent was experimentation.
  • The second author had a hostile intent and included profanities in the code.
  • The second author showed a simpler programming style than the first author.
  • The third author used mixed cases when programming and joined the other two authors' code.

Understanding Programming and Source Code

When a programmer codes, each part of that language serves a specific purpose. Programmers can program in different ways or use a different style to meet their goals. Two people could program the same feature, and the source code could look completely different.

You can name the authors by their style. Identifying the author is relevant to software forensics and gets accomplished four different ways.

  • Author Discrimination: Experts use author discrimination to decide how many authors wrote the code in question. Author discrimination doesn't mean that the author gets identified.
  • Author Identification: The aim is to name the author of a piece of code. It's necessary to use samples from that author to compare the code.
  • Author Characterization: Author characterisation is similar to suspect profiling for criminal suspects. The investigator tries to decide the education background of the programmer based on the code.
  • Author Intent Determination: The investigator attempts to show if the author intended to cause harm through the code or program. The source code offers much information useful in software forensics.
  1. Parts of the code offer further information about the background of the author. Code can also help show the skill level of the programmer. For example, analysis might show whether the programmers were trained in school or if they taught themselves how to program.
  2. The style and flow offer information about the pace and way of the programmers worked. Code looks different if the material is written all at once instead of gradually.
  3. Any extra comments to the code provides information about writing style. Sometimes, programmers write comments in human language such as English. This language helps people understand more about that person.
  4. The layouts, borders, and indentations provide insight into the programmer's personality. These three elements are like a picture that experts can use to learn about the programmer's personality.
  • Executable Codes: Software forensics studies executable codes that attack computers. These codes include the following: logic bombs, Trojan horses, worms, and viruses.

Technical Experts: Why They Matter in Software Forensics

Technical experts are typically required when dealing with intellectual property litigation. Intellectual property litigation refers to lawsuits that involve an idea instead of a physical object. Sometimes, whether purposeful or not, juries can get confused with technical computer programming terms. Two areas can help with understanding intellectual property litigation:

  1. A method to make software comparisons
  2. A standard for using the software comparisons in a court of law.

Combatting Concerns in Court

  • Certification: Some states require experts to have engineering certifications before they can testify. Requiring certification is a flawed need and only required by a few states. Requiring certification by reputable organizations such as the Association for Computer Machinery or the Institute of Electrical and Electronics Engineering would help to prevent fraud or slanted testimony in court. Some people believe that ramifications should exist excluding or even prosecution for perjury.
  • Neutral Experts: For court cases, other options would be to hire neutral experts instead of each side finding its own.
  • Tools and Techniques Concerns: Another concern in court is the legitimacy of the instruments and techniques used by experts. Currently, no governing body exists to make sure the tools and techniques of expert witnesses are valid. Testing these tools and techniques ensures that all expert witnesses provide reliable information to juries.

Validating Tools and Software

Forensic scientists need to prove every step of their processes, especially if they plan on testifying in court. According to the National Institutes of Standards and Technology, all test results of any method in forensic science needs to be repeatable and reproducible.

  • Daubert Standard: The Daubert Standard is a general standard in the legal community for determining whether any tested method is valid. Although not specific to digital forensics, the basics are still relevant. To decide whether a method is valid, the Daubert Standard asks the following five questions:
  1. Has the method been empirically tested?
  2. Was the method peer-reviewed?
  3. Is the method generally accepted in the scientific community?
  4. Are there any error rates associated with the method?
  5. Are there standards controlling the operation?

Internal Validation

  1. Developing a Plan: Developing a plan should include an in-depth survey of the goal and how to reach that goal. The plan should include how the test is completed, the tools that you will use, the testing manufacturer, and how often the tests get finalized. Generally, tests are rerun at least two to four times per year.
  2. Creating a Controlled Data Set: This step is usually the hardest and the longest. To create a controlled data set, the experimenters must start with their experiment and then change only one element. This process helps to make sure the test is reliable.
  3. Conducting the Tests in a Controlled Setting: Tests need to be completed in a controlled setting. If it's not in a controlled setting, then it's less likely your tests fall into the group of repeatable and reproducible. If you do the test over again, you should get the same results in a controlled setting.
  4. Validating the Test Results: Confirming the test results allows you to check them against known results. Try to confirm the results at least three times. Use peer review to help confirm your results, such as the High Technology Crime Investigation Association and the International Association of Computer Investigative Specialists.

High-Profile Court Cases and Case Studies

Caldera Inc. vs. Microsoft

In 1996, Caldera Inc filed suit against Microsoft for allegedly copying parts of their CP/M operating system in MS-DOS. The anti-trust suit was settled out of court with Microsoft paying $275 million. MS-DOS eventually became the foundation for Windows 95, which helped launch Microsoft as the powerhouse in the computer industry.

A debate exists concerning whether the $275 million was enough considering Microsoft's success and the demise of CP/M. Experts had to spend hundreds of hours studying the codes to decide the verdict. Although Microsoft settled the case, experts couldn't say for sure that they copied the code.

CodeSuite

CodeSuite is an import tool created around the time of the MS-DOS and CP/M controversy.

  • CodeSuite supports 40 languages.
  • Codes get divided into sections to find plagiarized codes in different sections.
  • CodeSuite analyzes statements, comments, strings, identifiers, and instruction sequences.

CodeSuite checks anything that could be suspicious and look at the codes more in-depth. This process also removes common mistakes caused by certain aspects of the computer science process.

Common Algorithms

Common algorithms might jump out as copying. Algorithm basics are taught to computer scientists during their educational training.

Common Identifiers

Terms such as index, count, and matrix are all prevalent in different programs. These common terms and human-readable language may all be wrongly identified as copying.

Common Author

If a person leaves one company and goes to another, the coding language will appear similar. However, this situation doesn't mean that the coding was copied.

Automatic Code Generation

Sometimes, people use computer-automated code to speed up their processes. If two people use the same computer program to generate automated code, the act could seem to be plagiarized, when it is not.

Third-Party Code

Open-source code, code that anyone can use, is often used for basic computer functions and is not considered plagiarizing. Third-party code could make something appear copied or performed legally. This area came into question during the Microsoft and CP/M controversy.

The analysis determined that Microsoft did not copy the source code for CP/M. Although many similarities existed, ultimately the findings remain that Microsoft did not plagiarize, although with changing technology, the possibility for copying exists.

Software Forensics on a Broader Scale

Software forensics also includes analyzing computer source or binary code for any investigation, analysis, or prevention purposes.

  • Malicious Code: Malicious code is a type of code that a person creates for unethical intentions. This code may seem harmless, but the writer of the code aims to harm. Software forensics often aims at detecting and finding the perpetrator of malicious code, although, sometimes, identifying and regular testing may be difficult.
  • Safety Incidents: Investigating safety incidents can help curb computer-related safety concerns and incidents.
  • Security Vulnerabilities: When dealing with computers, finding and correcting security vulnerabilities or weaknesses can also fall under the scope of software forensics.
  • Software Fault Analysis: Software forensics may be used to find issues with items such as monitoring devices.

Understanding Static Analysis

Work in the field of software forensics can be tedious without help from software that helps reduce the work for forensic experts. Static analysis tools contribute to cut some of that work.

Computer forensics involves using tools to investigate data, computer, or computer networks without changing the components of the system being investigated.

Users can't compromise the data collected during an investigation. Typically, forensics investigators will make an exact copy of all data, an image, of all data collected from a computer system.

Imaging is the technical term for creating and transferring exact copy of data to a disc. Investigators typically use imaging to complete their investigation and further prevent any compromise to the original data. Imaging also works to find data that people may have deleted accidentally or purposefully.

Commercial Software for Software Forensics

Popular Types of Software Forensic Tools

  • Mobile device analysis tools
  • Database forensic tools
  • File viewers
  • Internet analysis
  • Network forensic analysis

Encase

Created by Guidance Software, Encase can be found in use by many law enforcement agencies to image, or make a replica of, CDs, USB drives, or older floppy disks. Encase can also image personal digital assistants. Software forensic investigators especially use Encase when investigations may lead to use by a court of law or the police. Encase calls the image that a forensic investigator creates an Evidence File.

Encase allows users to search for deleted files, look at photos, and search the image for keywords. Although Encase is one of the most expensive programs for forensic investigators, the company offers discounts for law enforcement agents. Even without the rebate, many forensic investigators believe the cost is worthwhile because of the broad range of capabilities and functions that Encase provides. Encase also allows the user to create its mini programs, called eScript, which enhance the search experience and the capacity to filter the collected data.

Digital Forensics Framework

Both experts and beginners are able to use this open-source tool. This tool aids in recovering deleted files, searching meta data, and accessing remote devices.

Vogon Forensic Software

This advanced software offers commercial products for imaging, investigation, and processing. When combined, these products work to create an image of a file or hardware, which is then indexed by the processing software. The inquiry software helps investigators quickly search the image. Vogon is similar to Encase, but this tool makes the process easier.

Open Computer Forensics Architecture

The Dutch National Police Agency built this open-source program on the Linux platform to automate the digital forensics process.

X-Ways Forensics

This advanced platform, available on Windows, works efficiently for digital forensic investigators. Its features include:

  • Automated logging
  • Case management
  • Memory analysis
  • Data authenticity
  • Data recovery
  • Bulk calculation
  • Automated registry report

SafeBack

Safeback is also used often by law enforcement agencies. SafeBack operates mostly with Intel-based computer systems, a particular type of computer operating system. This older DOS-based program operates from floppy disks, but it does not offer all the capabilities of Vogon or Encase. Safeback's software provides only imaging technology.

Free Software

For those who are not tech savvy, you might be surprised to learn that most forensic analysis stems from UNIX, not the typical Windows Operating System. Unix, created in the 1970s, includes many small programs instead of one large program. Software forensics has developed from these small programs. While these programs are still used in UNIX, they have become used with Windows operating systems as well.

  • Data Dumper: Data dumper, or dd, creates exact copies for forensic investigators on UNIX systems. Investigators must type in commands instead of clicking on icons, which means the investigators must have advanced programming skills. Generic programs are available, and modified versions are available specifically for forensic use.
  • Md5sum: MD5 is a type of algorithm that determines whether an image was created correctly. This algorithm analyzes all data to ensure that data collected is, in fact, a replica of the original data. Md5sum is a free utility that helps a computer forensic specialist compare the image to the original.
  • Grep: Grep is a program that allows users to search for a character sequence. Grep is especially useful, however, because it allows users to search for metacharacters in the document or file as well. Grep is included as part of Encase and is involved in UNIX systems.
  • The Coroner's Toolkit: These tools help investigators with analysis for UNIX machines. Investigators use this software tool when an intruder breaks into a computer. The tools aid in tracking the times of break-in and restoring any files that the intruder deleted.

Live Response Tools

Typically, forensic investigators analyze computers when they are off to prevent altering the data, but in some cases, they need to analyze a computer while it is running. The following tools aid in analyzing operating computers.

  • Netstat: This tool, built into Windows operating systems, is applicable when investigators might have unauthorized access to a computer. This tool helps to gather data or evidence which can't be accessed if the computer is off.
  • Fport: This free software helps to reveal any software that might be communicating with another computer. Fport contributes to find unauthorized programs on Windows NT4, 2000, and XP.
  • PlainSight: This tool helps forensic investigators check USB usage, data carving, and internet use history.
  • PsList: This program shows all running programs on a computer. If an authorized program is running on the computer, a forensic investigator can find it quickly and easily.
  • XRY: This tool is especially important when studying software forensics on mobile technology. XRY is available for both hardware and software of mobile devices. It is able to recover deleted data from smartphones such as images, text messages, and call records.

Proving Who Did It

When only one computer is involved without a network, then it's easy to prove who is at fault. When a computer is part of a network, the process can become more complicated.

  • Whois: The primary registrar for computer DNS maintained by authorities throughout the world. Whois comes standard with UNIX, but it can also be downloaded on Windows systems. Whois can also be accessed through certain websites. Search Engines and Newsgroups Even completing a search on Google can produce many important results for investigators. Newsreader is a software program also performs similar searches.

Forensic Engineering

Forensic engineering focuses on determining why something has failed. It's often used when people become injured due to product failure. Forensic engineering typically involves testing the product that failed as well as a replica of the product that failed.

Typically, the exam puts stress on the product to the point that fails. By replicating the failure, the researcher can determine what point or what factors caused the failure. Regarding forensic engineering, reverse engineering is crucial for patent cases and trade secret cases. Forensic engineering was a part of the two space shuttle disasters. The space shuttle Challenger exploded at takeoff in 1986, killing the entire crew.

Investigators determined that rubber rings called O-rings failed. Investigations later showed that the manufacturer had warned NASA of a potential issue with the O-rings, but the company received pushback from NASA and dropped their objections. In 2003, seven crew members died during the re-entry of the space shuttle Columbia. A major operation formed to conduct a forensic examination of the tragedy. Investigators recovered the flight data recorder, which provided information on the accident.

Forensics has a broad scope and is typically used fo discussing purposeful or criminal activity, but forensic studies are also important in software investigations including safety failures, product failures, and security breaches.

Similar to investigating crime, investigating software failure uses detailed practices to analyze and understand what went wrong. Examining the source code or binary code helps to decide or understand what might have gone wrong.

Could Software Forensics Be a Concern for Your Legal Case?

If you believe software forensics could help with your patent or trademark case, you can post your job request on UpCounsel's marketplace. UpCounsel accepts only the top 5 percent of lawyers to its site. Lawyers on UpCounsel come from law schools such as Harvard Law and Yale Law and average 14 years of legal experience, including work with or on behalf of companies like Google, Stripe, and Twilio.