Week 3: Understanding Digital Data

Week 3 - Understanding Digital Data

Ali Jaddoa
Ali.Jaddoa@roehampton.ac.uk

25/26
Week 3: Understanding Digital Data

Quick Recap - Last Week

Data Acquisition & Duplication


25/26
Week 3: Understanding Digital Data

Quick Recap - Last Week

Purpose: capture and preserve digital evidence without altering it.

  • Acquisition types:
    • Live - collect volatile data (RAM, active processes, network).
    • Dead - image powered-off media (disks, USBs, phones).
  • Imaging formats: RAW | E01 | AFF - each with trade-offs in size, speed, and metadata.
  • Golden rules:
    • Never work on the original evidence.
    • Always use write blockers.
    • Hash before and after to confirm integrity.
  • Outcome: a forensically sound duplicate ready for examination.

25/26
Week 3: Understanding Digital Data

How does a computer store data?


25/26
Week 3: Understanding Digital Data

What Is Digital Data?

Digital data is information represented using discrete values - typically 0s and 1s (binary).

  • Every file, photo, message, and signal inside a computer is stored as a sequence of bits.
  • Data can represent:
    • Text, Images, Sound, Video
  • Computers use numbering systems (binary, octal, decimal, hexadecimal) to represent and interpret this data.

In forensics, understanding how data is represented helps investigators identify, recover, and interpret evidence accurately.


25/26
Week 3: Understanding Digital Data

Computer Data Bit by Bit: Representing 1s and 0s

width:1OO% center

These physical states correspond to binary digits - 0 and 1 - forming the foundation of all digital information.


25/26
Week 3: Understanding Digital Data

Numbers

We're going through it briefly.


25/26
Week 3: Understanding Digital Data

Decimal Number System - Base 10 (Example 2748)

  • Uses digits 0-9 and is based on powers of 10
  • Each position (column) has a place value that increases by a factor of 10
  • Most familiar system used in everyday counting and arithmetic
Power 10⁵
10×10×10×10×10
10⁴
10×10×10×10
10³
10×10×10
10²
10×10
10¹
10×1
10⁰
1
Column Value 100,000 10,000 1,000 100 10 1
Decimal Number 2 7 4 8
Sum of These Values (2748) 2,000 700 40 8

(2748)₁₀ = 2×10³ + 7×10² + 4×10¹ + 8×10⁰ = 2,748


25/26
Week 3: Understanding Digital Data

Octal Number System - Base 8

  • Uses digits 0-7 and is based on powers of 8
  • Each position represents a power of 8 increasing from right to left
  • Commonly used as a compact binary representation (groups of 3 bits)
Power 8⁵
8×8×8×8×8
8⁴
8×8×8×8

8×8×8

8×8

8×1
8⁰
1
Column Value 32,768 4,096 512 64 8 1
Octal Number 5 2 7 4
Sum of These Values (5274₈) 2,560 128 56 4

(5274)₈ = 5×8³ + 2×8² + 7×8¹ + 4×8⁰ = 2,748₁₀


25/26
Week 3: Understanding Digital Data

Hexadecimal Number System - Base 16

  • Uses digits 0-9 and letters A-F
  • A = 10, B = 11, C = 12, D = 13, E = 14, F = 15
  • Based on powers of 16
  • Commonly used in computing and digital forensics to represent binary values more compactly (4 bits = 1 hex digit)
Power 16⁵
16×16×16×16×16
16⁴
16×16×16×16
16³
16×16×16
16²
16×16
16¹
16×1
16⁰
1
Column Value 1,048,576 65,536 4,096 256 16 1
Hexadecimal Number A B C
Sum of These Values (ABC₁₆) 2,560 176 12

(ABC)₁₆ = A×16² + B×16¹ + C×16⁰ = 10×256 + 11×16 + 12 = 2,748₁₀


25/26
Week 3: Understanding Digital Data

Binary Number System - Base 2

  • Uses digits 0 and 1, and is based on powers of 2
  • Each bit (binary digit) represents a power of two, increasing from right to left
  • Foundation of all digital data, since every value in computing is expressed in binary form
Power 2¹⁰
2×2×2×2×2×2×2×2×2×2
2⁹
2×2×2×2×2×2×2×2×2
2⁸
2×2×2×2×2×2×2×2
2⁷
2×2×2×2×2×2×2
2⁶
2×2×2×2×2×2
2⁵
2×2×2×2×2
2⁴
2×2×2×2

2×2×2

2×2

2
2⁰
1
Column Value 2048 1024 512 256 128 64 32 16 8 4 2
Binary Number 1 0 1 0 1 0 1 1 1 1 0
Sum of These Values (101010111100₂) 2048 0 512 0 128 0 32 16 8 4 0

(101010111100)₂ = 2048 + 512 + 128 + 32 + 16 + 8 + 4 = 2,748₁₀


25/26
Week 3: Understanding Digital Data
Decimal Binary Octal Hexadecimal
0 0000 0 0
1 0001 1 1
2 0010 2 2
3 0011 3 3
4 0100 4 4
5 0101 5 5
6 0110 6 6
7 0111 7 7
8 1000 10 8
9 1001 11 9
10 1010 12 A
11 1011 13 B
12 1100 14 C
13 1101 15 D
14 1110 16 E
15 1111 17 F

Each system expresses the same values in different bases:
Binary (Base 2) · Octal (Base 8) · Decimal (Base 10) · Hexadecimal (Base 16)


25/26
Week 3: Understanding Digital Data

Storage Sizes

1 Bit Represents 0 or 1
1 Crumb 2 bits → (00, 01, 10, 11)
1 Nibble 4 bits → (0000 - 1111)
1 Byte 8 bits
1 Kilobyte (KB) 1024 bytes
1 Megabyte (MB) 1024 kilobytes
1 Gigabyte (GB) 1024 megabytes
1 Terabyte (TB) 1024 gigabytes
1 Petabyte (PB) 1024 terabytes
1 Exabyte (EB) 1024 petabytes

Data grows exponentially - every step up represents 1,024 times more information than the previous level.


25/26
Week 3: Understanding Digital Data

Text


25/26
Week 3: Understanding Digital Data

Question: How does the computer save text data?


25/26
Week 3: Understanding Digital Data

1. Standard ASCII

  • It’s a system that displays characters in standard formats.
  • Use numbers to represent text/characters
  • 8-bit code 7 bits for data and 1 bit for parity - 128 chars - Low ASCII
  • Low ASCII- Characters, punctuation, special codes

ASCII (American Standard Code for Information Interchange)


25/26
Week 3: Understanding Digital Data

25/26
Week 3: Understanding Digital Data

2.Extended ASCII Table

The Extended ASCII set expands the standard 7-bit ASCII (0-127) to 8 bits (0-255).

  • Supports additional symbols, graphics, and accented characters.
  • Printable characters range from 32-126 and 128-255.
  • Used by systems like IBM PC, Windows-1252, and ISO 8859-1 for extended language support.

25/26
Week 3: Understanding Digital Data

ASCII Character Codes

Example: The letters c, a, t

ASCII Codes - Decimal 99 97 116
c a t
ASCII Codes - Hexadecimal 63 61 74
c a t
ASCII Codes - Binary 01100011 01100001 01110100
c a t

Each letter has a unique ASCII value, which can be expressed in decimal, hexadecimal, or binary.


25/26
Week 3: Understanding Digital Data

Can 256-character codes accommodate all the characters in other languages, e.g., Chinese, Japanese, Hindi, etc.?


25/26
Week 3: Understanding Digital Data

3. Unicode

Extended the number of characters. Worldwide standard for processing, displaying, and interchanging all types of language texts.

  • Uses 2 bytes per character
  • Letter A
    • ASCII: 41h, Unicode: 4100h

When searching for text data, always search in both Unicode and ASCII formats.


25/26
Week 3: Understanding Digital Data

Others: Base64 and ROT13

  • Base64: Encodes binary data into text for safe storage or transfer.

    • Alphabet: [A–Z][a–z][0–9][+/=]
    • the cat sat on the matdGhlIGNhdCBzYXQgb24gdGhlIG1hdA==
    • Used to represent binary data in text form, e.g. emails, HTTP, APIs, embedded images.
  • ROT13: Simple substitution cipher that rotates letters by 13 places.

    • Alphabet: A–Z a–z
    • hellouryyb
    • Used for simple obfuscation, puzzles, and teaching, not real security.

25/26
Week 3: Understanding Digital Data

Endianness

It refers to the order in which a sequence of bytes is stored in a computer’s memory.

  • Big-endian: most significant value in the sequence is stored at the lowest storage address (i.e., first)
  • Little-endian : the least significant value in the sequence is stored first

width:1OO% center


25/26
Week 3: Understanding Digital Data

Endianness

  • Many mainframe computers, particularly IBM mainframes, use a big-endian architecture.

  • Most modern computers, including PCs, use the little-endian system.


25/26
Week 3: Understanding Digital Data

GRAPHICS AND VIDEO


25/26
Week 3: Understanding Digital Data

Graphics and Video

  • Common image formats include:
    • Bitmap (.bmp)
    • GIF (.gif)
    • JPEG (.jpg)
    • PNG (.png)

25/26
Week 3: Understanding Digital Data

Graphics

  • Digital cameras and scanners record the colour of each picture element - called a pixel.
  • Pixel stands for picture element.
  • Each small square in an image is a pixel, stored as one or more bytes.
  • A 4-megapixel camera captures about 4 million pixels to form one image.

The more pixels an image has, the higher its resolution.


25/26
Week 3: Understanding Digital Data

Video & Animation

  • Video and animation are sequences of single graphic images (frames) displayed over time.
  • A synchronised audio track may also be stored, aligning sound samples with each frame.
  • MPEG is the most common video format:
    • Combines JPEG images (frames) with MP3 audio synced in sequence.

Motion = rapid display of still frames + synchronised sound.


25/26
Week 3: Understanding Digital Data

FILE EXTENSIONS AND FILE HEADERS


25/26
Week 3: Understanding Digital Data

File Types and Headers

  • A typical hard drive contains thousands of files.
  • These files come in many different types and formats.
  • File types are usually identified by their three-letter extensions (e.g., .jpg, .doc, .pdf).
  • Each file also includes a header - a small block of data that helps the system recognise its type and structure.

Extensions help users identify files, while headers help software and forensic tools verify them.


25/26
Week 3: Understanding Digital Data

File Types and Extension Examples

Extension File Type / Description
.DOC Microsoft Word document
.XLS Microsoft Excel spreadsheet
.EXE Executable program
.BAT Batch command script
.JPG JPEG graphic image
.GIF GIF graphic image
.BMP Bitmap graphic image
.DLL Dynamic Link Library
.TXT Plain text file
.ZIP Compressed archive file

File extensions give users a quick way to identify format and function.


25/26
Week 3: Understanding Digital Data

File Headers

  • The first bytes of a file form its signature or magic number.
  • These identify the file type and allow it to open correctly.
  • If the header is changed or missing, the file becomes unreadable.
  • Headers also store key metadata like size and format.

Crucial for authenticating files in digital forensics.


25/26
Week 3: Understanding Digital Data

What is Metadata?

Metadata - Digital data often includes metadata, which provides information about the characteristics, origin and structure of the data


25/26
Week 3: Understanding Digital Data

File Headers / Signatures

Extension Header (Hex Signature) Associated Program / Format
.DOC D0 CF 11 E0 A1 B1 1A E1 00 Microsoft Office Document
.EML 46 72 6F 6D Generic Email Message
.EXE 4D 5A Windows Executable
.GIF 47 49 46 38 Graphic Interchange Format
.JPG FF D8 FF E0 JPEG Image
.MOV 73 6B 69 70 QuickTime Movie
.PDF 25 50 44 46 Adobe Acrobat File
.ZIP 50 4B 07 08 PK Zip Archive

The file header (magic number) reveals the true file type - even if the extension is changed.


25/26
Week 3: Understanding Digital Data

Example: What file type is associated with this header?

width:1OO% center


25/26
Week 3: Understanding Digital Data

Another Example

width:1OO% center


25/26
Week 3: Understanding Digital Data

Deleting or Modifying File Extensions

  • Adversaries can rename or remove extensions to hide files or evade detection.
  • Extension = user hint, not definitive proof of file type.
  • Use a hex editor or forensic tool to inspect the file header (magic number) to determine true content.
  • Always verify file identity before trusting extension-based evidence.

25/26
Week 3: Understanding Digital Data

Activity

Identify the actual file type.
Decide whether the file extension matches the file signature.

File name File signature (hex) Actual file type (based on signature) Extension match?
photo.jpg FF D8 FF
report.txt 25 50 44 46
song.eni 52 49 46 46
program.m 4D 5A
notes.docx 50 4B 03 04

25/26
Week 3: Understanding Digital Data

What is a Hex Editor?

A hex editor (or binary/byte editor) is a program that allows manipulation of the fundamental binary data that makes up a computer file.

  • The name “hex” comes from hexadecimal, the standard numerical format for representing binary data.

25/26
Week 3: Understanding Digital Data

Finding a List of File Signatures and Related Resources

Useful resources for researching file headers, extensions, and magic numbers in forensic analysis.


25/26
Week 3: Understanding Digital Data

Now we understand Data, how to search for it?

width:1OO% center


25/26
Week 3: Understanding Digital Data

Searching for Evidence (Keyword Searching)

  • A common technique in digital forensic investigations to quickly examine large volumes of data.
  • Typically carried out at the early stage of an investigation.
  • Focuses on identifying patterns, not exact values, such as:
    • email addresses, phone numbers, account IDs, tracking numbers.
  • Case-level keywords: Terms specific to the investigation.
  • Global keywords: Terms reused across multiple cases.

Use case-relevant keywords only (e.g. names, usernames, suspect terms).
Avoid wasting resources on unnecessary keywords.
Searches may be case-sensitive, depending on the tool.


25/26
Week 3: Understanding Digital Data

Example

width:1OO% center


25/26
Week 3: Understanding Digital Data

width:1OO% center


25/26
Week 3: Understanding Digital Data

GREP (Globally search a Regular Expression and Print)

  • A powerful Linux search tool used in forensic investigations.
  • Searches input files, or standard input if no file is specified, for matching lines of data.
  • Supports regular expressions to identify patterns rather than exact text.

Key Features

  • Uses standard GREP symbols and special characters.
  • Allows custom and flexible searches.
  • Useful for quickly locating evidence across large datasets.

25/26
Week 3: Understanding Digital Data

Common GREP / Regular Expression Characters

. Wildcard, matches any single character (except a full stop when escaped).
\d Any digit, shorthand for [0–9].
+ Must be present one or more times.
? May or may not be present (zero or one time).
* Present zero, one, or multiple times.
[abc] Matches one character from the set a, b, or c.
[ea] Example: re[ea]d matches read and reed, but not red.
[^abc] Must not match any character in the set. Example: re[^a]d matches red but not read.
[a-z] Defines a range. Example: [0–9] matches any digit from 0 to 9.
{x,y} Repeat the previous character between x and y times.
A{2} Matches AA (repeat A exactly two times).
a|b OR operator, matches a or b.
\ Escape character, used to search for special characters (e.g. \. for a full stop).
( ) Grouping, matches words as a unit. Example: (May) or (Jun).

25/26
Week 3: Understanding Digital Data

Example-1

width:1OO% center


25/26
Week 3: Understanding Digital Data

Example-2

width:1OO% center


25/26
Week 3: Understanding Digital Data

Example-3

width:1OO% center

Remember – the operator only act on the preceding character


25/26
Week 3: Understanding Digital Data

GREP in Autopsy

width:1OO% center


25/26
Week 3: Understanding Digital Data

Activity: Regular Expression Practice

  1. [^a-z]?Liz[^a-z]?

    • Will this search find Elizabeth, Lizzy or neither ?
  2. What does this expression find? \d{4}[- ]{3}:\d{3}

  3. Write an expression for find email address for Roehampton

    • e.g., First.Sure@ Roehampton.ac.uk
    • e.g., First.sure@ Roehampton.ac.uk

25/26
Week 3: Understanding Digital Data

Lab


25/26

--- ## Computer Data - All computer data is fundamentally represented in **binary**. - Computers are **electronic devices**. - Electronic components represent data using **two distinct states**: - **Off / On** - **Current flowing / Not flowing** - **Switch open / Closed**