Probabilistic suffix models for API sequence analysis of Windows XP applications

Geoffrey Mazeroff, Jens Gregor, Michael Thomason, Richard Ford

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Given the pervasive nature of malicious mobile code (viruses, worms, etc.), developing statistical/structural models of code execution is of considerable importance. We investigate using probabilistic suffix trees (PSTs) and associated suffix automata (PSAs) to build models of benign application behavior with the goal of subsequently being able to detect malicious applications as anything that deviates therefrom. We describe these probabilistic suffix models and present new generic analysis and manipulation algorithms. The models and the algorithms are then used in the context of API (i.e., system call) sequences realized by Windows XP applications. The analysis algorithms, when applied to traces (i.e., sequences of API calls) of benign and malicious applications, aid in choosing an appropriate modeling strategy in terms of distance metrics and consequently provide classification measures in terms of sequence-to-model matching. We give experimental results based on classification of unobserved traces of benign and malicious applications against a suffix model trained solely from traces generated by a small set of benign applications.

Original languageEnglish (US)
Pages (from-to)90-101
Number of pages12
JournalPattern Recognition
Volume41
Issue number1
DOIs
StatePublished - Jan 1 2008

Fingerprint

Application programming interfaces (API)
Viruses
Statistical Models

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Probabilistic suffix models for API sequence analysis of Windows XP applications. / Mazeroff, Geoffrey; Gregor, Jens; Thomason, Michael; Ford, Richard.

In: Pattern Recognition, Vol. 41, No. 1, 01.01.2008, p. 90-101.

Research output: Contribution to journalArticle

Mazeroff, Geoffrey ; Gregor, Jens ; Thomason, Michael ; Ford, Richard. / Probabilistic suffix models for API sequence analysis of Windows XP applications. In: Pattern Recognition. 2008 ; Vol. 41, No. 1. pp. 90-101.
@article{c95af57812d74b0e8fc2368d22e51c57,
title = "Probabilistic suffix models for API sequence analysis of Windows XP applications",
abstract = "Given the pervasive nature of malicious mobile code (viruses, worms, etc.), developing statistical/structural models of code execution is of considerable importance. We investigate using probabilistic suffix trees (PSTs) and associated suffix automata (PSAs) to build models of benign application behavior with the goal of subsequently being able to detect malicious applications as anything that deviates therefrom. We describe these probabilistic suffix models and present new generic analysis and manipulation algorithms. The models and the algorithms are then used in the context of API (i.e., system call) sequences realized by Windows XP applications. The analysis algorithms, when applied to traces (i.e., sequences of API calls) of benign and malicious applications, aid in choosing an appropriate modeling strategy in terms of distance metrics and consequently provide classification measures in terms of sequence-to-model matching. We give experimental results based on classification of unobserved traces of benign and malicious applications against a suffix model trained solely from traces generated by a small set of benign applications.",
author = "Geoffrey Mazeroff and Jens Gregor and Michael Thomason and Richard Ford",
year = "2008",
month = "1",
day = "1",
doi = "10.1016/j.patcog.2007.04.006",
language = "English (US)",
volume = "41",
pages = "90--101",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",
number = "1",

}

TY - JOUR

T1 - Probabilistic suffix models for API sequence analysis of Windows XP applications

AU - Mazeroff, Geoffrey

AU - Gregor, Jens

AU - Thomason, Michael

AU - Ford, Richard

PY - 2008/1/1

Y1 - 2008/1/1

N2 - Given the pervasive nature of malicious mobile code (viruses, worms, etc.), developing statistical/structural models of code execution is of considerable importance. We investigate using probabilistic suffix trees (PSTs) and associated suffix automata (PSAs) to build models of benign application behavior with the goal of subsequently being able to detect malicious applications as anything that deviates therefrom. We describe these probabilistic suffix models and present new generic analysis and manipulation algorithms. The models and the algorithms are then used in the context of API (i.e., system call) sequences realized by Windows XP applications. The analysis algorithms, when applied to traces (i.e., sequences of API calls) of benign and malicious applications, aid in choosing an appropriate modeling strategy in terms of distance metrics and consequently provide classification measures in terms of sequence-to-model matching. We give experimental results based on classification of unobserved traces of benign and malicious applications against a suffix model trained solely from traces generated by a small set of benign applications.

AB - Given the pervasive nature of malicious mobile code (viruses, worms, etc.), developing statistical/structural models of code execution is of considerable importance. We investigate using probabilistic suffix trees (PSTs) and associated suffix automata (PSAs) to build models of benign application behavior with the goal of subsequently being able to detect malicious applications as anything that deviates therefrom. We describe these probabilistic suffix models and present new generic analysis and manipulation algorithms. The models and the algorithms are then used in the context of API (i.e., system call) sequences realized by Windows XP applications. The analysis algorithms, when applied to traces (i.e., sequences of API calls) of benign and malicious applications, aid in choosing an appropriate modeling strategy in terms of distance metrics and consequently provide classification measures in terms of sequence-to-model matching. We give experimental results based on classification of unobserved traces of benign and malicious applications against a suffix model trained solely from traces generated by a small set of benign applications.

UR - http://www.scopus.com/inward/record.url?scp=34548040718&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548040718&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2007.04.006

DO - 10.1016/j.patcog.2007.04.006

M3 - Article

VL - 41

SP - 90

EP - 101

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 1

ER -