Overview

The primary role of the PREDETECT system is to perform accurate classifications at scale. To operate successfully under such conditions requires the use of several components to achieve our desired level of performance.

While each component is described in detail below, at a high-level, the following systems are utilized heavily by PREDETECT system:

  • Node.js (API) – The entire user-facing API is based on Node.js.
  • Apache Kafka (Queueing) – After API requests to analyze message content are validated for authorization and schema, they are published to Kafka for later processing. In addition to data ingest, Kafka is also used for queueing messages between various processing systems.
  • Apache Storm (Processing) – Storm handles all processing of messages in the conversational analysis phase (tokenization, part-of-speech tagging, tree-building, rewriting, psycholinguistic analysis, etc.), the sequence analysis phase of grooming detection, and masquerading detection.
  • Redis (Storage) – All storage of user profiles and overall statistics is handled by Redis.

System Components

API

The entire REST-based API is implemented in Node.js.

Queueing

All message queueing is

Processing

As the goal of PREDETECT is to provide start-to-finish processing of incoming messages in near real-time (under 10 seconds), a stream-based processing system is required. For this critical task, Apache Storm was chosen.

Storage

Storage within the PREDETECT system is quite simple.