Overview
The primary role of the PREDETECT system is to perform accurate classifications at scale. To operate successfully under such conditions requires the use of several components to achieve our desired level of performance.
While each component is described in detail below, at a high-level, the following systems are utilized heavily by PREDETECT system:
- Node.js (API) – The entire user-facing API is based on Node.js.
- Apache Kafka (Queueing) – After API requests to analyze message content are validated for authorization and schema, they are published to Kafka for later processing. In addition to data ingest, Kafka is also used for queueing messages between various processing systems.
- Apache Storm (Processing) – Storm handles all processing of messages in the conversational analysis phase (tokenization, part-of-speech tagging, tree-building, rewriting, psycholinguistic analysis, etc.), the sequence analysis phase of grooming detection, and masquerading detection.
- Redis (Storage) – All storage of user profiles and overall statistics is handled by Redis.
System Components
API
The entire REST-based API is implemented in Node.js.
Queueing
All message queueing is
Processing
As the goal of PREDETECT is to provide start-to-finish processing of incoming messages in near real-time (under 10 seconds), a stream-based processing system is required. For this critical task, Apache Storm was chosen.
Storage
Storage within the PREDETECT system is quite simple.