DEFENSE INTELLIGENCE AGENCY: PROVIDING MILITARY INTELLIGENCE TO WARFIGHTERS AND DEFENSE POLICYMAKERS

The Rapid Innovation Fund (RIF), administered by the Under Secretary of Defense of Research and Engineering (USD(R&E)) Small Business and Technology Partnerships (SBTP), provides a collaborative vehicle for small businesses to provide the department with innovative technologies that can be rapidly inserted into acquisition programs that meet specific defense needs.

JANUS Research Group was tasked with reviewing and presenting the data gathered from the FAS2T-RIF prototype, to ultimately help decide if the prototype should be a candidate for the National Media Exploitation Center (NMEC). By addressing technical risks, and providing suggestions to improve timeliness and thoroughness of test and evaluation outcomes, JANUS and their team was able to provide this data in support of a major defense acquisition program.

The Customer Challenge

The customer needed a partner to review the progress of the Fast Autonomous Sort, Search of Threats and Exploitation on Captures Media (FAS2T-RIF) prototype, and provide a high-level description of the results and work required to achieve the capabilities of FAS2T-RIF) in captured media at the Defense Intelligence Agency (DIA).

The JANUS Approach

The JANUS Research Group team utilized artificial intelligence, machine learning, real-time modeling, and exploitation methodologies to close the gap that exist between limited personnel and time constraints found in large scale processing of media.

Platform Infrastructure

Cloud Agnostic/Stand Alone Environment

We discovered that the current prototype was deployed to the Amazon Web Services (AWS), and full setup via Terraform scripts, and a new deployable solution could be re-created from scratch on a new AWS instance in under 10 minutes.

By utilizing Nvidia DGX hardware for Graphics Processing Unit (GPU) computation, we were able to install and configure a standalone environment.

Microservices Approach to Orchestration

We developed a platform based on the principle of Containerized Solutions. Similarly, to Legos, each service, data processing, or machine learning job is represented as a plug and play, swappable, packaged container.

- Docker: As a leading container platform to build, manage, secure, and deploy applications
- Kubernetes: Serves as a main container orchestration technology that provides elasticity and scalability from day one
- Helm charts: Serves as a Kubernetes Package Manager
- Apache Kafka: A message bus that handles communication between platform components

Each of these individual components includes Central Processing Unit (CPU), Memory, and GPU requirements that drive autoscaling when more resources need to be provisioned for data processing.

Data Ingestion Pipeline

Technology Stack

Our platform technology stack includes proven open-source technologies.

- Logstash Pipelines for data ingestion and transformation
- Elasticsearch for data enrichment
- Multiple input/output filters (S3/Minio, Kafka, FileSystem, Elasticsearch, etc)
- Horizontal scalability of processing nodes
- Queue persistence for re-ingestion and disaster recovery
- ConfigMaps for multiple pipeline Kubernetes architecture setup, depicting the data ingestion namespace (our updated and streamline data ingestion architecture)

ML Kubernetes Cluster Architecture

Below is a diagram of the baseline Kubernetes architecture depicting the data ingestion namespace from an Interim Process Review (IPR). Kubernetes is an architecture that offers a loosely coupled mechanism for service discovery across a cluster. A Kubernetes cluster has one or more control planes, and one or more compute nodes.

Cluster Architecture | JANUS Research Group

Namespace

Below is a diagram of the current, updated, more streamlined, data ingestion namespace architecture that was implemented. Namespaces are an abstract resource that enables Kubernetes to set boundaries for other resources. For example, in creating a namespace, we were able to assign Kubernetes resources such as services, pods, secrets, and config maps to that namespace.

Namespace | JANUS Research Group

Machine Learning (ML)/Deep Learning (DL) Framework

Approach

The JANUS team analyzed ML Pipelines, identifying the following goals:

- End-to-end orchestration: Enable and simplify the orchestration of machine learning pipelines.
- Easy experimentation: Make it easy to try out numerous ideas and techniques and manage various trials/experiments.
- Easy re-use: Ability to re-use components and pipelines to quickly cobble together end-to-end solutions, without having to rebuild each time.

Kubeflow Pipelines

Following our identification of goals, we created our Kubeflow pipelines, a platform for building and deploying portable and scalable ML workflows by using containers, a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. We established the following, necessary services to perform analysis to create a Kubeflow pipeline:

- AWS Identify and Access Management (IAM) Role
- AWS Simple Queue Service (SQS) Queues
- AWS Simple Notification Service (SNS) Topics
- AWS DynamoDB Table
- AWS Rekognition Collection
- AWS Lambda Functions

Document, Image and Video Analysis Algorithms

Our team worked with Amazon's Rekognition's computer vision (CV) capabilities to extract information and insights from images and videos. Image analysis worked with items such as object and scene detection, and facial analysis and recognition, resulting in a collection of faces stored in an AWS DynamoDB Table used for comparison during the process of facial recognition seeded with the faces of Saddam Hussein, Osama Bin Laden, Donald Trump, and George W. Bush. Video analysis used similar tools, as well as unsafe content video detection, resulting in 223 processed videos to find ?violence? criteria.

Our team also worked with Amazon Comprehend, a natural language processing (NLP) service that uses ML to find insights and relationships in text to help uncover the insights and relationships in your data. Specifically, we worked with items such as key phrase extraction, language detection and topic modeling to help process over 100,000 documents to find "President Bush".

JANUS Findings

The JANUS team demonstrated via minimally viable products analytic and sense-making capabilities for big data, unclassified information environments. The platform processes a raw test dataset of 200GB in less than 36 hours on the AWS General Purpose M5 8 Central Processing Unit (CPU) instance. The team processed over 36,000 images, 220 videos and 100,000 documents. This work resulted in five image and video features, and six document features.

JANUS Findings Image | JANUS Research Group

The test data was recovered in the raid on Usama Bin Laden's compound in Abbottlabad, Pakistan, on May 2nd, 2011.

Author

Nosika Fisher

Looking to join our team? Visit our Employment Opportunities page to get more information.

Join Our Team

ISO 20000 ISO 27001 ISO 9001 NCS logo CMMI CMMI Development