
DEFENSE INTELLIGENCE AGENCY: PROVIDING MILITARY INTELLIGENCE TO WARFIGHTERS AND DEFENSE POLICYMAKERS
The Rapid Innovation Fund (RIF), administered by the Under Secretary of Defense of Research and Engineering (USD(R&E)) Small Business and Technology Partnerships (SBTP), provides a collaborative vehicle for small businesses to provide the department with innovative technologies that can be rapidly inserted into acquisition programs that meet specific defense needs.
JANUS Research Group was tasked with reviewing and presenting the data gathered from the FAS2T-RIF prototype, to ultimately help decide if the prototype should be a candidate for the National Media Exploitation Center (NMEC). By addressing technical risks, and providing suggestions to improve timeliness and thoroughness of test and evaluation outcomes, JANUS and their team was able to provide this data in support of a major defense acquisition program.
The Customer Challenge
The customer needed a partner to review the progress of the Fast Autonomous Sort, Search of Threats and Exploitation on Captures Media (FAS2T-RIF) prototype, and provide a high-level description of the results and work required to achieve the capabilities of FAS2T-RIF) in captured media at the Defense Intelligence Agency (DIA).
The JANUS Approach
The JANUS Research Group team utilized artificial intelligence, machine learning, real-time modeling, and exploitation methodologies to close the gap that exist between limited personnel and time constraints found in large scale processing of media.
Platform Infrastructure
Cloud Agnostic/Stand Alone Environment
We discovered that the current prototype was deployed to the Amazon Web Services (AWS), and full setup via Terraform scripts, and a new deployable solution could be re-created from scratch on a new AWS instance in under 10 minutes.
By utilizing Nvidia DGX hardware for Graphics Processing Unit (GPU) computation, we were able to install and configure a standalone environment.
Microservices Approach to Orchestration
We developed a platform based on the principle of Containerized Solutions. Similarly, to Legos, each service, data processing, or machine learning job is represented as a plug and play, swappable, packaged container.
- Docker: As a leading container platform to build, manage, secure, and deploy applications
- Kubernetes: Serves as a main container orchestration technology that provides elasticity and scalability from day one
- Helm charts: Serves as a Kubernetes Package Manager
- Apache Kafka: A message bus that handles communication between platform components
Each of these individual components includes Central Processing Unit (CPU), Memory, and GPU requirements that drive autoscaling when more resources need to be provisioned for data processing.
Data Ingestion Pipeline
Technology Stack
Our platform technology stack includes proven open-source technologies.
- Logstash Pipelines for data ingestion and transformation
- Elasticsearch for data enrichment
- Multiple input/output filters (S3/Minio, Kafka, FileSystem, Elasticsearch, etc)
- Horizontal scalability of processing nodes
- Queue persistence for re-ingestion and disaster recovery
- ConfigMaps for multiple pipeline Kubernetes architecture setup, depicting the data ingestion namespace (our updated and streamline data ingestion architecture)
ML Kubernetes Cluster Architecture
Below is a diagram of the baseline Kubernetes architecture depicting the data ingestion namespace from an Interim Process Review (IPR). Kubernetes is an architecture that offers a loosely coupled mechanism for service discovery across a cluster. A Kubernetes cluster has one or more control planes, and one or more compute nodes.

Namespace
Below is a diagram of the current, updated, more streamlined, data ingestion namespace architecture that was implemented. Namespaces are an abstract resource that enables Kubernetes to set boundaries for other resources. For example, in creating a namespace, we were able to assign Kubernetes resources such as services, pods, secrets, and config maps to that namespace.

Machine Learning (ML)/Deep Learning (DL) Framework
Approach
The JANUS team analyzed ML Pipelines, identifying the following goals:
- End-to-end orchestration: Enable and simplify the orchestration of machine learning pipelines.
- Easy experimentation: Make it easy to try out numerous ideas and techniques and manage various trials/experiments.
- Easy re-use: Ability to re-use components and pipelines to quickly cobble together end-to-end solutions, without having to rebuild each time.
Kubeflow Pipelines
Following our identification of goals, we created our Kubeflow pipelines, a platform for building and deploying portable and scalable ML workflows by using containers, a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. We established the following, necessary services to perform analysis to create a Kubeflow pipeline:
- AWS Identify and Access Management (IAM) Role
- AWS Simple Queue Service (SQS) Queues
- AWS Simple Notification Service (SNS) Topics
- AWS DynamoDB Table
- AWS Rekognition Collection
- AWS Lambda Functions
Document, Image and Video Analysis Algorithms
Our team worked with Amazon's Rekognition's computer vision (CV) capabilities to extract information and insights from images and videos. Image analysis worked with items such as object and scene detection, and facial analysis and recognition, resulting in a collection of faces stored in an AWS DynamoDB Table used for comparison during the process of facial recognition seeded with the faces of Saddam Hussein, Osama Bin Laden, Donald Trump, and George W. Bush. Video analysis used similar tools, as well as unsafe content video detection, resulting in 223 processed videos to find ?violence? criteria.
Our team also worked with Amazon Comprehend, a natural language processing (NLP) service that uses ML to find insights and relationships in text to help uncover the insights and relationships in your data. Specifically, we worked with items such as key phrase extraction, language detection and topic modeling to help process over 100,000 documents to find "President Bush".
JANUS Findings
The JANUS team demonstrated via minimally viable products analytic and sense-making capabilities for big data, unclassified information environments. The platform processes a raw test dataset of 200GB in less than 36 hours on the AWS General Purpose M5 8 Central Processing Unit (CPU) instance. The team processed over 36,000 images, 220 videos and 100,000 documents. This work resulted in five image and video features, and six document features.

The test data was recovered in the raid on Usama Bin Laden's compound in Abbottlabad, Pakistan, on May 2nd, 2011.
Author
Nosika Fisher