A new project examining the activities of gangs and drug cartels will give START a chance to work alongside two other Department of Homeland Security (DHS) Science and Technology Centers of Excellence (COE): the Criminal Investigations and Network Analysis Center (CINA) and the Center for Accelerating Operational Efficiency (CAOE).
The project, Tracking Cartels: Exploiting Open Sources to Identify Trends, will gather relevant data from open source materials and build prototypes of tools to streamline information gathering, dissemination and the visualization of vital data points in Mexico, Guatemala, Honduras and El Salvador.
"It’s exciting to be harnessing the power of multiple centers of excellence to combat drug cartels and violent transnational criminal organizations in Mexico and Central America,” START Project Lead Marcus Boyd said. Boyd heads up START’s Geospatial Research Unit and serves as the director of graduate studies.
This collaboration will offer valuable tools to agencies across the federal government who need to collect, analyze and integrate open-source information related to Mexican and Central American criminal organizations.
“We want to create tools and data that can be used by practitioners, not just write peer-reviewed articles for academic journals,” said START Research Director Amy Pate.
This new project grew out of a pilot study, Open-Source Geospatial Analysis of Criminal Activity, which was completed in 2018.
In the pilot project, the START team pulled news articles from a large dataset and used keywords to focus on the region of interest.
“About two years ago we started the pilot project, and we were looking at any type of activities that the cartels were involved with,” Boyd said. “At first we just wanted to see how much we could catalog using open source news articles, but then we wanted to see if we could do more, both with the gazetteer and with AI and machine learning.”
The innovative Gazetteer Search Tool the team is developing is a type of directory that can be used alongside a map, and contains information about the social features of a particular region.
“The purpose is to integrate data points that are not usually found on a map and link them to other data so we can see what we’re not aware of,” Boyd said.
As a proof of concept, the pilot study highlighted that weapons tended to flow from the United States to Mexico, while drugs tended to flow from Mexico to the United States.
The pilot study also brought to light some challenges the researchers would face, such as deciding which words to include in the catalog of names used to pinpoint articles of interest. Including the wrong words in the catalog would have caused the tool to select articles that were not useful to the researchers.
“The word ‘cartel,’ for example, can be used for legitimate organizations, such as oil cartels. Because of this potential for confusion, we have to tell the tool gathering the articles to rank an article as more relevant to our project if the article includes the names of specific illegal cartels,” Boyd said.
Another issue is slang or colloquialisms for the regions – the local people may have a specific name for a particular place, that does not appear on any maps. The team also said that it is interested in both English- and Spanish-language sources, as well as local dialects and other languages, because cartels from all over the world are operating in the area.
“We are translating articles that come from a non-U.S. perspective, and we’re trying to use the most recent data – a lot of datasets from Mexico are from between 2006 to 2011, but more datasets have been released since 2017,” START Co-PI Samuel Henkin said.
The team will only use open source information, including municipal government sources – such as the Mexican census and local victimization surveys – and news articles pulled from Google BigQuery.
The START team will collect the articles and build the gazetteer, while CAOE will evaluate existing tools, develop AI and machine learning models, design and test deep learning, and create exploratory models. CINA will be responsible for node detection and enhancing the quality of the gazetteer. All of this will result in the creation of visual analytics, which will allow practitioners to utilize the results.