Tuesday, October 9, 2012

Boeing’s Data-Analytics Tool Helps Agencies

In the search for tools that sift vast amounts of data, a multitude of U.S. agencies have turned to TAC, a Boeing software product that can swiftly and automatically apply thousands of sophisticated queries to documents, data streams, and structured and unstructured data sets.

First deployed in 2005 and updated every six months, TAC “functions completely differently from anything else out there,” said Charles Fleischman, chief technical officer of Boeing’s Intelligence Systems Group. “You don’t ask it a question like on Google.”

Instead, Fleischman said, analysts “design a matrix of questions” that drive searches for ASCII characters — essentially, letters and numbers embedded in data. He said such searches do not require data to be preprocessed — for example, tagged to mark location, time or keywords — or placed within ontologies, which are lists of entities or terms that can be grouped and related within a hierarchy. TAC also does not rely on entity extraction, a process that automatically flags people, places and organizations of interest from unstructured text documents, he said.

The process is sophisticated enough that TAC automatically “searches for all the various spellings of al-Qaida as well as any other known or suspected information such as phone numbers, addresses or other identifying data,” Fleischman said.

“One of the current systems is asking over 600,000 shared questions across hundreds of millions of documents every day, creating over seven billion answers that are delivered automatically to the users. Each user is able to select as many questions as they want to monitor on a real-time basis,” Fleischman said.

The questions can be very broad — anything about a certain country — or very detailed, such as events occurring at a specific time or location.

“I could write a very strategic question with broad terms to make sure I don’t miss anything and then write additional queries that keep filtering the data down to a manageable level,” he said. For example, question one: bombings; question two: bombings in Afghanistan; question three: bombings in Kabul; question four: bombings in the Green Zone.

“Each question is getting more specific, but I only have to write the bombing part of the question once and then can reuse it for each specific location,” he said.

The software includes forms and templates that make it relatively easy for analysts to write the encoded questions that are used in the search.

“We can teach an analyst to be fairly fluent in half a day,” Fleischman said.

As TAC searches through data to find intelligence that answers the questions that analysts have asked, it uses “powerful pattern matching language” to compensate for misspellings and alternate spellings, Fleischman said.

It also uses “item proximity” to determine the relevance of words or numbers in relation to one another in a document or data set. Wild cards can be set to search for unknown data, and Boolean logic, a math-based analysis, is used to find relationships between the questions asked and answers hidden in the data.

Although the system does not recognize images, it does read metadata tagged to images, such as longitude and latitude indicators and time stamps, Fleischman said. It can also read captions, including captions attached to video.

When search results appear onscreen, TAC highlights important words and numbers in documents in red so they are easily spotted by analysts. It highlights terms in blue that are linked to questions that other analysts have asked about the same data.

And the software permits analysts to attach information tags and annotations to the highlighted items to further explain their relevance.

The system “was built for collaboration,” Fleischman said. “Your questions can be shared by everybody on the system — within classification and need-to-know constraints.”

The advantage of such teamwork is that one analyst will ask questions that another didn’t think to ask, sometimes revealing relationships between seemingly unrelated data.

TAC includes tools that will compare current documents with archived intelligence and sort documents according to the dates on which they appear. Items of interest — names, locations, incidents — are displayed in small boxes on a computer screen.

Clicking on a particular item retrieves intelligence related to the item while graphs in the box depict the frequency with which the item has appeared in recent data.

The arrangement enables analysts to quickly compare current and historic intelligence.

Since real-time analysis of data is a key TAC capability, the software is designed to alert analysts immediately when fresh intelligence is detected.

Fleischman helped develop the first version of TAC for the U.S. government in 2005. (The name stands for “tripwire analytic capability,” although “tripwire” is not used because it conflicts with another trademarked product.) At the time, he was chief technology officer of Kestrel Enterprises, a small company in Annapolis Junction, Md. By 2008, Kestrel’s software looked promising enough that Boeing bought the company along with several small technology firms as part of a move into business areas that are expected to grow even as other military spending shrinks.

In 2011, Boeing opened a 32,000-square-foot Cyber Engagement Center in Annapolis Junction right across the highway from the National Security Agency. At a ribbon-cutting ceremony, Boeing officials said the center will “work collaboratively with our customers to help defend their critical infrastructure,” as well as to protect Boeing’s own vast cyber network.

“We provide software as a service,” Fleishman said. That means TAC software manages systems on the customers’ servers, he said.

Boeing also provides software development, upgrades and maintenance, and offers a help desk, user support and training. Some of those are considered “engineering services above and beyond TAC,” and come at an extra cost, he said.

In July, the Defense Threat Reduction Agency announced plans to award Boeing a no-bid contract for TAC support, which the agency said they needed to combat weapons of mass destruction. DTRA also said it is hiring Boeing to “provide interfaces to existing data and analytic services and possibly combine services/data feeds in order to create unique data fusion opportunities.”

The DTRA deal came a month after the Army’s National Ground Intelligence Center announced it is interested in obtaining tripwire analytic capability training and on-site system support for up to 400 intelligence analysts in Charlottesville, Va.

Other TAC clients include the Defense Intelligence Agency, the Air Force Intelligence Analysis Agency and the Joint Staff’s director of intelligence, who uses TAC for real time searching of data on the Defense Department’s classified network, SIPRNET.

The Joint IED Defeat Organization — JIEDDO — may have been the biggest fan, having spent at least $24.2 million customizing TAC for use as “a web-based analytic system that enhances counter-IED decision making through quantitative analytics,” according to agency documents.

At JIEDDO, TAC searches data as it streams into the agency’s computer network. The software also compares new-found intelligence to archived data to hunt for connections. Something as obscure as the part number stamped on a piece of an improvised bomb, for example, might tie multiple bombs together, or link the bombs to parts manufacturers. Bomb locations and other common characteristics might begin to shed light on the individuals or groups who make and plant the bombs.

It turns out that JIEDDO is a big fan of lots of analytic software. The Government Accountability Office reported in February that “JIEDDO has funded the development and support of approximately 70 electronic data collection and analysis tools.” Not surprisingly, there was “overlap to some degree,” the GAO said.

No comments:

Post a Comment