June 21, 2021


Constructing Flow Graphs from Procedural Cybersecurity Texts. (arXiv:2105.14357v1 [cs.CL])

Following procedural texts written in natural languages is challenging. We
must read the whole text to identify the relevant information or identify the
instruction flows to complete a task, which is prone to failures. If such texts
are structured, we can readily visualize instruction-flows, reason or infer a
particular step, or even build automated systems to help novice agents achieve
a goal. However, this structure recovery task is a challenge because of such
texts’ diverse nature. This paper proposes to identify relevant information
from such texts and generate information flows between sentences. We built a
large annotated procedural text dataset (CTFW) in the cybersecurity domain
(3154 documents). This dataset contains valuable instructions regarding
software vulnerability analysis experiences. We performed extensive experiments
on CTFW with our LM-GNN model variants in multiple settings. To show the
generalizability of both this task and our method, we also experimented with
procedural texts from two other domains (Maintenance Manual and Cooking), which
are substantially different from cybersecurity. Our experiments show that Graph
Convolution Network with BERT sentence embeddings outperforms BERT in all three