Tracking crypto wallets with NLP

Tracking crypto wallets with NLP has designed and developed a Text Analysis module, with its own service and semantic engine, capable of extracting the address IDs of certain types of crypto wallets.  A subset of cryptocurrencies of interest has been identified and requested to be retrieved. They are Bitcoin, Bitcoin Cash, Litecoin, and Zcash. Each one of them possesses own logic and format to generate crypto wallet IDs, and they have been deeply studied by for a better understanding of what to extract by exploiting both regular expressions and rule-based methods (acting in disambiguation). 


Bitcoin contemplates three address formats, called P2PKH, P2SH, and Bech32. They have traceable patterns that have been integrated in a unique regular expression. 

Table 1.  Bitcoin extraction  

Bitcoin Cash 

Bitcoin Cash is based on the Bech32 format used for Bitcoin, and initially they were extremely similar, differing in just few aspects. This soon became an issue because of the ambiguity it created between Bitcoin Cash and certain addresses of Bitcoin, and it was only solved with the creation of a new and safer format called CashAddr, while the old one was renamed Legacy. CashAddr includes three possible prefixes, and some changes inside the code.  
All the CashAddr pattern possibilities have been implemented in a unique regular expression. 

Table 2.  Bitcoin Cash extraction 

About the Legacy format, the regex developed was integrated with a Cogito rule capable of disambiguating Bitcoin Cash from Bitcoin based on the context. 


Also Litecoin addresses are based on Bitcoin formats, this time on P2PKH and P2SH. Even here, as in Bitcoin Cash Legacy, there could be ambiguity with Bitcoin, and as in Bitcoin Cash Legacy, Cogito rules intervene to disambiguate thanks to the contexts. 

Figure 1.  Litecoin disambiguation

Table 3.  Litecoin extraction 


Zcash possesses 3 different formats, called transparent, sprout, and sapling. Since they are highly differentiated one from each other, 3 different regular expressions have been developed, and Cogito associates them to the corresponding format: 

Table 4.  Zcash Transparent extraction

Table 5.  Zcash Sprout extraction  

Table 6. Zcash Sapling extraction

Online tool 

The tool has been presented in the context of NOTIONES Working Group 3 – Tools for tracing cryptocurrencies used in criminal finances. 

It has been published in demo format in order to make it available for trial purposes (REST client needed). 

Try it and enjoy! 

CRYPTO Extraction Service

Configuration of the REST client:

Author(s): Ciro Caterino,