Text Based Custom Named Entity Tagger (TeBaC-NET)

I was recently exploring spaCy for some NLP work, and found that the default model was not sufficient for tagging entities in the domain I was exploring. The documentation was very helpful in explaining how I could train the statistical model of the named entity recognizer, but I needed training and evaluation data.

While I could tag them manually, I felt that I needed a method/tool to do it in an efficient manner – especially if I was going to be tagging hundreds of lines of data. My key considerations were:

  • Free/low-cost (its a hobby project)
  • Simple (no need for complex features that are not needed)
  • Data is kept private/confidential

I first looked at existing solutions and found many existing solutions, but they did not fit my criteria. Some examples are below: (Note: this is not an exhaustive list, nor an endorsement/recommendation – its just some interesting ones I found during my research)

Eventually, I decided to make my own named entity tagger using Python. If you are new to my blog, I like to make my solutions/tools simple and basic, avoiding dependency on other packages so as to reduce the chances of them “breaking” down the road, due to external packages/modules/services.

I also tend to avoid using GUIs so that it can be run on a wider range of platforms (e.g. cloud instances, lightweight VMs, Docker containers) that do not have a window manager or desktop interface (reason: occasionally cheaper, and faster).

Thus, I present to you TeBaC-NET, which stands for “Text Based Custom Named Entity Tagger”. It is cross platform, runs on the command line / terminal, and uses Python 3.6 with the platform and os modules that come pre-installed by default (i.e. no need to pip install anything else).

GitHub link: https://github.com/davidloke/tebac-net

Instructions on how to use it can be found in the README.

In my next TeBaC-NET post, I will talk more about how it is used, and some of the UX considerations I had in mind when developing it. If you are interested to use, collaborate, feature request, etc, please leave a comment, or send me an email via the “contact” page.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s