In my previous post on TeBaC-NET, I talked about the reason why I created it. In this post, I talk about why I created it the way it is.
#1 Cross Platform
One of the most important consideration is that it should be platform agnostic. A simple tool that can run on any type of operating system would reduce the amount of time/effort taken to get started with the actual work.
This means that it cannot be coded specifically as a Windows Batch or Linux Shell Script. An intermediary platform/container is required – such as a web browser or a interpreter.
#2 Tagging Efficiency
It is time consuming to manually tag hundreds of lines of data. Increasing the number of HCIs only adds “latency” into the process. Imagine a user having to click on the word to be tagged, typing the custom tag, and clicking an “add” button on the UI. The amount of time spent moving from one input device to another would slowly add up and lengthen the process.
Therefore, to ensure that the tool does not encumber the user’s efficiency – only one type of input device has to be used. In this case, the keyboard made the most sense as custom tags will need to be typed (instead of clicking letter by letter via an on-screen keyboard).
#3 Development Speed
TeBaC-NET is a tool used to achieve a very specific task in a much larger project. As such, I did not want to spend excessive amounts of time developing and debugging it.
With these factors in mind, I chose to go with an interpreter approach as it further reduced the need of a Desktop or window manager. As the program is rather simple, Python was used to quickly code, debug, and deploy a working version as it was not a strongly typed language, as compared to Java.
I took inspiration from Vim to create the TeBaC-NET commands used to manipulate the custom tags, entities, and data. For example, going to the next line of data (to be tagged) is
:nn, and deleting a wrongly tagged word can be achieved with
:dd. This is something intuitive to me, but it may not be the case for other users of the program.
Lastly, the small size of the program makes it easy to bring TeBaC-NET to the data, instead of extracting the data out onto another host. This is particularly useful if one is building a NER model from scratch, and has hundreds of lines of training and evaluation data to be tagged.
If you are interested, do check out TeBaC-NET at my Github: https://github.com/davidloke/tebac-net