TeBaC-NET Design Considerations

In my previous post on TeBaC-NET, I talked about the reason why I created it. In this post, I talk about why I created it the way it is.

Design Considerations

#1 Cross Platform

One of the most important consideration is that it should be platform agnostic. A simple tool that can run on any type of operating system would reduce the amount of time/effort taken to get started with the actual work.

This means that it cannot be coded specifically as a Windows Batch or Linux Shell Script. An intermediary platform/container is required – such as a web browser or a interpreter.

#2 Tagging Efficiency

It is time consuming to manually tag hundreds of lines of data. Increasing the number of HCIs only adds “latency” into the process. Imagine a user having to click on the word to be tagged, typing the custom tag, and clicking an “add” button on the UI. The amount of time spent moving from one input device to another would slowly add up and lengthen the process.

Therefore, to ensure that the tool does not encumber the user’s efficiency – only one type of input device has to be used. In this case, the keyboard made the most sense as custom tags will need to be typed (instead of clicking letter by letter via an on-screen keyboard).

#3 Development Speed

TeBaC-NET is a tool used to achieve a very specific task in a much larger project. As such, I did not want to spend excessive amounts of time developing and debugging it.



With these factors in mind, I chose to go with an interpreter approach as it further reduced the need of a Desktop or window manager. As the program is rather simple, Python was used to quickly code, debug, and deploy a working version as it was not a strongly typed language, as compared to Java.

I took inspiration from Vim to create the TeBaC-NET commands used to manipulate the custom tags, entities, and data. For example, going to the next line of data (to be tagged) is :nn, and deleting a wrongly tagged word can be achieved with :dd. This is something intuitive to me, but it may not be the case for other users of the program.

Lastly, the small size of the program makes it easy to bring TeBaC-NET to the data, instead of extracting the data out onto another host. This is particularly useful if one is building a NER model from scratch, and has hundreds of lines of training and evaluation data to be tagged.

If you are interested, do check out TeBaC-NET at my Github: https://github.com/davidloke/tebac-net



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s