Terminology: Merging duplicate entries made easy

September 19, 2024
Barbara Inge Karsch

Independent terminology consultant and trainer

One of the most daunting tasks for companies that decide to introduce terminology management is populating the termbase with entries in the first place. I don't mean creating entries from scratch, but rather collecting and importing existing terminologies. In this post, I will look at the process of adding legacy terminology to Kalcium Quickterm with the main focus on merging duplicate entries, technically called doublettes. I will finish with outlining some performance data, and I hope to show you that what may initially seem like a miserable endeavor can actually be accomplished in a reasonable timeframe.

Although most companies don't formally manage their terminology, terminology work happens anyway: product managers coin new names for their products; content publishers check correct terminology usage; translators research concepts in order to find correct equivalents in their target languages, and so on. When the decision to manage the corporate language in a centralized terminology management system (TMS) is made, someone will have to collect and harmonize terms and concepts. The desired end result is to have one entry per concept including all designations and supporting terminological information (e.g., part of speech, definition, definition source).

So, the goal is clear. Ideally a trained terminologist would be entrusted with this task. They will collect terminologies from as many resources as they can identify, such as glossaries from WorldServer, spreadsheets from product managers, or lists published in style guides. The data from all these sources would then be compiled in a termbase-compatible format and imported into Kalcium.

Concept harmonization is not trivial. But the mechanics to remove doublettes in Kalcium could not be easier

Some degree of cleaning is necessary before this information can be imported. For example, if a column in a spreadsheet combines two or more types of data, they must be separated.

The information in the "Forbidden English" column above needs to be split into separate lines of synonyms with their correct usage status. Here is one way to organize the above data:

In this example, it is easy to import all the terms into one entry. But if these synonyms and their terminological data are documented in different spreadsheets, we may not immediately recognize that they all refer to the same concept. That is really the task of concept harmonization. Is it simple? No. But Kalcium Quickterm makes merging records very easy.

In the following screenshot, we can see the button that allows us to merge two entries.

When we click on it, we get to a form where we select the additional entry to check against. The form below shows us data categories containing information from both entries.

We can delete what we don't need, improve existing data, and add additional information. We can also decide whether we want to delete both entries and create a completely new one, or choose an existing entry to retain.

In the above example, I deleted the second definition and modified the first. I added a definition source and a usage status of "deprecated" for the term "stick. I would do more, if I didn't already know that there is another entry for the same concept with additional information. Therefore I don't spend time completing all the necessary information for this concept, but simply repeat the merge process. This brings me to the following form.

I analyze and research as necessary and combine the information into one correct and complete entry.

Concept harmonization is not trivial. But the mechanics to remove doublettes in Kalcium could not be easier. In a recent project, I started out with the following:

  • Seven spreadsheets, each with different data categories
  • Close to 4000 rows
  • Up to 25 languages per spreadsheet
  • Violation of concept orientation, term autonomy, and data elementarity

In my 25 years of terminology work, this was probably one of my highest productivity rates for this task.

After 80 hours all the data was in the termbase and most concepts were harmonized. Not all entries had all the mandatory data categories filled in, but obvious doublettes that did not need additional research had been merged. The result was a little over 3500 entries with close to 30,000 terms and names in 31 languages. While some additional data validation work remains, the resulting centralized data is already much more useful than the spreadsheets had been before. In my 25 years of terminology work, this was probably one of my highest productivity rates for this task. Needless to say, I am a huge fan of the Quickterm Merge feature.

About Barbara Inge Karsch

Barbara Inge Karsch is the owner of BIK Terminology, a terminology consultancy and training company. As consultant and trainer, Barbara works with companies and organizations on terminology training, terminology development and the implementation of terminology management systems. She draws on her 14-year experience as in-house terminologist for J.D. Edwards and Microsoft. Since 2012, Barbara has been teaching in the Master's program at New York University where she was recently promoted to Adjunct Associate Professor. As US delegate to ISO TC 37, Barbara led the revision of ISO 12616, Terminology work in support of multilingual communication – Part 1: Fundamentals of translation-oriented terminography.

More about our guest author:

Share
icon cta

Kaleidoscope: Taking your content global

We combine our expertise and software solutions as well as those of carefully selected technology partners to create the right solutions to enable you to achieve success on the global market with your content. Thanks to our innovations and further developments, we continuously make it easier for you to manage terminology, quality, reviews, queries, and automation.

Scroll to Top