Data – the new gold of our times

Does the term GDPR ring a bell with you? Like most people, it probably does, but, like most people, you’re probably unsure what its purpose is. Also, what’s the deal with those annoying cookie banners that have made their way, rather unsuccessfully, into our daily lives in the seemingly vast galaxy of the online world – a world that no longer seems as enormous and wide as it did some 30 years ago?

Well, all our activities and movements on the World Wide Web, however brief and insignificant they might appear, leave behind traces – proverbial crumbs called cookies – that are worth their weight in gold to many companies. The value of this ‘gold’ is firmly anchored in data, which gives companies precise, and thus vital, information about our consumption or user behaviour, our interests and our social circumstances. These useful nuggets help companies generate profit, even if they come at the cost of manipulation. So it seems obvious that this valuable asset should be well protected.

So what’s all the fuss?

What one can do with the masses of information acquired is clearly visible in the latest results from the field of machine learning. Large Language Models (LLMs) are impressive indicators of how much value and use we can get out of such data, which, incidentally, is witnessing a tenfold increase year on year, if we know how to use it properly. It’ll be interesting to see what disruptive changes this development has in store for us in the near future.

Incidentally, these human-like systems reflect the conflicts and flaws in our society, starting with discrimination in all its forms. After all, we are all only human, and these talking bundles of data have learned all their information from … well, us!

Another interesting aspect is that the word ‘datum’, which is the singular form of the word ‘data’, only exists etymologically and is rarely, if at all, used in spoken language.

Input vs. instinct – what do you do with all that data?

So where does the challenge lie? Much like tidying up a child’s room in order to find things quickly, it’s important to start by sorting, structuring and, ideally, categorising the deluge of information that data provides. Talking about children, debates also abound in the field of linguistics regarding whether linguistic input (data) or so-called universal grammar (learning algorithms), which describes an instinctive linguistic disposition in humans, is responsible for the phenomenon of rapid learning of language in infancy.

As a society, we’re experiencing something similar: the mountainous volumes of information need to be understood and put to good use, because every company – knowingly or unknowingly – accumulates vast amounts of data. This isn’t news; it’s just that we are now more aware of its existence. In addition to protecting third-party data of customers or service providers, we also need to give more thought to how we exploit and utilise this information for the benefit of everyone involved.

A great example here are the widely used translation memories and terminology databases in our business, which are treasure troves of knowledge and information as well as repositories that store the linguistic artwork of translators and editors. Making skilful use of these and refining them even further with the help of metadata is both an exciting and awe-inspiring task, which we tackle day after day using an analytical approach coupled with a sensible amount of pragmatism.

Data quality plays a crucial role in machine learning. A language model is only as good as its input. The recipe for success here lies in a key ingredient: skilful data annotation.

So let’s get to work! And don’t worry – if you know what you’re doing, it’s child’s play!

Mohamed Boudan

Mohamed joined Apostroph in 2021 as a language technology expert. As a professional linguist, he brings valuable insights to the Group thanks to his passion and expertise in language and emerging and innovative technologies. As a family man, he attaches great importance to social cooperation and togetherness – which is why he is convinced that technology should serve as a means to help people and not just exist as an end in itself.
Mohamed Boudan