Skip to content

Quadrilingual corpora

A data bank of political and media discourses about Russia's invasion of Ukraine in five countries, two belligerents, the United States, the United Kingdom, and France, includes war-related

  1. speeches of political leaders (Volodymyr Zelensky, Vladimir Putin, Joe Biden, Emmanuel Macron and the Prime Ministers of the United Kingdom Boris Johnson, Liz Truss and Rishi Sunak),
  2. debates in national legislatures (Ukrainian Rada, Russian Duma, U.S. Congress, British Parliament and French Assemblée nationale),
  3. news items published in legacy media (ICTV, RBC Ukraina, Kommersant, Izvestia, First TV Channel, the New York Times, the Washington Post, USA Today, Fox, the Times, and Le Monde),
  4. news items published in digital media (Ukrainska Pravda, Liga, Strana, Gazeta.ru, Meduza), and
  5. posts in social media (VKontakte, Telegram).

The data bank covers the first two years of the all-out war (February 2022 to February 2024). It contains 215 million words in four languages: Ukrainian, Russian, English and French.

 

Dictionary of war

A quadrilingual "dictionary of war" has been created to mine and analyze big textual data about the invasion. It includes 500+ categories from "adversary" to "Zelensky." Each category contains several words and n-grams.

 

Patterns in political and media discourses

With the help of a dictionary-assisted analysis, it is possible to identify and study changing patterns in war coverage. For instance, the figure below visualizes similarities between various sources of political and media discourses about Russia’s invasion of Ukraine during the first two years of the war.

Two charts visualizing dynamics of the frequencies of mentions of NATO in various sources of political and media discourses during the same period serve as other illustrations. More examples, and a description of the original methodology, can be found in the recently published scholarly articles and a monograph.