Publications
List of all publications including workshop and conference papers, journals and thesis.
2024
- ICMRIdentification of Speaker Roles and Situation Types in News VideosGullal S. Cheema, Judi Arafat, Chiao-I Tseng, and 3 more authors2024
The proliferation of news sources on the web amplifies the problem of disinformation and misinformation, impacting public perception and societal stability. These issues necessitate the identification of bias in news broadcasts, whereby the analysis and understanding of speaker roles and news contexts are essential prerequisites. Although there is prior research on multimodal speaker role recognition (mostly) in the news domain, modern feature representations have not been explored yet, and no comprehensive public dataset is available. In this paper, we propose novel approaches to classify speaker roles (e.g., "anchor," "reporter," "expert") and categorise scenes into news situations (e.g., "report," "interview") in news videos, to enhance the understanding of news content. To bridge the gap of missing datasets, we present a novel annotated dataset for various speaker roles and news situations from diverse (national) media outlets. Furthermore, we suggest a rich set of features and employ aggregation and post-processing techniques. In our experiments, we compare classifiers like Random Forest and XGBoost for identifying speaker roles and news situations in video segments. Our approach outperforms recent state-of-the-art methods, including end-to-end multimodal deep network and unimodal transformer-based models. Through detailed feature combination analysis, generalisation and explainability insights, we underscore our models’ capabilities and set new directions for future research.
- ICMRUnveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian ConflictSherzod Hakimov, and Gullal S. Cheema2024
The ongoing Russo-Ukrainian conflict has been a subject of intense media coverage worldwide. Understanding the global narrative surrounding this topic is crucial for researchers that aim to gain insights into its multifaceted dimensions. In this paper, we present a novel multimedia dataset that focuses on this topic by collecting and processing tweets posted by news or media companies on social media across the globe. We collected tweets from February 2022 to May 2023 to acquire approximately 1.5 million tweets in 60 different languages along with their images. Each entry in the dataset is accompanied by processed tags, allowing for the identification of entities, stances, textual or visual concepts, and sentiment. The availability of this multimedia dataset serves as a valuable resource for researchers aiming to investigate the global narrative surrounding the ongoing conflict from various aspects such as who are the prominent entities involved, what stances are taken, where do these stances originate from, how are the different textual and visual concepts related to the event portrayed.
- SCSMIThe search for filmic narrative strategies in audiovisual news reporting: a progress reportChiao-I Tseng, John A. Bateman, Leandra Thiele, and 5 more authors2024
Audiovisual news reporting is now documented to involve many filmic techniques that bring news reporting ever closer to audiovisual storytelling. At SCSMI-2022 we introduced our FakeNarratives project, which undertakes a contrastive cataloguing of filmic narrative strategies in both mainstream and alternative news media to support the location of potentially problematic messaging. We now discuss the progress that has been made towards automating the recognition of filmic structures using diverse computational techniques for audiovisual processing. Results are maintained in a searchable richly annotated graph structure, allowing us to define narrative patterns in terms of formal combinations of filmic features present in the graph. By these means, we increase the scale of data on which filmic narrative patterns can be derived, empirically validated, and productively visualised. As the analytic framework is oriented to audiovisual material in general, we also show how aspects of the account may contribute to film research more broadly.
2023
- CLEFOverview of the CLEF-2023 CheckThat! Lab Task 1 on Check-Worthiness in Multimodal and Multigenre ContentFiroj Alam, Alberto Barrón-Cedeño, Gullal S. Cheema, and 8 more authors2023
We present an overview of CheckThat! Lab’s 2023 Task 1, which is part of CLEF-2023. Task 1 asks to determine whether a text item, or a text coupled with an image, is check-worthy. This task places a special emphasis on COVID-19, political debates and transcriptions, and it is conducted in three languages: Arabic, English, and Spanish. A total of 15 teams participated, and most submissions managed to achieve significant improvements over the baselines using Transformer-based models. Out of these, seven teams participated in the multimodal subtask (1A), and 12 teams participated in the Multigenre subtask (1B), collectively submitting 155 official runs for both subtasks. Across both subtasks, approaches that targeted multiple languages, either individually or in conjunction, generally achieved the best performance. We provide a description of the dataset and the task setup, including the evaluation settings, and we briefly overview the participating systems. As is customary in the CheckThat! lab, we have release all datasets from the lab as well as the evaluation scripts to the research community. This will enable further research on finding relevant check-worthy content that can assist various stakeholders such as fact-checkers, journalists, and policymakers.
- CLEFOverview of the CLEF-2023 CheckThat! Lab on Checkworthiness, Subjectivity, Political Bias, Factuality, and Authority of News Articles and Their SourceAlberto Barrón-Cedeño, Firoj Alam, Andrea Galassi, and 13 more authors2023
We describe the sixth edition of the CheckThat! lab, part of the 2023 Conference and Labs of the Evaluation Forum (CLEF). The five previous editions of CheckThat! focused on the main tasks of the information verification pipeline: check-worthiness, verifying whether a claim was fact-checked before, supporting evidence retrieval, and claim verification. In this sixth edition, we zoom into some new problems and for the first time we offer five tasks in seven languages: Arabic, Dutch, English, German, Italian, Spanish, and Turkish. Task 1 asks to determine whether an item —text or text plus image— is check-worthy. Task 2 aims to predict whether a sentence from a news article is subjective or not. Task 3 asks to assess the political bias of the news at the article and at the media outlet level. Task 4 focuses on the factuality of reporting of news media. Finally, Task 5 looks at identifying authorities in Twitter that could help verify a given target claim. For a second year, CheckThat! was the most popular lab at CLEF-2023 in terms of team registrations: 127 teams. About one-third of them (a total of 37) actually participated.
- FrontiersUnderstanding Image-Text Relations and News Values for Multimodal News AnalysisGullal S. Cheema, Sherzod Hakimov, Eric Müller-Budack, and 3 more authorsFrontiers in Artificial Intelligence 2023
The analysis of news dissemination is of utmost importance since the credibility of information and the identification of disinformation and misinformation affect society as a whole. Given the large amounts of news data published daily on the Web, the empirical analysis of news with regard to research questions and the detection of problematic news content on the Web require computational methods that work at scale. Today’s online news are typically disseminated in a multimodal form, including various presentation modalities such as text, image, audio, and video. Recent developments in multimodal machine learning now make it possible to capture basic “descriptive” relations between modalities–such as correspondences between words and phrases, on the one hand, and corresponding visual depictions of the verbally expressed information on the other. Although such advances have enabled tremendous progress in tasks like image captioning, text-to-image generation and visual question answering, in domains such as news dissemination, there is a need to go further. In this paper, we introduce a novel framework for the computational analysis of multimodal news. We motivate a set of more complex image-text relations as well as multimodal news values based on real examples of news reports and consider their realization by computational approaches. To this end, we provide (a) an overview of existing literature from semiotics where detailed proposals have been made for taxonomies covering diverse image-text relations generalisable to any domain; (b) an overview of computational work that derives models of image-text relations from data; and (c) an overview of a particular class of news-centric attributes developed in journalism studies called news values. The result is a novel framework for multimodal news analysis that closes existing gaps in previous work while maintaining and combining the strengths of those accounts. We assess and discuss the elements of the framework with real-world examples and use cases, setting out research directions at the intersection of multimodal learning, multimodal analytics and computational social sciences that can benefit from our approach.
- ECIRThe CLEF-2023 CheckThat! Lab: Checkworthiness, Subjectivity, Political Bias, Factuality, and AuthorityAlberto Barrón-Cedeño, Firoj Alam, Tommaso Caselli, and 8 more authors2023
The five editions of the CheckThat! lab so far have focused on the main tasks of the information verification pipeline: check-worthiness, evidence retrieval and pairing, and verification. The 2023 edition of the lab zooms into some of the problems and—for the first time—it offers five tasks in seven languages (Arabic, Dutch, English, German, Italian, Spanish, and Turkish): Task 1 asks to determine whether an item, text or a text plus an image, is check-worthy; Task 2 requires to assess whether a text snippet is subjective or not; Task 3 looks for estimating the political bias of a document or a news outlet; Task 4 requires to determine the level of factuality of a document or a news outlet; and Task 5 is about identifying authorities that should be trusted to verify a contended claim.
2022
- IJDHCombining Sentiment Analysis classifiers to explore multilingual news articles covering London 2012 and Rio 2016 olympicsCaio Mello, Gullal S. Cheema, and Gaurish ThakkarInternational Journal of Digital Humanities 2022
This study aims to present an approach for the challenges of working with Sentiment Analysis (SA) applied to news articles in a multilingual corpus. It looks at the use and combination of multiple algorithms to explore news articles published in English and Portuguese. It presents a methodology that starts by evaluating and combining four SA algorithms (SenticNet, SentiStrength, Vader and BERT, being BERT trained in two datasets) to improve the quality of outputs. A thorough review of the algorithms’ limitations is conducted using SHAP, an explainable AI tool, resulting in a list of issues that researchers must consider before using SA to interpret texts. We propose a combination of the three best classifiers (Vader, Amazon BERT and Sent140 BERT) to identify contradictory results, improving the quality of the positive, neutral and negative labels assigned to the texts. Challenges with translation are addressed, indicating possible solutions for non-English corpora. As a case study, the method is applied to the study of the media coverage of London 2012 and Rio 2016 Olympic legacies. The combination of different classifiers has proved to be efficient, revealing the unbalance between the media coverage of London 2012, much more positive, and Rio 2016, more negative.
- NAACLMM-Claims: A Dataset for Multimodal Claim Detection in Social MediaGullal S. Cheema, Sherzod Hakimov, Abdul Sittar, and 3 more authorsJul 2022
In recent years, the problem of misinformation on the web has become widespread across languages, countries, and various social media platforms. Although there has been much work on automated fake news detection, the role of images and their variety are not well explored. In this paper, we investigate the roles of image and text at an earlier stage of the fake news detection pipeline, called claim detection. For this purpose, we introduce a novel dataset, MM-Claims, which consists of tweets and corresponding images over three topics: COVID-19, Climate Change and broadly Technology. The dataset contains roughly 86000 tweets, out of which 3400 are labeled manually by multiple annotators for the training and evaluation of multimodal models. We describe the dataset in detail, evaluate strong unimodal and multimodal baselines, and analyze the potential and drawbacks of current models.
- SemEval @
NAACLTIB-VA at SemEval-2022 Task 5: A Multimodal Architecture for the Detection and Classification of Misogynous MemesSherzod Hakimov, Gullal S. Cheema, and Ralph EwerthJul 2022The detection of offensive, hateful content on social media is a challenging problem that affects many online users on a daily basis. Hateful content is often used to target a group of people based on ethnicity, gender, religion and other factors. The hate or contempt toward women has been increasing on social platforms. Misogynous content detection is especially challenging when textual and visual modalities are combined to form a single context, e.g., an overlay text embedded on top of an image, also known as meme. In this paper, we present a multimodal architecture that combines textual and visual features to detect misogynous memes. The proposed architecture is evaluated in the SemEval-2022 Task 5: MAMI - Multimedia Automatic Misogyny Identification challenge under the team name TIB-VA. We obtained the best result in the Task-B where the challenge is to classify whether a given document is misogynous and further identify the following sub-classes: shaming, stereotype, objectification, and violence.
2021
- CLEOPATRA @
WWWOn the Role of Images for Analyzing Claims in Social MediaGullal S. Cheema, Sherzod Hakimov, Eric Müller-Budack, and 1 more authorJul 2021Fake news is a severe problem in social media. In this paper, we present an empirical study on visual, textual, and multimodal models for the tasks of claim, claim check-worthiness, and conspiracy detection, all of which are related to fake news detection. Recent work suggests that images are more influential than text and often appear alongside fake text. To this end, several multimodal models have been proposed in recent years that use images along with text to detect fake news on social media sites like Twitter. However, the role of images is not well understood for claim detection, specifically using transformer-based textual and multimodal models. We investigate state-of-the-art models for images, text (Transformer-based), and multimodal information for four different datasets across two languages to understand the role of images in the task of claim and conspiracy detection.
- MMPT @
ICMRA fair and comprehensive comparison of multimodal tweet sentiment analysis methodsGullal S. Cheema, Sherzod Hakimov, Eric Müller-Budack, and 1 more authorJul 2021Opinion and sentiment analysis is a vital task to characterize subjective information in social media posts. In this paper, we present a comprehensive experimental evaluation and comparison with six state-of-the-art methods, from which we have re-implemented one of them. In addition, we investigate different textual and visual feature embeddings that cover different aspects of the content, as well as the recently introduced multimodal CLIP embeddings. Experimental results are presented for two different publicly available benchmark datasets of tweets and corresponding images. In contrast to the evaluation methodology of previous work, we introduce a reproducible and fair evaluation scheme to make results comparable. Finally, we conduct an error analysis to outline the limitations of the methods and possibilities for the future work.
- CLEOPATRA @
WWWOEKG: The Open Event Knowledge GraphSimon Gottschalk, Endri Kacupaj, Sara Abdollahi, and 8 more authorsJul 2021Accessing and understanding contemporary and historical events of global impact such as the US elections and the Olympic Games is a major prerequisite for cross-lingual event analytics that investigate event causes, perception and consequences across country borders. In this paper, we present the Open Event Knowledge Graph (OEKG), a multilingual, event-centric, temporal knowledge graph composed of seven different data sets from multiple application domains, including question answering, entity recommendation and named entity recognition. These data sets are all integrated through an easy-to-use and robust pipeline and by linking to the event-centric knowledge graph EventKG. We describe their common schema and demonstrate the use of the OEKG at the example of three use cases: type-specific image retrieval, hybrid question answering over knowledge graphs and news articles, as well as language-specific event recommendation. The OEKG and its query endpoint are publicly available.
2020
- CheckThat @
CLEFCheck square at CheckThat! 2020: Claim Detection in Social Media via Fusion of Transformer and Syntactic FeaturesGullal S. Cheema, Sherzod Hakimov, and Ralph EwerthJul 2020In this digital age of news consumption, a news reader has the ability to react, express and share opinions with others in a highly interactive and fast manner. As a consequence, fake news has made its way into our daily life because of very limited capacity to verify news on the Internet by large companies as well as individuals. In this paper, we focus on solving two problems which are part of the fact-checking ecosystem that can help to automate fact-checking of claims in an ever increasing stream of content on social media. For the first problem, claim check-worthiness prediction, we explore the fusion of syntactic features and deep transformer Bidirectional Encoder Representations from Transformers (BERT) embeddings, to classify check-worthiness of a tweet, i.e. whether it includes a claim or not. We conduct a detailed feature analysis and present our best performing models for English and Arabic tweets. For the second problem, claim retrieval, we explore the pre-trained embeddings from a Siamese network transformer model (sentence-transformers) specifically trained for semantic textual similarity, and perform KD-search to retrieve verified claims with respect to a query tweet.
- MediaEvalTIB’s Visual Analytics Group at MediaEval’20: Detecting Fake News on Corona Virus and 5G ConspiracyGullal S. Cheema, Sherzod Hakimov, and Ralph EwerthJul 2020
Fake news on social media has become a hot topic of research as it negatively impacts the discourse of real news in the public. Specifi-cally, the ongoing COVID-19 pandemic has seen a rise of inaccurate and misleading information due to the surrounding controversies and unknown details at the beginning of the pandemic. The Fak-eNews task at MediaEval 2020 tackles this problem by creating a challenge to automatically detect tweets containing misinformation based on text and structure from Twitter follower network. In this paper, we present a simple approach that uses BERT embeddings and a shallow neural network for classifying tweets using only text, and discuss our findings and limitations of the approach in text-based misinformation detection.
- BIGMMSemi-supervised clustering with neural networksAnkita Shukla, Gullal S. Cheema, and Saket AnandJul 2020
Clustering using neural networks has recently demonstrated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learning or their dependence on large set of labeled data samples. In this paper, we propose ClusterNet that uses pairwise semantic constraints from very few labeled data samples (<5% of total data) and exploits the abundant unlabeled data to drive the clustering approach. We define a new loss function that uses pairwise semantic similarity between objects combined with constrained k-means clustering to efficiently utilize both labeled and unlabeled data in the same framework. The proposed network uses convolution autoencoder to learn a latent representation that groups data into k specified clusters, while also learning the cluster centers simultaneously. We evaluate and compare the performance of ClusterNet on several datasets and state of the art deep clustering approaches.
2019
- PRICAIPrimate face identification in the wildAnkita Shukla, Gullal S. Cheema, Saket Anand, and 2 more authorsJul 2019
Ecological imbalance owing to rapid urbanization and deforestation has adversely affected the population of several wild animals. This loss of habitat has skewed the population of several non-human primate species like chimpanzees and macaques and has constrained them to co-exist in close proximity of human settlements, often leading to human-wildlife conflicts while competing for resources. For effective wildlife conservation and conflict management, regular monitoring of population and of conflicted regions is necessary. However, existing approaches like field visits for data collection and manual analysis by experts is resource intensive, tedious and time consuming, thus necessitating an automated, non-invasive, more efficient alternative like image based facial recognition. The challenge in individual identification arises due to unrelated factors like pose, lighting variations and occlusions due to the uncontrolled environments, that is further exacerbated by limited training data. Inspired by human perception, we propose to learn representations that are robust to such nuisance factors and capture the notion of similarity over the individual identity sub-manifolds. The proposed approach, Primate Face Identification (PFID), achieves this by training the network to distinguish between positive and negative pairs of images. The PFID loss augments the standard cross entropy loss with a pairwise loss to learn more discriminative and generalizable features, thus making it appropriate for other related identification tasks like open-set, closed set and verification. We report state-of-the-art accuracy on facial recognition of two primate species, rhesus macaques and chimpanzees under the four protocols of classification, verification, closed-set identification and open-set recognition.
- CVWC @
ICCVA Hybrid Approach to Tiger Re-IdentificationAnkita Shukla, Connor Anderson, Gullal S. Cheema, and 5 more authorsJul 2019Visual data analytics is increasingly becoming an important part of wildlife monitoring and conservation strategies. In this work, we discuss our solution to the image-based Amur tiger re-identification (Re-ID) challenge hosted by the CVWC Workshop at ICCV 2019. Various factors like poor quality images, lighting and pose variations, and limited images per identity make tiger Re-ID a difficult task for deep learning models. Consequently, we propose to utilize both deep learning and traditional SIFT descriptor-based matching for tiger re-identification. The proposed deep network is based on a DenseNet model, fine-tuned by minimizing a classification cross-entropy loss regularized by a pairwise KL-divergence loss that promotes better semantically discriminative features. We also utilize several data transformations to improve the model’s robustness and generalization across views and image quality variations. We establish the efficacy of our approach on the ’Plain Re-ID’ challenge task by reporting results on the pre-cropped tiger Re-ID dataset. To further test our Re-ID model’s robustness to detection quality, we also report results on the ’Wild Re-ID’ task, which incorporates learning a tiger detection model. We show that our model is able to perform well on both the plain and wild Re-ID tasks. Code will be available at https://github.com/FGVC/DelPro.
2017
- ECML PKDDAutomatic detection and recognition of individuals in patterned speciesGullal S. Cheema, and Saket AnandJul 2017
Visual animal biometrics is rapidly gaining popularity as it enables a non-invasive and cost-effective approach for wildlife monitoring applications. Widespread usage of camera traps has led to large volumes of collected images, making manual processing of visual content hard to manage. In this work, we develop a framework for automatic detection and recognition of individuals in different patterned species like tigers, zebras and jaguars. Most existing systems primarily rely on manual input for localizing the animal, which does not scale well to large datasets. In order to automate the detection process while retaining robustness to blur, partial occlusion, illumination and pose variations, we use the recently proposed Faster-RCNN object detection framework to efficiently detect animals in images. We further extract features from AlexNet of the animal’s flank and train a logistic regression (or Linear SVM) classifier to recognize the individuals. We primarily test and evaluate our framework on a camera trap tiger image dataset that contains images that vary in overall image quality, animal pose, scale and lighting. We also evaluate our recognition system on zebra and jaguar images to show generalization to other patterned species. Our framework gives perfect detection results in camera trapped tiger images and a similar or better individual recognition performance when compared with state-of-the-art recognition techniques.
2016
- Master’s ThesisAnisotropic mean shift clustering using distance metric learningGullal S. Cheema, and Saket AnandJul 2016
Mean shift is a non-parametric mode seeking procedure widely used in many computer vision problems. Mean shift clustering in particular is a well studied and established algorithm, which has many merits over the classic k-means clustering algorithm. These algorithms repeatedly calculate distance between data points to compute mean shift vector and cluster mean respectively using some distance function. In most of the cases, Euclidean distance function is used which weighs every dimension equally in the input space and thus often fails to capture the semantics of the data. To alleviate this problem, a general form of distance metric based on Mahalanobis distance is used that can be learned using the training data. Distance metric learning has received a lot of attention in recent years and has proven to be very successful in various problem domains. By learning a Mahalanobis distance metric, the input space is transformed such that, similar points get closer to each other and dissimilar points move further apart. A lot of research has been done on learning a global metric and integrating it with k-means algorithm, but there have been very few efforts of integrating metric learning with mean shift clustering. This work focuses on developing a unified framework for improving mean shift clustering by using global and local metric learning. We use a recently proposed Sparse Compositional Metric Learning (SCML) framework and integrate it with mean shift clustering to investigate the affect of using local metrics over a global metric. We also perform kernelization in the proposed framework that can handle datasets with non-linear decision boundaries. To establish the effectiveness of our approach, we performed experiments on 6 datasets of varying difficulty.