Total Facebook Posts colleted of which 2498 were analized by our system.
182584887 retrieved
Total Twitter Posts colleted of which 5335 were analized by our system.
58932 entities detected
Total number of entities available on our system.
Methods
Nowadays, Facebook and Twitter, two of the largest social networks in the world, already have more than 1 billion users and more than 340 million posts daily (Zephoria. 2017). Consequently, the amount of data available is growing every day. With the easiness of access (through mobile devices), users update their profile almost in real time. Therefore, there are many relevant events reported first through social networks post, and only then reported in journalistic media sources. Wherefore, it is necessary to detect this kind of information at early stages.
In project REMINDS we will develop systems to perform an analysis of public information transmitted through Social Networks, to automatically filter and show the information that is potentially relevant to a general audience.
About
What we do
Features
How we do it
Future Goals
What we will try to achieve
Team & Partners
Álvaro Figueira
Principal Investigator
Pedro Ribeiro
Researcher
Luís Torgo
Researcher
Ana Alves
Researcher
Hugo Oliveira
Researcher
Luciana Oliveira
Social Media Discourse
Fernando Zamith
Online Journalism
Alexandre Pinto
Fellow - Entity Detection
Nuno Guimarães
Fellow - Sentiment Analysis
Filipe Batista
Fellow - Entity Detection
Filipe Miranda
Fellow - Automatic Relevance Detection
Diogo Cunha
Project Research Assistant - Web Developer
André Brandão
Project Research Assistant - System Maintenance and Web Developer
Paula Fortuna
Fellow - Feature Extraction and Text Mining
Miguel Sandim
Fellow - Feature Extraction and Text Mining
CatarinaCosta
Project Research Assistant - Database Architect
Rui Pereira
Project Research Assistant - Crawling Engine
Gonçalo Paredes
Univ. Porto
Project Research Assistant - Social Network Analysis
Rui Encarnação
Univ. Coimbra
Fellow - Document Clustering
Luis Francisco-Revilla
Texas Advanced Computing Center
Researcher - Information Visualization
Matthew Lease
Univ. of Texas at Austin
Researcher - Machine Learning
Jacek Gwizdka
Univ. of Texas at Austin
Researcher - Human-Computer Interaction
Elham Khabiri
IBM - New York
Researcher - Social Media Analysis
Video
Project: UTAP-ICDT/EEI-CTP/0022/2014
Period: 27-04-2015 to 10-11-2017
About
In project REMINDS we will develop systems to perform an analysis of public information transmitted through Social Networks, to automatically filter and show the information that is potentially relevant to a general audience. Although Social Networks are a source for a tremendous amount of information, much of the information is either private (yet granting public access in most cases), personal, not important or simply irrelevant to the general audience. Despite this situation, we have witnessed in the last years many important news and mass opinions on relevant issues being conveyed by social networks, usually surpassing in speed the broadcast through the traditional media of important events.
The main issue that REMINDS tackles is then to create a system capable of detecting, in the social networks sea of data, “relevant information”, while filtering and ignoring private comments and personal information. As Saracevic (2007) says, “relevance is a, if not even the, key notion in information science in general and information retrieval in particular”. For the author, relevance can assume different manifestations in information science, such as “system or algorithmic”, “topical or subject”, “cognitive relevance or pertinence”, “situational relevance or utility” and “affective relevance”. There is work in the area of event detection and also about influence, and detection of controversial topics. For instance, Diakopoulos (2010) studied the polarity of the opinions given by Twitter users. Based on the polarity and on the identified events, they were able to understand the general feeling of the opinions. They also used the “Pearson correlation” between positive and negative responses to measure the degree of controversy of the discussed topics, identifying strong oppositions in the opinions. Thelwall (2010) studied the sentiment polarity in the MySpace social network and discovered that nearly 2/3 of the users express emotions. Gomez-Rodriguez (2012) defined an information cascade model and developed an algorithm capable of inferring networks of influence and diffusion for the propagated topics. Leskovec (2006) have before that studied information cascades, i.e. the propagation of actions or ideas due the influence of others. Bakshy (2011) have recently quantified the relative influence of users on Twitter. They found a correlation between the largest cascades and the most influential users, as well as between the number of followers and the past local influence.
It is interesting to notice that despite relevance being 80 years old and many attempts to define its contributing factors have been drawn, no conclusive results have been drawn and debate continues. Xu and Chen in 2006 wrote: “...there is no agreement on factors beyond topicality, neither in terms of what they should be nor of how important they are… Naturalistic inquiry with qualitative research methods has been advocated and adopted by many researchers… [yet] almost no study of relevance judgment had adopted a confirmatory approach.” The REMINDS team has experience in text-mining, information retrieval and community detection.
Our system from a previous project (“Breadcrumbs”) allows us already to automatically detect in news fragments the answers to three standard journalistic questions: Who? / Where? / When?. Our team has also experience in sentiment analysis (important to understand if topic is being polemic or controversial) and on ranking comments on the social web. Our partner company in this research – INTERRELATE – is a startup whose business is “mining, interrelating, sensing and analyzing online information”. Therefore, we propose creation and analysis of an automatic relevancy detection system.
Our methodology will be based on two standard, realistic, yet new from the technological point of view, approaches, to detect relevancy, plus a third “speculative” approach. The first approach will be trying to test for irrelevance (and eventually failing, concluding for some degree of relevance); the second one will be using journalistic factors to assess relevancy. Finally, the third one will be to try to find correlation and causality between interaction patterns and relevance in social networks. Those two “and a half” approaches will then be confronted with a “gold-standard” model of relevance to validate the system and for testing the relative importance of factors used in these models.
The result will inform the weights and aggregation functions of the system into a single ranking of relevant fragments of information. As a result we will create a model of relevance that embodies better understanding of how people make relevance decisions and which enables making automatic relevance predictions at a large scale.
Features
Sentiment Post and Comments - Our sentiment features are provided by 14 state of the art systems that are capable of detect positive, negative and neutral sentiment. Combining lexical rules with a large set of sentiment dictionaries, the features extract are highly reliable and essential for the relevance detection system.
Entity Sentiment - The temporal sentiment of each entity is calculated using the same systems and two different sources.
Entity Sentiment - The temporal sentiment of each entity is calculated using the same systems and two different sources.
News Sentiment Entities - By extracting all daily news that refer to a certain entity and classify them with our ensemble sentiment system, we are able to have a temporal news sentiment associated with each entity.
Future Goals
The main contribution of Project Reminds is to provide a system with a ‘friendly user interface’ and good filtering and analysis tools, both on Facebook and Twitter. Simultaneously, we also implemented an intelligent server to afford a good and fast communication between our system and the users. Several further directions can be explored, including the usability of our interface, the display of results and its performance. In future work, we will focus on testing our interface with a set of users. The feedback received will provide improvements allowing the system to become more intuitive and easy to use.
Project Name
Lorem ipsum dolor sit amet consectetur.
Use this area to describe your project. Lorem ipsum dolor sit amet, consectetur adipisicing elit. Est blanditiis dolorem culpa incidunt minus dignissimos deserunt repellat aperiam quasi sunt officia expedita beatae cupiditate, maiores repudiandae, nostrum, reiciendis facere nemo!
Date: January 2017
Client: Lines
Category: Branding
Project Name
Lorem ipsum dolor sit amet consectetur.
Use this area to describe your project. Lorem ipsum dolor sit amet, consectetur adipisicing elit. Est blanditiis dolorem culpa incidunt minus dignissimos deserunt repellat aperiam quasi sunt officia expedita beatae cupiditate, maiores repudiandae, nostrum, reiciendis facere nemo!
Date: January 2017
Client: Southwest
Category: Website Design
Project Name
Lorem ipsum dolor sit amet consectetur.
Use this area to describe your project. Lorem ipsum dolor sit amet, consectetur adipisicing elit. Est blanditiis dolorem culpa incidunt minus dignissimos deserunt repellat aperiam quasi sunt officia expedita beatae cupiditate, maiores repudiandae, nostrum, reiciendis facere nemo!