SIMBig Conference 2021

Wednesday December 1st

Time

Author(s)

Presentation

9h00 - 9h20

Welcome to SIMBig 2021

Data mining and Applications

9h20 - 9h40

Juan Ignacio Porta, Martin Ariel Dominguez and Francisco Tamarit

Automatic data imputation in time series processing using neural networks for industry and medical datasets

9h40 - 10h00

Carlos Gamboa-Venegas, Steffan Gómez-Campos and Esteban Meneses

Calibration of traffic simulations using simulated annealing and GPS navigation records

10h00 - 10h45

Keynote Speaker: Andrei Broder

Title: The Web Advertising Ecosystem

Abstract:

The World Wide Web is arguably an engineering artifact and social environment that defines our era. A large part of it is made possible by money generated via advertising. The goal of this talk is to give an introduction to the web advertising ecosystem and illuminate the complex relations between consumers, publishers, and advertisers.

10h45 - 11h05

Adrian Ulloa, Soledad Espezua, Julio Villavicencio, Oscar Miranda and Edwin Villanueva

Predicting daily trends in the Lima Stock Exchange General Index using economic indicators and financial news sentiments

11h05 - 11h25

Miguel Nunez-Del-Prado and Leibnitz Rojas-Bustamante

Government Public Services Presence Index based on Open Data

11h25 - 11h45

Edwin Alvarez Mamani, José Luis Soncco Álvarez and Harley Vera Olivera

Clustering Analysis for Traffic Jam Detection for Intelligent Transportation System

PAUSE

Machine Learning and Deep Learning

14h10 - 14h30

Eya Hammami and Rim Faiz

A Study of Dynamic Convolutional Neural Network Technique for SCOTUS legal opinions data classification

14h30 - 15h15

Keynote Speaker: Jiawei Han

Title: From Unstructured Text Data to Structured Knowledge: A Data-Driven Approach

Abstract:

The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. Such approaches, however, are not scalable. We vision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with pretrained language models and text embedding methods, it is promising to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, text classification, and taxonomy-guided text analysis. We show that data-driven approach could be promising at transforming massive text data into structured knowledge.

15h15 - 15h35

Alonso Puente and Marks Calderon

Hydra: Funding state prediction for Kickstarter Technology projects using a Multimodal Deep Learning

15h35 - 15h55

Naomi Rohrbaugh and Edgar Ceh-Varela

Composite recommendations with heterogeneous graphs

15h55 - 16h15

Gianfranco Campos, Alessandro Morales, Arturo Flores and Jorge Gelso

Energy Efficiency Using IOTA Tangle for Greenhouse Agriculture

16h30 - 17h15

Keynote Speaker: Jian Pei

Title: Towards Trustworthy Data Science: Interpretability, Fairness and Marketplaces

Abstract:

We believe data science and AI will change the world. No matter how smart and powerful an AI model we can build, the ultimate testimony of the success of data science and AI is users’ trust. How can we build trustworthy data science? At the level of user-model interaction, how can we convince users that a data analytic result is trustworthy? At the level of group-wise collaboration for data science and AI, how can we ensure that the parties and their contributions are recognized fairly, and establish trust between the outcome (e.g., a model built) of the group collaboration and the external users? At the level of data science participant eco-systems, how can we effectively and efficiently connect many participants of various roles and facilitate the connection among supplies and demands of data and models?
In this talk, I will brainstorm possible directions to the above questions in the context of an end-to-end data science pipeline. To strengthen trustworthy interactions between models and users, I will advocate exact and consistent interpretation of machine learning models. Our recent results show that exact and consistent interpretations are not just theoretically feasible, but also practical even for API-based AI services. To build trust in collaboration among multiple participants in coalition, I will review some progress in ensuring fairness in federated learning, including fair assessment of contributions and fairness enforcement in collaboration outcome. Last, to address the need of trustworthy data science eco-systems, I will review some latest efforts in building data and model marketplaces and preserving fairness and privacy. Through reflection I will discuss some challenges and opportunities in building trustworthy data science for possible future work.

Thursday December 2nd

Time

Author(s)

Presentation

Data-Driven Software Engineering

9h20 - 9h40

Airton Huaman, Marco Huancahuari and Lenis Wong

Multi-phase model based on K-means and Ant Colony Optimization to solve the capacitated vehicle routing problem with time windows

9h40 - 10h00

Geraldine Puntillo, Alonso Salazar and Lenis Wong

Enterprise architecture based on TOGAF for the adaptation of educational institutions to e-learning using the DLPCA methodology and Google Classroom

10h00 - 10h45

Keynote Speaker: Jean Vanderdonckt

Title: Dimension reduction by model-based approaches: application to gesture recognition

Abstract:

Machine learning algorithms used for 2D/3D gesture recognition typically require a large training set of templates having many dimensions, depending on the sensor used. Instead of applying classical methods for reducing the dimensionality of these templates, we propose relying on a model-based approach where the problem is first mathematically described and then submitted to machine learning algorithms.

10h45 - 11h05

Pablo Del Aguila, Dante Roque and Lenis Wong

Mobile app quality model based on SQuaRE and AHP

Health, NLP, and Social Media

11h05 - 11h25

Tereza Yallico Arias and Junior Fabian

Automatic detection of levels of violence against women with Natural Language Processing using Machine Learning and Deep Learning techniques

11h25 - 11h45

Taghreed Tarmom, Eric Atwell and Mohammad Alsalka

Deep Learning vs Compression-Based vs Traditional Machine Learning Classifiers to Detect Hadith Authenticity

11h45 - 12h05

Randa Zarnoufi and Mounia Abik

Classical Machine Learning vs Deep Learning for Detecting Cyber-Violence in Social Media

PAUSE

14h10 - 14h30

Nuhu Ibrahim and Riza Batista-Navarro

Automatic Detection of Deaths from Social Networking Sites

14h30 - 15h15

Keynote Speaker: Marinka Zitnik

Title: Infusing Structure and Knowledge Into Biomedical AI

Abstract:

Grand challenges in biology and medicine often lack annotated examples and require generalization to entirely new scenarios not seen during training. However, standard supervised learning is incredibly limited in scenarios, such as designing novel medicines, modeling emerging pathogens, and treating rare diseases. In this talk, I present our efforts to overcome these obstacles by infusing structure and knowledge into learning algorithms. First, I will present general-purpose and scalable algorithms for few-shot learning on graphs. At the core is the notion of local subgraphs that transfer knowledge from one task to another, even when only a handful of labeled examples are available. This principle is theoretically justified as we show the evidence for predictions can be found in subgraphs surrounding the targets. I will conclude with applications in drug development and precision medicine where the algorithmic predictions were validated in human cells and led to the discovery of a new class of drugs.

15h15 - 15h35

Camila Mantilla-Saavedra and Juan Gutiérrez-Cárdenas

Model comparison for the classification of comments containing suicidal traits from Reddit via NLP and Supervised Learning

15h35 - 15h55

Syed Mehtab Alam, Elena Arsevska, Mathieu Roche and Maguelonne Teisseire

A data-driven score model to assess online news articles in Event-based surveillance system

15h55 - 16h15

Tomonari Masada

AmLDA: A Non-VAE Neural Topic Model

16h15 - 17h00

Keynote Speaker: Francisco Pereira

Title: Revealing interpretable object representations from human behaviour

Abstract:

Objects can be characterized according to a vast number of possible criteria (e.g. animacy, shape, color, function), but some dimensions are more useful than others for making sense of the objects around us. In this talk, I will describe an ongoing effort by our collaborators to collect a behavioral dataset of millions of odd-one-out similarity judgements on thousands of objects, and a new approach to identify the "core dimensions" of object representations used in those judgements. Our approach models each object as a sparse, non-negative embedding, and judgements as a function of the similarity of those embeddings. The resulting model predicts subject behaviour on test data, as well as the fine-grained structure of object similarity. The dimensions of the embedding space are coherently interpretable by test subjects, and reflect degrees of taxonomic membership, functionality, and perceptual or structural attributes, among other characteristics. Further, naive subjects can accurately rate objects along these dimension, without training. Collectively, these results demonstrate that human similarity judgments can be captured by a fairly low-dimensional, interpretable embedding that generalizes to external behaviour.

17h00 - 17h20

Asma Aldrees, Cherie Poland and Syeda Arzoo Irshad

Auditing Algorithms: Determining Ethical Parameters of Algorithmic Decision-Making Systems in Healthcare

17h20 - 17h35

Moises Meza, Willian Araujo, and Jesus Alvarado

Bibliometric analysis using Spark and HPCtechniques to search of potential inhibitorstargeting SARS-CoV-2 Main Protease

Friday December 3rd

Time

Author(s)

Presentation

Image Processing

9h05 - 9h20

Ibrahim Shehzad, Adeel Zafar, Zahir Shah, and Zilli Huma

Breast Cancer CT-Scan Image Classification Using Transfer Learning

9h20 - 9h40

Alejandra Valeria Lucero Burbano, Sherald Damian Noboa Chavez and Manuel Eugenio Morocho Cayamcela

Plant Disease Classification and Severity Estimation: A Comparative Study of Multitask Convolutional Neural Networks and First Order Optimizers

9h40 - 10h00

Diego Hernán Suntaxi Domínguez, Oscar Vicente Guarnizo Cabezas, Jonnathan Fabricio Crespo Yaguana, Samantha Carolina Quintanchala Sandoval, Israel Gustavo Pineda Arias and Manuel Eugenio Morocho Cayamcela

Deep Learning and Computer Vision in Smart Agriculture: Datasets, Models, and Applications

10h00 - 10h20

Carla Rucoba, Efrain Ramos and Juan Gutierrez-Cardenas

Crack detection in oil paintings using morphological filters and K-SVD algorithm

10h20 - 10h40

Filomen Incahuanaco Quispe, Edward Hinojosa Cardenas, Denis Pilares Figueroa and Cesar Beltrán Castañón

CoffeeSE: Interpretable transfer learning method for estimating the severity of coffee rust

10h40 - 11h00

Joel Cabrera and Edwin Villanueva

Investigating generative neural-network models for building pest insect detectors in sticky trap images for the Peruvian horticulture

Semantic and Machine Learning

11h00 - 11h45

Keynote Speaker: Vipin Kumar

Title: Big data in water: Opportunities and challenges for machine learning

Abstract:

Water resources worldwide are coming under stress due to increasing demand from a growing population, increasing pollution, and depleting or uncertain supplies due to changing climate in which drought and floods have both become more frequent. As domains associated with Water continue to experience tremendous data growth from models, sensors, and satellites, there is an unprecedented opportunity for machine learning to help address urgent water challenges facing the humanity. This talk will examine the role of big data and machine learning can play in advancing water science, challenges faced by traditional Machine learning methods in addressing the domain of water, and some early successes.

11h45 - 12h00

David Gatta, Kilian Hinteregger and Anna Fensel

Making Licensing of Content and Data Explicit with Semantics and Blockchain

12h00 - 12h45

Keynote Speaker: Natasha Noy

Title: Google Dataset Search: Building an open ecosystem for dataset discovery

Abstract:

There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others’ work, and providing data journalists easier access to information and its provenance. This talk will discuss Dataset Search by Google, which provides search capabilities over potentially all dataset repositories on the Web. We will talk about the open ecosystem for describing datasets that we hope to encourage.

12h45 - 13h05

Chetraj Pandey, Rafal Angryk and Berkay Aydin

Deep Neural Networks based Solar Flare Prediction using Compressed Full-disk Line-of-sight Magnetograms

13h05 - 13h25

David Gatta, Kilian Hinteregger and Anna Fensel

Prediction Of Soil Saturated Electrical Conductivity By Statistical Learning

13h25 - 13h35

Closing SIMBig 2021

Ph.D. in Computer Science

Universidad de Ingeniería y Tecnología - UTEC

Lima, PERU

Programa

US Eastern time (New York = Peru time): UTC−05:00

Wednesday December 1st

Time

Author(s)

Presentation

Data mining and Applications

Keynote Speaker: Andrei Broder

Machine Learning and Deep Learning

Keynote Speaker: Jiawei Han

Keynote Speaker: Jian Pei

Thursday December 2nd

Time

Author(s)

Presentation

Data-Driven Software Engineering

Keynote Speaker: Jean Vanderdonckt

Health, NLP, and Social Media

Keynote Speaker: Marinka Zitnik

Keynote Speaker: Francisco Pereira

Friday December 3rd

Time

Author(s)

Presentation

Image Processing

Semantic and Machine Learning

Keynote Speaker: Vipin Kumar

Keynote Speaker: Natasha Noy

Contactos

Juan Antonio
Lossio-Ventura

Hugo
Alatrista-Salas