SIMBIG
Time |
Author(s) |
Presentation |
9h00 - 9h20 | Welcome to SIMBig 2021 | |
Data mining and Applications |
||
9h20 - 9h40 | Juan Ignacio Porta, Martin Ariel Dominguez and Francisco Tamarit | Automatic data imputation in time series processing using neural networks for industry and medical datasets |
9h40 - 10h00 | Carlos Gamboa-Venegas, Steffan Gómez-Campos and Esteban Meneses | Calibration of traffic simulations using simulated annealing and GPS navigation records |
10h00 - 10h45 |
Keynote Speaker: Andrei BroderTitle: The Web Advertising Ecosystem Abstract: The World Wide Web is arguably an engineering artifact and social environment that defines our era. A large part of it is made possible by money generated via advertising. The goal of this talk is to give an introduction to the web advertising ecosystem and illuminate the complex relations between consumers, publishers, and advertisers. |
|
10h45 - 11h05 | Adrian Ulloa, Soledad Espezua, Julio Villavicencio, Oscar Miranda and Edwin Villanueva | Predicting daily trends in the Lima Stock Exchange General Index using economic indicators and financial news sentiments |
11h05 - 11h25 | Miguel Nunez-Del-Prado and Leibnitz Rojas-Bustamante | Government Public Services Presence Index based on Open Data |
11h25 - 11h45 | Edwin Alvarez Mamani, José Luis Soncco Álvarez and Harley Vera Olivera | Clustering Analysis for Traffic Jam Detection for Intelligent Transportation System |
PAUSE | ||
Machine Learning and Deep Learning |
||
14h10 - 14h30 | Eya Hammami and Rim Faiz | A Study of Dynamic Convolutional Neural Network Technique for SCOTUS legal opinions data classification |
14h30 - 15h15 |
Keynote Speaker: Jiawei HanTitle: From Unstructured Text Data to Structured Knowledge: A Data-Driven Approach Abstract: The real-world big data are largely dynamic, interconnected and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. Such approaches, however, are not scalable. We vision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with pretrained language models and text embedding methods, it is promising to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, text classification, and taxonomy-guided text analysis. We show that data-driven approach could be promising at transforming massive text data into structured knowledge. |
|
15h15 - 15h35 | Alonso Puente and Marks Calderon | Hydra: Funding state prediction for Kickstarter Technology projects using a Multimodal Deep Learning |
15h35 - 15h55 | Naomi Rohrbaugh and Edgar Ceh-Varela | Composite recommendations with heterogeneous graphs |
15h55 - 16h15 | Gianfranco Campos, Alessandro Morales, Arturo Flores and Jorge Gelso | Energy Efficiency Using IOTA Tangle for Greenhouse Agriculture |
16h30 - 17h15 |
Keynote Speaker: Jian PeiTitle: Towards Trustworthy Data Science: Interpretability, Fairness and Marketplaces Abstract: We believe data science and AI will change the world. No matter how smart and powerful an AI model we can build, the ultimate testimony of the success of data science and AI is users’ trust. How can we build trustworthy data science? At the level of user-model interaction, how can we convince users that a data analytic result is trustworthy? At the level of group-wise collaboration for data science and AI, how can we ensure that the parties and their contributions are recognized fairly, and establish trust between the outcome (e.g., a model built) of the group collaboration and the external users? At the level of data science participant eco-systems, how can we effectively and efficiently connect many participants of various roles and facilitate the connection among supplies and demands of data and models?
|
Time |
Author(s) |
Presentation |
Data-Driven Software Engineering |
||
9h20 - 9h40 | Airton Huaman, Marco Huancahuari and Lenis Wong | Multi-phase model based on K-means and Ant Colony Optimization to solve the capacitated vehicle routing problem with time windows |
9h40 - 10h00 | Geraldine Puntillo, Alonso Salazar and Lenis Wong | Enterprise architecture based on TOGAF for the adaptation of educational institutions to e-learning using the DLPCA methodology and Google Classroom |
10h00 - 10h45 |
Keynote Speaker: Jean VanderdoncktTitle: Dimension reduction by model-based approaches: application to gesture recognition Abstract: Machine learning algorithms used for 2D/3D gesture recognition typically require a large training set of templates having many dimensions, depending on the sensor used. Instead of applying classical methods for reducing the dimensionality of these templates, we propose relying on a model-based approach where the problem is first mathematically described and then submitted to machine learning algorithms. |
|
10h45 - 11h05 | Pablo Del Aguila, Dante Roque and Lenis Wong | Mobile app quality model based on SQuaRE and AHP |
Health, NLP, and Social Media |
||
11h05 - 11h25 | Tereza Yallico Arias and Junior Fabian | Automatic detection of levels of violence against women with Natural Language Processing using Machine Learning and Deep Learning techniques |
11h25 - 11h45 | Taghreed Tarmom, Eric Atwell and Mohammad Alsalka | Deep Learning vs Compression-Based vs Traditional Machine Learning Classifiers to Detect Hadith Authenticity |
11h45 - 12h05 | Randa Zarnoufi and Mounia Abik | Classical Machine Learning vs Deep Learning for Detecting Cyber-Violence in Social Media |
PAUSE | ||
14h10 - 14h30 | Nuhu Ibrahim and Riza Batista-Navarro | Automatic Detection of Deaths from Social Networking Sites |
14h30 - 15h15 |
Keynote Speaker: Marinka ZitnikTitle: Infusing Structure and Knowledge Into Biomedical AI Abstract: Grand challenges in biology and medicine often lack annotated examples and require generalization to entirely new scenarios not seen during training. However, standard supervised learning is incredibly limited in scenarios, such as designing novel medicines, modeling emerging pathogens, and treating rare diseases. In this talk, I present our efforts to overcome these obstacles by infusing structure and knowledge into learning algorithms. First, I will present general-purpose and scalable algorithms for few-shot learning on graphs. At the core is the notion of local subgraphs that transfer knowledge from one task to another, even when only a handful of labeled examples are available. This principle is theoretically justified as we show the evidence for predictions can be found in subgraphs surrounding the targets. I will conclude with applications in drug development and precision medicine where the algorithmic predictions were validated in human cells and led to the discovery of a new class of drugs. |
|
15h15 - 15h35 | Camila Mantilla-Saavedra and Juan Gutiérrez-Cárdenas | Model comparison for the classification of comments containing suicidal traits from Reddit via NLP and Supervised Learning |
15h35 - 15h55 | Syed Mehtab Alam, Elena Arsevska, Mathieu Roche and Maguelonne Teisseire | A data-driven score model to assess online news articles in Event-based surveillance system |
15h55 - 16h15 | Tomonari Masada | AmLDA: A Non-VAE Neural Topic Model |
16h15 - 17h00 |
Keynote Speaker: Francisco PereiraTitle: Revealing interpretable object representations from human behaviour Abstract: Objects can be characterized according to a vast number of possible criteria (e.g. animacy, shape, color, function), but some dimensions are more useful than others for making sense of the objects around us. In this talk, I will describe an ongoing effort by our collaborators to collect a behavioral dataset of millions of odd-one-out similarity judgements on thousands of objects, and a new approach to identify the "core dimensions" of object representations used in those judgements. Our approach models each object as a sparse, non-negative embedding, and judgements as a function of the similarity of those embeddings. The resulting model predicts subject behaviour on test data, as well as the fine-grained structure of object similarity. The dimensions of the embedding space are coherently interpretable by test subjects, and reflect degrees of taxonomic membership, functionality, and perceptual or structural attributes, among other characteristics. Further, naive subjects can accurately rate objects along these dimension, without training. Collectively, these results demonstrate that human similarity judgments can be captured by a fairly low-dimensional, interpretable embedding that generalizes to external behaviour. |
|
17h00 - 17h20 | Asma Aldrees, Cherie Poland and Syeda Arzoo Irshad | Auditing Algorithms: Determining Ethical Parameters of Algorithmic Decision-Making Systems in Healthcare |
17h20 - 17h35 | Moises Meza, Willian Araujo, and Jesus Alvarado | Bibliometric analysis using Spark and HPCtechniques to search of potential inhibitorstargeting SARS-CoV-2 Main Protease |
Time |
Author(s) |
Presentation |
Image Processing |
||
9h05 - 9h20 | Ibrahim Shehzad, Adeel Zafar, Zahir Shah, and Zilli Huma | Breast Cancer CT-Scan Image Classification Using Transfer Learning |
9h20 - 9h40 | Alejandra Valeria Lucero Burbano, Sherald Damian Noboa Chavez and Manuel Eugenio Morocho Cayamcela | Plant Disease Classification and Severity Estimation: A Comparative Study of Multitask Convolutional Neural Networks and First Order Optimizers |
9h40 - 10h00 | Diego Hernán Suntaxi Domínguez, Oscar Vicente Guarnizo Cabezas, Jonnathan Fabricio Crespo Yaguana, Samantha Carolina Quintanchala Sandoval, Israel Gustavo Pineda Arias and Manuel Eugenio Morocho Cayamcela | Deep Learning and Computer Vision in Smart Agriculture: Datasets, Models, and Applications |
10h00 - 10h20 | Carla Rucoba, Efrain Ramos and Juan Gutierrez-Cardenas | Crack detection in oil paintings using morphological filters and K-SVD algorithm |
10h20 - 10h40 | Filomen Incahuanaco Quispe, Edward Hinojosa Cardenas, Denis Pilares Figueroa and Cesar Beltrán Castañón | CoffeeSE: Interpretable transfer learning method for estimating the severity of coffee rust |
10h40 - 11h00 | Joel Cabrera and Edwin Villanueva | Investigating generative neural-network models for building pest insect detectors in sticky trap images for the Peruvian horticulture |
Semantic and Machine Learning |
||
11h00 - 11h45 |
Keynote Speaker: Vipin KumarTitle: Big data in water: Opportunities and challenges for machine learning Abstract: Water resources worldwide are coming under stress due to increasing demand from a growing population, increasing pollution, and depleting or uncertain supplies due to changing climate in which drought and floods have both become more frequent. As domains associated with Water continue to experience tremendous data growth from models, sensors, and satellites, there is an unprecedented opportunity for machine learning to help address urgent water challenges facing the humanity. This talk will examine the role of big data and machine learning can play in advancing water science, challenges faced by traditional Machine learning methods in addressing the domain of water, and some early successes. |
|
11h45 - 12h00 | David Gatta, Kilian Hinteregger and Anna Fensel | Making Licensing of Content and Data Explicit with Semantics and Blockchain |
12h00 - 12h45 |
Keynote Speaker: Natasha NoyTitle: Google Dataset Search: Building an open ecosystem for dataset discovery Abstract: There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others’ work, and providing data journalists easier access to information and its provenance. This talk will discuss Dataset Search by Google, which provides search capabilities over potentially all dataset repositories on the Web. We will talk about the open ecosystem for describing datasets that we hope to encourage. |
|
12h45 - 13h05 | Chetraj Pandey, Rafal Angryk and Berkay Aydin | Deep Neural Networks based Solar Flare Prediction using Compressed Full-disk Line-of-sight Magnetograms |
13h05 - 13h25 | David Gatta, Kilian Hinteregger and Anna Fensel | Prediction Of Soil Saturated Electrical Conductivity By Statistical Learning |
13h25 - 13h35 | Closing SIMBig 2021 |
Download the SIMBig 2021 program here