Wednesday 21 August

8h00 - 8h45 Registration and reception
8h45 - 9h00 Inauguration of SIMBig 2019
Keynote Speaker: Sophia Ananiadou
Title: Text Mining for Biomedical Applications
This talk will provide an overview of recent developments in neural information extraction at the National Centre for Text Mining to support biomedical applications. Applications range from pathway construction, database curation, semantic search and systematic review development. Two systems will be presented: a) LitPathExplorer, a visual text analytics tools that integrates advanced text mining, semi-supervised learning and interactive visualization, to facilitate the exploration and analysis of pathway models and b) RobotAnalyst, a web-based screening system combining machine learning and text mining for document prioritisation.
Session 1: Social Media and NLP
10h00 - 10h20 Spanish Sentiment Analysis with Universal Language Model Fine-Tuning: A detailed case of study Daniel Palomino and Jose Eduardo Ochoa Luna
10h20 - 10h40 Coffee break
10h40 - 11h00 Detecting Anomalies in Time-varying Media Crime News Using Tensor Decomposition Hugo Alatrista-Salas, Juandiego Morzan, Miguel Nunez-Del-Prado, Gustavo Yamada and Pablo Lavado Padilla
11h00 - 11h20 What can we learn from tweets? A textual analysis of tweets using #bacon as inductor word Erick Saldaña, Dominique Valentin, Jorge Behrens, Miriam Mabel Selani, Iliani Patinho and Carmen J Contreras-Castillo
11h20 - 11h40 A fuzzy linguistic approach for stakeholder prioritization Yasiel Pérez Vera and Anié Bermudez Peña
11h40 - 11h55 Controlling Formality and Style of Machine Translation Output Using AutoML on NMTs Varden Wang, Aditi Viswanathan, Antonina Kononova
11h55 - 12h10 Automatic Speech Recognition of Quechua Language Using HMM Toolkit Rodolfo Zevallos, Luis Camacho and Johanna Cordova
12h10 - 12h25 Collect Ethically: Eliminate and Reduce Bias in Twitter Datasets Lulwah Alkulaib, Abdulaziz Alhamadani, Taoran Ji and Chang-Tien Lu
14h00 - 15h00
Keynote Speaker: Vipin Kumar
Title: Big Data in Climate and Earth Sciences: Challenges and Opportunities for Data Science
The climate and earth sciences have recently undergone a rapid transformation from a data-poor to a data-rich environment. In particular, massive amount of data about Earth and its environment is now continuously being generated by a large number of Earth observing satellites as well as physics-based earth system models running on large-scale computational platforms. These massive and information-rich datasets offer huge potential for understanding how the Earth's climate and ecosystem have been changing and how they are being impacted by humans actions. This talk will discuss various challenges involved in analyzing these massive data sets as well as opportunities they present for both advancing machine learning as well as the science of climate change in the context of monitoring the state of the tropical forests and surface water on a global scale.
Session 2: Data Mining
15h00 - 15h20 A Place to Go: Locating Damaged Regions after Natural Disasters through Mobile Phone Data Galo Castillo-López, María-Belén Guaranda, Fabricio Layedra and Carmen Vaca
15h20 - 15h40 Coffee break
15h40 - 16h00 Recurrence Plot Representation for Multivariate Time-series Analysis Dennys Mallqui and Ricardo Fernandes
16h00 - 16h20 Privacy Preservation and Inference With Minimal Mobility Information Miguel Nunez-Del-Prado and Julían Salas
16h20 - 16h40 A Progressive Formalization of Tacit Knowledge to Improve Semantic Expressiveness of Biodiversity Data Andrea Corrêa Flôres Albuquerque and Jose Laurindo Campos Dos Santos
16h40 - 17h00 Big Data Recommender System for Encouraging Purchases in New Places Taking into Account Demographics Miguel Nunez-Del-Prado, Hugo Alatrista-Salas, Ana Luna and Isaias Hoyos
17h00 - 17h15 Implementation of an indoor location system for mobile-based museum guidance Dennis Núñez Fernández
18h00 - 20h00 Welcome Cocktail

Thursday 22 August

9h00 - 10h00
Keynote Speaker: Ravi Kumar
Title: Crowdsourced Geodata: Some Applications
Session 3: Machine Learning
10h00 - 10h20 Anomaly Detection and Levels of Automation for AI-supported system administration Odej Kao
10h20 - 10h40 Coffee break
10h40 - 11h00 Characterization of Salinity Impact on Synthetic Floc Strength Via Nonlinear Component Analysis Hang Yin, Patrick Carriere, Huey Lawson, Habib Mahamadian, Zhengmao Ye
11h00 - 11h20 Global Brand Perception based on SocialPrestige, Credibility and Social Responsibility:A Clustering Approach Rosario Medina, Alvaro Talavera, Martín Hernani, Juan Lazo and José Afonso Mazzon
11h20 - 11h40 SCUT sampling technique with classification algorithms to classify child malnutrition Juan Baraybar-Huambo and Juan Gutierrez-Cardenas
11h40 - 12h00 Come with Me Now: New Potential Consumers Identification from CompetitorsRecurrence Plot Representation for Multivariate Time-series Analysis Miguel Nunez-Del-Prado, Hugo Alatrista-Salas and Victoria Zevallos
12h00 - 12h15 An efficient set-based algorithm for variable streaming clustering Isaac Campos Ardiles, Jared León Malpartida and Fernando Campos Ardiles
14h00 - 15h00
Keynote Speaker: Michael Franklin
Title: Towards a New Discipline of Data Science
The emergence of Data Science has led to a flourishing of initiatives, centers, degrees, programs and organizational units at educational and research institutions around the world. The demand for data science know-how from students, parents, scientists and employers is strong and getting stronger. However, the interdisciplinary nature of the topic and the lack of a consensus around its definition raise challenges for its implementation in the modern university setting. Many ongoing efforts treat Data Science as simply a combination of topics from existing fields. While such an approach has obvious practical advantages, I believe that the challenges raised by Data Science imply that it should be more productively pursued as a new discipline in its own right. In this talk I will try to frame this larger question with a goal of initiating a discussion to identify the intellectual opportunities and research questions that could lie at the heart of a new discipline of Data Science.
Session 4: Semantic Web and Knowledge Bases
15h00 - 15h20 Using Embeddings to Predict Changes in Large Semantic Graphs Damian Barsotti and Martin Ariel Dominguez
15h20 - 15h40 Coffee break
15h40 - 16h40
Keynote Speaker: Nigam Shah
Title: Good machine learning for better healthcare
We will discuss learnings from Stanford Medicine's Program for Artificial Intelligence (AI) in Healthcare, with the mission of bringing AI technologies to the clinic, safely, cost effectively and ethically. Using our experience in deploying predictive model to improve access to palliative care services, we will discuss potential solutions to issues relating to model correctness, interpretability, fairness, and equity as well as issues such as autonomy of decision making and fiduciary responsibility. Drawing on our experience in running a clinical consult service for generating evidence from the collective experience of patients, we will discuss the challenges as well as potential solutions to use aggregate patient data at the bedside.
Session 5: Biomedical Informatics
16h40 - 17h00 Sparse non-negative matrix factorization for retrieving genomes across metagenomes Vincent Prost, Stéphane Gazut and Thomas Brüls
17h00 - 17h20 Linguistic Fingerprints of Pro-vaccination and Anti-vaccination Writings Rebecca A Stachowicz
17h20 - 17h35 Comparing predictive machine learning algorithms in fit for work occupational health assessments Moises Stevend Meza, Saul Charapaqui, Katherine Arapa and Horacio Chacón
17h35 - 17h50 Tensorflow for Doctors Isha Agarwal, Rajkumar Kolakaluri, Michael Dorin and Mario Chong
17h50 - 18h05 Detection of NSCLC adenocarcinoma using supervised machine learning algorithms applicated to metabolomic profiles Paulo Vela-Anton and Diego Rondon-Soto
19h30 Gala Diner

Friday 23 August

Session 6: Data-driven Software Engineering Chair: Miguel Nuñez del Prado / Ana Luna
9h30 - 10h00 An Industry Perspective on Data-Driven Software Engineering Ludmer Arcaya - Avantica
10h00 - 10h20 Design of cognitive tutor to diagnose the types of intelligence in students from 3 to 5 years of preschool Flor De Maria Olivares Ramos
10h20 - 10h40 Fake News in Spanish: Towards the building of a Corpus based on Twitter Braulio Andres Soncco Pimentel and Roxana Portugal
10h40 - 11h00 Coffee break
11h00 - 11h20 Big Data Use Case: Luca Transit Lourdes Guiulfo (Telefónica)
11h20 - 11h40 Development of a hand gesture based control interface using Deep Learning Dennis Núñez Fernández
11h40 - 12h00 Chronic Pain Estimation Through Deep Facial Descriptors Analysis Manasses Antoni Mauricio Condori, Jonathan David Peña Andagua, Erwin Junger Dianderas Caut,
Leonidas Mauricio Condori, Jose Carlos Díaz Rosado and Antonio
Manuel Moran Cárdenas
12h00 - 12h15 Recognition of the image of a person's silhouette, based on Viola-Jones Washington Garcia Quilachamin, Luzmila Pro Concepción and Jorge Herrera - Tapia
12h15 - 12h30 Super Resolution approach using Generative Adversarial Network models for improving Satellite Image Resolution Ferdinand Pineda, Victor Ayma, Robert Aduviri and César Beltrán
12h30 - 12h45 Peruvian sign language recognition using a hybrid deep neural network Yuri Vladimir Huallpa Vargas, Naysha Naydu Diaz Ccasa and Lauro Enciso Rodas
12h45 - 13h00 Cloture