Wise-Supervision @ AKBC2022

Weak, Indirect and Self Supervision for Knowledge Extraction

Virtual Workshop at AKBC 2022, London, UK
14:00~18:00 on November 5, 2022

Overview

Knowledge extraction (KE) was mainly driven by task-specific human annotations. Recent years have seen an increasing interest in KE with WISE supervision (Weak supervision, Indirect supervision, SElf-supervision, etc.). This workshop aims to provide a forum for researchers and practitioners from broad communities, such as information extraction (IE), knowledge graphs (KG), semantic web, and transfer learning, etc., to discuss the challenges and promises of KE when human annotations are limited.

Wise-Supervision 2022 aims to bring together researchers from different areas related to KE. As such, the workshop welcomes and covers a wide range of topics, including (non-exclusively):

  • IE/KE with indirect supervision from textual entailment, summarization, etc.
  • IE/KE with weak supervision and denoising.
  • IE/KE with self-supervision, e.g., pretrained LMs for IE/KE.
  • KG construction and consolidation.
  • Low-resource IE/KE.
  • KE in industry settings.

Contact: wenpeng.yin@temple.edu

Accepted papers

Openreview Submission Link

Registration

Not needed.

Attention!!! Please use zoom https://temple.zoom.us/j/2508029698 for all workshop sessions.

Important Dates

  • Oct. 16: Paper submission deadline (extended 1 day as required)
  • Oct. 20: Notification of acceptance
  • Nov. 5: Workshop

All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).

Keynote Speakers (in presentation order)

Prof. Eneko Agirre
University of
the Basque Country

Prof. Heng Ji
University of Illinois
Urbana-Champaign

Dr. Hoifung Poon
Microsoft Research
Redmond

Prof. Yizhou Sun
University of California
Los Angeles

  • Prof. Eneko Agirre: "Few-shot Information Extraction: Pre-train, prompt and entail"

    Abstract: Deep Learning has made tremendous progress in Natural Language Processing (NLP), where large pre-trained language models (PLM) fine-tuned on the target task have become the predominant tool. More recently, in a process called prompting, NLP tasks are rephrased as natural language text, allowing us to better exploit linguistic knowledge learned by PLMs and resulting in significant improvements. Still, PLMs have limited inference ability. In the Textual Entailment task, systems need to output whether the truth of a certain textual hypothesis follows from the given premise text. Manually annotated entailment datasets covering multiple inference phenomena have been used to infuse inference capabilities to PLMs. This talk will review these recent developments, and will present an approach that combines prompts and PLMs fine-tuned for textual entailment that yields state-of-the-art results on Information Extraction (IE) using only a small fraction of the annotations. The approach has additional benefits, like the ability to learn from different schemas and inference datasets. These developments enable a new paradigm for IE where the expert can define the domain-specific schema using natural language and directly run those specifications, annotating a handful of examples in the process. A user interface based on this new paradigm will also be presented. Beyond IE, inference capabilities could be extended, acquired and applied from other tasks, opening a new research avenue where entailment and downstream task performance improve in tandem.

    Bio: Eneko Agirre is a Professor of Informatics and Head of HiTZ Basque Center of Language Technnology at the University of the Basque Country, UPV/EHU, in San Sebastian, Spain. He has been active in Natural Language Processing and Computational Linguistics for decades. He received the Spanish Informatics Research Award in 2021, and is one of the 74 fellows of the Association of Computational Linguistics (ACL). He was President of ACL's SIGLEX, member of the editorial board of Computational Linguistics, Journal of Artificial Intelligence Research and Action editor for the Transactions of the ACL. He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM). Recipient of three Google Research Awards and five best paper awards and nominations. Dissertations under his supervision received best PhD awards by EurAI, the Spanish NLP society and the Spanish Informatics Scientific Association. He has over 200 publications across a wide range of NLP and AI topics. His research spans topics such as Word Sense Disambiguation, Semantic Textual Similarity, Unsupervised Machine Translation and resources for Basque. Most recently his research focuses on inference and deep learning language models.

  • Prof. Heng Ji: "Few-shot Event Argument Extraction via Natural Language or Programming Language Generation"

    Abstract: The goal of the event argument role labeling task is to find finding the arguments (participants) of an event. Traditional methods model this task as a supervised classification problem that requires a large amount of training data (around 4,000 fully annotated sentences). By contrast, when humans know some events happened (e.g., Halloween parade in Champaign), they would expect to know a list of arguments - attendees, time, location, organizer, etc., actively seek information to fill in these slots by reading news or social media, and then they will be able to describe the information about these arguments to their friends. Therefore, if an intelligent information system knows how to narrate an event, it should be able to fill in the slots for the expected argument roles. We propose to re-frame the problem as conditional generation given a template (a list of arguments for each event type). Conditioned on the unfilled template and a given natural language context, the model is asked to generate a filled-in template with arguments. We can train this generator using a large language model learned from natural language, or, even better, from programming language, both of which are readily available. For example, event-argument structures can be represented as a class object using code. This alignment between structures and code enables us to take advantage of Programming Language features such as inheritance and type annotation to introduce external knowledge or add constraints. Our model does not require any substantial amount of annotations for the information extraction task, and thus it's highly effective for zero-shot or few-shot settings. Also it can work with long contexts beyond single sentences, and bring us one step closer to the original goal of information extraction - constructing a knowledge base from the entire corpus. When only using 50 training instances for each event type, our framework is comparable to fully-supervised models trained on 4,202 event instances. When given the same 50-shot data, our approach outperforms current state-of-the-art (SOTA) by 20.8% absolute F1.

    Bio: Heng Ji is a professor at Computer Science Department, and an affiliated faculty member at Electrical and Computer Engineering Department of University of Illinois at Urbana-Champaign. She is also an Amazon Scholar. She received her B.A. and M. A. in Computational Linguistics from Tsinghua University, and her M.S. and Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Multimedia Multilingual Information Extraction, Knowledge Base Population and Knowledge-driven Generation. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017. She was named as part of Women Leaders of Conversational AI (Class of 2023) by Project Voice. The awards she received include "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, PACLIC2012 Best paper runner-up, "Best of ICDM2013" paper award, "Best of SDM2013" paper award, ACL2018 Best Demo paper nomination, ACL2020 Best Demo Paper Award, NAACL2021 Best Demo Paper Award, Google Research Award in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014 and Bosch Research Award in 2014-2018. She was invited by the Secretary of the U.S. Air Force and AFRL to join Air Force Data Analytics Expert Panel to inform the Air Force Strategy 2030. She is the lead of many multi-institution projects and tasks, including the U.S. ARL projects on information fusion and knowledge networks construction, DARPA DEFT Tinker Bell team and DARPA KAIROS RESIN team. She has coordinated the NIST TAC Knowledge Base Population task since 2010. She was the associate editor for IEEE/ACM Transaction on Audio, Speech, and Language Processing, and served as the Program Committee Co-Chair of many conferences including NAACL-HLT2018 and AACL-IJCNLP2022. She is elected as the North American Chapter of the Association for Computational Linguistics (NAACL) secretary 2020-2023. Her research has been widely supported by the U.S. government agencies (DARPA, ARL, IARPA, NSF, AFRL, DHS) and industry (Amazon, Google, Facebook, Bosch, IBM, Disney).

  • Dr. Hoifung Poon: "Self-Supervised AI for Precision Health"

    Abstract: The advent of big data promises to revolutionize medicine by making it more personalized and effective, but big data also presents a grand challenge of information overload. For example, tumor sequencing has become routine in cancer treatment, yet interpreting the genomic data requires painstakingly curating knowledge from a vast biomedical literature, which grows by thousands of papers every day. Electronic medical records contain high-definition patient information for speeding up clinical trial recruitment and drug development, but curating such real-world evidence from clinical notes can take hours for a single patient. Natural language processing (NLP) can play a key role in interpreting big data for precision medicine. In particular, machine reading can help unlock knowledge from text by substantially improving curation efficiency. However, standard supervised methods require labeled examples, which are expensive and time-consuming to produce at scale. In this talk, I'll present Project Hanover, where we overcome the annotation bottleneck by combining deep learning with probabilistic logic, by exploiting self-supervision from readily available resources such as ontologies and databases, and by leveraging domain-specific pretraining on unlabeled text. This enables us to extract knowledge from tens of millions of publications, structure real-world data for millions of cancer patients, and apply the extracted knowledge and real-world evidence to supporting precision oncology.

    Bio: Hoifung Poon is the Senior Director of Biomedical NLP at Microsoft Research and an affiliated professor at the University of Washington Medical School. He leads Project Hanover, with the overarching goal of structuring medical data for precision medicine. He has given tutorials on this topic at top conferences such as the Association for Computational Linguistics (ACL) and the Association for the Advancement of Artificial Intelligence (AAAI). His research spans a wide range of problems in machine learning and natural language processing (NLP), and his prior work has been recognized with Best Paper Awards from premier venues such as the North American Chapter of the Association for Computational Linguistics (NAACL), Empirical Methods in Natural Language Processing (EMNLP), and Uncertainty in AI (UAI). He received his PhD in Computer Science and Engineering from University of Washington, specializing in machine learning and NLP.

  • Prof. Yizhou Sun: "Combining Representation Learning and Logical Rule Reasoning for Knowledge Graph Inference"

    Abstract: Knowledge graph inference has been studied extensively due to its wide applications. It has been addressed by two lines of research, i.e., the more traditional logical rule reasoning and the more recent knowledge graph embedding (KGE). In this talk, we will introduce two recent developments in our group to combine these two worlds. First, we propose to leverage logical rules to bring in high-order dependency among entities and relations for KGE. By limiting the logical rules to be the definite Horn clauses, we are able to fully exploit the knowledge in logical rules and enable the mutual enhancement of logical rule-based reasoning and KGE in an extremely efficient way. Second, we propose to handle logical queries by representing fuzzy sets as specially designed vectors and retrieving answers via dense vector computation. In particular, we provide embedding-based logical operators that strictly follow the axioms required in fuzzy logic, which can be trained by self-supervised knowledge completion tasks. With additional query-answer pairs, the performance can be further enhanced. With these evidence, we believe combining logic with representation learning provides a promising direction for knowledge reasoning.

    Bio: Yizhou Sun is an associate professor at department of computer science of UCLA. She received her Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2012. Her principal research interest is on mining graphs/networks, and more generally in data mining, machine learning, and network science, with a focus on modeling novel problems and proposing scalable algorithms for large-scale, real-world applications. She is a pioneer researcher in mining heterogeneous information network, with a recent focus on deep learning on graphs/networks. Yizhou has over 150 publications in books, journals, and major conferences. Tutorials of her research have been given in many premier conferences. She is a recipient of KDD Best Student Paper Award, ACM SIGKDD Doctoral Dissertation Award, Yahoo ACE (Academic Career Enhancement) Award, NSF CAREER Award, CS@ILLINOIS Distinguished Educator Award, Amazon Research Awards (twice), Okawa Foundation Research Award, and VLDB Test of Time Award.

Schedule

Attention!!! Please use zoom https://temple.zoom.us/j/2508029698 for all workshop sessions.

Session London time Speaker's local time Session chair
Opening speech 14:00-14:05 pm (5 mins) 15:00-15:05 pm Barbara Plank

Keynote by Prof. Eneko Agirre

"Few-shot Information Extraction: Pre-train, prompt and entail"

14:05-14:45 pm (40mins) 15:05-15:45 pm Barbara Plank

Keynote by Prof. Heng Ji

"Few-shot Event Argument Extraction via Natural Language or Programming Language Generation"

14:50-15:30 pm (40mins) 09:50-10:30 am Wenpeng Yin
Coffee break 15:30-16:00 pm (30mins)

Keynote by Dr. Hoifung Poon

"Self-Supervised AI for Precision Health"

16:00-16:40 pm (40 mins) 09:00-09:40 am Benjamin Roth

Keynote by Prof. Yizhou Sun

"Combining Representation Learning and Logical Rule Reasoning for Knowledge Graph Inference"

16:40-17:20 pm (40 mins) 09:40-10:20 am Muhao Chen
Accepted paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction". Presenter: Ningyu Zhang 17:20-17:28 pm (8 mins) 01:20-01:28 am (11/6/2022) Hongming Zhang
Accepted paper "SepLL: Separating Latent Class Labels from Weak Supervision Noise". Presenter: Andreas Stephan 17:28-17:36 pm (8 mins) 18:28-18:36 pm Hongming Zhang
Accepted paper "OpenStance: Real-world Zero-shot Stance Detection". Presenter: Hanzi Xu 17:36-17:44 pm (8 mins) 13:36-13:44 pm Hongming Zhang
Accepted paper "Cross-Lingual Speaker Identification Using Distant Supervision". Presenter: Ben Zhou 17:44-17:52 pm (8 mins) 13:44-13:52 pm Hongming Zhang
Accepted paper "Towards Improved Distantly Supervised Multilingual Named-Entity Recognition for Tweets". Presenter: Ramy Eskander 17:52-18:00 pm (8 mins) 13:52-14:00 pm Hongming Zhang
Closing Remarks 18:00-18:05 pm (5 mins) 14:00-14:05 pm Wenpeng Yin

Organizing Committee

Wenpeng Yin
Temple
University
haha

Muhao Chen
University of
Southern
California

Lifu Huang
Virginia Tech
haha
haha

Huan Sun
The Ohio
State University
haha

Hongming Zhang
Tencent AI Lab
Seattle

Benjamin Roth
University of
Vienna
haha

Barbara Plank
LMU Munich
haha
haha