Accurate table retrieval from unstructured sources

2018 COE Engineering Design Project (EB01)

Faculty Lab Coordinator

Ebrahim Bagheri

Topic Category

Software / Data Engineering

Preamble

Search engines are able to retrieve web pages that have the highest information value to the users’ search queries. However, they are not able to identify and retrieve tabular information that are embedded within unstructured textual content on Web pages.

Objective

The goal of this project is to developing cutting techniques that allow for the efficient identification and retrieval of tables for a given user query. This has important implications for search in application specific domains such as financial data.

Partial Specifications

Students will learn about cutting edge techniques in Natural Language Processing and Information Retrieval.

Suggested Approach

The work will build upon and improve the work presented in https://arxiv.org/pdf/1802.06159.pdf

Group Responsibilities

The group will be responsible to replicate the work presented in the baseline paper, work with the FLC to further enhance the algorithm, implement the enhancements, perform experiments and report on the findings. All students will be responsible for understanding the baseline paper and participating in group brainstorming sessions.

Student A Responsibilities

Preparation of the dataset for the experiments and assist with the implementation of the enhancements to the baseline.

Student B Responsibilities

Implementation and Replication of the baseline code.

Student C Responsibilities

Leading the implementation of the enhanced code as well as running the experiments.

Course Co-requisites

N/A

EB01: Accurate table retrieval from unstructured sources | Ebrahim Bagheri | Friday August 24th 2018 at 12:27 AM