Ebrahim Bagheri
Software / Data Engineering
Search engines are able to retrieve web pages that have the highest information value to the users’ search queries. However, they are not able to identify and retrieve tabular information that are embedded within unstructured textual content on Web pages.
The goal of this project is to developing cutting techniques that allow for the efficient identification and retrieval of tables for a given user query. This has important implications for search in application specific domains such as financial data.
Students will learn about cutting edge techniques in Natural Language Processing and Information Retrieval.
The work will build upon and improve the work presented in https://arxiv.org/pdf/1802.06159.pdf
The group will be responsible to replicate the work presented in the baseline paper, work with the FLC to further enhance the algorithm, implement the enhancements, perform experiments and report on the findings. All students will be responsible for understanding the baseline paper and participating in group brainstorming sessions.
Preparation of the dataset for the experiments and assist with the implementation of the enhancements to the baseline.
Implementation and Replication of the baseline code.
Leading the implementation of the enhanced code as well as running the experiments.
N/A
EB01: Accurate table retrieval from unstructured sources | Ebrahim Bagheri | Friday August 24th 2018 at 12:27 AM