C Lee Giles

C Lee Giles

Penn State University

H-index: 117

North America-United States

Professor Information

University

Penn State University

Position

___

Citations(all)

59431

Citations(since 2020)

16873

Cited By

47536

hIndex(all)

117

hIndex(since 2020)

60

i10Index(all)

458

i10Index(since 2020)

259

Email

University Profile Page

Penn State University

Research & Interests List

Information extraction

search engines

information retrieval

deep learning

digital libraries

Top articles of C Lee Giles

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing.

Authors

Ting-Yao Hsu,Chieh-Yang Huang,Shih-Hong Huang,Ryan Rossi,Sungchul Kim,Tong Yu,C Lee Giles,Ting-Hao K Huang

Published Date

2024/5

Automated Detection and Analysis of Data Practices Using A Real-World Corpus

Privacy policies are crucial for informing users about data practices, yet their length and complexity often deter users from reading them. In this paper, we propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Leveraging crowd-sourced annotations from the ToS;DR platform, we experiment with various methods to match policy excerpts with predefined data practice descriptions. We further conduct a case study to evaluate our approach on a real-world policy, demonstrating its effectiveness in simplifying complex policies. Experiments show that our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.

Authors

Mukund Srinath,Pranav Venkit,Maria Badillo,Florian Schaub,C Lee Giles,Shomir Wilson

Journal

arXiv preprint arXiv:2402.11006

Published Date

2024/2/16

Stability Analysis of Various Symbolic Rule Extraction Methods from Recurrent Neural Network

This paper analyzes two competing rule extraction methodologies: quantization and equivalence query. We trained RNN models, extracting DFA with a quantization approach (k-means and SOM) and DFA by equivalence query() methods across initialization seeds. We sampled the datasets from Tomita and Dyck grammars and trained them on RNN cells: LSTM, GRU, O2RNN, and MIRNN. The observations from our experiments establish the superior performance of O2RNN and quantization-based rule extraction over others. , primarily proposed for regular grammars, performs similarly to quantization methods for Tomita languages when neural networks are perfectly trained. However, for partially trained RNNs, shows instability in the number of states in DFA, e.g., for Tomita 5 and Tomita 6 languages, produced more than states. In contrast, quantization methods result in rules with number of states very close to ground truth DFA. Among RNN cells, O2RNN produces stable DFA consistently compared to other cells. For Dyck Languages, we observe that although GRU outperforms other RNNs in network performance, the DFA extracted by O2RNN has higher performance and better stability. The stability is computed as the standard deviation of accuracy on test sets on networks trained across seeds. On Dyck Languages, quantization methods outperformed with better stability in accuracy and the number of states. often showed instability in accuracy in the order of for GRU and MIRNN while deviation for quantization methods varied in . In many instances with LSTM and GRU, DFA's extracted by  …

Authors

Neisarg Dave,Daniel Kifer,C Lee Giles,Ankur Mali

Journal

arXiv preprint arXiv:2402.02627

Published Date

2024/2/4

A provably stable neural network turing machine with finite precision and time

We introduce a neural stack architecture with a differentiable parameterized stack operator approximating stack push and pop operations. We prove the stability of this stack architecture for arbitrarily many stack operations, showing that the state of the neural stack still closely resembles the state of a discrete stack. Using the neural stack with a recurrent neural network, we devise a neural network Pushdown Automaton (nnPDA). A new theoretical bound shows that an nnPDA can recognize any PDA using only finite precision state neurons in finite time. By using two neural stacks to construct a neural tape together with a recurrent neural network, we define a neural network Turing Machine (nnTM). Just like the neural stack, we show these architectures are also stable. Furthermore, we show the nnTM is Turing complete. It requires finite precision state neurons with an arbitrary number of stack neurons to recognize …

Authors

John Stogin,Ankur Mali,C Lee Giles

Journal

Information Sciences

Published Date

2024/2/1

Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text

The General Data Protection Regulation (GDPR) and other recent privacy laws require organizations to post their privacy policies, and place specific expectations on organisations' privacy practices. Privacy policies take the form of documents written in natural language, and one of the expectations placed upon them is that they remain up to date. To investigate legal compliance with this recency requirement at a large scale, we create a novel pipeline that includes crawling, regex-based extraction, candidate date classification and date object creation to extract updated and effective dates from privacy policies written in English. We then analyze patterns in policy dates using four web crawls and find that only about 40% of privacy policies online contain a date, thereby making it difficult to assess their regulatory compliance. We also find that updates in privacy policies are temporally concentrated around passage of …

Authors

Mukund Srinath,Lee Matheson,Pranav Narayanan Venkit,Gabriela Zanfir-Fortuna,Florian Schaub,C Lee Giles,Shomir Wilson

Published Date

2023/8/22

A prototype hybrid prediction market for estimating replicability of published work

We present a prototype hybrid prediction market and demonstrate the avenue it represents for meaningful human-AI collaboration. We build on prior work proposing artificial prediction markets as a novel machine learning algorithm. In an artificial prediction market, trained AI agents (bot traders) buy and sell outcomes of future events. Classification decisions can be framed as outcomes of future events, and accordingly, the price of an asset corresponding to a given classification outcome can be taken as a proxy for the systems confidence in that decision. By embedding human participants in these markets alongside bot traders, we can bring together insights from both. In this paper, we detail pilot studies with prototype hybrid markets for the prediction of replication study outcomes. We highlight challenges and opportunities, share insights from semi-structured interviews with hybrid market participants, and outline a vision for ongoing and future work.

Authors

P Lukowicz

Journal

HHAI 2023: Augmenting Human Intellect: Proceedings of the Second International Conference on Hybrid Human-Artificial Intelligence

Published Date

2023/7/7

Backpropagation-free deep learning with recursive local representation alignment

Training deep neural networks on large-scale datasets requires significant hardware resources whose costs (even on cloud platforms) put them out of reach of smaller organizations, groups, and individuals. Backpropagation (backprop), the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. Furthermore, researchers must continually develop various specialized techniques, such as particular weight initializations and enhanced activation functions, to ensure stable parameter optimization. Our goal is to seek an effective, neuro-biologically plausible alternative to backprop that can be used to train deep networks. In this paper, we propose a backprop-free procedure, recursive local representation alignment, for training large-scale architectures. Experiments with residual networks on CIFAR-10 and the large benchmark, ImageNet, show that our algorithm generalizes as well as backprop while converging sooner due to weight updates that are parallelizable and computationally less demanding. This is empirical evidence that a backprop-free algorithm can scale up to larger datasets.

Authors

Alexander G Ororbia,Ankur Mali,Daniel Kifer,C Lee Giles

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Published Date

2023/6/26

Summaries as captions: Generating figure captions for scientific documents with automated text summarization

Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEGASUS, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs (e.g., "Figure 3 shows...") into figure captions. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations. We further conducted an in-depth investigation focused on two key challenges: (i) the common presence of low-quality author-written captions and (ii) the lack of clear standards for good captions. Our code and data are available at: https://github.com/Crowd-AI-Lab/Generating-Figure-Captions-as-a-Text-Summarization-Task.

Authors

Chieh-Yang Huang,Ting-Yao Hsu,Ryan Rossi,Ani Nenkova,Sungchul Kim,Gromit Yeuk-Yin Chan,Eunyee Koh,Clyde Lee Giles,Ting-Hao'Kenneth' Huang

Published Date

2023/2/23

Professor FAQs

What is C Lee Giles's h-index at Penn State University?

The h-index of C Lee Giles has been 60 since 2020 and 117 in total.

What are C Lee Giles's research interests?

The research interests of C Lee Giles are: Information extraction, search engines, information retrieval, deep learning, digital libraries

What is C Lee Giles's total number of citations?

C Lee Giles has 59,431 citations in total.

What are the co-authors of C Lee Giles?

The co-authors of C Lee Giles are Prasenjit Mitra, David M Pennock, Daniel Kifer, Ah Chung Tsoi, Cornelia Caragea, Christian Omlin.

Co-Authors

H-index: 64
Prasenjit Mitra

Prasenjit Mitra

Penn State University

H-index: 55
David M Pennock

David M Pennock

Rutgers, The State University of New Jersey

H-index: 43
Daniel Kifer

Daniel Kifer

Penn State University

H-index: 42
Ah Chung Tsoi

Ah Chung Tsoi

University of Wollongong

H-index: 40
Cornelia Caragea

Cornelia Caragea

University of Illinois at Chicago

H-index: 26
Christian Omlin

Christian Omlin

Universitetet i Agder

academic-engine

Useful Links