Publications | Shengjie Li

2025

Shengjie Li, and Vincent Ng

To appear in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Nov 2025

Bib

@inproceedings{li-ng-2024-graph,
  title = {Graph-Based Multi-Trait Essay Scoring},
  author = {Li, Shengjie and Ng, Vincent},
  editor = {},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  month = nov,
  year = {2025},
  address = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
  url = {},
  doi = {},
  pages = {},
  pubstate = {forthcoming}
}

2024

ICLE++: Modeling Fine-Grained Traits for Holistic Essay Scoring

Shengjie Li, and Vincent Ng

In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Jun 2024

Abs Bib

The majority of the recently developed models for automated essay scoring (AES) are evaluated solely on the ASAP corpus. However, ASAP is not without its limitations. For instance, it is not clear whether models trained on ASAP can generalize well when evaluated on other corpora. In light of these limitations, we introduce ICLE++, a corpus of persuasive student essays annotated with both holistic scores and trait-specific scores. Not only can ICLE++ be used to test the generalizability of AES models trained on ASAP, but it can also facilitate the evaluation of models developed for newer AES problems such as multi-trait scoring and cross-prompt scoring. We believe that ICLE++, which represents a culmination of our long-term effort in annotating the essays in the ICLE corpus, contributes to the set of much-needed annotated corpora for AES research.
@inproceedings{li-ng-2024-icle, title = {{ICLE}++: Modeling Fine-Grained Traits for Holistic Essay Scoring}, author = {Li, Shengjie and Ng, Vincent}, editor = {Duh, Kevin and Gomez, Helena and Bethard, Steven}, booktitle = {Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)}, month = jun, year = {2024}, address = {Mexico City, Mexico}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.naacl-long.468}, doi = {10.18653/v1/2024.naacl-long.468}, pages = {8465--8486}, }

Automated Essay Scoring: Recent Successes and Future Directions

Shengjie Li, and Vincent Ng

In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Aug 2024

Survey Track

Bib

@inproceedings{ijcai2024p897,
  title = {Automated Essay Scoring: Recent Successes and Future Directions},
  author = {Li, Shengjie and Ng, Vincent},
  booktitle = {Proceedings of the Thirty-Third International Joint Conference on
                 Artificial Intelligence, {IJCAI-24}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor = {Larson, Kate},
  pages = {8114--8122},
  year = {2024},
  month = aug,
  note = {Survey Track},
  doi = {10.24963/ijcai.2024/897},
  url = {https://doi.org/10.24963/ijcai.2024/897},
}

Conundrums in Cross-Prompt Automated Essay Scoring: Making Sense of the State of the Art

Shengjie Li, and Vincent Ng

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

Abs Bib

Cross-prompt automated essay scoring (AES), an under-investigated but challenging task that has gained increasing popularity in the AES community, aims to train an AES system that can generalize well to prompts that are unseen during model training. While recently-developed cross-prompt AES models have combined essay representations that are learned via sophisticated neural architectures with so-called prompt-independent features, an intriguing question is: are complex neural models needed to achieve state-of-the-art results? We answer this question by abandoning sophisticated neural architectures and developing a purely feature-based approach to cross-prompt AES that adopts a simple neural architecture. Experiments on the ASAP dataset demonstrate that our simple approach to cross-prompt AES can achieve state-of-the-art results.
@inproceedings{li-ng-2024-conundrums, title = {Conundrums in Cross-Prompt Automated Essay Scoring: Making Sense of the State of the Art}, author = {Li, Shengjie and Ng, Vincent}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.414}, doi = {10.18653/v1/2024.acl-long.414}, pages = {7661--7681}, }
Automated Essay Scoring: A Reflection on the State of the Art

Shengjie Li, and Vincent Ng

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Abs Bib

The majority of the recently developed models for automated essay scoring (AES) are evaluated solely on the ASAP corpus. However, ASAP is not without its limitations. For instance, it is not clear whether models trained on ASAP can generalize well when evaluated on other corpora. In light of these limitations, we introduce ICLE++, a corpus of persuasive student essays annotated with both holistic scores and trait-specific scores. Not only can ICLE++ be used to test the generalizability of AES models trained on ASAP, but it can also facilitate the evaluation of models developed for newer AES problems such as multi-trait scoring and cross-prompt scoring. We believe that ICLE++, which represents a culmination of our long-term effort in annotating the essays in the ICLE corpus, contributes to the set of much-needed annotated corpora for AES research.
@inproceedings{li-ng-2024-automated, title = {Automated Essay Scoring: A Reflection on the State of the Art}, author = {Li, Shengjie and Ng, Vincent}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.emnlp-main.991/}, doi = {10.18653/v1/2024.emnlp-main.991}, pages = {17876--17888}, }

2023

Multimodal Propaganda Processing

Vincent Ng, and Shengjie Li

In Proceedings of the AAAI Conference on Artificial Intelligence, Sep 2023

Abs Bib

Propaganda campaigns have long been used to influence public opinion via disseminating biased and/or misleading information. Despite the increasing prevalence of propaganda content on the Internet, few attempts have been made by AI researchers to analyze such content. We introduce the task of multimodal propaganda processing, where the goal is to automatically analyze propaganda content. We believe that this task presents a long-term challenge to AI researchers and that successful processing of propaganda could bring machine understanding one important step closer to human understanding. We discuss the technical challenges associated with this task and outline the steps that need to be taken to address it.
@article{Ng_Li_2023, title = {Multimodal Propaganda Processing}, author = {Ng, Vincent and Li, Shengjie}, year = {2023}, month = sep, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {37}, number = {13}, pages = {15368--15375}, doi = {10.1609/aaai.v37i13.26792}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/26792}, }

2022

End-to-End Neural Discourse Deixis Resolution in Dialogue

Shengjie Li, and Vincent Ng

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022

Abs Bib

We adapt Lee et al.’s (2018) span-based entity coreference model to the task of end-to-end discourse deixis resolution in dialogue, specifically by proposing extensions to their model that exploit task-specific characteristics. The resulting model, dd-utt, achieves state-of-the-art results on the four datasets in the CODI-CRAC 2021 shared task.

@inproceedings{li-ng-2022-end,
  title = {End-to-End Neural Discourse Deixis Resolution in Dialogue},
  author = {Li, Shengjie and Ng, Vincent},
  year = {2022},
  month = dec,
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
  publisher = {Association for Computational Linguistics},
  address = {Abu Dhabi, United Arab Emirates},
  pages = {11322--11334},
  doi = {10.18653/v1/2022.emnlp-main.778},
  url = {https://aclanthology.org/2022.emnlp-main.778},
}

Neural Anaphora Resolution in Dialogue Revisited

Shengjie Li, Hideo Kobayashi, and Vincent Ng

In Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue, Oct 2022

Abs Bib

We present the systems that we developed for all three tracks of the CODI-CRAC 2022 shared task, namely the anaphora resolution track, the bridging resolution track, and the discourse deixis resolution track. Combining an effective encoding of the input using the SpanBERT_\textLarge encoder with an extensive hyperparameter search process, our systems achieved the highest scores in all phases of all three tracks.
@inproceedings{li-etal-2022-neural-anaphora, title = {Neural Anaphora Resolution in Dialogue Revisited}, author = {Li, Shengjie and Kobayashi, Hideo and Ng, Vincent}, year = {2022}, month = oct, booktitle = {Proceedings of the CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue}, publisher = {Association for Computational Linguistics}, address = {Gyeongju, Republic of Korea}, pages = {32--47}, url = {https://aclanthology.org/2022.codi-crac.4}, }

2021

The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution in Dialogue: A Cross-Team Analysis

Shengjie Li, Hideo Kobayashi, and Vincent Ng

In Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue, Nov 2021

Abs Bib

The CODI-CRAC 2021 shared task is the first shared task that focuses exclusively on anaphora resolution in dialogue and provides three tracks, namely entity coreference resolution, bridging resolution, and discourse deixis resolution. We perform a cross-task analysis of the systems that participated in the shared task in each of these tracks.

@inproceedings{li-etal-2021-codi,
  title = {The {CODI}-{CRAC} 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis Resolution in Dialogue: A Cross-Team Analysis},
  author = {Li, Shengjie and Kobayashi, Hideo and Ng, Vincent},
  year = {2021},
  month = nov,
  booktitle = {Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue},
  publisher = {Association for Computational Linguistics},
  address = {Punta Cana, Dominican Republic},
  pages = {71--95},
  doi = {10.18653/v1/2021.codi-sharedtask.8},
  url = {https://aclanthology.org/2021.codi-sharedtask.8},
}

Neural Anaphora Resolution in Dialogue

Hideo Kobayashi, Shengjie Li, and Vincent Ng

In Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue, Nov 2021

Abs Bib

We describe the systems that we developed for the three tracks of the CODI-CRAC 2021 shared task, namely entity coreference resolution, bridging resolution, and discourse deixis resolution. Our team ranked second for entity coreference resolution, first for bridging resolution, and first for discourse deixis resolution.

@inproceedings{kobayashi-etal-2021-neural,
  title = {Neural Anaphora Resolution in Dialogue},
  author = {Kobayashi, Hideo and Li, Shengjie and Ng, Vincent},
  year = {2021},
  month = nov,
  booktitle = {Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue},
  publisher = {Association for Computational Linguistics},
  address = {Punta Cana, Dominican Republic},
  pages = {16--31},
  doi = {10.18653/v1/2021.codi-sharedtask.2},
  url = {https://aclanthology.org/2021.codi-sharedtask.2},
}

2020

Cross-modal Coherence Modeling for Caption Generation

Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, and Matthew Stone

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020

Abs Bib

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning. Using an annotation protocol specifically devised for capturing image–caption coherence relations, we annotate 10,000 instances from publicly-available image–caption pairs. We introduce a new task for learning inferences in imagery and text, coherence relation prediction, and show that these coherence annotations can be exploited to learn relation classifiers as an intermediary step, and also train coherence-aware, controllable image captioning models. The results show a dramatic improvement in the consistency and quality of the generated captions with respect to information needs specified via coherence relations.
@inproceedings{alikhani-etal-2020-cross, title = {Cross-modal Coherence Modeling for Caption Generation}, author = {Alikhani, Malihe and Sharma, Piyush and Li, Shengjie and Soricut, Radu and Stone, Matthew}, year = {2020}, month = jul, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, publisher = {Association for Computational Linguistics}, address = {Online}, pages = {6525--6535}, doi = {10.18653/v1/2020.acl-main.583}, url = {https://aclanthology.org/2020.acl-main.583}, }