For NLP practitioners, life is easier when text conforms to widely-held standards. One reason for this is that the huge annotated sets, on which NLP systems are often trained, conform to widely held standards. Another reason is that standardization limits the number of linguistic elements in a text.
Divergence from standards reveals the limitations of NLP systems. This is a common problem. But does it have common solutions? The field of text normalization attempts to find them.
What if you had only one reliable physical movement with which to ineract with a computer? Then you would be interested in binary typing. In these studies, we use information theory to design binary typing systems, and then we test them on actual humans. The results suggest that people are not fond of unfamiliar things, unpredictability, or messing up.
Beckley, Russell and Roark, Brian (2013), Pair Language Models for Deriving Alternative Pronunciations and Spellings from Pronunciation Dictionaries}, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. PDF
Brian Roark, Russ Beckley, Chris Gibbons and Melanie Fried-Oken. 2013. Huffman scanning: using language models within fixed-grid keyboard emulation. Computer Speech and Language, 27(6): 1212-1234. PDF
Bedrick, S., Beckley, R., Roark, B., & Sproat, R. (2012). Robust kaomoji detection in Twitter. Presented at the LSM '12: Proceedings of the Second Workshop on Language in Social Media, Association for Computational Linguistics.PDF
Beckley, Russ, & Roark, B. (2011). Asynchronous fixed-grid scanning with dynamic codes. Presented at the SLPAT '11: Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, Association for Computational Linguistics.PDF
Taghva, K., Beckley, R., & Coombs, J. (2011). Name extraction and formal concept analysis. Presented at the ICCS'11: Proceedings of the 19th international conference on Conceptual structures for discovering knowledge, Springer-Verlag. PDF
Taghva, K., Beckley, R., & Coombs, J. (2006). The effects of OCR error on the extraction of private information. Presented at the DAS'06: Proceedings of the 7th international conference on Document Analysis Systems, Springer-Verlag. PDF
Taghva, K., Beckley, R., & Coombs, J. (2007a). Extracting Carbon Copy Names and Organizations from a Heterogeneous Document Collection (Vol. 2). Presented at the ICDAR '07: Proceedings of the Ninth International Conference on Document Analysis and Recognition, IEEE Computer Society. PDF
Taghva, K., Beckley, R., & Sadeh, M. (2005a). A Stemming Algorithm for the Farsi Language (Vol. 1). Presented at the ITCC “05: Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC”05, IEEE Computer Society.) PDF
Taghva, K., Beckley, R., & Sadeh, M. (2005b). A stemming algorithm for the Farsi language (Vol. 1, pp. 158–162). Presented at the Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on. doi:10.1109/ITCC.2005.40