Bing Liu (computer scientist)

Last updated

Bing Liu (born 1963) is a Chinese-American professor of computer science who specializes in data mining, machine learning, and natural language processing. In 2002, he became a scholar at University of Illinois at Chicago. [1] He holds a PhD from the University of Edinburgh (1988). [2] [3] His PhD advisors were Austin Tate and Kenneth Williamson Currie, and his PhD thesis was titled Reinforcement Planning for Resource Allocation and Constraint Satisfaction. [4]

Contents

Academic research

He developed a mathematical model that can reveal fake advertising. [5] Also, he teaches the course "Data Mining" during the Fall and Spring semesters at UIC. The course usually involves a project and various quiz/examinations as grading criteria.

He is best known for his research on sentiment analysis (also called opinion mining), fake/deceptive opinion detection, and using association rules for prediction. He also made important contributions to learning from positive and unlabeled examples (or PU learning), Web data extraction, and interestingness in data mining.

Two of his research papers published in KDD-1998 and KDD-2004 received KDD Test-of-Time awards in 2014 and 2015. In 2013, he was elected chair of SIGKDD, ACM Special Interest Group on Knowledge Discovery and Data Mining.

Research on Association Rules For Prediction

Association rule-based classification takes into account the relationships between each item in a dataset and the class into which one is trying to classify that item. [6] The basis is that there are two classes, a positive class and a negative class, into which one classifies items. [6] Some classification algorithms only check if a case/item is in the positive class, without understanding how much exactly the probability of it being in that class is. [6] Liu and his collaborators described a new association rule-based classification algorithm that takes into account the relationship between items and the positive and negative classes. [6] Each item is given a probability or scoring of being in the positive class or the negative class. It then ranks the items as per which ones would be most likely to be in the positive class. [6]

Research on Sentiment Analysis

In a paper that Liu collaborated on, "Opinion Word Expansion and Target Extraction through Double Propagation", Qiu, Liu, Bu and Chen studied the relationship between opinion lexicons and opinion targets. [7] Opinion lexicons are word sets and opinion targets are topics on which there is an opinion. [7] The authors of that paper discuss how their algorithm uses a limited opinion word set with the topic and through double propagation, one is able to form a more detailed opinion word set on a set of sentences. Double propagation is the back and forth functional process between the word set and topic as the word set updates itself. [7] Some algorithms require set rules and thus are limited in what they can actually do and in what service they provide through updated opinion lists. [7] Their algorithm only requires an initial word set, which is updated through finding relations between the words in the set and the target word or vice versa. [7] The algorithm is done on a word population such as a set of sentences or a paragraph. [7]

Honors and awards

Publications

Peer-reviewed Article List

Related Research Articles

In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour", or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in Artificial General Intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy.

In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models.

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.

Data preprocessing can refer to manipulation, filtration or augmentation of data before it is analyzed, and is often an important step in the data mining process. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues.

Automatic taxonomy construction (ATC) is the use of software programs to generate taxonomical classifications from a body of texts called a corpus. ATC is a branch of natural language processing, which in turn is a branch of artificial intelligence.

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

<span class="mw-page-title-main">Author name disambiguation</span>

Author name disambiguation is a type of disambiguation and record linkage applied to the names of individual people. The process could, for example, distinguish individuals with the name "John Smith".

Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.

Approximate computing is an emerging paradigm for energy-efficient and/or high-performance design. It includes a plethora of computation techniques that return a possibly inaccurate result rather than a guaranteed accurate result, and that can be used for applications where an approximate result is sufficient for its purpose. One example of such situation is for a search engine where no exact answer may exist for a certain search query and hence, many answers may be acceptable. Similarly, occasional dropping of some frames in a video application can go undetected due to perceptual limitations of humans. Approximate computing is based on the observation that in many scenarios, although performing exact computation requires large amount of resources, allowing bounded approximation can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy. For example, in k-means clustering algorithm, allowing only 5% loss in classification accuracy can provide 50 times energy saving compared to the fully accurate classification.

Longbing Cao is an AI and data science researcher at the University of Technology Sydney, Australia. His broad research interest involves artificial intelligence, data science, behavior informatics, and their enterprise applications.

Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment. It studies the corresponding foundations, frameworks, algorithms, models, architectures, and evaluation systems for actionable knowledge discovery.

Huan Liu is a Chinese-born computer scientist.

<span class="mw-page-title-main">Ujjwal Maulik</span> Indian computer scientist (born 1965)

Ujjwal Maulik is an Indian computer scientist and a professor. He is the former chair of the Department of Computer Science and Engineering at Jadavpur University, Kolkata, West Bengal, India. He also held the position of the principal-in-charge and the head of the Department of Computer Science and Engineering at Kalyani Government Engineering College.

Multi-task optimization is a paradigm in the optimization literature that focuses on solving multiple self-contained tasks simultaneously. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics.

An associative classifier (AC) is a kind of supervised learning model that uses association rules to assign a target value. The term associative classification was coined by Bing Liu et al., in which the authors defined a model made of rules "whose right-hand side are restricted to the classification class attribute".

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

<span class="mw-page-title-main">Matthias Grossglauser</span> Swiss communication engineer

Matthias Grossglauser is a Swiss communication engineer. He is a professor of computer science at EPFL and co-director of the Information and Network Dynamics Laboratory (INDY) at EPFL's School of Computer and Communication Sciences School of Basic Sciences.

Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

References

  1. Christy Levy (February 19, 2013). "On the internet, no one knows you're lying" . Retrieved January 1, 2015.
  2. "Bing Liu CV - Biography".
  3. "Bing Liu - The Mathematics Genealogy Project". Mathematics Genealogy Project.
  4. Liu, Bin (1988). "Reinforcement Planning for Resource Allocation and Constraint Satisfaction". Edinburgh Research Archive. Retrieved 17 January 2022.
  5. David Streitfield (January 26, 2012). "For $2 a Star, an Online Retailer Gets 5-Star Product Reviews". The New York Times .
  6. 1 2 3 4 5 6 Liu, Bing; Ma, Yiming; Wong, Ching Kian; Yu, Philip S. (2003-03-01). "Scoring the Data Using Association Rules". Applied Intelligence. 18 (2): 119–135. doi: 10.1023/A:1021931008240 . ISSN   1573-7497. S2CID   10307615.
  7. 1 2 3 4 5 6 7 Qiu, Guang; Liu, Bing; Bu, Jiajun; Chen, Chun (March 2011). "Opinion Word Expansion and Target Extraction through Double Propagation". Computational Linguistics. 37 (1): 9–27. doi:10.1162/coli_a_00034. ISSN   0891-2017. S2CID   1578481.
  8. "ACM Fellows Named for Computing Innovations that Are Advancing Technology in the Digital Age". ACM. 8 December 2015. Archived from the original on 9 December 2015. Retrieved 9 December 2015.
  9. "AAAI Fellows Elected in 2016". AAAI. 2016. Retrieved 2 February 2016.
  10. Wu, Xindong; Kumar, Vipin; Ross Quinlan, J.; Ghosh, Joydeep; Yang, Qiang; Motoda, Hiroshi; McLachlan, Geoffrey J.; Ng, Angus; Liu, Bing; Yu, Philip S.; Zhou, Zhi-Hua (January 2008). "Top 10 algorithms in data mining". Knowledge and Information Systems. 14 (1): 1–37. doi:10.1007/s10115-007-0114-2. hdl: 10983/15329 . ISSN   0219-1377. S2CID   2367747.
  11. Liu, Bing (1995). "A unified framework for consistency check". International Journal of Intelligent Systems. 10 (8): 691–713. doi:10.1002/int.4550100802. S2CID   37397676.
  12. Zhang, Lei; Wang, Shuai; Liu, Bing (July 2018). "Deep learning for sentiment analysis: A survey". Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 8 (4). arXiv: 1801.07883 . doi: 10.1002/widm.1253 . ISSN   1942-4787.
  13. Wang, Guan; Xie, Sihong; Liu, Bing; Yu, Philip S. (September 2012). "Identify Online Store Review Spammers via Social Review Graph". ACM Transactions on Intelligent Systems and Technology. 3 (4): 1–21. doi:10.1145/2337542.2337546. ISSN   2157-6904. S2CID   6041150.
  14. Yu, Zeng; Li, Tianrui; Yu, Ning; Pan, Yi; Chen, Hongmei; Liu, Bing (2019-02-28). "Reconstruction of Hidden Representation for Robust Feature Extraction". ACM Transactions on Intelligent Systems and Technology. 10 (2): 1–24. arXiv: 1710.02844 . doi:10.1145/3284174. ISSN   2157-6904. S2CID   23537050.
  15. Wang, Jing; Yu, Clement T.; Yu, Philip S.; Liu, Bing; Meng, Weiyi (2015-10-26). "Diversionary Comments under Blog Posts". ACM Transactions on the Web. 9 (4): 1–34. doi:10.1145/2789211. ISSN   1559-1131. S2CID   15011104.
  16. Bing Liu; Wynne Hsu; Lai-Fun Mun; Hing-Yan Lee (November–December 1999). "Finding interesting patterns using user expectations". IEEE Transactions on Knowledge and Data Engineering. 11 (6): 817–832. doi:10.1109/69.824588.
  17. Yanhong Zhai; Bing Liu (December 2006). "Structured Data Extraction from the Web Based on Partial Tree Alignment". IEEE Transactions on Knowledge and Data Engineering. 18 (12): 1614–1628. doi:10.1109/TKDE.2006.197. ISSN   1041-4347. S2CID   506970.
  18. Yu, Huilin; Qian, Tieyun; Liang, Yile; Liu, Bing (December 2020). "AGTR: Adversarial Generation of Target Review for Rating Prediction". Data Science and Engineering. 5 (4): 346–359. doi: 10.1007/s41019-020-00141-1 . ISSN   2364-1185.
  19. Bing Liu (July 1997). "Route finding by using knowledge about the road network". IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans. 27 (4): 436–448. doi:10.1109/3468.594911.
  20. Liu, Bing (July 1993). "Problem acquisition in scheduling domains". Expert Systems with Applications. 6 (3): 257–265. doi:10.1016/0957-4174(93)90054-A.
  21. Liu, Bing (July 1993). "Knowledge-based factory scheduling: Resource allocation and constraint satisfaction". Expert Systems with Applications. 6 (3): 349–359. doi:10.1016/0957-4174(93)90060-J.
  22. Bing Liu; Grossman, R.; Yanhong Zhai (November 2004). "Mining Web Pages for Data Records". IEEE Intelligent Systems. 19 (6): 49–55. doi:10.1109/MIS.2004.68. ISSN   1541-1672. S2CID   3240731.
  23. Bing Liu; Wynne Hsu; Shu Chen; Yiming Ma (September 2000). "Analyzing the subjective interestingness of association rules". IEEE Intelligent Systems. 15 (5): 47–55. doi:10.1109/5254.889106. ISSN   1094-7167.
  24. Liu, Bing; Tuzhilin, Alexander (February 2008). "Managing large collections of data mining models". Communications of the ACM. 51 (2): 85–89. doi:10.1145/1314215.1314230. ISSN   0001-0782. S2CID   9140117.
  25. Liu, Qian; Gao, Zhiqiang; Liu, Bing; Zhang, Yuanlin (July 2016). "Automated rule selection for opinion target extraction". Knowledge-Based Systems. 104: 74–88. doi:10.1016/j.knosys.2016.04.010. S2CID   397572.
  26. Liu, Bing (June 2017). "Lifelong machine learning: a paradigm for continuous learning". Frontiers of Computer Science. 11 (3): 359–361. doi:10.1007/s11704-016-6903-6. ISSN   2095-2228. S2CID   3410376.
  27. Poria, Soujanya; Soon, Ong Yew; Liu, Bing; Bing, Lidong (March 2021). "Affect Recognition for Multimodal Natural Language Processing". Cognitive Computation. 13 (2): 229–230. doi: 10.1007/s12559-020-09738-0 . ISSN   1866-9956.
  28. Qian, Yuhua; Xu, Hang; Liang, Jiye; Liu, Bing; Wang, Jieting (2015-10-01). "Fusing Monotonic Decision Trees". IEEE Transactions on Knowledge and Data Engineering. 27 (10): 2717–2728. doi:10.1109/TKDE.2015.2429133. ISSN   1041-4347. S2CID   1906702.
  29. Wang, Hao; Yang, Yan; Liu, Bing; Fujita, Hamido (January 2019). "A study of graph-based system for multi-view clustering". Knowledge-Based Systems. 163: 1009–1019. doi:10.1016/j.knosys.2018.10.022. S2CID   56482120.
  30. Li, Huayi; Liu, Bing; Mukherjee, Arjun; Shao, Jidong (2014-09-30). "Spotting Fake Reviews using Positive-Unlabeled Learning". Computación y Sistemas. 18 (3). doi:10.13053/cys-18-3-2035. ISSN   1405-5546. S2CID   5264540.
  31. Zhai, Zhongwu; Liu, Bing; Wang, Jingyuan; Xu, Hua; Jia, Peifa (July 2012). "Product Feature Grouping for Opinion Mining". IEEE Intelligent Systems. 27 (4): 37–44. doi:10.1109/MIS.2011.38. ISSN   1541-1672. S2CID   1882536.
  32. Apte, Chidanand; Liu, Bing; Pednault, Edwin P. D.; Smyth, Padhraic (August 2002). "Business applications of data mining". Communications of the ACM. 45 (8): 49–53. doi:10.1145/545151.545178. ISSN   0001-0782. S2CID   15896869.
  33. Li, Yanni; Li, Hui; Wang, Zhi; Liu, Bing; Cui, Jiangtao; Fei, Hang (2020). "ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering". IEEE Transactions on Knowledge and Data Engineering. 34 (2): 617–630. doi:10.1109/TKDE.2020.2990196. ISSN   1041-4347. S2CID   218993907.
  34. Grossman, Robert; Kasturi, Pavan; Hamelberg, Donald; Liu, Bing (March 2004). "An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds". Journal of Bioinformatics and Computational Biology. 02 (1): 155–171. doi:10.1142/S021972000400051X. ISSN   0219-7200. PMID   15272437.
  35. Bing Liu; Siew-Hwee Choo; Shee-Ling Lok; Sing-Meng Leong; Soo-Chee Lee; Foong-Ping Poon; Hwee-Har Tan (October 1994). "Finding the shortest route using cases, knowledge, and Djikstra's algorithm". IEEE Expert. 9 (5): 7–11. doi:10.1109/64.331478. ISSN   0885-9000.
  36. Liu, Bing (March 1994). "Specific Constraint Handling in Constraint Satisfaction Problems". International Journal on Artificial Intelligence Tools. 03 (1): 79–96. doi:10.1142/S0218213094000066. ISSN   0218-2130.