Global Counterfactual Explainer for Graph Neural Networks
Accepted paper Global Counterfactual Explanation for Graph Neural Networks at WSDM’23. Graph neural networks (GNNs) find applications in various domains such as computational biology, natural language processing, and computer security. Owing to their popularity, there is an increasing need to explain GNN predictions since GNNs are black-box machine learning models. One way to address this is counterfactual reasoning where the objective is to change the GNN prediction by minimal changes in the input graph. Existing methods for counterfactual explanation of GNNs are limited to instance-specific local reasoning. This approach has two major limitations of not being able to offer global recourse policies and overloading human cognitive ability with too much information. In this work, we study the global explainability of GNNs through global counterfactual reasoning. Specifically, we want to find a small set of representative counterfactual graphs that explains all input graphs. Towards this goal, we propose GCFExplainer, a novel algorithm powered by vertex-reinforced random walks on an edit map of graphs with a greedy summary. Extensive experiments on real graph datasets show that the global explanation from GCFExplainer provides important high-level insights of the model behavior and achieves a 46.9% gain in recourse coverage and a 9.5% reduction in recourse cost compared to the state-of-the-art local counterfactual explainers.
Signed Embedding for Polarized Graphs
Accepted paper POLE: Polarized Embedding for Signed Networks at WSDM’22. From the 2016 U.S. presidential election to the 2021 Capitol riots to the spread of misinformation related to COVID-19, many have blamed social media for today’s deeply divided society. Recent advances in machine learning for signed networks hold the promise to guide small interventions with the goal of reducing polarization in social media. However, existing models are especially ineffective in predicting conflicts (or negative links) among users. This is due to a strong correlation between link signs and the network structure, where negative links between polarized communities are too sparse to be predicted even by state-of-the-art approaches. To address this problem, we first design a partition-agnostic polarization measure for signed graphs based on the signed random-walk and show that many real-world graphs are highly polarized. Then, we propose POLE (POLarized Embedding for signed networks), a signed embedding method for polarized graphs that captures both topological and signed similarities jointly via signed autocovariance. Through extensive experiments, we show that POLE significantly outperforms state-of-the-art methods in signed link prediction, particularly for negative links with gains of up to one order of magnitude.
Graph Representation Learning Based on Random-walks
Research track paper A Broader Picture of Random-walk Based Graph Embedding at KDD’21. Graph embedding based on random-walks supports effective solutions for many graph-related downstream tasks. However, the abundance of embedding literature has made it increasingly difficult to compare existing methods and to identify opportunities to advance the state-of-the-art. Meanwhile, existing work has left several fundamental questions—such as how embeddings capture different structural scales and how they should be applied for effective link prediction—unanswered. This paper addresses these challenges with an analytical framework for random-walk based graph embedding that consists of three components: a random-walk process, a similarity function, and an embedding algorithm. Our framework not only categorizes many existing approaches but naturally motivates new ones. With it, we illustrate novel ways to incorporate embeddings at multiple scales to improve downstream task performance. We also show that embeddings based on autocovariance similarity, when paired with dot product ranking for link prediction, outperform state-of-the-art methods using Pointwise Mutual Information similarity by up to 100%.
XLNet Fine-tuning Tradeoffs with Limited Hardware
Course project for Information Retrieval. In this project, we explored alternative settings that allow XLNet, a state-of-the-art contextualized word embedding model, to fine-tune with limited hardware. We considered varying three different components: sequence length, number of fine-tuned layers, and downstream fully connected layers. We showed that for question answering with SQuAD 2.0 dataset, larger sequence lengths, more fine-tuned layers, and deeper fully connected layers lead to increased performance in terms of F1 score, but also increased time per iteration. Tradeoffs should be made depending on the available hardware and time when tuning the components.
Fault-tolerant Distributed Banking System Based on RAFT and blockchain
Course project for Advanced Distributed System. We implemented a fault-tolerant distributed banking application based on the RAFT distributed consensus protocol and blockchain. The system has multiple servers that maintain replicated lists of transactions made by clients, and store them in the format of blockchains. We applied Nakamoto’s Proof-of-Work concept that has been used in Bitcoin in the blockchain. RAFT has been used as the underlying consensus protocol to ensure proper blockchain replication. We also guarantee fault-tolerance for two types of failures: node failure (crash) and network partition. Detailed project description can be found on the GitHub page.
Prospect Theory for Group Decision Making Dynamics
Research project supported by Network Science of Teams, a Multidisciplinary University Research Initiative. Our goal is to understand how interaction in a group affects one’s behavior when making risky decisions (such as gambles). To this end, we extended the Prospect Theory to model group-level risky decision making dynamics with an interpersonal influence system. We collected and processed data from two human-subject experiments and applied our model to analyze the group behavior. Results show that the group behavior shifts towards consensus and the shift can be explained by the self-reported influence.
A Network Flow Approach to Bike Sharing
Course project for Topics in Combinatorial Algorithms. For a real-world bike-sharing scheme, re-balancing the bikes across different bike parking lots is a non-trivial problem. Specifically, since different stations can have different levels of supply and demand during the day, and the natural flow of bikes might not be enough to satisfy the demand, bike-sharing companies need to relocate the bikes once in a while to meet the demands, while minimizing the costs. In this project, we formulated the problem as a min-cost flow problem and solved it with efficient min-cost solvers. We also proposed a greedy algorithm that partitions the network according to a different time, which further increased the scalability. Experiment results showed that our algorithm can find the optimal relocation solution of large networks in a reasonable time and with a small trade-off of performance (~5%), the greedy version can find sub-optimal solutions in seconds.
Transfer Learning for Community Detection in Multiplex Networks
Part of overseas research training module and final year project of Yingcai Honors College. Based on our observation of real-world networks, we formulated the multiplex community detection problem as to refine community detection results in some layer with transferred knowledge from other layers, instead of finding a community partition across several or all layers. To solve this new problem, we provide a representation-based multiplex community detection framework, which first learns a shared common feature representation and layer-specific feature representations simultaneously and then cluster the combined representations for community partition. The framework is implemented with an extended symmetric non-negative matrix factorization approach for learning representations and k-means for clustering representations. Experimental results show that our implementation outperforms other methods in terms of modularity, especially when the target layer which receives transferred knowledge contains much noise (e.g. sparser than other layers).
Overlapping Community Detection Based on Game Theory-incorporated Label Propagation Dynamics
Part of the undergraduate research training module of Yingcai Honors College. Following the classical work of label propagation based dynamics algorithms (see, COPRA and SLPA), the proposed algorithm incorporated gain and loss functions from game theory to better model the intrinsic dynamics between agents in a social network. The gain function was introduced as the local overlapping modularity of a node to maximize global overlapping modularity. The loss function was based on the concept of ‘energy’, which determines how many labels an agent is capable of transferring. Similar agents tend to share labels easily, while labels between agents with different characteristics are hard to exchange, if not impossible. In terms of overlapping modularity, the algorithm yielded results comparable to state-of-the-art algorithms in real-life datasets, e.g., Zachary’s karate club, dolphins, netscience, etc.
Image Restoration Based on Ranked-order Based Adapative Median Filter
Course project for Digital Image Processing. Implemented Ranked-order based Adaptive Median Filter (RAMF, see Adaptive Median Filters: New Algorithms and Results) with Matlab, and compared the algorithm with simple median filters against salt-and-pepper noise. Results showed that while median filters with small mask sizes can not handle high-density noise effectively, those with large masks tend to blur the image to an unacceptable extent. On the contrary, RAMF can eliminate the impulses perfectly, while keeping other parts intact.
Signal Processing System for Imbedded Transparent Ultrasonic Emitter in Smartphones
Course project for Writing for Science and Technology. We designed a signal processing system that is suitable for use with the transparent ultrasonic emitter applied to the screen of a smartphone based on the US2016/0255441 A1. This system can help provide high‐quality audio in an efficient and economic manner, be easily embedded into a smartphone, and function well in harsh conditions with a long lifespan. A prototypical implementation of the system comprises six modules: an equalizer to handle the flat response, a compressor to narrow the range of audio amplitude, a low pass filter, a high pass filter, a modulator to combine the signal with a carrier ultrasonic wave generated from the oscillator, and another high pass filter to eliminate all audio frequencies. It is expected to both fill the gap of the missing design and bring considerable economic benefits.
Analysis about efficiency of a cycle containing a linear process
We calculated the efficiency of a cycle containing a non-standard linear process with either ideal gas or Van Der Waals gas as its medium. This process is composed of a linear expansion process with a negative slope, an isobaric compression process, and an isochoric pressurized process in a pV diagram. We also illustrated the endothermic and exothermic characteristics of this linear process and discussed the number of conversion points. We believe this analysis can serve as an useful example in the teaching of thermodynamics in universities.
Break through the Bottleneck: Optimizing Designs for Toll Plazas with a Cellular-automation Approach
A work for Problem B of Mathematical Contest in Modeling 2017. The objective of this project is to quantify the optimal design of the merging area in a toll plaza. Specifically, we first established a cellular-automation model to simulate the traffic flow of the fan-in area. Then, a general metric, Composite Performance Index (CPI), is formulated to evaluate the performance of a merging area design. After that, we implemented our model with different shapes and merging patterns and conducted sensitivity analysis. The results show that for a eight-to-four fan-in scheme, our purposely designed double-slant toll plaza, with approximately 1450.1 square yards’ coverage, outperforms traditional rectangular plazas and single-slant plazas.
Bath as the Roman Bath: Optimazing Strategies for Reheating Bathtub Based on Water Thermodynamics
A work for Problem A of Mathematical Contest in Modeling 2016. We provided a series of strategies that optimize water usage while maintaining satisfactory experience for bathers using modern bathtubs. Specifically, we accomplished the following: determination of temperate distribution with thermodynamics; development a measure of satisfactory temperature requirement; criteria for strategies that can be easy to handle and economical; and evaluation of versatility, in which different shapes of bathtubs, postures of bathers and characteristics of additives are simulated.
Face the Goodgrant Challenge
Course project for Operations Research. We provided a solution for the Goodgrant challenge in Mathematical Contest in Modeling 2016. We formulated it as an optimization problem. The objective function is the sum of increase in students’ performance in each university. The decision variables are the investment granted to each university. And the constraint is the total amount of money available. We first manually selected several dimensions that can indicate students’ performance, and applied principle component analysis to extract several principle components. Then, we fitted the current amount of donation with students’ performance with a fitting function. Finally, we determined the increment of students’ performance after donation and optimize our objective function, based on which we provided our selection of universities that are worth more donation.
Analog Communication System Design Based on Simulink Toolbox
Course project for Random Signal Analysis. We constructed a multi-channel analog communication system based on Amplitude Modulation (AM) broadcast system in Chengdu with Simulink toolbox.The system consists of a modulation module, a multiplex module, a channel module, a demultiplex module and a demodulation module. Based on our system, we discussed the influence of modulation methods and channel conditions on the multiple-channel system performance.
Frequency Spectrum Analysis Based on DFT
Course project for Digital Signal Processing. Presented a digital spectrum analysis scheme that clearly identifies the frequency components of a linear combination of sinusoids with relatively close frequencies but distinct amplitudes. In an effort to obtain the best resolution as well as saving computational burdens, the choices of the type and length of window function and length of DFT were discussed in details and a special method by avoiding spectrum leakage is highlighted.
Generation and Detection of DTMF Signals with GUI Support
Course project for Digital Signal Processing. Simulated the process of Dual-tone multi-frequency (DTMF) signal generation and detection based on methods implemented in hardware DSP modules with Matlab. Also designed a Graphical User Interface (GUI) for convenience and better illustration.
Parcel with Wings: A Zero-one Programming Model to Manage Logistic Centers for Online Retailers
A work for MCM mock competition in UESTC. We provided a solution for electronic commerce company to allocate its warehouses wisely across the states, maintaining delivery time and optimizing investment. Specifically, with our zero-one programming model, we accomplished the following: extraction of key features of warehouse locations and guarantee of one-day shipping; optimization in a real-life manner by incorporating several independent criteria, such as digital shopper density and tax rates; and a status ranking system to provide insight as to flexible adjustment when additional constraints such as no warehouse in a specific state are applied.
Computer Generation of Random Variables Based on Transformation Method
Course project for Random Signal Analysis. Implemented the transformation method in Matlab that is capable of generating any required random variables given probability density or mass function. Tested the algorithm by generating binomial random variables and two independent normal random variables and verifying their respective properties. In addition to those, some tricks for fast generation of binomial random variables and common fallacies in variation were discussed.
An Implementation of Multi-thread Capable HTTP Server Based on Java Socket
Course project for Computer Networks. Implemented an HTTP server based on Java Socket. This server is capable of handling files of html, jpg, ico, css, js, and pdf types, thus enabling the normal response of nearly any static websites, which is especially suitable for uses such as a personal homepage. In addition, taking advantage of the Thread class in Java, this server also supports multi-thread processing, providing correct responses when multiple users send queries concurrently. Finally, the applicability of this server was demonstrated by showing how it works properly for a personal homepage case.
Smart Bulb Design with Supporting Android Application
Part of Yingcai Innovation Training Course. We designed and fabricated a smart bulb with a supporting android application. The shape of this bulb is based on an ancient Chinese legend, the Lotus Lateran, and we fabricated the model with 3D printing technology. The bulb itself is capable of being controlled by touching, and can adjust the brightness based on environment light and infrared sensing (that is, can sense human approaching and turn itself on automatically). In addition, thanks to the Wifi communication module, it can be controlled by an Android application we developed once it connects to the Internet.
Empathy Experiment: Mute for a Day and Visual Impairment Simulation
Course project for Psychology and Modern Life. We designed two experiments that can rouse empathy of participants. The first one requires participants to be muted for a one-day trip in a foreign land, where the local language is incomprehensible to them. This experiment is aimed to help participants experience the feeling of the dumb and deaf. The second experiment is covering participants’ eyes or glasses with non-transparent materials, such as tapes and cellophane papers, in an effort to simulate visual impairment scenarios. Participants then are required to conduct a visually intensive task, such as watching a movie, and report their feeling. We did these experiments ourselves and provided first-hand analysis.
How to Create Delicious Moments of Joy: a Case Study for Management
Course project for Management. We interviewed a graduate from College of Management, National Chiao Tung University, who at that time, was a customer services & logistical lead in Modelez Taiwan Limited. The questions asked covers topics including managers, corporation culture and structure, employee pressure and countermeasure, objective and planning and strategy. We conduct comparative analysis about the answers we summarized and those exists in our textbook, and provided remarks about management in theory and in practice.
Role of Compulsoriness in Students' Evaluation of Teaching System
Extended from course projects of Academic English Training courses. Designed an instrument in form of questionnaire to measure students’ perception, motivation, feedback quality and self-reported compulsoriness of students’ evaluation of teaching system (SET). Based on results from over one hundred participants across China, the negative correlation between compulsoriness and students’ behaviors were revealed. In addition to that, some practical suggestions for enhancement of feedback for universities currently adopting a compulsory SET system were provided.
Amplitude-Modulation Broadcast System Design Based On Simulink Toolbox
Course project for Signal and System. A multi-channel analog communication system based on amplitude-modulation broadcast system in Chengdu was constructed. In addition to that, the effects of noise intensity and damping rate on system performance were analyzed. All the system was simulated using Simulink Toolbox in Matlab.
Simple Digital Library Based on Access/SQL
Course project for Data Structure and Algorithm. We designed a simple digital library with Microsoft Access as our platform. Based on relational database, the system has the following functions: book management, user management, borrow/return management, record query, etc. Thanks to Access, our library provides a nice graphical user interface that allows user without professional knowledge of database to access freely.
Fire Alarm Circuit Based on Temperature with Tunable Sensitivity
Course project for Analog Circuits. An analog circuit that can detect and provide an alarm for fire breaking was designed based on temperature change in a fire scenario. At a normal temperature, the circuit remains silent and shows no visual signal. Once the temperature reach an abnormal state, such as 50 degree Celsius, the circuit rings the buzzer and light the LED. The sensitivity or the temperature threshold can be easily tuned by adjusting the variable resistor. Both simulation results and hardware implementation were given, showing a consistency between theory and reality.
A Brief Introduction to The Realization of User Mobility in Modern Mobile Communication
Course project for Freshman Seminar: Communication System Session. One of the most prominent challenge for any kind of mobile communication system, support for user mobility, was discussed in this paper. Using Global System for Mobile Communication (GSM) as an example, the following aspects were covered: the network structure, mobility management techniques and call processing techniques.
A Design for Score Increase in Final-exam Review Period Based on Dynamic Programming
Course project for Mathematical Analysis II. From a very practical perspective of students, we designed an algorithm to help us manage our time for review before final examinations. The algorithm is in a dynamic programming manner, considering the discrete relationship between expected score increase and time invested. The time for reviewing were divided into two parts, namely, the phase when students still have to attend to at least one lectures and when all lectures end and time are at their total disposal. The optimal time allocation for our own case in that semester was provided.
Design and Test of A Frequency-Adjustable Square Wave Generator
Course project for Circuit Analysis. We designed a square wave generator with adjustable frequency. This circuit uses a 555 timer as the function generator module, which allows inexpensive implementation. To stabilize duty ratio to half, an additional triode was added. Simulation and hardware implementation were conducted and showed ideal and consistent performance.
Design of An Adjustable Voltage-Regulated DC Power Supply
Course project for Circuit Analysis. We designed a voltage-regulated DC power supply with adjustable output. The circuit composes of a transformer, a rectification module, a filter module and a voltage regulation module to accomplish the required output. The regulation is achieved with two three-terminal voltage regulators, i.e., LM317 and LM337, and the adjustable feature is supported by a potentiometer. Simulation results were presented and hardware implementation was conducted to test its performance.
Input-output Model, A Case Study of Sichuan Province
Course project for Linear Algebra. We applied a matrix algebra model for macro-economics to the case of Sichuan Province. Specifically, we collected the input-output table in 2007 from government website of Sichuan Province, and drafted a plan for the change of output products in the coming five years. Based on these figures, we derived the average growth rate for final demand of each industry. The computation was conducted with Mathematica.