Table of Contents General
Chairs’ Welcome – This Year’s Theme: Data Science for Social Good Research Track Program Chairs’ Welcome Industry
& Government Track Program Chairs' Welcome Bloomberg
Welcome KDD’14 Conference on Knowledge Discovery & Data Mining Organization KDD'14 Research Track Senior Program Committee KDD'14 Research Track Program Committee KDD'14 Industry & Government Track Senior Program Committee KDD'14 Industry & Government Track Program Committee | |||||
(Return to Top) |
The Battle for the Future of Data Mining (Page
1) Data, Predictions, and Decisions in Support of People and Society (Page
2) A Data Driven Approach to Diagnosing and Treating Disease (Page
3) Bugbears Or Legitimate Threats? (Social) Scientists' Criticisms of Machine Learning? (Page
4) | ||||
(Return to Top) | Research Session 1: Location-based Services Prediction of Human Emergency Behavior and Their Mobility Following Large-scale Disaster (Page
5) Inferring User Demographics and Social Strategies in Mobile Social Networks (Page
15) Travel Time Estimation of a Path Using Sparse Trajectories (Page
25) Modeling Human Location Data with Mixtures of Kernel Densities (Page
35) A Cost-Effective Recommender System for Taxi Drivers (Page
45) | ||||
(Return to Top) | Research Session 2: Applications to Healthcare and Medicine I LUDIA: an Aggregate-Constrained Low-Rank Reconstruction Algorithm to Leverage Publicly Released Health Data (Page
55) People on Drugs: Credibility of User Statements in Health Communities (Page
65) Unfolding Physiological State: Mortality Modelling in Intensive Care Units (Page
75) Unsupervised Learning of Disease Progression Models (Page
85) Good-Enough Brain Model: Challenges, Algorithms and Discoveries in Multi-Subject Experiments (Page
95) | ||||
(Return to Top) | Research Session 3: Applications to Healthcare and Medicine II FUNNEL: Automatic Mining of Spatially Coevolving Epidemics (Page
105) Marble: High-Throughput Phenotyping from Electronic Health Records via Sparse Nonnegative Tensor Factorization (Page
115) Scalable Noise Mining in Long-Term Electrocardiographic Time-Series to Predict Death Following Heart Attacks (Page
125) From Micro to Macro: Data Driven Phenotyping by Densification of Longitudinal Electronic Medical Records (Page
135) Clinical Risk Prediction with Multilinear Sparse Logistic Regression (Page
145) Dual Beta Process Priors for Latent Cluster Discovery in Chronic Obstructive Pulmonary Disease (Page
155) | ||||
(Return to Top) | Research Session 4: Recommender Systems COM: A Generative Model for Group Recommendation (Page
163) Leveraging User Libraries to Bootstrap Collaborative Filtering (Page
173) Topic-Factorized Ideal Point Estimation Model for Legislative Voting Network (Page
183) Jointly Modeling Aspects, Ratings and Sentiments for Movie Recommendation (JMARS) (Page
193) User Effort Minimization Through Adaptive Diversification (Page
203) | ||||
(Return to Top) | Research Session 5: Clustering Relevant Overlapping Subspace Clusters on Categorical Data (Page
213) Batch Discovery of Recurring Rare Classes Toward Identifying Anomalous Samples (Page
223) A Dirichlet Multinomial Mixture Model-Based Approach for Short Text Clustering (Page
233) Representative Clustering of Uncertain Data (Page
243) SMVC: Semi-Supervised Multi-View Clustering in Subspace Projections (Page
253) | ||||
(Return to Top) | Research Session 6: Supervised Learning I FastXML: A Fast, Accurate and Stable Tree-Classifier for eXtreme Multi-label Learning (Page
263) A Multi-Class Boosting Method with Direct Optimization (Page
273) An Efficient Algorithm for Weak Hierarchical Lasso (Page
283) Online Multiple Kernel Regression (Page
293) Class-Distribution Regularized Consensus Maximization for Alleviating Overfitting in Model Combination (Page
303) | ||||
(Return to Top) | Research Session 7: Supervised Learning II Large Margin Distribution Machine (Page
313) Distance Metric Learning Using Dropout: A Structured Regularization Approach (Page
323) Box Drawings for Learning with Imbalanced Data (Page
333) Incremental and Decremental Training for Linear Classification (Page
343) Supervised Deep Learning with Auxiliary Networks (Page
353) | ||||
(Return to Top) | Research Session 8: Trend, Anomaly and Novelty Detection Sleep Analytics and Online Selective Anomaly Detection (Page
362) GLAD: Group Anomaly Detection in Social Media Analysis (Page
372) FBLG: A Simple and Effective Approach for Temporal Dependence Discovery from Time Series Data (Page
382) Learning Time-Series Shapelets (Page
392) Utilizing Temporal Patterns for Estimating Uncertainty in Interpretable Early Decision Making (Page
402) | ||||
(Return to Top) | Research Session 9: Data Streams Prototype-based Learning on Concept-drifting Data Streams (Page
412) Detecting Moving Object Outliers in Massive-Scale Trajectory Streams (Page
422) The Setwise Stream Classification Problem (Page
432) Streamed Approximate Counting of Distinct Elements: Beating Optimal Batch Methods (Page
442) Time-Varying Learning and Content Analytics via Sparse Factor Analysis (Page
452) | ||||
(Return to Top) | Research Session 10: Active Learning Active-Transductive Learning with Label-Adapted Kernels (Page
462) Active Learning for Sparse Bayesian Multilabel Classification (Page
472) Large-Scale Adaptive Semi-Supervised Learning via Unified Inductive and Transductive Model (Page
482) Active Semi-Supervised Learning Using Sampling Theory for Graph Signals (Page
492) Active Collaborative Permutation Learning (Page
502) | ||||
(Return to Top) | Research Session 11: Feature Selection Effective Global Approaches for Mutual Information Based Feature Selection (Page
512) Gradient Boosted Feature Selection (Page
522) Simultaneous Feature and Feature Group Selection Through Hard Thresholding (Page
532) Safe and Efficient Screening for Sparse Support Vector Machine (Page
542) Factorized Sparse Learning Models with Interpretable High Order Feature Interactions (Page
552) | ||||
(Return to Top) | Research Session 12: Statistical Techniques for Big Data Parallel Gibbs Sampling for Hierarchical Dirichlet Processes via Gamma Processes Equivalence (Page
562) Empirical Glitch Explanations (Page
572) Learning with Dual Heterogeneity: A Nonparametric Bayes Model (Page
582) Online Chinese Restaurant Process (Page
591) Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (Page
601) | ||||
(Return to Top) | Research Session 13: Scaling-up Methods for Big Data Improving the Modified Nyström Method Using Spectral Shifting (Page
611) Fast Flux Discriminant for Large-Scale Sparse Nonlinear Classification (Page
621) Scalable Histograms on Large Probabilistic Data (Page
631) Correlation Clustering in MapReduce (Page
641) Scaling Out Big Data Missing Value Imputations: Pythia vs. Godzilla (Page
651) | ||||
(Return to Top) | Research Session 14: Large-scale Optimization and Learning Efficient Mini-Batch Training for Stochastic Optimization (Page
661) Streaming Submodular Maximization: Massive Data Summarization on the Fly (Page
671) Distance Queries from Sampled Data: Accurate and Efficient (Page
681) Improved Testing of Low Rank Matrices (Page
691) DeepWalk: Online Learning of Social Representations (Page
701) | ||||
(Return to Top) | Research Session 15: Web Mining Open-Domain Quantity Queries on Web Tables: Annotation, Response, and Consensus Models (Page
711) Crowdsourced Time-sync Video Tagging Using Temporal and Personalized Topic Modeling (Page
721) Identifying and Labeling Search Tasks via Query-based Hawkes Processes (Page
731) LaSEWeb: Automating Search Strategies over Semi-Structured Web Data (Page
741) Personalized Search Result Diversification via Structured Learning (Page
751) | ||||
(Return to Top) | Research Session 16: Transfer Learning Efficient Multi-Task Feature Learning with Calibration (Page
761) Multi-Task Copula By Sparse Graph Regression (Page
771) Unifying Learning to Rank and Domain Adaptation: Enabling Cross-Task Document Scoring (Page
781) Scalable Heterogeneous Translated Hashing (Page
791) Matching Users and Items Across Domains to Improve the Recommendation Quality (Page
801) | ||||
(Return to Top) | Research Session 17: Recommendations and Ratings Optimal Recommendations Under Attraction, Aversion, and Social Influence (Page
811) ClusCite: Effective Citation Recommendationby Information Network-Based Clustering (Page
821) GeoMF: Joint Geographical Modeling and Matrix Factorization for Point-of-Interest Recommendation (Page
831) Detecting Anomalies in Dynamic Rating Data: A Robust Probabilistic Model for Rating Evolution (Page
841) Product Selection Problem: Improve Market Share by Learning Consumer Behavior (Page
851) | ||||
(Return to Top) | Research Session 18: Topic Modeling TCS: Efficient Topic Discovery Over Crowd-Oriented Service Data (Page
861) SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds (Page
871) Experiments with Non-Parametric Topic Models (Page
881) Reducing the Sampling Complexity of Topic Models (Page
891) Dynamics of News Events and Social Media Reaction (Page
901) | ||||
(Return to Top) | Research Session 19: Security and Privacy Differentially Private Network Data Release via Structural Inference (Page
911) Exponential Random Graph Estimation under Differential Privacy (Page
921) Top-k Frequent Itemsets via Differentially Private FP-Trees (Page
931) CatchSync: Catching Synchronized Behavior in Large Directed Graphs (Page
941) Mobile
App Recommendations with Security and Privacy Awareness (Page
951) | ||||
(Return to Top) | Research Session 20: Dimensionality Reduction Fast Dtt — A Near Linear Algorithm for Decomposing A Tensor into Factor Tensors (Page
967) Clustering and Projected Clustering with Adaptive Neighbors (Page
977) LWI-Svd: Low-rank, Windowed, Incremental Singular Value Decompositions on Time-Evolving Data Sets (Page
987) Provable Deterministic Leverage Score Sampling (Page
997) Semantic Visualization for Spherical Representation (Page
1007) | ||||
(Return to Top) | Research Session 21: Novel Applications Grouping Students in Educational Settings (Page
1017) Inferring Gas Consumption and Pollution Emission of Vehicles Throughout a City (Page
1027) Methods for Ordinal Peer Grading (Page
1037) Exploiting Geographic Dependencies for Real Estate Appraisal: A Mutual Perspective of Ranking and Clustering (Page
1047) Towards Scalable Critical Alert Mining (Page
1057) | ||||
(Return to Top) | Research Session 22: Crowds and Markets From Labor to Trader: Opinion Elicitation via Online Crowds as a Market (Page
1067) Optimal Real-Time Bidding for Display Advertising (Page
1077) Quantifying Herding Effects in Crowd Wisdom (Page
1087) Modeling Delayed Feedback in Display Advertising (Page
1097) Networked Bandits with Disjoint Linear Payoffs (Page
1106) | ||||
(Return to Top) | Research Session 23: Text Mining Mining Topics in Documents: Standing on the Shoulders of Big Data (Page
1116) Integrating Spreadsheet Data via Accurate and Low-Effort Extraction (Page
1126) Sentiment Expression Conditioned by Affective Transitions and Social Forces (Page
1136) Entity Profiling with Varying Source Reliabilities (Page
1146) Open Question Answering Over Curated and Extracted Knowledge Bases (Page
1156) | ||||
(Return to Top) | Research Session 24: Dynamic Graph Analysis Non-Parametric Scan Statistics for Event Detection and Forecasting in Heterogeneous Social Media Graphs (Page
1166) Event Detection in Activity Networks (Page
1176) FEMA: Flexible Evolutionary Multi-Faceted Analysis for Dynamic Behavioral Pattern Discovery (Page
1186) Profit-Maximizing Cluster Hires (Page
1196) On Social Event Organization (Page
1206) | ||||
(Return to Top) | Research Session 25: Diffusion in Social and Information Networks A Bayesian Framework for Estimating Properties of Network Diffusions (Page
1216) Scalable Diffusion-Aware Optimization of Network Topology (Page
1226) Probabilistic Latent Network Visualization: Inferring and Embedding Diffusion Networks (Page
1236) MMrate: Inferring Multi-Aspect Diffusion Networks with Multi-Pattern Cascades (Page
1246) Stability of Influence Maximization (Page
1256) | ||||
(Return to Top) | Research Session 26: Social and Information Networks Who to Follow and Why: Link Prediction with Explanations (Page
1266) Activity-edge Centric Multi-label Classification for Mining Heterogeneous Information Networks (Page
1276) Meta-Path Based Multi-Network Collective Link Prediction (Page
1286) Fast Influence-based Coarsening for Large Networks (Page
1296) Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network (Page
1306) | ||||
(Return to Top) | Research Session 27: Graph Mining and Modeling Core Decomposition of Uncertain Graphs (Page
1316) Learning Multifractal Structure in Large Networks (Page
1326) Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization (Page
1336) Focused Clustering and Outlier Detection in Large Attributed Graphs (Page
1346) Inside the Atoms: Ranking on a Network of Networks (Page
1356) | ||||
(Return to Top) | Research Session 28: Network Community Detection Community Membership Identification from Small Seed Sets (Page
1366) Community Detection in Graphs through Correlation (Page
1376) Heat Kernel Based Community Detection (Page
1386) On the Permanence of Vertices in Network Communities (Page
1396) The Interplay Between Dynamics and Networks: Centrality, Communities, and Cheeger Inequality (Page
1406) | ||||
(Return to Top) | Research Session 29: Scaling-up Graph Algorithms Almost Linear-Time Algorithms for Adaptive Betweenness Centrality Using Hypergraph Sketches (Page
1416) Efficient SimRank Computation via Linearization (Page
1426) FAST-Ppr: Scaling Personalized PageRank Estimation for Large Graphs (Page
1436) Graph Sample and Hold: A Framework for Big-Graph Analytics (Page
1446) Balanced Graph Edge Partition (Page
1456) | ||||
(Return to Top) | Research Session 30: Social Network Analysis Using Strong Triadic Closure to Characterize Ties in Social Networks (Page
1466) Network Structural Analysis via Core-Tree-Decomposition (Page
1476) Analyzing Expert Behaviors in Collaborative Networks (Page
1486) Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint (Page
1496) Who Are Experts Specializing in Landscape Photography? Analyzing Topic-Specific Authority on Content Sharing Services (Page
1506) | ||||
(Return to Top) | Industry & Government Invited Talks Frontiers in E-commerce Personalization (Page
1516) Predictive Modeling in Practice (Page
1517) Medicine in the Age of Electronic Health Records (Page
1518) Algorithms for Interpretable Machine Learning (Page
1519) Data Science Through the Lens of Social Science (Page
1520) Information Environment Security (Page
1521) Big Data for Social Good (Page
1522) Bringing Data Science to the Speakers of Every Language (Page
1523) | ||||
(Return to Top) |
Guilt by Association: Large Scale Malware Detection by Mining File-relation Graphs (Page
1524) Mining Text Snippets for Images on the Web (Page
1534) Predicting Student Risks Through Longitudinal Analysis (Page
1544) Novel Geospatial Interpolation Analytics for General Meteorological Measurements (Page
1553) Targeting Direct Cash Transfers to the Extremely Poor (Page
1563) Scalable Hands-Free Transfer Learning for Online Advertising (Page
1573) Correlating Events with Time Series for Incident Diagnosis (Page
1583) Proactive Workflow Modeling By Stochastic Processes with Application to Healthcare Operation and Management (Page
1593) Activity Ranking in LinkedIn Feed (Page
1603) Budget Pacing for Targeted Online Advertisements at LinkedIn (Page
1613) Large Scale Predictive Modeling for Micro-Simulation of 3G Air Interface Load (Page
1620) Unveiling Clusters of Events for Alert and Incident Management in Large-Scale Enterprise IT (Page
1630) Style in the Long Tail: Discovering Unique Interests with Latent Variable Models in Large Scale Social E-Commerce (Page
1640) Corporate Residence Fraud Detection (Page
1650) Modeling Mass Protest Adoption in Social Network Communities Using Geometric Brownian Motion (Page
1660) Shallow Semantic Parsing of Product Offering Titles (for better automatic hyperlink insertion) (Page
1670) A Case Study: Privacy Preserving Release of Spatio-Temporal Density in Paris (Page
1679) Scalable Near Real-Time Failure Localization of Data Center Networks (Page
1689) Improving Management of Aquatic Invasions by Integrating Shipping Network, Ecological, and Environmental Data: Data Mining for Social Good (Page
1699) FoodSIS: A Text Mining System to Improve the State of Food Safety in Singapore (Page
1709) A Hazard Based Approach to User Return Time Prediction (Page
1719) Predicting Employee Expertise for Talent Management in the Enterprise (Page
1729) Applying Data Mining Techniques to Address Critical Process Optimization Needs in Advanced Manufacturing (Page
1739) EARS (Earthquake Alert and Report System): A Real Time Decision Support System for Earthquake Crisis Management (Page
1749) Knock It Off: Profiling the Online Storefronts of Counterfeit Merchandise (Page
1759) Up Next: Retrieval Methods for Large Scale Related Video Suggestion (Page
1769) Identifying Tourists from Public Transport Commuters (Page
1779) Spatially Embedded Co-Offence Prediction Using Supervised Learning (Page
1789) Beating the News' with EMBERS: Forecasting Civil Unrest Using Open Source Indicators (Page
1799) LASTA: Large Scale Topic Assignment on Multiple Social Networks (Page
1809) New Algorithms for Parking Demand Management and a City Scale Deployment (Page
1819) Reducing Gang Violence Through Network Influence Based Targeting of Social Programs (Page
1829) Modeling Impression Discounting in Large-scale Recommender Systems (Page
1837) ISIS: A Networked-Epidemiology Based Pervasive Web App for Infectious Disease Pandemic Planning and Response (Page
1847) Seven Rules of Thumb for Web Site Experimenters (Page
1857) Log-based Predictive Maintenance (Page
1867) Automated Hypothesis Generation Based on Mining Scientific Literature (Page
1877) A System to Grade Computer Programming Skills Using Machine Learning (Page
1887) An Empirical Study of Reserve Price Optimisation in Real-Time Bidding (Page
1897) Large-Scale High-Precision Topic Modeling on Twitter (Page
1907) Early Prediction of Code Blue Using Electronic Medical Records (Page
1917) Large Scale Visual Recommendations from Street Fashion Images (Page
1925) We Know What You Want to Buy: A Demographic-based System for Product Recommendation on Microblogs (Page
1935) Modeling Professional Similarity by Mining Professional Career Trajectories (Page
1945) Filling Context-Ad Vocabulary Gaps with Click Logs (Page
1955) | ||||
(Return to Top) |
Does Social Good Justify Risking Personal Privacy? (Page
1965) | ||||
(Return to Top) |
Scaling Up Deep Learning (Page
1966) Constructing and Mining Web-Scale Knowledge Graphs: KDD 2014 Tutorial (Page
1967) Bringing Structure to Text: Mining Phrases, Entities, Topics, and Hierarchies (Page
1968) Computational Epidemiology (Page
1969) Management and Analytic of Biomedical Big Data with Cloud-Based In-Memory Database and Dynamic Querying: A Hands-on Experience with Real-world Data (Page
1970) The Recommender Problem Revisited: Morning Tutorial (Page
1971) Correlation Clustering: from Theory to Practice (Page
1972) Deep Learning (Page
1973) Network Mining and Analysis for Social Applications (Page
1974) Sampling for Big Data: A Tutorial (Page
1975) Statistically Sound Pattern Discovery (Page
1976) Recommendation in Social Media: Recent Advances and New Frontiers (Page
1977) |
||||