Abstract: Tracking and collecting fast-evolving online discussions provides vast data for studying social media usage and its role in people's public lives. However, collecting social media data using a static set of keywords fails to satisfy the growing need to monitor dynamic conversations and to study fast-changing topics. We propose a dynamic keyword search method to maximize the coverage of relevant information in fast-evolving online discussions. The method uses word embedding models to represent the semantic relations between keywords and predictive models to forecast the future trajectory of keywords. We also implement a visual user interface to aid in the decision making process in each round of keyword updates. This allows for both human-assisted tracking and fully-automated data collection. In simulations using historical #MeToo data in 2017, our human-assisted tracking method outperforms the traditional static baseline method significantly, achieving 37.1% improvement in F-1 score in the task of tracking the top trending keywords. We conduct a contemporary case study to cover dynamic conversations about the recent Presidential Inauguration and to test the dynamic data collection system. Our case studies reflect the effectiveness of our process and also points to the potential challenges in future deployment.

Publication: arXiv
ID: CaltechAUTHORS:20210510-121148191

]]>

Abstract: We study the problem of obtaining accurate policy gradient estimates using a finite number of samples. Monte-Carlo methods have been the default choice for policy gradient estimation, despite suffering from high variance in the gradient estimates. On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational complexity. In this work, we propose deep Bayesian quadrature policy gradient (DBQPG), a computationally efficient high-dimensional generalization of Bayesian quadrature, for policy gradient estimation. We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a significantly lower variance, (ii) a consistent improvement in the sample complexity and average return for several deep policy gradient algorithms, and, (iii) the uncertainty in gradient estimation that can be incorporated to further improve the performance.

Publication: arXiv Vol.: 35
ID: CaltechAUTHORS:20201106-120212166

]]>

Abstract: We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LQGOPT, a novel adaptive control algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LQGOPT efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LQGOPT, and prove the first Õ(√T) regret upper bound for adaptive control of linear quadratic Gaussian (LQG) systems with convex cost, where T is the time horizon of the problem.

Publication: arXiv
ID: CaltechAUTHORS:20200403-141835981

]]>

Abstract: Humans have an inherent ability to learn novel concepts from only a few samples and generalize these concepts to different situations. Even though today's machine learning models excel with a plethora of training data on standard recognition tasks, a considerable gap exists between machine-level pattern recognition and human-level concept learning. To narrow this gap, the Bongard Problems (BPs) were introduced as an inspirational challenge for visual cognition in intelligent systems. Albeit new advances in representation learning and learning to learn, BPs remain a daunting challenge for modern AI. Inspired by the original one hundred BPs, we propose a new benchmark Bongard-LOGO for human-level concept learning and reasoning. We develop a program-guided generation technique to produce a large set of human-interpretable visual cognition problems in action-oriented LOGO language. Our benchmark captures three core properties of human cognition: 1) context-dependent perception, in which the same object may have disparate interpretations given different contexts; 2) analogy-making perception, in which some meaningful concepts are traded off for other meaningful concepts; and 3) perception with a few samples but infinite vocabulary. In experiments, we show that the state-of-the-art deep learning methods perform substantially worse than human subjects, implying that they fail to capture core human cognition properties. Finally, we discuss research directions towards a general architecture for visual reasoning to tackle this benchmark.

Publication: arXiv
ID: CaltechAUTHORS:20201109-074710530

]]>

Abstract: Causal discovery is at the core of human cognition. It enables us to reason about the environment and make counterfactual predictions about unseen scenarios that can vastly differ from our previous experiences. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure. In particular, our goal is to discover the structural dependencies among environmental and object variables: inferring the type and strength of interactions that have a causal effect on the behavior of the dynamical system. Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution induced by the detected keypoints, and (c) a dynamics module that can predict the future by conditioning on the inferred graph. We assume access to different configurations and environmental conditions, i.e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions. We evaluate our method in a planar multi-body interaction environment and scenarios involving fabrics of different shapes like shirts and pants. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions. The causal structure assumed by the model also allows it to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes.

Publication: arXiv
ID: CaltechAUTHORS:20201109-123525639

]]>

Abstract: Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting. This is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. In this paper, we propose a higher-order convolutional LSTM model that can efficiently learn these correlations, along with a succinct representations of the history. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. To make this feasible in terms of computation and memory requirements, we propose a novel convolutional tensor-train decomposition of the higher-order model. This decomposition reduces the model complexity by jointly approximating a sequence of convolutional kernels as a low-rank tensor-train factorization. As a result, our model outperforms existing approaches, but uses only a fraction of parameters, including the baseline models. Our results achieve state-of-the-art performance in a wide range of applications and datasets, including the multi-steps video prediction on the Moving-MNIST-2 and KTH action datasets as well as early activity recognition on the Something-Something V2 dataset.

Publication: arXiv
ID: CaltechAUTHORS:20200402-134911700

]]>

Abstract: Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam—a multiplicative version of the Adam optimiser—and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.

Publication: arXiv
ID: CaltechAUTHORS:20201106-120208748

]]>

Abstract: One of the main challenges in using deep learning-based methods for simulating physical systems and solving partial differential equations (PDEs) is formulating physics-based data in the desired structure for neural networks. Graph neural networks (GNNs) have gained popularity in this area since graphs offer a natural way of modeling particle interactions and provide a clear way of discretizing the continuum models. However, the graphs constructed for approximating such tasks usually ignore long-range interactions due to unfavorable scaling of the computational complexity with respect to the number of nodes. The errors due to these approximations scale with the discretization of the system, thereby not allowing for generalization under mesh-refinement. Inspired by the classical multipole methods, we purpose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Our multi-level formulation is equivalent to recursively adding inducing points to the kernel matrix, unifying GNNs with multi-resolution matrix factorization of the kernel. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.

Publication: arXiv
ID: CaltechAUTHORS:20201106-120222366

]]>

Abstract: Neural networks are vulnerable to input perturbations such as additive noise and adversarial attacks. In contrast, human perception is much more robust to such perturbations. The Bayesian brain hypothesis states that human brains use an internal generative model to update the posterior beliefs of the sensory input. This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of an internal generative model and the external environment. Inspired by such hypothesis, we enforce self-consistency in neural networks by incorporating generative recurrent feedback. We instantiate this design on convolutional neural networks (CNNs). The proposed framework, termed Convolutional Neural Networks with Feedback (CNN-F), introduces a generative feedback with latent variables to existing CNN architectures, where consistent predictions are made through alternating MAP inference under a Bayesian framework. In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.

Publication: arXiv
ID: CaltechAUTHORS:20201106-120201944

]]>

Abstract: Scale has been central to the success of deep learning with the availability of large-scale data and compute infrastructure. However, for further progress, scale has to be coupled with novel algorithms. Next-generation AI will be unsupervised, robust and adaptive. It will incorporate more structure and domain knowledge. Examples include tensors, graphs, physical laws, and simulations. I will describe efficient frameworks that enable developers to easily prototype such models, e.g., Tensorly to incorporate tensorized architectures, NVIDIA Isaac to incorporate physically valid simulations and NVIDIA RAPIDS for end-to-end data analytics. I will then lay out some outstanding problems in this area.

ID: CaltechAUTHORS:20210507-122910987

]]>

Abstract: We propose MESHFREEFLOWNET, a novel deep learning-based super-resolution framework to generate continuous (grid-free) spatio-temporal solutions from the lowresolution inputs. While being computationally efficient, MESHFREEFLOWNET accurately recovers the fine-scale quantities of interest. MESHFREEFLOWNET allows for: (i) the output to be sampled at all spatio-temporal resolutions, (ii) a set of Partial Differential Equation (PDE) constraints to be imposed, and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal domains owing to its fully convolutional encoder. We empirically study the performance of MESHFREEFLOWNET on the task of super-resolution of turbulent flows in the Rayleigh-Bénard convection problem. Across a diverse set of evaluation metrics, we show that MESHFREEFLOWNET significantly outperforms existing baselines. Furthermore, we provide a large scale implementation of MESHFREEFLOWNET and show that it efficiently scales across large clusters, achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of less than 4 minutes. We provide an opensource implementation of our method that supports arbitrary combinations of PDE constraints.

Publication: arXiv
ID: CaltechAUTHORS:20200526-153937049

]]>

Abstract: In this work, we propose a robust network-in-the-loop control system for autonomous navigation and landing of an Unmanned-Aerial-Vehicle (UAV). To estimate the UAV’s absolute pose, we develop a deep neural network (DNN) architecture for visual-inertial odometry, which provides a robust alternative to traditional methods. We first evaluate the accuracy of the estimation by comparing the prediction of our model to traditional visual-inertial approaches on the publicly available EuRoC MAV dataset. The results indicate a clear improvement in the accuracy of the pose estimation up to 25% over the baseline. Finally, we integrate the data-driven estimator in the closed-loop flight control system of Airsim, a simulator available as a plugin for Unreal Engine, and we provide simulation results for autonomous navigation and landing.

Publication: arXiv
ID: CaltechAUTHORS:20200108-154918519

]]>

Abstract: Sketching is a randomized dimensionalityreduction method that aims to preserve relevant information in large-scale datasets. In this paper, we propose a novel extension known as Higher-order Count Sketch (HCS). We derive efficient (approximate) computation of various tensor operations such as tensor products and tensor contractions directly on the sketched data. HCS is the first sketch to fully exploit the multi-dimensional nature of higher-order tensors.

Publication: arXiv
ID: CaltechAUTHORS:20190327-085821224

]]>

Abstract: We introduce a new algorithm for the numerical computation of Nash equilibria of competitive two-player games. Our method is a natural generalization of gradient descent to the two-player setting where the update is given by the Nash equilibrium of a regularized bilinear local approximation of the underlying game. It avoids oscillatory and divergent behaviors seen in alternating gradient descent. Using numerical experiments and rigorous analysis, we provide a detailed comparison to methods based on optimism and consensus and show that our method avoids making any unnecessary changes to the gradient dynamics while achieving exponential (local) convergence for (locally) convex-concave zero sum games. Convergence and stability properties of our method are robust to strong interactions between the players, without adapting the stepsize, which is not the case with previous methods. In our numerical experiments on non-convex-concave problems, existing methods are prone to divergence and instability due to their sensitivity to interactions among the players, whereas we never observe divergence of our algorithm. The ability to choose larger stepsizes furthermore allows our algorithm to achieve faster convergence, as measured by the number of model evaluations.

Publication: 33rd Conference on Neural Information Processing Systems
ID: CaltechAUTHORS:20190905-154241002

]]>

Abstract: Precise near-ground trajectory control is difficult for multi-rotor drones, due to the complex aerodynamic effects caused by interactions between multi-rotor airflow and the environment. Conventional control methods often fail to properly account for these complex effects and fall short in accomplishing smooth landing. In this paper, we present a novel deep-learning-based robust nonlinear controller (Neural-Lander) that improves control performance of a quadrotor during landing. Our approach combines a nominal dynamics model with a Deep Neural Network (DNN) that learns high-order interactions. We apply spectral normalization (SN) to constrain the Lipschitz constant of the DNN. Leveraging this Lipschitz property, we design a nonlinear feedback linearization controller using the learned model and prove system stability with disturbance rejection. To the best of our knowledge, this is the first DNN-based nonlinear feedback controller with stability guarantees that can utilize arbitrarily large neural nets. Experimental results demonstrate that the proposed controller significantly outperforms a Baseline Nonlinear Tracking Controller in both landing and cross-table trajectory tracking cases. We also empirically show that the DNN generalizes well to unseen data outside the training domain.

Publication: arXiv
ID: CaltechAUTHORS:20190205-100744248

]]>

Abstract: We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share statistical strength across sub-word structures (e.g. Latin roots), producing accurate representations of rare, misspelt, or even unseen words. Moreover, each component of the mixture can capture a different word sense. Probabilistic FastText outperforms both FastText, which has no probabilistic model, and dictionary-level probabilistic embeddings, which do not incorporate subword structures, on several word-similarity benchmarks, including English RareWord and foreign language datasets. We also achieve state-of-art performance on benchmarks that measure ability to discern different meanings. Thus, the proposed model is the first to achieve multi-sense representations while having enriched semantics on rare words.

Publication: arXiv
ID: CaltechAUTHORS:20190327-085800530

]]>

Abstract: Visual Question Answering (VQA) requires integration of feature maps with drastically different structures. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently follow a temporal sequence and naturally cluster into semantically different question types. A lot of previous works use complex models to extract feature representations but neglect to use high-level information summary such as question types in learning. In this work, we propose Question Type-guided Attention (QTA). It utilizes the information of question type to dynamically balance between bottom-up and top-down visual features, respectively extracted from ResNet and Faster R-CNN networks. We experiment with multiple VQA architectures with extensive input ablation studies over the TDIUC dataset and show that QTA systematically improves the performance by more than 5% across multiple question type categories such as “Activity Recognition”, “Utility” and “Counting” on TDIUC dataset compared to the state-of-art. By adding QTA on the state-of-art model MCB, we achieve 3% improvement in overall accuracy. Finally, we propose a multi-task extension to predict question types which generalizes QTA to applications that lack question type, with a minimal performance loss.

Publication: arXiv Vol.: IV No.: 11208
ID: CaltechAUTHORS:20190327-085753056

]]>

Abstract: We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) model, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. We apply our method to a wide range of Atari games in Arcade Learning Environments. Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster than a key baseline, double deep Q network DDQN.

ID: CaltechAUTHORS:20181101-121222226

]]>

Abstract: Tensors offer a natural representation for many kinds of data frequently encountered in machine learning. Images, for example, are naturally represented as third order tensors, where the modes correspond to height, width, and channels. In particular, tensor decompositions are noted for their ability to discover multi-dimensional dependencies and produce compact low-rank approximations of data. In this paper, we explore the use of tensor contractions as neural network layers and investigate several ways to apply them to activation tensors. Specifically, we propose the Tensor Contraction Layer (TCL), the first attempt to incorporate tensor contractions as end-to-end trainable neural network layers. Applied to existing networks, TCLs reduce the dimensionality of the activation tensors and thus the number of model parameters. We evaluate the TCL on the task of image recognition, augmenting popular networks (AlexNet, VGG). The resulting models are trainable end-to-end. We evaluate TCL's performance on the task of image recognition, using the CIFAR100 and ImageNet datasets, studying the effect of parameter reduction via tensor contraction on performance. We demonstrate significant model compression without significant impact on the accuracy and, in some cases, improved performance.

ID: CaltechAUTHORS:20180321-103123441

]]>

Abstract: Tensor decomposition is positioned to be a pervasive tool in the era of big data. In this paper, we resolve many of the key algorithmic questions regarding robustness, memory efficiency, and differential privacy of tensor decomposition. We propose simple variants of the tensor power method which enjoy these strong properties. We propose the first streaming method with a linear memory requirement. Moreover, we present a noise calibrated tensor power method with efficient privacy guarantees. At the heart of all these guarantees lies a careful perturbation analysis derived in this paper which improves up on the existing results significantly.

Publication: arXiv
ID: CaltechAUTHORS:20190401-123322786

]]>

Abstract: Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. In this paper, we propose and evaluate new BLAS-like primitives that are capable of performing a wide range of tensor contractions on CPU and GPU efficiently. We begin by focusing on single-index contractions involving all the possible configurations of second-order and third-order tensors. Then, we discuss extensions to more general cases. Existing approaches for tensor contractions spend large amounts of time restructuring the data which typically involves explicit copy and transpose operations. In this work, we summarize existing approaches and present library-based approaches that avoid memory movement. Through systematic benchmarking, we demonstrate that our approach can achieve 10x speedup on a K40c GPU and 2x speedup on dual-socket Haswell-EP CPUs, using MKL and CUBLAS respectively, for small and moderate tensor sizes. This is relevant in many machine learning applications such as deep learning, where tensor sizes tend to be small, but require numerous tensor contraction operations to be performed successively. Concretely, we implement a Tucker decomposition and show that using our kernels yields atleast an order of magnitude speedup as compared to state-of-the-art libraries.

ID: CaltechAUTHORS:20170920-112945692

]]>

Abstract: Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in statistical learning of latent variable models and in data mining. In this paper, we propose fast and randomized tensor CP decomposition algorithms based on sketching. We build on the idea of count sketches, but introduce many novel ideas which are unique to tensors. We develop novel methods for randomized computation of tensor contractions via FFTs, without explicitly forming the tensors. Such tensor contractions are encountered in decomposition methods such as tensor power iterations and alternating least squares. We also design novel colliding hashes for symmetric tensors to further save time in computing the sketches. We then combine these sketching ideas with existing whitening and tensor power iterative techniques to obtain the fastest algorithm on both sparse and dense tensors. The quality of approximation under our method does not depend on properties such as sparsity, uniformity of elements, etc. We apply the method for topic modeling and obtain competitive results.

Publication: arXiv Vol.: 1
ID: CaltechAUTHORS:20190401-123256793

]]>

Abstract: Scalable probabilistic modeling and prediction in high dimensional multivariate time-series, such as dynamic social networks with co-evolving nodes and edges, is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Here, we address this problem through the discovery of hierarchical latent groups. We introduce a family of Conditional Latent Tree Models (CLTM), in which tree-structured latent variables incorporate the unknown groups. The latent tree itself is conditioned on observed covariates such as seasonality, historical activity, and node attributes. We propose a statistically efficient framework for learning both the hierarchical tree structure and the parameters of the CLTM. We demonstrate competitive performance on two real world datasets, one from the students' attempts at answering questions in a psychology MOOC and the other from Twitter users participating in an emergency management discussion and interacting with one another. In addition, our modeling framework provides valuable and interpretable information about the hidden group structures and their effect on the evolution of the time series.

ID: CaltechAUTHORS:20170927-083449769

]]>

Abstract: This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory (ALT) conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

No.: 9355 ISSN: 0302-9743

ID: CaltechAUTHORS:20170920-143816175

]]>

Abstract: We propose a robust formulation for the noncooperative rate-maximization game in MIMO Gaussian interference channels under bounded channel uncertainty. The proposed robust game needs little additional computation and requires no additional information exchange among users when compared to the nominal game and thus maintains the low-complexity and distributed nature of the MIMO waterfilling algorithm. The robust rate-maximization game is shown to be equivalent to the nominal game with modified direct-channel matrices. The equilibrium solution of the robust rate-maximization game and the required iterative algorithm to obtain the solution are presented. Sufficient conditions for the uniqueness of the equilibrium and the convergence of the algorithm are also presented. Simulation results indicate that the robust solution in the presence of channel uncertainty performs better than the nominal solution with zero uncertainty, due to the users being more conservative in their power allocation when there is channel uncertainty.

ID: CaltechAUTHORS:20170925-100945197

]]>

Abstract: The problem of distributed load balancing among m agents operating in an n-server slotted system is considered. A randomized local search mechanism, FCD (fast, concurrent and distributed) algorithm, is implemented concurrently by each agent associated with a user. It involves switching to a different server with a certain exploration probability and then backtracking with a probability proportional to the ratio of the measured loads in the two servers (in consecutive time slots). The exploration and backtracking operations are executed concurrently by users in local alternating time slots. To ensure that users do not switch to other servers asymptotically, each user chooses the exploration probability to be decaying polynomially with time for decaying rate β ∈ [0.5, 1]. The backtracking decision is then based on an estimate of the server load which is computed based on local information. Thus, FCD algorithm does not require synchronization or coordination with other users. The main contribution of this work, besides the FCD algorithm, is the analysis of the convergence time for the system to be approximately balanced, i.e. to reach an c-Nash equilibrium. We show that the system reaches an c-Nash equilibrium in expected time O (max {n log n/ϵ + n^(1/β), (n^3/m^3 log n^2/ϵ)1/β}) when m > n^2. This implies that the convergence rate is robust with large scale system(large user population), and is not affected by imperfect measurements of the server load. We also extend our analysis to open systems where users arrive and depart from a system with an initial load of m users. We allow for general time-dependent arrival processes (including heavy-tailed processes) and consider a uniform and a load-oblivious routing of the arrivals to the servers. A wide class of departure processes including load-dependent departures from the servers is also allowed. Our analysis demonstrates that it is possible to design fast, concurrent and distributed load balancing mechanisms in large multi-agent systems via randomized local search.

ID: CaltechAUTHORS:20170925-100358224

]]>

Abstract: We consider the problem of inferring the topology of an M-by-N network by sending probes between M sources and N receivers. Prior work has shown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e., 1-by-N's or 2-by-2's) and then merge these components to identify the M-by-N topology. In this paper, we focus on the second part. In particular, we assume that a 1by-N topology is given and that all 2-by-2 components can be queried and learned using end-to-end probes. The problem is which 2-by-2's to query and how to merge them with the 1-byN, so as to exactly identify the 2-by-N topology, and optimize a number of performance metrics including measurement traffic, time complexity, and memory usage. We provide a lower bound, ⌈N/2⌉, on the number of 2-by-2's required by any active learning algorithm and we also propose a greedy algorithm that is nearoptimal and efficient in practice. It follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding 2-by-2, and merges it with the given 1-by-N. The algorithm requires exactly N - 1 steps, which is much less than all (N:2) possible 2-by-2's, and it correctly identifies the 2-by-N topology.

ID: CaltechAUTHORS:20170925-095252031

]]>

Abstract: We introduce a new class of measurement matrices for compressed sensing, using low order summaries over binary sequences of a given length. We prove recovery guarantees for three reconstruction algorithms using the proposed measurements, including ℓ_1 minimization and two combinatorial methods. In particular, one of the algorithms recovers k-sparse vectors of length N in sublinear time poly(k log N), and requires at most O(k log N log log N) measurements. The empirical oversampling constant of the algorithm is significantly better than existing sublinear recovery algorithms such as Chaining Pursuit and Sudocodes. In particular, for 10^3 ≤ N ≤ 10^(12) and k = 100, the oversampling factor is between 5 to 25. We provide preliminary insight into how the proposed constructions, and the fast recovery scheme can be used in a number of practical applications such as market basket analysis, and real time compressed sensing implementation.

ID: CaltechAUTHORS:20120406-072754339

]]>

Abstract: The problem of designing policies for in-network function computation with minimum energy consumption subject to a latency constraint is considered. The scaling behavior of the energy consumption under the latency constraint is analyzed for random networks, where the nodes are uniformly placed in growing regions and the number of nodes goes to infinity. The special case of sum function computation and its delivery to a designated root node is considered first. A policy which achieves order-optimal average energy consumption in random networks subject to the given latency constraint is proposed. The scaling behavior of the optimal energy consumption depends on the path-loss exponent of wireless transmissions and the dimension of the Euclidean region where the nodes are placed. The policy is then extended to computation of a general class of functions which decompose according to maximal cliques of a proximity graph such as the k-nearest neighbor graph or the geometric random graph. The modified policy achieves order-optimal energy consumption albeit for a limited range of latency constraints.

ID: CaltechAUTHORS:20170925-092119382

]]>

Abstract: We consider the problem of tracking the topology of a large-scale dynamic network with limited monitoring resources. By modeling the dynamics of links as independent ON-OFF Markov chains, we formulate the problem as that of maximizing the overall accuracy of tracking link states when only a limited number of network elements can be monitored at each time step. We consider two forms of sampling policies: link sampling, where we directly observe the selected links, and node sampling, where we observe states of the links adjacent to the selected nodes. We reduce the link sampling problem to a Restless Multi-armed Bandit (RMB) and prove its indexability under certain conditions. By applying the Whittle's index policy, we develop an efficient link sampling policy with methods to compute the Whittle's index explicitly. Under node sampling, we use a linear programming (LP) formulation to derive an extended policy that can be reduced to determining the nodes with maximum coverage of the Whittle's indices. We also derive performance upper bounds in both sampling scenarios. Simulations show the efficacy of the proposed policies. Compared with the myopic policy, our solution achieves significantly better tracking performance for heterogeneous links.

ID: CaltechAUTHORS:20170925-091212425

]]>

Abstract: The problem of competitive rate-maximization is an important signal-processing problem for power-constrained multi-user systems. It involves solving the power control problem for mutually interfering users operating across multiple frequencies. We introduced robust rate-maximization game for systems with bounded channel uncertainty. In this paper, we analyse the effect of uncertainty on the global efficiency of the robust rate-maximization game. For a two-user scenario with large number of frequencies, we show that the robust-optimization equilibrium tends to move towards FDMA solution as the uncertainty bound increases and thus increases the sum-rate for interference-constrained systems where FDMA is Pareto-optimal. These results are verified through simulations.

ID: CaltechAUTHORS:20170922-134024046

]]>

Abstract: The problem of composite binary hypothesis testing of Markov forest (or tree) distributions is considered. The worst-case type-II error exponent is derived under the Neyman-Pearson formulation. Under simple null hypothesis, the error exponent is derived in closed-form and is characterized in terms of the so-called bottleneck edge of the forest distribution. The least favorable distribution for detection is shown to be Markov on the second-best max-weight spanning tree with mutual information edge weights. A necessary and sufficient condition to have positive error exponent is derived.

ID: CaltechAUTHORS:20170922-085233225

]]>

Abstract: For Gaussian graphical models with cycles, loopy belief propagation often performs reasonably well, but its convergence is not guaranteed and the computation of variances is generally incorrect. In this paper, we identify a set of special vertices called a feedback vertex set whose removal results in a cycle-free graph. We propose a feedback message passing algorithm in which non-feedback nodes send out one set of messages while the feedback nodes use a different message update scheme. Exact inference results can be obtained in O(k^2n), where k is the number of feedback nodes and n is the total number of nodes. For graphs with large feedback vertex sets, we describe a tractable approximate feedback message passing algorithm. Experimental results show that this procedure converges more often, faster, and provides better results than loopy belief propagation.

ID: CaltechAUTHORS:20121004-153356182

]]>

Abstract: We consider spatial graphical models on random Euclidean points, applicable for data in sensor and social networks. We establish limit laws for general functions of the graphical model such as the mean value, the entropy rate etc. as the number of nodes goes to infinity under certain conditions. These conditions require the corresponding Gibbs measure to be spatially mixing and for the random graph of the model to satisfy a certain localization property known as stabilization. Graphs such the k nearest neighbor graph and the geometric disc graph belong to the class of stabilizing graphs. Intuitively, these conditions require the data at each node not to have strong dependence on data and positions of nodes far away. Finally, it is shown that spatial mixing of the Gibbs measure on a random graph holds when a suitably defined degree-dependent (but otherwise independent) node percolation does not have a giant component.

ID: CaltechAUTHORS:20170922-084526901

]]>

Abstract: The problem of cooperative allocation among multiple secondary users to maximize cognitive system throughput is considered. The channel availability statistics are initially unknown to the secondary users and are learnt via sensing samples. Two distributed learning and allocation schemes which maximize the cognitive system throughput or equivalently minimize the total regret in distributed learning and allocation are proposed. The first scheme assumes minimal prior information in terms of pre-allocated ranks for secondary users while the second scheme is fully distributed and assumes no such prior information. The two schemes have sum regret which is provably logarithmic in the number of sensing time slots. A lower bound is derived for any learning scheme which is asymptotically logarithmic in the number of slots. Hence, our schemes achieve asymptotic order optimality in terms of regret in distributed learning and allocation.

ID: CaltechAUTHORS:20170922-083321456

]]>

Abstract: The problem of decentralized power allocation for competitive rate maximization in a frequency-selective Gaussian interference channel is considered. In the absence of perfect knowledge of channel state information (CSI), a distribution-free robust game is formulated. A robust-optimization equilibrium (RE) is proposed where each player formulates a best response to the worst-case interference. The conditions for existence, uniqueness and convergence of the RE are derived. It is shown that the convergence reduces as the uncertainty increases. Simulations show an interesting phenomenon where the proposed RE moves closer to a Pareto-optimal solution as the CSI uncertainty bound increases, when compared to the classical Nash equilibrium under perfect CSI. Thus, the robust-optimization equilibrium successfully counters bounded channel uncertainty and increases system sum-rate due to users being more conservative about causing interference to other users.

ID: CaltechAUTHORS:20170922-084018628

]]>

Abstract: The problem of learning tree-structured Gaussian graphical models from i.i.d. samples is considered. The influence of the tree structure and the parameters of the Gaussian distribution on the learning rate as the number of samples increases is discussed. Specifically, the error exponent corresponding to the event that the estimated tree structure differs from the actual unknown tree structure of the distribution is analyzed. Finding the error exponent reduces to a least-squares problem in the very noisy learning regime. In this regime, it is shown that universally, the extremal tree structures which maximize and minimize the error exponent are the star and the Markov chain for any fixed set of correlation coefficients on the edges of the tree. In other words, the star and the chain graphs represent the hardest and the easiest structures to learn in the class of tree-structured Gaussian graphical models. This result can also be intuitively explained by correlation decay: pairs of nodes which are far apart, in terms of graph distance, are unlikely to be mistaken as edges by the maximum-likelihood estimator in the asymptotic regime.

ID: CaltechAUTHORS:20170921-160724769

]]>

Abstract: The problem of maximum-likelihood learning of the structure of an unknown discrete distribution from samples is considered when the distribution is Markov on a tree. Large-deviation analysis of the error in estimation of the set of edges of the tree is performed. Necessary and sufficient conditions are provided to ensure that this error probability decays exponentially. These conditions are based on the mutual information between each pair of variables being distinct from that of other pairs. The rate of error decay, or error exponent, is derived using the large-deviation principle. The error exponent is approximated using Euclidean information theory and is given by a ratio, to be interpreted as the signal-to-noise ratio (SNR) for learning. Numerical experiments show the SNR approximation is accurate.

ID: CaltechAUTHORS:20170921-154915776

]]>

Abstract: The problem of binary hypothesis testing is considered when the measurements are drawn from a Markov random field (MRF) under each hypothesis. Spatial dependence of the measurements is incorporated by explicitly modeling the influence of sensor node locations on the clique potential functions of each MRF hypothesis. The nodes are placed i.i.d. in expanding areas with increasing sample size. Asymptotic performance of hypothesis testing is analyzed through the Neyman-Pearson type-II error exponent. The error exponent is expressed as the limit of a functional over dependency edges of the MRF hypotheses for acyclic graphs. Using the law of large numbers for graph functionals, the error exponent is derived.

ID: CaltechAUTHORS:20170921-154140237

]]>

Abstract: A novel formulation for optimal sensor selection and in-network fusion for distributed inference known as the prize- collecting data fusion (PCDF) is proposed in terms of optimal tradeoff between the costs of aggregating the selected set of sensor measurements and the resulting inference performance at the fusion center. For i.i.d. measurements, PCDF reduces to the prize-collecting Steiner tree (PCST) with the single-letter Kullback-Leibler divergence as the penalty at each node, as the number of nodes goes to infinity. PCDF is then analyzed under a correlation model specified by a Markov random field (MRF) with a given dependency graph. For a special class of dependency graphs, a constrained version of the PCDF reduces to the PCST on an augmented graph. In this case, an approximation algorithm is given with the approximation ratio depending only on the number of profitable cliques in the dependency graph. Based on these results, two heuristics are proposed for node selection under general correlation structure, and their performance is studied via simulations.

ID: CaltechAUTHORS:20170921-153522501

]]>

Abstract: Peer-to-peer (P2P) file distribution is a scalable way to disseminate content to a wide audience. This paper presents an algorithm by which download times are sequentially minimized; that is, the first peer's download time is minimized, and subsequent peers' times are minimized conditional on their predecessors' times being minimized. This objective gives robustness to the file distribution in the case that the network may be partitioned. It is also an important step towards the natural objective of minimizing the average download time, which is made challenging by the combinatorial structure of the problem. This optimality result not only provides fundamental insight to scheduling in such P2P systems, but also can serve as a benchmark to evaluate practical algorithms and illustrate the scalability of P2P networks.

ID: CaltechAUTHORS:20170810-112107197

]]>

Abstract: The problem of tracking end-to-end service-level transactions in the absence of instrumentation support is considered. The transaction instances progress through a state-transition model and generate time-stamped footprints on entering each state in the model. The goal is to track individual transactions using these footprints even when the footprints may not contain any tokens uniquely identifying the transaction instances that generated them. Assuming a semi-Markov process model for state transitions, the transaction instances are tracked probabilistically by matching them to the available footprints according to the maximum likelihood (ML) criterion. Under the ML-rule, for a two-state system, it is shown that the probability that all the instances are matched correctly is minimized when the transition times are i.i.d. exponentially distributed. When the transition times are i.i.d. distributed, the ML-rule reduces to a minimum weight bipartite matching and reduces further to a first-in first-out match for a special class of distributions. For a multi-state model with an acyclic state transition digraph, a constructive proof shows that the ML-rule reduces to splicing the results of independent matching of many bipartite systems.

ID: CaltechAUTHORS:20170927-141103230

]]>

Abstract: The problem of minimum cost in-network fusion of measurements, collected from distributed sensors via multihop routing is considered. A designated fusion center performs an optimal statistical-inference test on the correlated measurements, drawn from a Markov random field. Conditioned on the delivery of a sufficient statistic for inference to the fusion center, the structure of optimal routing and fusion is shown to be a Steiner tree on a transformed graph. This Steiner-tree reduction preserves the approximation ratio, which implies that any Sterner- tree approximation can be employed for minimum cost fusion with the same approximation ratio. The proposed fusion scheme involves routing packets of two types viz., raw measurements sent for local processing, and aggregates obtained on combining these processed values. The performance of heuristics for minimum cost fusion are evaluated through theory and simulations, showing a significant saving in routing costs, when compared to routing all the raw measurements to the fusion center.

ID: CaltechAUTHORS:20170920-154142664

]]>

Abstract: We consider the problem of online monitoring of transaction instances in enterprise environments, based on footprints left by the instances in system logs and using a state-based reference model of the transaction. Unlike existing approaches, we do not rely on any platform-specific knowledge, neither do we assume footprints to carry correlating identifiers, as injected through instrumentation. We outline a solution for tracking transaction instances at individual and aggregate levels, present preliminary results on theoretical analysis of monitoring precision and conclude with directions of ongoing and future research.

ID: CaltechAUTHORS:20170920-161153053

]]>

Abstract: The problem of hypothesis testing against independence for a Gauss-Markov random field (GMRF) with nearest-neighbor dependency graph is analyzed. The sensors measuring samples from the signal field are placed IID according to the uniform distribution. The asymptotic performance of Neyman-Pearson detection is characterized through the large-deviation theory. An expression for the error exponent is derived using a special law of large numbers for graph functionals. The exponent is analyzed for different values of the variance ratio and correlation. It is found that a more correlated GMRF has a higher exponent (improved detection performance) at low values of the variance ratio, whereas the opposite is true at high values of the ratio.

ID: CaltechAUTHORS:20170920-152542556

]]>

Abstract: The problem of routing of sensor observations for optimal detection of a Markov random field (MRF) at a designated fusion center is analyzed. Assuming that the correlation structure of the MRF is defined by the nearest-neighbor dependency graph, routing schemes which minimize the total energy consumption are analyzed. It is shown that the optimal routing scheme involves data fusion at intermediate nodes and requires transmissions of two types viz., the raw sensor data and the aggregates of log-likelihood ratio (LLR). The raw data is transmitted among the neighbors in the dependency graph and local contributions to the LLR are computed. These local contributions are then aggregated and delivered to the fusion center. A 2-approximation routing algorithm (DFMRF) is proposed and it has a transmission multidigraph consisting of the dependency graph and the directed minimum spanning tree, with the directions toward the fusion center.

ID: CaltechAUTHORS:20170920-153229570

]]>

Abstract: We consider the problem of distributed detection over a multi-access channel. Assuming a random number of sensors transmitting their observations using type based multiple access, we derive the detection performance using large deviations principle as the mean number of sensors goes to infinity. We characterize the performance in terms of error exponents. We provide comparison with the case when the number of sensors is deterministic. We generalize this scheme to multiple collections, propose a minimum sum-rate detector and characterize its error exponents.

Vol.: 4
ID: CaltechAUTHORS:20170920-144901114

]]>

Abstract: The problem of distributed detection and estimation in a sensor network over a multiaccess fading channel is considered. A communication scheme known as the type-based random access (TBRA) is employed and its performance is characterized with respect to the mean transmission rate and the channel coherence index. For extreme values of channel coherence index i.e., 0 and ∞, we give an optimal TBRA scheme which is essentially a sensor activation strategy that achieves the optimal allocation of transmission energy to spatial and temporal domains. For channels with zero coherence index, it is shown that there exists a finite optimal mean transmission rate maximizing performance. This optimal rate can be calculated numerically or estimated using the Gaussian approximation. On the other hand, for channels with infinite coherence index (i.e., no fading) the optimal strategy is to allocate all the energy to the spatial domain. Numerical examples and simulations confirm our theory.

ID: CaltechAUTHORS:20170920-145744615

]]>