This talk explores scheduling challenges in providing probabilistic Byzantine fault tolerance in a hybrid cloud environment, consisting of nodes with varying reliability levels, compute power, and monetary cost. In this context, the probabilistic Byzantine fault tolerance guarantee refers to the confidence level that the results of a given computation is correct despite potential Byzantine failures. We formally define a family of such scheduling problems distinguished by whether they insist on meeting a given latency limit and trying to optimize the monetary budget or vice versa. For the case where the latency bound is a restriction and the budget should be optimized, we present several heuristic protocols and compare between them using extensive simulations.
Pierre Sens obtained his Ph. D. in Computer Science in 1994, and the “Habilitation à diriger des recherches” in 2000 from Paris 6 University, France. Currently, he is a full Professor at Sorbonne Université. His research interests include distributed systems and algorithms, large scale data storage, fault tolerance, and cloud computing. Pierre Sens is heading the Delys group which is a joint research team between LIP6 and Inria Paris. He was member of the Program Committee of major conferences in the areas of distributed systems and parallelism (DISC, ICDCS, IPDPS, OPODIS, ICPP, Europar,…) and serves as General chair of SBAC and EDCC. Overall, he has published over 150 papers in international journals and conferences and has acted for advisor of 24 PhD theses.
A causal broadcast ensures that messages are delivered to all nodes (processes) preserving causal relation of the messages. We have proposed [ICPP 2018] a causal broadcast protocol for distributed systems whose nodes are logically organized in a virtual hypercube-like topology called Vcube. Messages are broadcast by dynamically building spanning trees rooted in the message's source node. By using multiple trees, the contention bottleneck problem of a single root spanning tree approach is avoided. Furthermore, different trees can intersect at some node. Hence, by taking advantage of both the out-of-order reception of causally related messages at a node and these paths intersections, a node can delay to one or more of its children in the tree, the forwarding of the messages whose some causal dependencies it knows that the children in question cannot satisfy yet. Such a delay does not induce any overhead. Experimental evaluation conducted on top of PeerSim simulator confirms the communication effectiveness of our causal broadcast protocol in terms of latency and message traffic reduction.
Luciana Arantes é graduada em ciência da computação pela Unicamp, fez seu mestrado na Escola da Politécnica da Universidade de São Paulo e doutorado na Universidade Pierre et Marie Curie (UPMC), Paris, França. Desde 2011 é professora/pesquisadora da Faculdade de Ciências da Universidade Sorbonne (ex-UPMC) e membro do grupo Delys, uma cooperação entre o LIP6 (Laboratoire d’Informatique de Paris 6) e o INRIA. O foco da sua pesquisa é propor e adaptar algoritmos distribuídos, para ambientes, heterogêneos, dinâmicos e sujeitos a falhas como, por exemplo, redes móveis, Cloud, P2P e Grids.
The presentation intends to identify the role of optical interconnects in next-generation distributed processing systems and will review the challenges and recent developments in board-level and on-board chip-to-chip interconnection for Data Centres and High-Performance Compute applications.
Dr. George T. Kanellos is currently a Lecturer in High-Performance Networks Group at the University of Bristol. He received both his BSc in ECE/CS and his Ph.D. from National Technical University of Athens (NTUA) - Photonics Communications Research Laboratory (PCRL), in 2002 and 2008 respectively. His research areas vertically spanned from on-chip integrated III-V/Si optical memories to their application in new cache-free computer paradigms, and from on-board optical interconnects technology to the design of new processor to processor interconnection schemes. Dr. Kanellos current research relies on exploiting the recent advancements of photonic integration to introduce radically new architectural approaches and concepts across all 4 hierarchical levels of Data Center and Computing interconnection: on-chip, on-board, board-to-board, and rack-to-rack. Dr. Kanellos has published more than 80 articles in scientific journals and international conferences including several invited contributions.
Failure detection is a prerequisite to failure mitigation and a key component to build distributed algorithms requiring resilience. This talk introduces the problem of failure detection in asynchronous network where the transmission delay is not known. We show how distributed failure detector oracles can be used to address fundamental problems such as consensus, k-set agreement, or mutual exclusion. Finally, we focus on how to build scalable failure detectors.
Pierre Sens obtained his Ph. D. in Computer Science in 1994, and the “Habilitation à diriger des recherches” in 2000 from Paris 6 University, France. Currently, he is a Full Professor at Sorbonne Université. His research interests include distributed systems and algorithms, large scale data storage, fault tolerance, and cloud computing. Pierre Sens is heading the Delys group which is a joint research team between LIP6 and Inria Paris. He was a member of the Program Committee of major conferences in the areas of distributed systems and parallelism (DISC, ICDCS, IPDPS, OPODIS, ICPP, Europar,…) and serves as General Chair of SBAC and EDCC. Overall, he has published over 150 papers in international journals and conferences and has acted for advisor of 24 Ph.D. theses.
In the context of HPC platforms, individual nodes nowadays consist of heterogenous processing resources such as GPU units and multicores. Those resources share communication and storage resources, inducing complex co-scheduling effects, and making it hard to predict the exact duration of a task or of a communication. To cope with these issues, runtime dynamic schedulers such as STARPU have been developed. These systems base their decisions at runtime on the state of the platform and possibly on static priorities of tasks computed offline. In this paper, our goal is to quantify performance variability in the context of HPC heterogeneous nodes, by focusing on very regular dense linear algebra kernels, such as Cholesky and LU factorizations. We therefore first concentrate on the evaluation of the individual block-size kernels variability. Then, we analyze the impact of this variability at the scale of a full application on a dynamic runtime scheduler such as STARPU, in order to analyze whether the strategies that have been designed in the context of MapReduce applications to cope with stragglers could be transferred to HPC systems, or if the dynamic nature of runtime schedulers is enough to cope with actual performance variations, even in presence of task dependencies.
A natural trend in the multicore evolution is the increase in the number of integrated cores, while keeping the configuration of a shared main memory. One common packaging considers a Non Uniform Memory Access (NUMA), where the overall available memory is made of several blocks that are physically separated but interconnected so as to form a virtually contiguous memory with a unified addressing. Form the programming point of view, this aspect is completely virtual, and ordinary programmers are not even aware of this technical reality. The main consequence of not addressing the NUMA configuration is the unacceptable scalability that can be observed using a standard parallelization. This penalty is the conjunction of remote accesses and bus contention, among others. In this talk, we will explain the concept, how is it technically considered, what are the effects and how to deal with this programming concern.
Short-bioClaude Tadonki (M) is a senior researcher and lecturer at the MINES ParisTech - PSL (Paris/France) since 2011. He holds a PhD and an HDR in computer science from University of Rennes and from Paris-Sud University respectively. After six years of cutting-edge research in operational research and theoretical computer science at the University of Geneva, he relocated to France to work for EMBL, University of Paris-Sud, LAL-CNRS and then MINES ParisTech. His main research topics included High Performance Computing, Parallel Computing, Operational Research, Matrix Computation, Combinatorial Algorithm and Complexity, Mathematical Programming, Scientific and Technical Programming, Automatic Code Transformations. Claude Tadonki has worked at several laboratories and universities, he has initiated various scientific projects and national/international collaborations, and he has given significant number of CS courses in different contexts including industries. He is an active member of well-established scientific corporations and reviewer of high-impact international journals and top-rank conferences. He has published numerous papers in journals and international conferences. He is very active in international collaborations and has co-organized several HPC conferences and forums.
Cloud platforms have emerged as a prominent environment to execute different classes of applications providing on-demand resources as well as scalability. They usually offer several types of Virtual Machines (VMs) which have different guarantees in terms of availability and volatility, provisioning the same resource through multiple pricing models. For instance, in the Amazon EC2 cloud, the user pays per hour for on-demand VMs while spot VMs are unused instances available for a lower price. Despite the financial advantages, a spot VM can be terminated or hibernated by EC2 at any moment.
Using both hibernation-prone spot VMs (for cost sake) and on-demand VMs, we propose in this paper the Hibernation-Aware Dynamic Scheduler (HADS), that uses those VMs to execute applications composed of independent tasks (bag-of-task) with deadline constraints. Besides that, we also define the problem of temporal failures, that occurs when a spot VM hibernates, and it does not resume within a time that guarantees the application's deadline. Our scheduling approach, thus, aims at minimizing the monetary costs of bag-of-tasks applications in EC2 cloud, respecting its deadline even in the presence of hibernation, and avoiding temporal failures. Performance results with real executions using Amazon EC2 VMs confirm the effectiveness of our scheduling and that it can tolerate temporal failures.
Connectivity is a key issue for smart cities. Providing Internet to a large number of users and things requires intensive network densification, an operation consisting in increasing the number of access points in order to provide a high density of users and things with an Internet connection. This presentation describes how to reach this goal using an uberization solution. We compare our solution with 5G and Fog networking solutions. We also describe how it is possible to secure the network with blockchains solutions.
Short-bioGuy Pujolle is a computer science professor at Sorbonne University during 1981-1993 and from 2000 to the present day. He is a member of The Royal Academy of Lund, Sweden. Before, he was a member of the Institut Universitaire of France from 2009 to 2014. Guy Pujolle received different prices for his work and publications, in particular the Grand Prix of French Academy of Sciences in 2013. Guy Pujolle is a pioneer in high-speed networking having led the development of the first Gbps network to be tested in 1980. He has also patents on metamorphic networks, green communications, and security in the Internet of Things.