Tecnologías de la Información y de Redes

Distributed, Parallel and Collaborative Systems

Propuesta de tesis

Investigadores/as

Grupo de investigación

Parallel and distributed scientific applications: performance and efficiency

There are currently various bottlenecks in the growth in parallel and distributed programming paradigms and environments, which are affecting the ability to provide efficient applications for performing concurrent computations.

We need to know the platforms, their performance, the underlying hardware and networking technologies, and we must be able to produce optimized software that statically or dynamically may take advantage of the computational resources available.

In this line of research we study different approaches to producing better scientific applications, and to making tools (via automatic performance analysis), which can understand the application model and the underlying programming paradigm. We try to tune the performance of these to a dynamically changing computational environment, in which the resources (and their characteristics) can be homogeneous or heterogeneous depending on the hardware platform. In particular we focus our research on shared memory and message-passing paradigms, and in many-core/multi-core environments including multi-core CPUs, GPUs (graphic cards computing) and cluster/grid/cloud/super computing platforms.

Dr Josep Jorba

Mail: jjorbae@uoc.edu

WINE

Community-owned systems at the edge
 
Edge computing is a case of cloud computing where a portion of the computing part (data and/or services) is hosted in resources spread in Internet (“at the edges”). By community-owned systems at the edge we refer to systems that host their data and services in personal computers (mostly desktop computers or single-board computers such as Raspberry Pi) voluntarily contributed by participants in the system. Community-owned systems at the edge will be self-owned systems (community members own the computers where data and services are hosted); self-managed (with a decentralized and uncoupled structure); and self-growing. They also share the following characteristics:
 
(a) No central authority is responsible for providing the required computational resources.
 
(b) Heterogeneous (software and hardware) and low capacity computer resources spread across the Internet in contrast with high capacity cluster of computers on traditional clouds.
 
(c) The computational infrastructure belongs to the user and is shared to build the computational infrastructure.
 
Regarding the reliability and QoS of these community-owned systems at the edge they have to guarantee to the user:
 
* Availability: the user can access data anytime from anywhere;
 
* Freshness: the user gets up-to-date data; and
 
* Immediateness: the user obtains the data in a time that is felt as immediate.
 
Therefore, this kind of system has to (a) guarantee a clever and optimal usage of the (likely scarce) contributed resources (storage, bandwidth, and CPU) to avoid wasting them; and (b) provide privacy and security guarantees.
 
We are looking for PhD candidates interested in large-scale distributed systems applied to community-owned systems at the edge in fields such as (a) creating social networks hosted in thse kind of systems (b) optimal allocation of data and services in resources, (c) availability prediction, (d) efficient usage of resources, or (e) privacy and security.
 

Dr Joan Manuel Marquès

Mail: jmarquesp@uoc.edu

WINE

Migration of Parallel Applications to Cloud Computing architectures

Scientific parallel applications usually require a lot of computing  resources to resolve complex problems. Traditionally, this kind of applications have been executed in cluster or supercomputing environments.

With the advent of cloud computing, a new and interesting platform arose to execute scientific parallel applications, which require High Performance Computing (HPC), providing scalable, elastic, practical, and low cost platform, to satisfy the computational and storage demands of many scientific parallel applications.

The migration of HPC parallel applications to cloud environments comes with several advantages, but due to the complex interaction between the parallel applications and the virtual layer, many applications may suffer performance inefficiencies when they scale. This problem is particularly serious when the application is executed many times over a long period of time. 

To achieve an efficient use of these virtual systems using a large number of cores, a point to consider before executing an application is to know its performance behavior in the system. It is important to know this information since the ideal number of processes and resources required to run the application may vary from one system to another, due to hardware virtualization architecture differences. Moreover, it is known that using more resources does not always imply a higher performance. The lack of this information may produce an inefficient use of the cores, causing problems such as not achieving the expected speedup, and increased energy and economic costs.

In this research line we study different approaches to make novel methodologies and automatic performance analysis tools, to analyze and predict the application behaviour in a specific cloud platform. By this way, users have valuable information to execute the parallel application on a target virtual cloud architecture in an efficient way (i.e selecting the righ number of cloud resources, tuning the virtual parameters, etc). Moreover, the developed tools provide information to detect possible inefficiences which become potential bottlenecks in the specific System.

 

 Dr Josep Jorba

Mail: jjorbae@uoc.edu

 WINE

Resource-Allocation Mechanisms for Voluntary Large Scale Distributed Systems

Volunteer Computing is a type of large-scale distributed system formed aggregating computers voluntarily donated by volunteers. These computers are usually off-the-self heterogeneous resources belonging to different administrative authorities (users) that have an uncertain behavior regarding connectivity and failure. Thus, the resource allocation methods in such systems are highly dependent on the availability of resources. On one hand, resources tend to be scarce, but on the other hand, computers exhibiting low availability patterns – which are the most frequent type – are discarded or used at a high cost only when high available nodes are crowded. 

 
Our research group have developed different allocation mechanisms to create large-scale distributed systems based on computational resources voluntarily contributed by users, such us:
 
* Sergio Gonzalo, Joan Manuel Marquès, Alberto García-Villoria, Javier Panadero, Laura Calvet (2022). CLARA: A Novel Clustering-Based Resource-Allocation Mechanism for Exploiting Low-Availability Complementarities of Voluntarily Contributed Nodes. Future Generation Computer Systems. Volume 128, pages 248-264.
 
CLARA is a clustering-based resource allocation mechanism that takes advantage of complementarities between nodes with low availability patterns. The combination of them into complementary nodes offers an availability level equivalent to the level offered by a single high-available node. These groups of complementary nodes are maintained using a lazy reassignment algorithm. Consequently, a significant number of nodes with low-availability patterns are considered by the resource allocation mechanism for service placement. This mechanism maximizes the use of poor quality computational resources to satisfy the user quality requirements while minimizing the number of reassignments between nodes. The capacity of the system for providing user services is highly increased while the load of the high-available nodes is remarkably reduced.
 
* Javier Panadero, Jésica de Armas, Xavier Serra, Joan Manuel Marquès. (2018). Multi criteria biased randomized method for resource allocation in distributed systems: Application in a volunteer computing system. Future Generation Computer Systems. Volume 82, pages 29–40.
 
This paper proposes a heuristic method based on a weight system to determine resources qualities. Afterward, a biased random procedure allows selecting them accordingly in an extremely fast way
 
This PhD research line is aimed to continue these works. A promising focus is to explore how the use of network information (bandwidth, latency, … ) can be used to minimize latency between replicas of a service or how can reduce the carbon footprint, energy consumption, … necessary to maintain these replicas.

Dr Joan Manuel Marquès

Mail: jmarquesp@uoc.edu

WINE