The Ninth International Workshop on Load Testing and Benchmarking of Software Systems (LTB 2021)

News

[Mar 20, 2021]: The Workshop will be held on April 19, 9:00 - 3:00pm EDT (New York/Toronto time)
[Nov 20, 2020]: Currently, the ICPE 2021 conference and therefor the LTB Workshop model is on-site with remote participation facilities - meaning the aim is to have a live event but along with required infrastructure for online presence.

Workshop Agenda:

09:00 - 9:10 EDT		Introduction [Talk]

09:10 - 9:35		Research Paper: George Kousiouris (Harokopio University of Athens) and Dimosthenis Kyriazis (University of Piraeus). [Talk] — [Slide]
		Enabling Containerized, Parametric and Distributed Database Deployment and Benchmarking as a Service.
		Abstract: Containerized environments introduce a set of performance challenges that require extensive measurements and benchmarking to identify and model application behavior regarding a variety of parameters. Databases present extra challenges given their extensive need for synchronization and orchestration of a benchmark run, especially in microservice-oriented technologies (such as container platforms) and dynamic business models such as DBaaS. The latter dictate the need to include co-allocation scenarios in the experimentation. In this work we describe the adaptation of our open source, baseline load injection as a service tool, Flexibench, in order to enable the automated, parametric launching and measurement of containerized and distributed databases as a service. Adaptation and synchronization needs are described for ensuring test sequence, implementation details of the adapter are presented and applied through a case study on MySQL. Therefore a performance engineer can directly test selected configuration and performance of a database in a given target workload with simple REST invocations for experiment setup and execution. Experimentation starts from adapting the official MySQL docker images as well as OLTP Bench Client ones and investigates scenarios such as parameter sweep experiments for DB deployment and co-allocation scenarios where multiple DB instances are sharing physical nodes, as expected in the DBaaS paradigm.

09:35 - 10:00		Research Paper: Franz Bender, Jan Jonas Brune, Nick Lauritz Keutel, Ilja Behnke, and Lauritz Thamsen (Technische Universität Berlin). [Talk] — [Slide]
		PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded Systems in the IOT.
		Abstract: IoT devices have become an integral part of our lives and the industry. Many of these devices run real-time systems or are used as part of them. As these devices receive network packets over IP networks, the network interface informs the CPU about their arrival using interrupts that might preempt critical processes. Therefore, the question arises whether network interrupts pose a threat to the real-timeness of these devices. However, there are few tools to investigate this issue. We present a playground which enables researchers to conduct experiments in the context of network interrupt simulation. The playground comprises different network interface controller implementations, load generators and timing utilities. It forms a flexible and easy to use foundation for future network interrupt research. We conduct two verification experiments and two real world examples. The latter give insight into the impact of the interrupt handling strategy parameters and the influence of different load types on the execution time with respect to these parameters.

10:00 - 11:00		Keynote: Petr Tuma (Charles University). [Talk] — [Slide]
		Tracking Performance of the Graal Compiler on Public Benchmarks.
		Abstract: For the past three years, we have used several public Java benchmarks (DaCapo, ScalaBench, Renaissance, SPECjvm2008) to track the performance changes introduced by the daily development changes of the Graal compiler. The talk will outline how we tackle common measurement issues such as measurement scheduling and change detection, summarize observed parameters of the performance changes themselves, and then discuss factors that impact the usefulness of such testing for the compiler development process, for example (1) what changes are useful to report (or not), (2) what changes are missed by the public benchmarks, or (3) what aspects of the compiler behavior make such testing difficult (or easy).

11:00 - 11:25		Research Paper: Sören Henning and Wilhelm Hasselbring (Kiel University). [Talk] — [Slide]
		How to Measure Scalability of Distributed Stream Processing Engines?
		Abstract: Scalability is promoted as a key quality feature of modern big data stream processing engines. However, even though research made huge efforts to provide precise definitions and corresponding metrics for the term scalability, experimental scalability evaluations or benchmarks of stream processing engines apply different and inconsistent metrics. With this paper, we aim to establish general metrics for scalability of stream processing engines. Derived from common definitions of scalability in cloud computing, we propose two metrics: a load capacity function and a resource demand function. Both metrics relate provisioned resources and load intensities, while requiring specific service level objectives to be fulfilled. We show how these metrics can be employed for scalability benchmarking and discuss their advantages in comparison to other metrics, used for stream processing engines and other software systems.

11:25 - 11:50		Research Paper: Adriano Lange, Marcos Sunyé (UFPR), and Tiago Kepe (IFPR). [Slide]
		Performance Interference on Key-Value Stores in Multi-tenant Environments: When Block Size and Write Requests Matter.
		Abstract: Key-value stores are currently used by major cloud computing vendors, such as Google, Facebook, and LinkedIn, to support large-scale applications with concurrent read and write operations. Based on very simple data access APIs, the key-value stores can deliver outstanding throughput, which have been hooked up to high-performance solid-state drives (SSDs) to boost this performance even further. However, measuring performance interference on SSDs while sharing cloud computing resources is complex and not well covered by current benchmarks and tools. Different applications can concurrently access these resources until becoming overloaded without notice either by the benchmark or the cloud application. In this paper, we define a methodology to measure the problem of performance interference. Depending on the block size and the proportion of concurrent write operations, we show how a key-value store may quickly degrade throughput until becoming almost inoperative while sharing persistent storage resources with other tenants.

11:50 - 12:15		Industrial Talk: Andreas Grabner, Dynatrace. [Talk: Part1 - Part2] — [Slide]
		Performance as a Self-Service based on SLIs/SLOs with Keptn.
		Abstract: Inspired by how companies like Paypal, Intuit or Dynatrace have been implementing Performance as a Self-Service we included this use case into Keptn - a CNCF Open Source project. Keptn provides Performance as a Self-Service by automating deployment, testing and evaluation a new artifact (e.g: container). Keptn queries custom defined SLIs (Service Level Indicators) from multiple data sources (testing tools, monitoring tools …), automatically validates them against SLOs (Service Level Objectives) and provides this feedback through ChatOps, the Keptn API or the Keptns Bridge. Join this session and learn how to setup Keptn, how to define SLIs, SLOs and the tests that should be executed and how to make it available to anybody in your organization as a self-service option.

12:15 - 13:10		Panel [Talk]
		Performance Testing in DevOps.
		Leading experts from academia and industry will discuss approaches to integrating performance testing into DevOps. Panelists are Andreas Grabner (Dynatrace), Alexander Podelko (MongoDB), Weiyi (Ian) Shang (Concordia University), Petr Tuma (Charles University). The panel is moderated by Tse-Hsun (Peter) Chen (Concordia University).

13:10 - 13:35		Research Paper: Wajdi Halabi, Daniel Smith, Linh Ngo, Amy Apon (Clemson University), John Hill (Georgia Institute of Technology), Jason Anderson, and Brandon Posey (BMW IT Research Center). [Talk] — [Slide]
		Viability of Azure IoT Hub for Processing High Velocity Large Scale IoT Data.
		Abstract: We utilize the Clemson supercomputer to generate a massive workload for testing the performance of Microsoft Azure IoT Hub. The workload emulates sensor data from a large manufacturing facility. We study the effects of message frequency, distribution, and size on round-trip latency for different IoT Hub configurations. Significant variation in latency occurs when the system exceeds IoT Hub specifications. The results are predictable and well-behaved for a well-engineered system and can meet soft real-time deadlines.

13:35 - 14:00		Industrial Talk: Xiaosong Lou (Blackline). [Talk] — [Slide]
		Concurrent User Modeling - An Alternative Approach to Classic Queuing Theory.
		Abstract: Concurrent User is a random variable and one of the key performance metrics in a system. Many performance issues that can only be exposed under load are related to the increased number of concurrent users. Without properly simulating the distribution of concurrent users, a load test will not expose the system to realistic production stress levels. Traditional analysis on system concurrency is based on the state probabilities of the corresponding queuing models. Unfortunately, we have not seen this approach as a common practice, partly due to its constraints and limitations. We propose an analytical alternative to the classic queuing theory for modeling Concurrent User. This model helps us determine whether the simulated workload is a proper representation of the expected production scenario. Unlike the queuing model that can make predictions, our proposal proved to be more useful and convenient in verifying the results of load tests.

14:00 - 14:55		Keynote: Kishor Trivedi (Duke University). [Talk] — [Slide]
		Accelerated Life-testing Applied to Software Systems.
		Abstract: An important metric of software reliability is the mean-time-to-failure (MTTF) of the software system. To estimate this metric, a straightforward method is first collecting a sufficient number of samples of software inter-failure times and then using this sequence of inter-failure times to statistically infer the estimate of its mean and a confidence interval. However, this process is hindered by the fact that the samples of software inter-failures are time-consuming to collect, especially for highly reliable software systems. Furthermore, large and complex software systems are known to contain a significant number of elusive bugs known as Mandelbugs. One sub-type of Mandelbugs is known as aging-related bugs. Another subtype is known as concurrency bugs. Mandelbugs are triggered not just by the inputs or the workload presented to the software but also by the execution environment of the software, such as the operating system and other concurrently running software. Many different factors in the execution environment affect this type of failure occurrence. Accelerated-life testing (ALT) is a known systematic method that has been extensively applied in speeding up the experimental estimation of MTTFs of high-reliability hardware systems. The application of ALT in the context of software systems is the subject of this talk.

14:55 - 15:00		Conclusion.

Call for papers

Software systems (e.g., smartphone apps, desktop applications, telecommunication infrastructures, cloud and enterprise systems, etc.) have strict requirements on software performance. Failure to meet these requirements may cause business losses, customer defection, brand damage and other serious consequences. In addition to conventional functional testing, the performance of these systems must be verified through load testing or benchmarking to ensure quality service.

Load testing and benchmarking software systems are difficult tasks, which requires a great understanding of the system under test and customer behavior. Practitioners face many challenges such as tooling (choosing and implementing the testing tools), environments (software and hardware setup) and time (limited time to design, test, and analyze). This one-day workshop brings together software testing researchers, practitioners and tool developers to discuss the challenges and opportunities of conducting research on load testing and benchmarking software systems.

We solicit two tracks of submissions:

Research or industry papers (maximum 4 pages)
Presentation track for industry or research talks (maximum 700 words extended abstract)

Research/Industry papers should follow the standard ACM SIG proceedings format and need to be submitted electronically via EasyChair. Accepted papers will be published in the ICPE 2021 Proceedings. Submissions can be research papers, position papers, case studies or experience reports addressing issues including but not limited to the following:

Efficient and cost-effective test executions
Rapid and scalable analysis of the measurement results
Case studies and experience reports on load testing and benchmarking
Leveraging cloud computing to conduct large-scale testing
Load testing and benchmarking on emerging systems (e.g., machine learning systems, adaptive/autonomic systems, big data batch and stream processing systems, and cloud services)
Continuous performance testing
Load testing and benchmarking in the context of agile software development process
Using performance models to support load testing and benchmarking
Building and maintaining load testing and benchmarking as a service
Efficient test data management for load testing and benchmarking
Context-driven performance testing
Performance/load testing as an integral part of the performance engineering process

Important Dates

Paper Track (research and industry papers):

Abstract submission:	January 3, 2021;
Paper submission:	January 10, 2021;
Author notification:	February 1, 2021;
Camera-ready version:	February 22, 2021;

Presentation Track:

Extended abstract submission:	January 31, 2021;
Author notification:	February 15, 2021;

Organization:

Chairs:

Alexander Podelko	MongoDB, USA
Tse-Hsun (Peter) Chen	Concordia University, Canada
Hamzeh Khazaei	York University, Canada

Program Committee:

Almeida, Eduardo Cunha de	UFPR, Brazil
Jiang, Zhen Ming (Jack)	York University, Canada
Kroß, Johannes	Fortiss, Germany
Sunyé, Gerson	University of Nantes, France
Li, Heng	Polytechnique Montréal, Canada
Chen, Jinfu	Huawei, Canada
Shang, Weiyi	Concordia University, Canada

Steering Committee:

E. Hassan, Ahmed	Queen’s University, Canada
Litoiu, Marin	York University, Canada
Jiang, Zhen Ming (Jack)	York University, Canada

The Ninth International Workshop onLoad Testing and Benchmarking of Software Systems (LTB 2021)