The Workshop will be held on April 19, 9:00 - 3:00pm EDT (New York/Toronto time)
- meaning
the aim is to have a live event
but along with required infrastructure for online presence.
09:00 - 9:10 EDT
|
| Introduction
[Talk]
|
|
|
|
09:10 - 9:35
|
| Research Paper: George Kousiouris (Harokopio University of Athens) and
Dimosthenis Kyriazis (University of Piraeus).
[Talk] — [Slide]
|
|
| Enabling Containerized, Parametric and Distributed
Database Deployment and Benchmarking as a Service.
|
|
|
Abstract: Containerized environments introduce a set of performance challenges that require
extensive measurements and benchmarking to identify and model application behavior regarding a
variety of parameters. Databases present extra challenges given their extensive need for
synchronization and orchestration of a benchmark run, especially in microservice-oriented
technologies (such as container platforms) and dynamic business models such as DBaaS. The latter
dictate the need to include co-allocation scenarios in the experimentation. In this work we describe
the adaptation of our open source, baseline load injection as a service tool, Flexibench, in order
to enable the automated, parametric launching and measurement of containerized and distributed
databases as a service. Adaptation and synchronization needs are described for ensuring test
sequence, implementation details of the adapter are presented and applied through a case study on
MySQL. Therefore a performance engineer can directly test selected configuration and performance of
a database in a given target workload with simple REST invocations for experiment setup and
execution. Experimentation starts from adapting the official MySQL docker images as well as OLTP
Bench Client ones and investigates scenarios such as parameter sweep experiments for DB deployment
and co-allocation scenarios where multiple DB instances are sharing physical nodes, as expected in
the DBaaS paradigm.
|
|
|
|
09:35 - 10:00
|
| Research Paper: Franz Bender, Jan Jonas Brune, Nick Lauritz Keutel,
Ilja Behnke, and Lauritz Thamsen (Technische Universität Berlin).
[Talk] — [Slide]
|
|
| PIERES: A Playground for Network Interrupt Experiments on Real-Time Embedded
Systems in the IOT.
|
|
|
Abstract: IoT devices have become an integral part of our lives and the industry. Many of
these devices run real-time systems or are used as part of them. As these devices receive network
packets over IP networks, the network interface informs the CPU about their arrival using interrupts
that might preempt critical processes. Therefore, the question arises whether network interrupts
pose a threat to the real-timeness of these devices. However, there are few tools to investigate
this issue. We present a playground which enables researchers to conduct experiments in the context
of network
interrupt simulation. The playground comprises different network interface controller
implementations, load generators and timing utilities. It forms a flexible and easy to use
foundation for future network interrupt research. We conduct two verification experiments and two
real world examples. The latter give insight into the impact of the interrupt handling strategy
parameters and the influence of different load types on the execution time with respect to these
parameters.
|
|
|
|
10:00 - 11:00
|
| Keynote: Petr Tuma (Charles University).
[Talk] — [Slide]
|
|
| Tracking Performance of the Graal Compiler on Public Benchmarks.
|
|
|
Abstract: For the past three years, we have used several public Java benchmarks (DaCapo,
ScalaBench, Renaissance, SPECjvm2008) to track the performance changes introduced by the daily
development changes of the Graal compiler. The talk will outline how we tackle common measurement
issues such as measurement scheduling and change detection, summarize observed parameters of the
performance changes themselves, and then discuss factors that impact the usefulness of such testing
for the compiler development process, for example (1) what changes are useful to report (or not),
(2) what changes are missed by the public benchmarks, or (3) what aspects of the compiler behavior
make such testing difficult (or easy).
|
|
|
|
11:00 - 11:25
|
| Research Paper: Sören Henning and Wilhelm Hasselbring (Kiel University).
[Talk] — [Slide]
|
|
| How to Measure Scalability of Distributed Stream Processing Engines?
|
|
|
Abstract: Scalability is promoted as a key quality feature of modern big data stream
processing engines. However, even though research made huge efforts to provide precise definitions
and corresponding metrics for the term scalability, experimental scalability evaluations or
benchmarks of stream processing engines apply different and inconsistent metrics. With this paper,
we aim to establish general metrics for scalability of stream processing engines. Derived from
common definitions of scalability in cloud computing, we propose two metrics: a load capacity
function and a resource demand function. Both metrics relate provisioned resources and load
intensities, while requiring specific service level objectives to be fulfilled. We show how these
metrics can be employed for scalability benchmarking and discuss their advantages in comparison to
other metrics, used for stream processing engines and other software systems.
|
|
|
|
11:25 - 11:50
|
| Research Paper: Adriano Lange, Marcos Sunyé (UFPR), and Tiago Kepe (IFPR).
[Slide]
|
|
| Performance Interference on Key-Value Stores in Multi-tenant Environments:
When Block Size and Write Requests Matter.
|
|
|
Abstract: Key-value stores are currently used by major cloud computing vendors, such as
Google, Facebook, and LinkedIn, to support large-scale applications with concurrent read and write
operations. Based on very simple data access APIs, the key-value stores can deliver outstanding
throughput, which have been hooked up to high-performance solid-state drives (SSDs) to boost this
performance even further. However, measuring performance interference on SSDs while sharing cloud
computing resources is complex and not well covered by current benchmarks and tools. Different
applications can concurrently access these resources until becoming overloaded without notice either
by the benchmark or the cloud application. In this paper, we define a methodology to measure the
problem of performance interference. Depending on the block size and the proportion of concurrent
write operations, we show how a key-value store may quickly degrade throughput until becoming almost
inoperative while sharing persistent storage resources with other tenants.
|
|
|
|
11:50 - 12:15
|
| Industrial Talk: Andreas Grabner, Dynatrace. [Talk:
Part1 - Part2] —
[Slide]
|
|
| Performance as a Self-Service based on SLIs/SLOs with Keptn.
|
|
|
Abstract: Inspired by how companies like Paypal, Intuit or Dynatrace have been implementing
Performance as a Self-Service we included this use case into Keptn - a CNCF Open Source project.
Keptn provides Performance as a Self-Service by automating deployment, testing and evaluation a new
artifact (e.g: container). Keptn queries custom defined SLIs (Service Level Indicators) from
multiple data sources (testing tools, monitoring tools …), automatically validates them against SLOs
(Service Level Objectives) and provides this feedback through ChatOps, the Keptn API or the Keptns
Bridge. Join this session and learn how to setup Keptn, how to define SLIs, SLOs and the tests that
should
be executed and how to make it available to anybody in your organization as a self-service option.
|
|
|
|
12:15 - 13:10
|
| Panel
[Talk]
|
|
| Performance Testing in DevOps.
|
|
|
Leading experts from academia and industry will discuss approaches to integrating performance testing
into DevOps. Panelists are Andreas Grabner (Dynatrace), Alexander Podelko (MongoDB), Weiyi (Ian) Shang
(Concordia University), Petr Tuma (Charles University). The panel is moderated by Tse-Hsun (Peter) Chen
(Concordia University).
|
|
|
|
13:10 - 13:35
|
| Research Paper: Wajdi Halabi, Daniel Smith, Linh Ngo, Amy Apon
(Clemson University), John Hill (Georgia Institute of Technology), Jason Anderson, and Brandon
Posey (BMW IT Research Center).
[Talk] — [Slide]
|
|
| Viability of Azure IoT Hub for Processing High Velocity Large Scale IoT
Data.
|
|
|
Abstract: We utilize the Clemson supercomputer to generate a massive workload for testing
the performance
of Microsoft Azure IoT Hub. The workload emulates sensor data from a large manufacturing
facility.
We study the effects of message frequency, distribution, and size on round-trip latency for
different IoT
Hub configurations. Significant variation in latency occurs when the system exceeds IoT Hub
specifications. The results are predictable and well-behaved for a well-engineered system and
can meet soft
real-time deadlines.
|
|
|
|
13:35 - 14:00
|
| Industrial Talk: Xiaosong Lou (Blackline).
[Talk] — [Slide]
|
|
| Concurrent User Modeling - An Alternative Approach to Classic Queuing
Theory.
|
|
|
Abstract: Concurrent User is a random variable and one of the key performance metrics in a
system. Many performance
issues that can only be exposed under load are related to the increased number of concurrent users.
Without
properly simulating the distribution of concurrent users, a load test will not expose the system to
realistic production stress levels. Traditional analysis on system concurrency is based on the state
probabilities of the corresponding queuing
models. Unfortunately, we have not seen this approach as a common practice, partly due to its
constraints
and limitations. We propose an analytical alternative to the classic queuing theory for modeling
Concurrent
User. This model
helps us determine whether the simulated workload is a proper representation of the expected production
scenario. Unlike the queuing model that can make predictions, our proposal proved to be more useful and
convenient in verifying the results of load tests.
|
|
|
|
14:00 - 14:55
|
| Keynote: Kishor Trivedi (Duke University).
[Talk] — [Slide]
|
|
| Accelerated Life-testing Applied to Software Systems.
|
|
|
Abstract: An important metric of software reliability is the mean-time-to-failure (MTTF) of the
software system. To
estimate this metric, a straightforward method is first collecting a sufficient number of samples of
software inter-failure times and then using this sequence of inter-failure times to statistically infer
the
estimate of its mean and a confidence interval. However, this process is hindered by the fact that the
samples of software inter-failures are time-consuming to collect, especially for highly reliable
software
systems. Furthermore, large and complex software systems are known to contain a significant number of
elusive bugs known as Mandelbugs. One sub-type of Mandelbugs is known as aging-related bugs. Another
subtype
is known as concurrency bugs. Mandelbugs are triggered not just by the inputs or the workload presented
to
the software but also by the execution environment of the software, such as the operating system and
other
concurrently running software. Many different factors in the execution environment affect this type of
failure occurrence. Accelerated-life testing (ALT) is a known systematic method that has been
extensively
applied in speeding up the experimental estimation of MTTFs of high-reliability hardware systems. The
application of ALT in the context of software systems is the subject of this talk.
|
|
|
|
14:55 - 15:00
|
| Conclusion.
|
Software systems (e.g., smartphone apps, desktop applications, telecommunication infrastructures, cloud
and
enterprise systems, etc.) have strict requirements on software performance. Failure to meet these
requirements
may cause business losses, customer defection, brand damage and other serious consequences. In addition
to
conventional functional testing, the performance of these systems must be verified through load testing
or
benchmarking to ensure quality service.
Load testing and benchmarking software systems are difficult tasks, which requires a great understanding
of
the
system under test and customer behavior. Practitioners face many challenges such as tooling (choosing
and
implementing the testing tools), environments (software and hardware setup) and time (limited time to
design,
test, and analyze). This one-day workshop brings together software testing researchers, practitioners
and
tool
developers to discuss the challenges and opportunities of conducting research on load testing and
benchmarking
software systems.
. Accepted papers
will be published in the
Proceedings.
Submissions can be research papers, position papers, case
studies or experience reports addressing issues including but not limited to the following: