
We anticipate that the experimental results and performance analysis will provide the implications on various storage systems. We also perform comparison study of NVMe SSD with SATA SSD. We analyze the performance of NVMe SSD in terms of different performance metrics with microbenchmark and database workloads. The maximum throughput is 2.5 GB/s and 800 MB/s for reading and writing 4 kb, respectively. This paper provides the results of empirical evaluation and analysis of the performance on a recent NVM express solid state drive (NVMe SSD) developed by Samsung electronics, a flash-based PCIe-attached SSD built to follow NVMe specification. While the technology is commercially viable, it is important to consider the performance of NVM devices with NVMe specification according to different I/O configurations and analyze workloads on the storage to exploit better performance. The industry and academic communities made the NVMe specification to elicit the highest performance on NVM devices. Results show that Ligthswap reduces the page faults handling latency by 3-5 times, and improves the throughput of memcached by more than 40% compared with the stat-of-art swapping systems.Įmerging non-volatile memory (NVM) technology with high throughput and scalability has considerable attraction in cloud and enterprise storage systems.

We implement Lightswap in our production-level system and evaluate it with YCSB workloads running on memcached. Finally, we propose a try-catch framework in Lightswap to deal with paging errors which are exacerbated by the scaling in process technology. Second, we co-design Lightswap with light weight thread (LWT) to improve system throughput and make it be transparent to user applications.

First, to avoids kernel-involving, a novel page fault handling mechanism is proposed to handle page faults in user-space and further eliminates the heavy I/O stack with the help of user-space I/O drivers. In this paper, we redesign the swapping system and propose LightSwap, an high-performance user-space swapping scheme that supports paging with both local SSDs and remote memories. However, the heavy I/O stack makes the traditional kernel-based swapping suffers from several critical performance issues. Conventional swapping can enlarge the memory capacity by paging out inactive pages to disks. Memory-intensive applications, such as in-memory databases, caching systems and key-value stores, are increasingly demanding larger main memory to fit their working sets. This result implies that our virtual memory subsystem for mmap can effectively extend the main memory with fast storage devices. The system with insufficient memory and our mmio achieves 92% performance of the resource-rich system. We also compare our system to a system that has enough memory to keep all data in the main memory. Experimental results show that our optimized mmio has up to 7x better performance than the original mmio.

We modify the Linux kernel to implement our optimization techniques and evaluate our prototyped system with low-latency storage devices. To reduce the overheads and fully exploit the fast storage devices, we present several optimization techniques.
#MEMORY MAPPED IO FULL#
Throughout our investigation, we find that the overhead of the Linux virtual memory subsystem, negligible on the HDD, prevents applications from using the full performance of fast storage devices. In this article, we examine the Linux virtual memory subsystem and mmio path to determine the influence of fast storage on the existing Linux kernel. However, the expectation is limited when fast storage devices are used since the virtual memory subsystem does not reflect the performance feature of those devices. It is widely expected that better storage devices will lead to better performance. Generally, the performance of storage devices has a direct impact on the performance of mmio. When mmio is used, hot data tend to reside in the main memory and cold data are located in storage devices such as HDD and SSD data placement in the memory hierarchy depends on the virtual memory subsystem of the operating system.

As more data are located in the main memory, the performance of applications can be enhanced owing to the effect of a large cache. The number of applications that use mmio are increasing because memory semantics can provide better performance than file semantics (i.e., read/write). The mapping allows applications to access data from files through memory semantics (i.e., load/store) and it provides ease of programming. In modern operating systems, memory-mapped I/O (mmio) is an important access method that maps a file or file-like resource to a region of memory.
