设为首页  
联系我们  
加入收藏  
网页制作 冲浪宝典 图形图像 操作系统 软件教学 编程开发 认证考试 安全技术 站长专区 文学驿站 娱乐天地 游戏天地 办公软件
文章搜索
您的位置: 首页 >> 文章首页 >> 编程开发 >> Visual C++ >> Server Strategies -- Programming High Performance WinSock Server
精品推荐
Visual C++点击TOP10
·用WINSOCK实现聊天室的VC++程序设计
·利用mfc编写activex控件
·用vc实现生产者消费者问题
·DirectX8.0
·C/C++ 大量经典编程书籍下载
·VC快捷键大全
·CreateFileMapping的MSDN翻译和使用心得
·Windows环境下的麦克风录音系统
·挂钩Windows API
·如何开发OPC Server
编程开发点击TOP10
·数字小键盘指法练习
·用C语言编通讯录程序(初学者级别的)
·ASP.NET 程序中常用的三十三种代码
·我写的Java学生成绩管理系统源代码
·CHK文件恢复工具
·java笔试题
·Modem 常用AT指令集
·异常java.sql.SQLException: Io exception:The Network Adapter could not establish connection
·单片机模拟I2C总线及24C02(I2C EEPROM)读写实例(源代码)
·C++经典电子书下载
精选专题

Server Strategies -- Programming High Performance WinSock Server

作者: 来源:网络文章 时间:2005-12-13 17:19:54

Server Strategies

In this section, we'll take a look at several strategies for handling resources depending on the nature of the server. Also, the more control you have over the design of the client and server allows you to design both accordingly to avoid the limitations and bottlenecks discussed previously. Again, there is no foolproof method that will work 100 percent in all situations. Servers can be divided roughly into two categories: high throughput and high connections. A high throughput server is more concerned with pushing data on a small number of connections. Of course, the meaning of the phrase “small number of connections” is relative to the amount of resources available on the server. A high connection server is more concerned with handling a large number of connections and is not attempting to push large data amounts.

In the next two sections, we'll discuss both high throughput and high connection server strategies. After that, we'll look at performance numbers gathered from the server samples provided on the companion CD.

High Throughput

An FTP server is an example of a high throughput server. It is concerned with delivering bulk content. In this case, the server is concerned with processing each connection to minimize the amount of time required to transfer the data. To do so, the server must limit the number of concurrent connections because the greater the simultaneous connections, the lower the throughput will be on each connection. An example would be an FTP server that refuses a connection because it is too busy.

The goal for this strategy is I/O. The server should keep enough receives or sends posted to maximize throughput. Because each overlapped I/O requires memory to be locked as well as a small portion of non-paged pool for each IRP associated with the operation, it is important to limit I/O to a small set of connections. It is possible for the server to continually accept connections and have a relatively high number of established connections, but I/O must be limited to a smaller set.

In this case, the server may post a number of sends or receives on a subset of the established clients. For example, the server could handle client connections in a first-in, first-out manner and post a number of overlapped sends and/or receives on the first 100 connections. After those clients are handled, the server can move on the next set of clients in the queue. In this model, the number of outstanding send and receive operations are limited to a smaller set of connections. This prevents the server from blindly posting I/O operations on every connection, which could quickly exhaust the server's resources.

The server should take care to monitor the number of operations outstanding on each connection so it may prevent malicious clients from attacking it. For example, a server designed to receive data from a client, process it, and send some sort of response should keep track of how many sends are outstanding. If the client is simply flooding the server with data but not posting any receives, the server may end up posting dozens of overlapped sends that will never complete. In this case, once the server finds that there are too many outstanding operations, it can close the connection.

Maximizing Connections

Maximizing the number of concurrent client connections is the more difficult of the two strategies. Handling the I/O on each connection becomes difficult. A server cannot simply post one or more sends or receives on each connection because the amount of memory (both in terms of locked pages and non-paged pool) is great. In this scenario, the server is interested in handling many connections at the eXPense of throughput on each connection. An example of this would be an instant messenger server. The server would handle many thousands of connections but would need to send or receive only a small number of bytes at a time.

For this strategy, the server does not necessarily want to post an overlapped receive on each connection because this would involve locking many pages for each of the receive buffers. Instead, the server can post an overlapped zero-byte receive. Once the receive completes, the server would perform a non-blocking receive until WSAEWOUDLBLOCK is returned. This allows the server to immediately receive all buffered data received on that connection. Because this model is geared toward clients that send data intermittently, it minimizes the number of locked pages but still allows processing of data on each connection.

Performance Numbers

This section covers performance numbers from the different servers provided in Chapters 5 and 6. The various servers tested are those using blocking sockets, non-blocking with select, WSAAsynCSelect, WSAEventSelect, overlapped I/O with events, and overlapped I/O with completion ports. Table 6-3 summarizes the results of these tests. For each I/O model, there are a couple of entries. The first entry is where 7000 connections were attempted from three clients. For all of these tests, the server is an echo server. That is, for each connection that is accepted, data is received and sent back to the client. The first entry for each I/O model represents a high-throughput server where the client sends data as fast as possible to the server. Each of the sample servers does not limit the number of concurrent connections. The remaining entries represent the connections when the clients limit the rate in which they send data so as to not overrun the bandwidth available on the network. The second entry for each I/O model represents 12,000 connections from the client, which is rate limiting the data sent. If the server was able to handle the majority of the 12,000 connections, then the third entry is the maximum number of clients the server was able to handle.

As we mentioned, the servers used are those provided from Chapter 5 except for the I/O completion port server, which is a slightly modified version of the Chapter 5 completion port server except that it limits the number of outstanding operations. This completion port server limits the number of outstanding send operations to 200 and posts just a single receive on each client connection. The client used in this test is the I/O completion port client from Chapter 5. Connections were established in blocks of 1000 clients by specifying the ‘-c 1000' option on the client. The two x86-based clients initiated a maximum of 12,000 connections and the Itanium system was used to establish the remaining clients in blocks of 4000. In the tests that were rate limited, each client block was limited to 200,000 bytes per second (using the ‘-r 200000' switch). So the average send throughput for that entire block of clients was limited to 200,000 bytes per second (not that each client was limited to this amount).

Table 6-3 I/O Method Performance Comparison

I/O Model

Attempted/Connected

Memory Used (KB)

Non-Paged Pool

CPU Usage

Threads

Throughput (Send/ Receive Bytes Per Second)

Blocking

7000/ 1008

25,632

36,121

10–60%

2016

2,198,148/ 2,198,148

12,000/ 1008

25,408

36,352

5– 40%

2016

404,227/ 402,227

Non- blocking

7000/ 4011

4208

135,123

95–100%*

1

0/0

12,000/ 5779

5224

156,260

95–100%*

1

0/0

WSA- Async Select

7000/ 1956

3640

38,246

75–85%

3

1,610,204/ 1,637,819

12,000/ 4077

4884

42,992

90–100%

3

652,902/ 652,902

WSA- Event Select

7000/ 6999

10,502

36,402

65–85%

113

4,921,350/ 5,186,297

12,000/ 11,080

19,214

39,040

50–60%

192

3,217,493/ 3,217,493

46,000/ 45,933

37,392

121,624

80–90%

791

3,851,059/ 3,851,059

Over- lapped (events)

7000/ 5558

21,844

34,944

65–85%

66

5,024,723/ 4,095,644

12,000/12,000

60,576

48,060

35–45%

195

1,803,878/ 1,803,878

49,000/48,997

241,208

155,480

85–95%

792

3,865,152/ 3,834,511

Over- lapped (comple- tion port)

7000/ 7000

36,160

31,128

40–50%

2

6,282,473/ 3,893,507

12,000/12,000

59,256

38,862

40–50%

2

5,027,914/ 5,027,095

50,000/49,997

242,272

148,192

55–65%

2

4,326,946/ 4,326,496

The server was a Pentium 4 1.7 GHz Xeon with 768 MB memory. Clients were established from three machines: Pentium 2 233MHz with 128 MB memory, Pentium 2 350 MHz with 128 MB memory, and an Itanium 733 MHz with 1 GB memory. The test network was a 100 MB isolated hub. All of the machines tested had Windows XP installed.

The blocking model is the poorest performing of all the models. The blocking server spawns two threads for each client connection: one for sending data and one for receiving it. In both test cases, the server was unable to handle a fraction of the connections because it hit a system resource limit on creating threads. Thus the CreateThread call was failing with ERROR_NOT_ENOUGH_MEMORY. The remaining client connections failed with WSAECONNREFUSED.

The non-blocking model faired only somewhat better. It was able to accept more connections but ran into a CPU limitation. The non-blocking server puts all the connected sockets into an FD_SET, which is passed into select. When select completes, the server uses the FD_ISSET macro to search to determine if that socket is signaled. This becomes inefficient because the number of connections increases. Just to determine if a socket is signaled, a linear search through the array is required! To partially alleviate this problem, the server can be redesigned so that it iteratively steps through the FD_SETs returned from select. The only issue is that the server then needs to be able to quickly find the SOCKET_INFO strUCture associated with that socket handle. In this case, the server can provide a more sophisticated cataloging mechanism, such as a hash tree, which allows quicker lookups. Also note that the non-paged pool usage is extremely high. This is because both AFD and TCP are buffering data on the client connections because the server is unable to read the data fast enough (as indicated by the zero-byte throughput) as indicated by the high CPU usage.

The WSAAsyncSelect model is acceptable for a small number of clients but does not scale well because the overhead of the message loop quickly bogs down its capability to process messages fast enough. In both tests, the server is able to handle only about a third of the connections made. The clients receive many WSAECONNREFUSED errors indicating that the server cannot handle the FD_ACCEPT messages quickly enough so the listen backlog is not exhausted. However, even for those connections accepted, you will notice that the average throughput is rather low (even in the case of the rate limited clients).

Surprisingly, the WSAEventSelect model performed very well. In all the tests, the server was, for the most part, able to handle all the incoming connections while obtaining very good data throughput. The drawback to this model is the overhead required to manage the thread pool for new connections. Because each thread can wait on only 64 events, when new connections are established new threads have to be created to handle them. Also, in the last test case in which more than 45,000 connections were established, the machine became very sluggish. This was most likely due to the great number of threads created to service the many connections. The overhead for switching between the 791 threads becomes significant. The server reached a point at which it was unable to accept any more connections due to numerous WSAENOBUFS errors. In addition, the client application reached its limitation and was unable to sustain the already established connections (we'll discuss this in detail later).

The overlapped I/O with events model is similar to the WSAEventSelect in terms of scalability. Both models rely on thread pools for event notification, and both reach a limit at which the thread switching overhead becomes a factor in how well it handles client communication. The performance numbers for this model almost exactly mirror that of WSAEventSelect. It does surprisingly well until the number of threads increases.

The last entry is for overlapped I/O with completion ports, which is the best performing of all the I/O models. The memory usage (both user and non-paged pool) and accepted clients are similar to both the overlapped I/O with events and WSAEventSelect model. However, the real difference is in CPU usage. The completion port model used only around 60 percent of the CPU, but the other two models required substantially more horsepower to maintain the same number of connections. Another significant difference is that the completion port model also allowed for slightly better throughput.

While carrying out these tests, it became apparent that there was a limitation introduced due to the nature of the data interaction between client and server. The server is designed to be an echo server such that all data received from the client was sent back. Also, each client continually sends data (even if it's at a lower rate) to the server. This results in data always pending on the server's socket (either in the TCP buffers or in AFD's per-socket buffers, which are all non-paged pool). For the three well-performing models, only a single receive is performed at a time; however, this means that for the majority of the time, there is still data pending. It is possible to modify the server to perform a non-blocking receive once data is indicated on the connection. This would drain the data buffered on the machine. The drawback to this approach in this instance is that the client is constantly sending and it is possible that the non-blocking receive could return a great deal of data, which would lead to starvation of other connections (as the thread or completion thread would not be able to handle other events or completion notices). Typically, calling a non-blocking receive until WSAEWOULDBLOCK works on connections where data is transmitted in intervals and not in a continuous manner.

From these performance numbers it is easily deduced that WSAEventSelect and overlapped I/O offer the best performance. For the two event based models, setting up a thread pool for handling event notification is cumbersome but still allows for Excellent performance for a moderately stressed server. Once the connections increase and the number of threads increases, then scalability becomes an issue as more CPU is consumed for context switching between threads. The completion port model still offers the ultimate scalability because CPU usage is less of a factor as the number of clients increases.


Server Strategies -- Programming High Performance WinSock Server 相关文章:
Server Strategies -- Programming High Performance WinSock Server 相关软件:
特别声明:本站除部分特别声明禁止转载的专稿外的其他文章可以自由转载,但请务必注明出处和原始作者。文章版权归文章原始作者所有。对于被本站转载文章的个人和网站,我们表示深深的谢意。如果本站转载的文章有版权问题请联系编辑人员,我们尽快予以更正。
转载请注明来源:http://www.xgdown.com