WebClient实现爬虫提示无法从传输连接中读取数据: 远程主机强迫关闭了一个现有的连接-526互联

在做爬虫去抓取网上一些信息的时候，有的网站设置了安全策略，导致通过WebClient请求的时候，提示错误：无法从传输连接中读取数据: 远程主机强迫关闭了一个现有的连接。

先看我最初写的代码：

      public static Task<string> getHtmlByUrl(string url)
        {
            var taskCompletitionSource = new TaskCompletionSource<string>();//将WebClient的异步转为Task
                ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                webClient.Encoding = Encoding.UTF8;
                webClient.DownloadStringCompleted += (s, e) =>
                {
                    if (e.Error != null)
                    {
                        taskCompletitionSource.TrySetException(e.Error);
                    }
                    else if (e.Cancelled)
                    {
                        taskCompletitionSource.TrySetCanceled();
                    }
                    else
                    {
                        taskCompletitionSource.TrySetResult(e.Result);
                    }
                };
                webClient.DownloadStringAsync(new Uri(url));
                return taskCompletitionSource.Task;
        }

如果请求的服务器没有设置请求限制，那么上面的代码，可以正常的运行，但是有的服务器，就是会做限制，如何解决，最容易想到的是伪造一些header，就有以下代码：

      public static Task<string> getHtmlByUrl(string url)
        {
            var taskCompletitionSource = new TaskCompletionSource<string>();
                ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                webClient.Encoding = Encoding.UTF8;
                webClient.Headers["refer"] = url;//伪造了header
                webClient.Headers["ContentType"] = "application/x-www-form-urlencoded";
                webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36";
                webClient.DownloadStringCompleted += (s, e) =>
                {
                    if (e.Error != null)
                    {
                        taskCompletitionSource.TrySetException(e.Error);
                    }
                    else if (e.Cancelled)
                    {
                        taskCompletitionSource.TrySetCanceled();
                    }
                    else
                    {
                        taskCompletitionSource.TrySetResult(e.Result);
                    }
                };
                webClient.DownloadStringAsync(new Uri(url));
                return taskCompletitionSource.Task;
        }

这样还是不行，看了这篇的文章：https://blog.csdn.net/zikizhh/article/details/104531875/，我尝试使用了其中的一点，就是在请求完之后，要释放WebClient，我这里给WebClient加上using

        public static Task<string> getHtmlByUrl(string url)
        {
            var taskCompletitionSource = new TaskCompletionSource<string>();//WebClient异步转为Task
            using (var webClient = new WebClient())//加上using
            {
                ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                webClient.Encoding = Encoding.UTF8;
                webClient.Headers["refer"] = url;//伪造header
                webClient.Headers["ContentType"] = "application/x-www-form-urlencoded";//伪造header
                webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36";//伪造header
                webClient.DownloadStringCompleted += (s, e) =>
                {
                    if (e.Error != null)
                    {
                        taskCompletitionSource.TrySetException(e.Error);
                    }
                    else if (e.Cancelled)
                    {
                        taskCompletitionSource.TrySetCanceled();
                    }
                    else
                    {
                        taskCompletitionSource.TrySetResult(e.Result);
                    }
                };
                webClient.DownloadStringAsync(new Uri(url));
                return taskCompletitionSource.Task;
            }
        }

但是，还是不能完全解决，不过，提示同样错误的概率极大的降低了。

可以的话，再偷偷随机伪造user-Agent，可能效果更好，你不介意的话，多随机几个数字

//我这里给浏览器版本  随便加了点随机数进去
webClient.Headers["User-Agent"] = $"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/5{new Random().Next(0, 10)}7.36 (KHTML, like Gecko) Chrome/11{new Random().Next(0, 10)}.0.0.0 Safari/537.36";

webclient forbidden download returned

springwebflux httpclient webclient

526互联

WebClient实现爬虫 提示 无法从传输连接中读取数据: 远程主机强迫关闭了一个现有的连接

WebClient实现爬虫提示无法从传输连接中读取数据: 远程主机强迫关闭了一个现有的连接