WebClient实现爬虫 提示 无法从传输连接中读取数据: 远程主机强迫关闭了一个现有的连接

发布时间 2023-10-07 01:51:52作者: 凌晨10点13分

在做爬虫去抓取网上一些信息的时候,有的网站设置了安全策略,导致通过WebClient请求的时候,提示错误:无法从传输连接中读取数据: 远程主机强迫关闭了一个现有的连接。

先看我最初写的代码:

      public static Task<string> getHtmlByUrl(string url)
        {
            var taskCompletitionSource = new TaskCompletionSource<string>();//将WebClient的异步转为Task
                ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                webClient.Encoding = Encoding.UTF8;
                webClient.DownloadStringCompleted += (s, e) =>
                {
                    if (e.Error != null)
                    {
                        taskCompletitionSource.TrySetException(e.Error);
                    }
                    else if (e.Cancelled)
                    {
                        taskCompletitionSource.TrySetCanceled();
                    }
                    else
                    {
                        taskCompletitionSource.TrySetResult(e.Result);
                    }
                };
                webClient.DownloadStringAsync(new Uri(url));
                return taskCompletitionSource.Task;
        }

如果请求的服务器没有设置请求限制,那么上面的代码,可以正常的运行,但是有的服务器,就是会做限制,如何解决,最容易想到的是伪造一些header,就有以下代码:

      public static Task<string> getHtmlByUrl(string url)
        {
            var taskCompletitionSource = new TaskCompletionSource<string>();
                ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                webClient.Encoding = Encoding.UTF8;
                webClient.Headers["refer"] = url;//伪造了header
                webClient.Headers["ContentType"] = "application/x-www-form-urlencoded";
                webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36";
                webClient.DownloadStringCompleted += (s, e) =>
                {
                    if (e.Error != null)
                    {
                        taskCompletitionSource.TrySetException(e.Error);
                    }
                    else if (e.Cancelled)
                    {
                        taskCompletitionSource.TrySetCanceled();
                    }
                    else
                    {
                        taskCompletitionSource.TrySetResult(e.Result);
                    }
                };
                webClient.DownloadStringAsync(new Uri(url));
                return taskCompletitionSource.Task;
        }

这样还是不行,看了这篇的文章:https://blog.csdn.net/zikizhh/article/details/104531875/,我尝试使用了其中的一点,就是在请求完之后,要释放WebClient,我这里给WebClient加上using

        public static Task<string> getHtmlByUrl(string url)
        {
            var taskCompletitionSource = new TaskCompletionSource<string>();//WebClient异步转为Task
            using (var webClient = new WebClient())//加上using
            {
                ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
                webClient.Encoding = Encoding.UTF8;
                webClient.Headers["refer"] = url;//伪造header
                webClient.Headers["ContentType"] = "application/x-www-form-urlencoded";//伪造header
                webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36";//伪造header
                webClient.DownloadStringCompleted += (s, e) =>
                {
                    if (e.Error != null)
                    {
                        taskCompletitionSource.TrySetException(e.Error);
                    }
                    else if (e.Cancelled)
                    {
                        taskCompletitionSource.TrySetCanceled();
                    }
                    else
                    {
                        taskCompletitionSource.TrySetResult(e.Result);
                    }
                };
                webClient.DownloadStringAsync(new Uri(url));
                return taskCompletitionSource.Task;
            }
        }

但是,还是不能完全解决,不过,提示同样错误的概率极大的降低了。

可以的话,再偷偷随机伪造user-Agent,可能效果更好,你不介意的话,多随机几个数字